Saturday, June 27, 2009

Solutions to Chapter 10 (p. 254)

First, an apology: no answers to the questions in Chapter 9 following those in page 228. Why, you ask? Because there's a bug in the authors' code leading to the questions on page 232; their implementation of foldTree compiles, but is broken (try it on non-shallow hierarchy trees to see why). While I don't mind trying to answer the exercises, debugging the authors' code is a bit too much for my patience.

(For the record, I am losing patience with this book; too many bugs, too many errors, too much hand-waving and unclear explanations.)

And now, answers to the exercises of Chapter 10:

#1. It's pretty trivial to modify parseRawPGM to parsePlainPGM; the modified lines are in bold:

parseAsciiPGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P2") "invalid ASCII header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey -> skipSpaces ==>&
    parseNats (width * height) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")

The main challenge is defining parseNats (plural), which repeatedly uses parseNat (singular) to construct a ByteString:

parseNats :: Int -> Parse L.ByteString
parseNats 0 = identity L.empty
parseNats n = parseNat ==> \v -> (L.cons (fromIntegral v)) <$> parseNats (n - 1)

#2. Let's start by thinking about how will the parsed data be stored. We can replace the type of the greyData field in Greymap, from ByteString to [Int] for example. Or, we can keep it a ByteString, but always allocate two bytes per value. Or, we can keep it a ByteString, and have the ByteString's size vary based on the greyMax value. The last option is the most space-efficient, which is why I've chosen it for the following implementation (changes from the original are in bold):

parseRawPGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P5") "invalid raw header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey ->
    parseByte ==>&
    parseBytes (width * height * (pixelSize maxGrey)) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")

pixelSize :: Int -> Int
pixelSize maxGrey = if maxGrey < 256 then 1 else 2

Why make pixelSize a top-level function, rather than put it in the where part of parseRawPGM? Because we'll want to use it elsewhere; the new greymap representation is best used with accessor functions. Here's a function that will read the pixel at position x, y:

getPixel :: Greymap -> Int -> Int -> Int
getPixel (Greymap w h greyMax greyData) x y
    | pixSize == 1 = firstByte
    | pixSize == 2 = (256 * firstByte) + secondByte
  where pixSize = pixelSize greyMax
        offset = fromIntegral $ (y * w + x) * pixSize
        dataFromOffset = L8.drop offset greyData
        firstByte = ord $ L8.head dataFromOffset
        secondByte = ord $ L8.head $ L8.tail dataFromOffset

Of course, scanning the entire image using this function would be mighty inefficient; a Greymap -> [[Int]] function can make life easier.

#3. One alternative is to peek into the data, and if switch to either parseRawPGM or parsePlainPGM, based on the file's header. However, the two functions are similar enough that I decided to create a single, unified function, thus:

parsePGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P5" || header == "P2") "invalid PGM header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey -> (skipAfterMaxGrey header) ==>&
    (parseData header) (width * height * (valuesPerPixel header maxGrey)) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")
        isRaw header = header == "P5"
        skipAfterMaxGrey header = if (isRaw header) then skipByte else skipSpaces
        parseData header = if (isRaw header) then parseBytes else parseNats
        valuesPerPixel header maxGrey = if (isRaw header) then (pixelSize maxGrey) else 1

skipByte :: Parse ()
skipByte = parseByte ==>& identity ()

Basically, in every location where the two parsers would behave differently, I've used a helper function. The helper function often returns which function to call, as the data passed to the function is identical in both cases; thus, parseData returns either parseBytes (for raw files) or parseNats (for plain files), and skipAfterMaxGrey returns either skipByte or skipSpaces.

Ever wondered why the raw parser used parseByte rather than skipSpaces after the grey-max value? Because the actual value bytes could happen to be, say, 8 (tab) or 32 (space), and this would cause them to be erroneously skipped. I had to define skipByte here since parseByte has a different return type than skipSpaces, so the helper function skipAfterMaxGrey couldn't be used to pick between them. skipByte has the same type as skipSpaces, which solves the problem.

(There's a subtle bug here, since the function defined in solution #1 above wasn't updated with the change dictated by solution #2. Can you fix that?)

Friday, June 5, 2009

Solutions to Chapter 9 (p. 228)

#1. The Info type derives Ord; and because FilePath is its first field, a list of Infos will be sorted by name. Therefore, we simply need to pass (reverse . sort) to get reverse alphabetic order.

#2. First, let's try to understand why using id yields preorder traversal. The key line in ControlledVisit.hs is bolded below:

traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]
traverse order path = do
    names <- getUsefulContents path
    contents <- mapM getInfo (path : map (path </>) names)
    liftM concat $ forM (order contents) $ \info -> do
      if isDirectory info && infoPath info /= path
        then traverse order (infoPath info)
        else return [info]

I.e., the contents to traverse is constructed as the path itself, followed by its content (loaded into names a line earlier). To change to postorder, we can modify this single line, thus:

traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]
traverse order path = do
    names <- getUsefulContents path
    contents <- mapM getInfo ((map (path </>) names) ++ [path])
    liftM concat $ forM (order contents) $ \info -> do
      if isDirectory info && infoPath info /= path
        then traverse order (infoPath info)
        else return [info]

But of course, that's not what the question is about: we should provide an order function that does this for us, rather than modify traverse directly! All this function has to do is move the first item to the list's end, so:

postorder :: [Info] -> [Info]
postorder ns = tail ns ++ [head ns]

#3. Explanations to this solution appear throughout the solution itself, as comments:

-- The InfoP type is the type of a function that
-- extracts data from an Info record:
type InfoP a = Info -> a

-- pathP extracts the path; but we already got a
-- function that does that, by virtue of Info's
-- definition as a record; so pathP is just an alias.
pathP :: InfoP FilePath
pathP = infoPath

-- For sizeP, we can't use infoSize as-is, since
-- we must handle the "Nothing" case:
sizeP :: InfoP Integer
sizeP inf = case (infoSize inf) of
              Just size -> size
              Nothing   -> -1

-- We don't really need equalP: We'll be using a lift
-- function in a minute. But it helps us formulate
-- what this lift function should look like.
equalP :: (Eq a) => InfoP a -> a -> InfoP Bool
equalP f k inf = f inf == k

-- Now, liftP abstracts the equality test from
-- equalP into any test, passed as the first argument:
liftP :: (a -> b -> c) -> InfoP a -> b -> InfoP c
liftP op f k inf = f inf `op` k

-- And we can use liftP to define greaterP, lesserP
-- (could also have used it to define equalP):
greaterP, lesserP :: (Ord a) => InfoP a -> a -> InfoP Bool
greaterP = liftP (>)
lesserP = liftP (<)
-- Note that these three definitions are identical to
-- the ones in the book (p. 223) -- but we're using
-- a different definition of liftP.

-- Now, to define andP and orP. We begin (as in the book,
-- p. 224) with simpleAndP, which helps us formulate
-- what the lifting function should look like:
simpleAndP :: InfoP Bool -> InfoP Bool -> InfoP Bool
simpleAndP f g inf = f inf && g inf

-- Let's lift the "&&" operator out of this definition:
liftP2 :: (a -> b -> c) -> InfoP a -> InfoP b -> InfoP c
liftP2 op f g inf = f inf `op` g inf

andP, orP :: InfoP Bool -> InfoP Bool -> InfoP Bool
andP = liftP2 (&&)
orP = liftP2 (||)

-- constP is almost unchanged (cf. p. 224), except
-- that our representation of an info record is a
-- single Info entry, vs. four entries in the old
-- representation:
constP :: a -> InfoP a
constP k _ = k

-- This is liftP, rewritten in terms of liftP2:
liftP' op f k inf = f inf `op` constP k

-- Lifting paths: convert a function that operates
-- on pathnames to one that operates on Info 
-- records:
liftPath :: (FilePath -> a) -> InfoP a
liftPath f = f . pathP

-- Now, the myTest2 function, as defined on p. 225,
-- should work unmodified using our new predicates
-- and combinators:
myTest2 = (liftPath takeExtension `equalP` ".cpp") `andP`
          (sizeP `greaterP` 131072)

-- To conclude, by defining infix operators, we can
-- have the definitions of myTest3 (p. 225) and myTest4
-- (p. 226), again unmodified:
(==?) = equalP
(&&?) = andP
(>?) = greaterP

myTest3 = (liftPath takeExtension ==? ".cpp") &&? (sizeP >? 131072)

infix 4 ==?
infixr 3 &&?
infix 4 >?

myTest4 = liftPath takeExtension ==? ".cpp" &&? sizeP >? 131072

#4. Because of the way traverse is defined, a directory must be part of the output if it is to be traversed. So, the traversal predicate is used to filter directories, and the filtering predicate applies to files only:

-- The first parameter is a predicate that should return true
-- only for directories that we wish to traverse. The second
-- parameter is a predicate that is used to filter files (i.e.,
-- return false for files to be removed from the output).
traverse2 :: (Info -> Bool) -> (Info -> Bool) -> FilePath -> IO [Info]
traverse2 travP filterP path = traverse (order2 travP filterP) path

order2 :: (Info -> Bool) -> (Info -> Bool) -> ([Info] -> [Info])
order2 tP fP infs = filter (\x -> (isDirectory x && tP x) || fP x) infs

Now, to find all mp3 files outside of my music directory, I can use:

traverse2 (not . liftPath (isInfixOf "Audio")) (liftPath (isInfixOf "mp3")) "c:\\"

(The liftPath function was defined in the solution to question #3 above.)