Saturday, June 27, 2009

Solutions to Chapter 10 (p. 254)

First, an apology: no answers to the questions in Chapter 9 following those in page 228. Why, you ask? Because there's a bug in the authors' code leading to the questions on page 232; their implementation of foldTree compiles, but is broken (try it on non-shallow hierarchy trees to see why). While I don't mind trying to answer the exercises, debugging the authors' code is a bit too much for my patience.

(For the record, I am losing patience with this book; too many bugs, too many errors, too much hand-waving and unclear explanations.)

And now, answers to the exercises of Chapter 10:

#1. It's pretty trivial to modify parseRawPGM to parsePlainPGM; the modified lines are in bold:

parseAsciiPGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P2") "invalid ASCII header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey -> skipSpaces ==>&
    parseNats (width * height) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")

The main challenge is defining parseNats (plural), which repeatedly uses parseNat (singular) to construct a ByteString:

parseNats :: Int -> Parse L.ByteString
parseNats 0 = identity L.empty
parseNats n = parseNat ==> \v -> (L.cons (fromIntegral v)) <$> parseNats (n - 1)

#2. Let's start by thinking about how will the parsed data be stored. We can replace the type of the greyData field in Greymap, from ByteString to [Int] for example. Or, we can keep it a ByteString, but always allocate two bytes per value. Or, we can keep it a ByteString, and have the ByteString's size vary based on the greyMax value. The last option is the most space-efficient, which is why I've chosen it for the following implementation (changes from the original are in bold):

parseRawPGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P5") "invalid raw header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey ->
    parseByte ==>&
    parseBytes (width * height * (pixelSize maxGrey)) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")

pixelSize :: Int -> Int
pixelSize maxGrey = if maxGrey < 256 then 1 else 2

Why make pixelSize a top-level function, rather than put it in the where part of parseRawPGM? Because we'll want to use it elsewhere; the new greymap representation is best used with accessor functions. Here's a function that will read the pixel at position x, y:

getPixel :: Greymap -> Int -> Int -> Int
getPixel (Greymap w h greyMax greyData) x y
    | pixSize == 1 = firstByte
    | pixSize == 2 = (256 * firstByte) + secondByte
  where pixSize = pixelSize greyMax
        offset = fromIntegral $ (y * w + x) * pixSize
        dataFromOffset = L8.drop offset greyData
        firstByte = ord $ L8.head dataFromOffset
        secondByte = ord $ L8.head $ L8.tail dataFromOffset

Of course, scanning the entire image using this function would be mighty inefficient; a Greymap -> [[Int]] function can make life easier.

#3. One alternative is to peek into the data, and if switch to either parseRawPGM or parsePlainPGM, based on the file's header. However, the two functions are similar enough that I decided to create a single, unified function, thus:

parsePGM =
    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&
    assert (header == "P5" || header == "P2") "invalid PGM header" ==>&
    parseNat ==> \width -> skipSpaces ==>&
    parseNat ==> \height -> skipSpaces ==>&
    parseNat ==> \maxGrey -> (skipAfterMaxGrey header) ==>&
    (parseData header) (width * height * (valuesPerPixel header maxGrey)) ==> \bitmap ->
    identity (Greymap width height maxGrey bitmap)
  where notWhite = (`notElem` " \r\n\t")
        isRaw header = header == "P5"
        skipAfterMaxGrey header = if (isRaw header) then skipByte else skipSpaces
        parseData header = if (isRaw header) then parseBytes else parseNats
        valuesPerPixel header maxGrey = if (isRaw header) then (pixelSize maxGrey) else 1

skipByte :: Parse ()
skipByte = parseByte ==>& identity ()

Basically, in every location where the two parsers would behave differently, I've used a helper function. The helper function often returns which function to call, as the data passed to the function is identical in both cases; thus, parseData returns either parseBytes (for raw files) or parseNats (for plain files), and skipAfterMaxGrey returns either skipByte or skipSpaces.

Ever wondered why the raw parser used parseByte rather than skipSpaces after the grey-max value? Because the actual value bytes could happen to be, say, 8 (tab) or 32 (space), and this would cause them to be erroneously skipped. I had to define skipByte here since parseByte has a different return type than skipSpaces, so the helper function skipAfterMaxGrey couldn't be used to pick between them. skipByte has the same type as skipSpaces, which solves the problem.

(There's a subtle bug here, since the function defined in solution #1 above wasn't updated with the change dictated by solution #2. Can you fix that?)


  1. Could you give an example how to use the parse functions on some data? I wanted to do the exercises but I can't even start with the original parseRawPGM functions. I tried:

    parseRawPGM "foo"
    parseRawPGM (L8.pack "foo")
    parseRawPGM (ParseState (L8.pack "foo"))
    parseRawPGM (ParseState (L8.pack "foo") 0)

    but nothing seems to work and the error messages are very cryptic :(

  2. OK, I figured it out by myself :) parseRawPGM should be passed to parse function, like that:

    ghci> parse parseRawPGM (L8.pack "P5 2 2 255\n1003")
    Right Greymap 2x2 255

  3. Thank you Dr. Elkstein for providing these solutions online. They have been indispensable.

    There is a slight gap in parseNats as defined. It does not skip the spaces between pixels in the raster.

    As is done in the static header parsing parseNats should be using skipSpaces and the chaining combinator (==>&) to achieve this.

    parseNats :: Int -> Parse L.ByteString
    parseNats 0 = identity L.empty
    parseNats n = parseNat ==> \pixel -> skipSpaces ==>& ((L.cons (fromIntegral pixel)) <$> parseNats (n - 1))