Monday, April 27, 2009

Solutions to Chapter 4 (p. 97)

This exercises section has numerous mistakes and typos in it, including incorrect numbering (at least, that's the print I have). The numbers I'm using here are the "right" ones, not the ones that appear in print.

#1. It's foldl, of course, because in each step we have to multiply everything we've seen so far by 10 -- and "everything we've seen so far" refers to stuff on the left of the currently-added digit.

Because the minus sign can only appear as the first character, we'll use a special pattern just for it, before folding.

import Data.Char

asInt_fold :: String -> Int
asInt_fold ('-':xs) = -(asInt_fold xs)
asInt_fold xs = foldl step 0 xs
                where step x y = 10 * x + (digitToInt y)

#2. I can only assume the authors meant "" and "-" to be reported as errors. I also assume they wish us to use error directly, rather than rely on digitToInt doing the work for us. Finally, I assume we can't use isDigit, which wasn't introduced yet.

asInt_fold :: String -> Int
asInt_fold [] = error "Contains no digits"
asInt_fold ('-':xs) = -(asInt_fold xs)
asInt_fold xs = foldl step 0 xs
   where step x y | (y >= '0') && (y <= '9') = 10 * x + (digitToInt y)
                  | otherwise = error ("Not a digit '" ++ [y] ++ "'")

#3. (sigh) The signature of asInt_either is broken in the text. It should be:

type ErrorMessage = String
asInt_either :: String -> Either ErrorMessage Int

And of course, Either wasn't presented yet...! To add insult to injury, the book's index doesn't have the correct pointer to information about Either: you can read about it in page 210 (Chapter 8); or you can use the GHC online docs. To keep things simple, I will not be using the either function, just the Either type, with patterns and case to tell Left from Right:

type ErrorMessage = String

asInt_either :: String -> Either ErrorMessage Int
asInt_either [] = Left "Contains no digits"
asInt_either ('-':xs) = case inner of
                          Left err  -> Left err
                          Right val -> Right (-val)
   where inner = (asInt_either xs)
asInt_either xs = foldl step (Right 0) xs
   where step (Right x) y | (y >= '0') && (y <= '9') = Right (10 * x + (digitToInt y))
                          | otherwise = Left ("Not a digit '" ++ [y] ++ "'")
         step (Left err) _ = Left err

What makes the code so unattractive is the fact we have to deconstruct the answer at every folding step, and reconstruct it right back. If it was deconstructed to a Left, it stays a Left with the same error message; if it was deconstructed to a Right, we proceed normally in the step, but must wrap the result in a Right constructor again.

#4. Simply append the nested lists to each other:

myConcat :: [[a]] -> [a]
myConcat xs = foldr (++) [] xs

#5. Recursive:

takeWhile_rec :: (a -> Bool) -> [a] -> [a]
takeWhile_rec _ [] = []
takeWhile_rec pred (x:xs) | pred x    = x : takeWhile_rec pred xs
                          | otherwise = []

takeWhile is very similar to filter, except that if the predicate isn't met, we don't proceed (return an empty list instead of the accumulator):

takeWhile_foldr :: (a -> Bool) -> [a] -> [a]
takeWhile_foldr pred xs = foldr stepr [] xs
  where stepr x acc | pred x = x : acc
                    | otherwise = []

#6. What does groupBy do? The first argument is of type a -> a -> Bool, meaning it's a relation on the type a (for example, ">" is a relation on the integers; == is a relation on any comparable type; etc.) This, plus the function name, gives us a pretty strong hint. Let's check it out, in ghci:

Prelude> :module Data.List
Prelude Data.List> groupBy (>) [1,2,3,4,5,6,7]
[[1],[2],[3],[4],[5],[6],[7]]
Prelude Data.List> groupBy (>) [4,3,2,1]
[[4,3,2,1]]
Prelude Data.List> groupBy (>) [4,3,4,2,1]
[[4,3],[4,2,1]]
Prelude Data.List> groupBy (==) [4,3,4,2,1]
[[4],[3],[4],[2],[1]]
Prelude Data.List> groupBy (==) [1,1,1,2,1,1,2,2,2,1,2]
[[1,1,1],[2],[1,1],[2,2,2],[1],[2]]

Seems like it compares pairs of elements, and as long as each element fulfills the relation with the previous one, they are added to the same group. But then, something funny happens when you try other stuff:

*Main Data.List> groupBy (>) [4,2,3,1,3,5,2,3,2,1]
[[4,2,3,1,3],[5,2,3,2,1]]

Huh?

Well, apparently groupBy picks the first elements, and groups to it all elements that are in the relation with it; once an element not in the relationship with the first one is found, it is used to start a new group, and further elements will be compared to this head of the new group. And so on. So, again, each element is compared not with the preceding element, but with the first element (the head) of the group being constructed.

This only goes to show how important it is to use varied input sets when testing your code.

Now, let's see how this can be implemented using fold. Because each element is compared not to its direct neighbor, but to the first of its group, foldr makes little sense: seeing the rightmost element, we don't know what to compare it with! So, with foldl:

groupBy_foldl :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy_foldl p xs = foldl step [] xs
  where step [] x = [[x]]
        step acc x | p (head (last acc)) x = (init acc) ++ [((last acc) ++ [x])]
                   | otherwise             = acc ++ [[x]]

Here, the accumulator (acc) is a list of lists, and to get the first element of the last list we use head (last acc). I assume last isn't a very efficient function, so it might be more efficient to keep the last element of acc on its own, and append it to acc only when moving to a new sublist (or when we're done). This yields the less-readable function:

groupBy_foldl2 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy_foldl2 p xs = acc ++ [lastAcc]
  where (acc, lastAcc) = foldl step ([], []) xs
        step (acc, []) x = (acc, [x])
        step (acc, lastAcc) x | p (head lastAcc) x = (acc, lastAcc ++ [x])
                              | otherwise          = (acc ++ [lastAcc], [x])

... But while it's less readable, I'm not at all certain it's also faster. Besides, using fold is not the fastest way to implement groupBy anyway. My advice, as always, is to leave optimizations out of the game until you're certain that there actually is a performance problem.

#7. For any, we can use either left or right folding; here's a naive attempt:

any_foldl p xs = foldl step False xs
  where step acc x = acc || (p x)

any_foldr p xs = foldr step False xs
  where step x acc = acc || (p x)

Note that we are applying the predicate p to each x, and or-ing the results. So why not apply p using map?

any_foldl p xs = foldl (||) False (map p xs)

any_foldr p xs = foldr (||) False (map p xs)

Remember that Haskell is lazy, so this does not imply that we first apply p to all xs; rather, it is applied on a per-need basis. And indeed, this version is both shorter, easier to understand, and faster. And finally, the foldr version is far superior to the foldl one. To see why, ask yourself which of them allows Haskell to use short-circuit evaluation for the "or" operator.

For cycle... wait, cycle wasn't introduced yet. And it involves infinite lists, a concept which wasn't introduced yet, either. So we'll skip it for now.

For words, left and right versions both use an accumulator that includes the words collected so far, plus a boolean indicating if we're currently handling whitespace:

import Data.Char
 -- (for isSpace)

words_foldl :: String -> [String]
words_foldl ln = fst(foldl step ([],True) ln)
  where step (acc, True) x | isSpace x = (acc, True)
                           | otherwise = (acc ++ [[x]], False)
        step (acc, False) x | isSpace x = (acc, True)
                            | otherwise = ((init acc) ++ [(last acc) ++ [x]], False)

words_foldr :: String -> [String]
words_foldr ln = fst(foldr step ([],True) ln)
  where step x (acc, True) | isSpace x = (acc, True)
                           | otherwise = ([x] : acc, False)
        step x (acc, False) | isSpace x = (acc, True)
                            | otherwise = ((x : (head acc)) : tail acc, False)

Again, the right-folding version is better, this time since head and tail are preferred over init and last.

Finally, for unlines:

unlines_foldl :: [String] -> String
unlines_foldl lns = foldl step [] lns
  where step acc ln = acc ++ ln ++ "\n"

unlines_foldr :: [String] -> String
unlines_foldr lns = foldr step [] lns
  where step ln acc = ln ++ "\n" ++ acc

5 comments:

  1. At this point the book recommends Graham Hutton's paper:
    http://www.cs.nott.ac.uk/~gmh/fold.pdf

    ReplyDelete
  2. I think the answer to the first exercise is wrong, since it accepts input like "--0".

    ReplyDelete
  3. Hi freich,

    You're probably right. This can be solved, for example, by adding a helper function so that the first template calls the helper, and the helper doesn't further support a leading minus sign.

    ReplyDelete
  4. Your groupBy example is much is easier to understand but I didn't want to keep running init, head, tail on every element so I used a tuple to keep the accumulator, the current list and the list head (or comparison value). Not sure if its more efficient. I also wanted to mimic the alder32_try2 example.

    myGroupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    myGroupBy _ [] = []
    myGroupBy p (x:xs) = let (acc,ls,lsh) = foldl step ([],[x],x) xs
    in acc ++ [ls]
    where step (acc',ls',lsh') x'
    | p lsh' x' = (acc' , ls'++[x'], lsh')
    | otherwise = (acc'++[ls'], [x'] , x' )

    ReplyDelete
  5. In you "naive" any implementation, you should switch the arguments of ||, to allow short circuiting.

    ReplyDelete