Thursday, May 28, 2009

Solutions to Chapter 9 (p. 221)

#1. The order clearly matters. If the handle only applied to the internal of the bracketed operation, then errors in the bracketing itself -- namely, the openFile and hClose operations -- would not have been handled by it.

Wednesday, May 27, 2009

Solutions to Chapter 8 (p. 212)

#1. For a fantastic discussion of implementing a simple regular-expression matcher, see the first chapter of Beautiful Code -- probably the most beautiful chapter of that whole book. And as for the solution:

matchesGlob2 :: FilePath -> String -> Bool
matchesGlob2 fp "" = null fp
matchesGlob2 (c:fp) ('?':pat) = matchesGlob2 fp pat
matchesGlob2 fp ('*':pat) = matchStar fp pat
matchesGlob2 fp ('[':pat) = matchCharClass fp pat
matchesGlob2 (c:fp) (p:pat) = (c == p) && matchesGlob2 fp pat
matchesGlob2 "" _ = False

matchCharClass :: FilePath -> String -> Bool
 -- string is list of chars to match ++ "]" ++ rest of pattern.
 -- if starts with "!", negate meaning.
matchCharClass _ pat | not (']' `elem` pat) = error "Unmatched '['"
matchCharClass (c:fp) ('!':pat) = (c `notElem` (getCharClass pat)) && matchesGlob2 fp (afterCharClass pat)
matchCharClass (c:fp) pat = (c `elem` (getCharClass pat)) && matchesGlob2 fp (afterCharClass pat)

getCharClass :: String -> String
 -- assumes pat contains ']'
getCharClass pat = takeWhile (\x -> x /= ']') pat

afterCharClass :: String -> String
 -- assumes pat contains ']'
afterCharClass pat = tail $ dropWhile (\x -> x /= ']') pat  -- tail drops ']'

matchStar :: FilePath -> String -> Bool
matchStar "" pat = matchesGlob2 "" pat
matchStar fp pat = matchesGlob2 fp pat || matchStar (tail fp) pat

Tuesday, May 26, 2009

Solutions to Chapter 8 (p. 211)

#1. Contrary to the authors' claim, this one is pretty simple -- if you define a helper function. Here, the function prepend either prepends a string to a Right value, or propagates unchanged a Left value. We then use it to update all equations defining globToRegex' and charClass. The main function, globToRegex, is also updated to handle either case.

(Note: the update is applied to the original version of globToRegex, not to the modified versions created in previous exercises.)

module GlobRegex (globToRegex, matchesGlob) where

import Text.Regex.Posix ((=~))

type GlobError = String

globToRegex :: String -> Either GlobError String
globToRegex cs = case internal of
    Left error -> Left error
    Right val  -> Right ('^' : val ++ "$")
  where internal = globToRegex' cs

globToRegex' :: String -> Either GlobError String
globToRegex' "" = Right ""

globToRegex' ('*':cs) = prepend ".*" (globToRegex' cs)
globToRegex' ('?':cs) = prepend "." (globToRegex' cs)

globToRegex' ('[':'!':c:cs) = prepend ("[^" ++ [c]) (charClass cs)
globToRegex' ('[':c:cs) = prepend ['[', c] (charClass cs)
globToRegex' ('[':_) = Left "unterminated character class"

globToRegex' (c:cs) = prepend (escape c) (globToRegex' cs)

escape :: Char -> String
escape c | c `elem` regexChars = '\\' : [c]
         | otherwise = [c]
  where regexChars = "\\+()^$.{}]"

charClass :: String -> Either GlobError String
charClass (']':cs) = prepend "]" (globToRegex' cs)
charClass (c:cs) = prepend [c] (charClass cs)
charClass _ = Left "unterminated character class"

prepend :: String -> Either GlobError String -> Either GlobError String
prepend prefix (Left error) = Left error
prepend prefix (Right str)  = Right (prefix ++ str)

#2. namesMatching never uses globToRegex directly; it uses matchesGlob, which we update thus:

matchesGlob :: FilePath -> String -> Bool
f `matchesGlob` g = case (globToRegex g) of
    Right regex -> f =~ regex
    Left err    -> False

In other words, a bad glob expression simply matches nothing.

Saturday, May 23, 2009

Solutions to Chapter 8 (p. 210), question 3

#3. What a tricky question!

No OS shell that I'm aware of interprets "**" the way the authors suggest, but I decided to give it a go anyway. I'll present the solution in steps.

The first part is changing globToRegex. We want "*" to not match anything with a path-separator (slash or backslash, depending on your OS) in it; and we want "**" do to match such strings. So, editing GlobRegex.hs, with changes bolded:

module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where

import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
import import System.FilePath (pathSeparator)

globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"

globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""

globToRegex' ('*':'*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('*':cs) ign = "[^" ++ [pathSeparator] ++ "]*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "[^]" ++ globToRegex' cs ign

globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"

globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign

... the rest of the file is unchanged -- copied from answer #2 here...

What we're going to do, for "**" patterns, is pretty inefficient: collect all possible filepaths, and then filter out the matching ones. Starting top-down, we modify the namesMatching function (based on the definition from the previous post):

import Data.List (isInfixOf)

isPattern :: String -> Bool
isPattern = any (`elem` "[*?")

isDoubleStarPattern :: String -> Bool
isDoubleStarPattern = isInfixOf "**"

namesMatching :: String -> IO [FilePath]
namesMatching pat
  | not (isPattern pat) = do
      exists <- doesNameExist pat
      return (if exists then [pat] else [])
  | isDoubleStarPattern pat = do
      names <- allFilenames
      return (filter (flip matchesGlob pat) names)
  | otherwise = do
      case splitFileName pat of
        ... continues unchanged ...

The allFilenames pattern does just what its name suggests: collects all possible filepaths, working recursively from the current directory. Like this:

allFilenames :: Char -> IO [FilePath]
allFilenames = do
  contents <- getDirectoryContents "."
  allFilenames' (return []) "" contents

allFilenames' :: IO [FilePath] -> FilePath -> [FilePath] -> IO [FilePath]
-- The first argument (acc) is the result accumulator.
-- The second argument (pref) is the path prefix to apply to all fetched filenames.
-- The third argument (f:fs) is a stack of file or directory names to process.
allFilenames' acc _ [] = acc
allFilenames' acc pref (f:fs)
  | (f == ".") = allFilenames' acc pref fs
  | (f == "..") = allFilenames' acc pref fs
  | otherwise = do
      dirExists <- doesDirectoryExist (pref ++ f)
      dirCont <- if dirExists
                 then getDirectoryContents (pref ++ f)
                 else return []
      if dirExists 
         then do -- f is a directory, recurse into it
           restOfCurrent <- (allFilenames' acc pref fs)
           recurse <- (allFilenames' (return []) (pref ++ f ++ [pathSeparator]) dirCont)
           return (restOfCurrent ++ recurse)
         else do -- f is a file, add it to the accumulator and proceed
           acc' <- acc
           allFilenames' (return ((pref ++ f):acc')) pref fs

This works just fine. (A much neater version is presented early in Chapter 9.) The solution, however, has a few limitations:

  • It does not correctly handle patterns starting with "/" (or "\" under Windows).
  • If it tries to recurse into a directory where the current user has no read access, it fails.
  • It is not case-insensitive under Windows.
Let's fix the first problem first. By making allFilenames accept a single character -- the head of the glob pattern -- as a parameter, we can make it start from "." or from "/" (or "\"), as the case might be:

allFilenames :: Char -> IO [FilePath]
allFilenames fst = do
  contents <- getDirectoryContents start
  allFilenames' (return []) prefix contents
  where start = if fst == pathSeparator then [pathSeparator] else "."
        prefix = if fst == pathSeparator then [pathSeparator] else ""

We have to patch the call to this function accordingly, in namesMatching:

namesMatching :: String -> IO [FilePath]
namesMatching pat
  | not (isPattern pat) = do
      exists <- doesNameExist pat
      return (if exists then [pat] else [])
  | isDoubleStarPattern pat = do  --xx
      names <- allFilenames $ head pat
      return (filter (flip matchesGlob pat) names)
... continues unchanged ...

The other limitations are left as an exercise to the reader.

As noted in the beginning, this solution collects all names first, and filters on matchesGlob later. A probably-faster alternative is to include the filtering in allFilenames', so that when each item is added for addition to the accumulator (acc), we actually test it for a match before adding. A more effective and elegant solution is presented by the authors in Chapter 9.

Tuesday, May 19, 2009

Solutions to Chapter 8 (p. 210), questions 1-2

First, if you're having trouble loading Glob.hs, that's probably because you're using a too-new version of GHC. Try replacing the line

import Control.Exception (handle)

with

import Control.OldException (handle)

to use a backwards-compatible version of handle, which solves this problem.

#1. The trick to detecting the current OS is examining the path-separator character. It's '/' on Unix and Unix-like systems, and '\\' (a backslash) on Windows.

I'll be using the GlobRegex module which was updated in the previous post, which includes the matchesGlobIgnoreCase function. Here's the updated Glog.hs; changed or new lines are in bold.

import System.Directory (doesDirectoryExist, doesFileExist,
    getCurrentDirectory, getDirectoryContents)
import System.FilePath (dropTrailingPathSeparator, splitFileName, ())

import Control.OldException (handle)
import Control.Monad (forM)
import GlobRegex (matchesGlob, matchesGlobIgnoreCase)

import System.FilePath (pathSeparator)

isPattern :: String -> Bool
isPattern = any (`elem` "[*?")

namesMatching :: String -> IO [FilePath]
namesMatching pat
  | not (isPattern pat) = do
      exists <- doesNameExist pat
      return (if exists then [pat] else [])
  | otherwise = do
      case splitFileName pat of
        ("", baseName) -> do
          curDir <- getCurrentDirectory
          listMatches curDir baseName
        (dirName, baseName) -> do
          dirs <- if isPattern dirName
                  then namesMatching (dropTrailingPathSeparator dirName)
                  else return [dirName]
          let listDir = if isPattern baseName
                        then listMatches
                        else listPlain
          pathNames <- forM dirs $ \dir -> do
                         baseNames <- listDir dir baseName
                         return (map (dir ) baseNames)
          return (concat pathNames)

doesNameExist :: FilePath -> IO Bool
doesNameExist name = do
  fileExists <- doesFileExist name
  if fileExists then return True else doesDirectoryExist name

listMatches :: FilePath -> String -> IO [FilePath]
listMatches dirName pat = do
  dirName' <- if null dirName
              then getCurrentDirectory
              else return dirName
  let matcher = if isOsCaseInsensitive
                then matchesGlobIgnoreCase
                else matchesGlob
  handle (const (return [])) $ do
    names <- getDirectoryContents dirName'
    let names' = if isHidden pat
                 then filter isHidden names
                 else filter (not . isHidden) names
    return (filter (`matcher` pat) names')

isHidden ('.':_) = True
isHidden _ = False

listPlain :: FilePath -> String -> IO [FilePath]
listPlain dirName baseName = do
  exists <- if null baseName
            then doesDirectoryExist dirName
            else doesNameExist (dirName  baseName)
  return (if exists then [baseName] else [])

-- Only the case-insensitive Windows uses backlash as a path separator
isOsCaseInsensitive :: Bool
isOsCaseInsensitive = pathSeparator == '\\'

#2. The System.Posix.Files module includes the function getFileStatus, with the signature FilePath -> IO FileStatus. There's also the function isDirectory :: FileStatus -> Bool, so getFileStatus obviously works for directories, too.

Of course, using an OS-specific function where OS-agnostic code can do is a poor idea.

Solutions to Chapter 8 (p. 205)

#1. Simply evaluating globToRegex "[" gives the answer - error is evaluated. There are two interesting things to note here: first, there are two different places that can flag this error: in globalToRegex', which happens if the '[' is the last character in the string; or in charClass, which happens if the character class is nonempty but not terminated. Second, and more interesting: due to lazy evaluation, the error is flagged only after the rest of the regexp string was generated and printed. See, for example, the response from ghci here:

GlobRegex> globToRegex "whatnow?["
"^whatnow.*** Exception: unterminated character class

#2. There's probably something in POSIX regular expressions that allows for case insensitivity, but of course that's not interesting.

Let's add a new Bool parameter, which should be True if case is to be ignored. We'll also add a new helper function, escapeCase, which can return "xX" for any character x if case is ignored. escapeCase is used in two different contexts: inside a character class (a [...] block), we need a simple replacement, but outside of a character class, we need to create one, i.e., replace x with [xX]. Because escape is used to process any character outside of a character class block, it is the perfect case to handle this. Here goes:

module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where

import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)

globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"

globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""

globToRegex' ('*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "." ++ globToRegex' cs ign

globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"

globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign

escape :: Char -> Bool -> String
escape c _ | c `elem` regexChars = '\\' : [c]
  where regexChars = "\\+()^$.{}]"
escape c False = [c]
escape c True = '[' : (escapeCase c True) ++ "]"

escapeCase :: Char -> Bool -> String
escapeCase c True | lowerC /= upperC = [lowerC, upperC]
  where upperC = toUpper c
        lowerC = toLower c
escapeCase c _ = [c]

charClass :: String -> Bool -> String
charClass (']':cs) ign = ']' : globToRegex' cs ign
charClass (c:cs) ign = escapeCase c ign ++ charClass cs ign
charClass _ _ = error "unterminated character class"

matchesGlob :: FilePath -> String -> Bool
f `matchesGlob` g = f =~ globToRegex g False

matchesGlobIgnoreCase :: FilePath -> String -> Bool
f `matchesGlobIgnoreCase` g = f =~ globToRegex g True

Trying it out in ghci:

Prelude> :load "GlobRegex.hs"
[1 of 1] Compiling GlobRegex        ( GlobRegex.hs, interpreted )
Ok, modules loaded: GlobRegex.
*GlobRegex> globToRegex "hello" False
"^hello$"
*GlobRegex> globToRegex "hello" True
"^[hH][eE][lL][lL][oO]$"
*GlobRegex> globToRegex "HELLO" True
"^[hH][eE][lL][lL][oO]$"
*GlobRegex> globToRegex "foo[bar]" True
"^[fF][oO][oO][bBaArR]$"

Sunday, May 17, 2009

Solutions to Chapter 6

(Yes, I know I've skipped Chapter 5. It was a bit too boring; I might return to it later.)

#1. Whoa, isn't this a bit early for that? You can understand what second does by reading the previous example with great care, and trying to understand what it should do. The tricky part here is that Control.Arrow introduces the Arrow class, which acts just like Haskell's syntactical arrow: "->".

Let's use ghci to find out second's type:

Prelude> :module Control.Arrow
Prelude Control.Arrow> :type second
second :: (Arrow a) => a b c -> a (d, b) (d, c)

Definitely confusing. But let's replace a, of class Arrow, with a "simple" arrow:

second :: (->) b c -> (->) (d, b) (d, c)

(I've underlined the changes to make reading easier.) Moving to infix notation (think of "arrow" as an operator), we get:

second :: b -> c -> (d, b) -> (d, c)

This clarifies everything (really!). The first argument is a function, from type b to type c. The second argument is a pair: a d and a b. The function second can't really do anything with the d: it knows nothing about the type. But it can convert the b to a c, using the b->c function which is the first argument. And indeed, the result is a d (the first part of the second argument, unmodified) and a c (the second part of the second argument, after we've applied the function to it).

Confusing? Let's see this in action:

Prelude Control.Arrow> second head ("hello","world")
("hello",'w')
Prelude Control.Arrow> second tail ("hello","world")
("hello","orld")
Prelude Control.Arrow> second (head.tail.tail) ("hello","world")
("hello",'r')
Prelude Control.Arrow> second (+1) (10,20)
(10,21)

In all cases, the function provided is applied to the second element of the pair, while the first element is left untouched.

#2. To find something's type, just ask ghci:

Prelude> :type (,)
(,) :: a -> b -> (a, b)
Prelude> :type (,,)
(,,) :: a -> b -> c -> (a, b, c)

So these are operators that build tuples.