#1. The order clearly matters. If the handle only applied to the internal of the bracketed operation, then errors in the bracketing itself -- namely, the openFile and hClose operations -- would not have been handled by it.
Thursday, May 28, 2009
Wednesday, May 27, 2009
Solutions to Chapter 8 (p. 212)
#1. For a fantastic discussion of implementing a simple regular-expression matcher, see the first chapter of Beautiful Code -- probably the most beautiful chapter of that whole book. And as for the solution:
matchesGlob2 :: FilePath -> String -> Bool
matchesGlob2 fp "" = null fp
matchesGlob2 (c:fp) ('?':pat) = matchesGlob2 fp pat
matchesGlob2 fp ('*':pat) = matchStar fp pat
matchesGlob2 fp ('[':pat) = matchCharClass fp pat
matchesGlob2 (c:fp) (p:pat) = (c == p) && matchesGlob2 fp pat
matchesGlob2 "" _ = False
matchCharClass :: FilePath -> String -> Bool
-- string is list of chars to match ++ "]" ++ rest of pattern.
-- if starts with "!", negate meaning.
matchCharClass _ pat | not (']' `elem` pat) = error "Unmatched '['"
matchCharClass (c:fp) ('!':pat) = (c `notElem` (getCharClass pat)) && matchesGlob2 fp (afterCharClass pat)
matchCharClass (c:fp) pat = (c `elem` (getCharClass pat)) && matchesGlob2 fp (afterCharClass pat)
getCharClass :: String -> String
-- assumes pat contains ']'
getCharClass pat = takeWhile (\x -> x /= ']') pat
afterCharClass :: String -> String
-- assumes pat contains ']'
afterCharClass pat = tail $ dropWhile (\x -> x /= ']') pat -- tail drops ']'
matchStar :: FilePath -> String -> Bool
matchStar "" pat = matchesGlob2 "" pat
matchStar fp pat = matchesGlob2 fp pat || matchStar (tail fp) pat
Tuesday, May 26, 2009
Solutions to Chapter 8 (p. 211)
#1. Contrary to the authors' claim, this one is pretty simple -- if you define a helper function. Here, the function prepend either prepends a string to a Right value, or propagates unchanged a Left value. We then use it to update all equations defining globToRegex' and charClass. The main function, globToRegex, is also updated to handle either case.
(Note: the update is applied to the original version of globToRegex, not to the modified versions created in previous exercises.)
module GlobRegex (globToRegex, matchesGlob) where
import Text.Regex.Posix ((=~))
type GlobError = String
globToRegex :: String -> Either GlobError String
globToRegex cs = case internal of
Left error -> Left error
Right val -> Right ('^' : val ++ "$")
where internal = globToRegex' cs
globToRegex' :: String -> Either GlobError String
globToRegex' "" = Right ""
globToRegex' ('*':cs) = prepend ".*" (globToRegex' cs)
globToRegex' ('?':cs) = prepend "." (globToRegex' cs)
globToRegex' ('[':'!':c:cs) = prepend ("[^" ++ [c]) (charClass cs)
globToRegex' ('[':c:cs) = prepend ['[', c] (charClass cs)
globToRegex' ('[':_) = Left "unterminated character class"
globToRegex' (c:cs) = prepend (escape c) (globToRegex' cs)
escape :: Char -> String
escape c | c `elem` regexChars = '\\' : [c]
| otherwise = [c]
where regexChars = "\\+()^$.{}]"
charClass :: String -> Either GlobError String
charClass (']':cs) = prepend "]" (globToRegex' cs)
charClass (c:cs) = prepend [c] (charClass cs)
charClass _ = Left "unterminated character class"
prepend :: String -> Either GlobError String -> Either GlobError String
prepend prefix (Left error) = Left error
prepend prefix (Right str) = Right (prefix ++ str)
#2. namesMatching never uses globToRegex directly; it uses matchesGlob, which we update thus:
matchesGlob :: FilePath -> String -> Bool
f `matchesGlob` g = case (globToRegex g) of
Right regex -> f =~ regex
Left err -> False
In other words, a bad glob expression simply matches nothing.
Saturday, May 23, 2009
Solutions to Chapter 8 (p. 210), question 3
#3. What a tricky question!
No OS shell that I'm aware of interprets "**" the way the authors suggest, but I decided to give it a go anyway. I'll present the solution in steps.
The first part is changing globToRegex. We want "*" to not match anything with a path-separator (slash or backslash, depending on your OS) in it; and we want "**" do to match such strings. So, editing GlobRegex.hs, with changes bolded:
module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where
import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
import import System.FilePath (pathSeparator)
globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"
globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""
globToRegex' ('*':'*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('*':cs) ign = "[^" ++ [pathSeparator] ++ "]*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "[^]" ++ globToRegex' cs ign
globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"
globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign
... the rest of the file is unchanged -- copied from answer #2 here...
What we're going to do, for "**" patterns, is pretty inefficient: collect all possible filepaths, and then filter out the matching ones. Starting top-down, we modify the namesMatching function (based on the definition from the previous post):
import Data.List (isInfixOf)
isPattern :: String -> Bool
isPattern = any (`elem` "[*?")
isDoubleStarPattern :: String -> Bool
isDoubleStarPattern = isInfixOf "**"
namesMatching :: String -> IO [FilePath]
namesMatching pat
| not (isPattern pat) = do
exists <- doesNameExist pat
return (if exists then [pat] else [])
| isDoubleStarPattern pat = do
names <- allFilenames
return (filter (flip matchesGlob pat) names)
| otherwise = do
case splitFileName pat of
... continues unchanged ...
The allFilenames pattern does just what its name suggests: collects all possible filepaths, working recursively from the current directory. Like this:
allFilenames :: Char -> IO [FilePath]
allFilenames = do
contents <- getDirectoryContents "."
allFilenames' (return []) "" contents
allFilenames' :: IO [FilePath] -> FilePath -> [FilePath] -> IO [FilePath]
-- The first argument (acc) is the result accumulator.
-- The second argument (pref) is the path prefix to apply to all fetched filenames.
-- The third argument (f:fs) is a stack of file or directory names to process.
allFilenames' acc _ [] = acc
allFilenames' acc pref (f:fs)
| (f == ".") = allFilenames' acc pref fs
| (f == "..") = allFilenames' acc pref fs
| otherwise = do
dirExists <- doesDirectoryExist (pref ++ f)
dirCont <- if dirExists
then getDirectoryContents (pref ++ f)
else return []
if dirExists
then do -- f is a directory, recurse into it
restOfCurrent <- (allFilenames' acc pref fs)
recurse <- (allFilenames' (return []) (pref ++ f ++ [pathSeparator]) dirCont)
return (restOfCurrent ++ recurse)
else do -- f is a file, add it to the accumulator and proceed
acc' <- acc
allFilenames' (return ((pref ++ f):acc')) pref fs
This works just fine. (A much neater version is presented early in Chapter 9.) The solution, however, has a few limitations:
- It does not correctly handle patterns starting with "/" (or "\" under Windows).
- If it tries to recurse into a directory where the current user has no read access, it fails.
- It is not case-insensitive under Windows.
allFilenames :: Char -> IO [FilePath]
allFilenames fst = do
contents <- getDirectoryContents start
allFilenames' (return []) prefix contents
where start = if fst == pathSeparator then [pathSeparator] else "."
prefix = if fst == pathSeparator then [pathSeparator] else ""
We have to patch the call to this function accordingly, in namesMatching:
namesMatching :: String -> IO [FilePath]
namesMatching pat
| not (isPattern pat) = do
exists <- doesNameExist pat
return (if exists then [pat] else [])
| isDoubleStarPattern pat = do --xx
names <- allFilenames $ head pat
return (filter (flip matchesGlob pat) names)
... continues unchanged ...
The other limitations are left as an exercise to the reader.
As noted in the beginning, this solution collects all names first, and filters on matchesGlob later. A probably-faster alternative is to include the filtering in allFilenames', so that when each item is added for addition to the accumulator (acc), we actually test it for a match before adding. A more effective and elegant solution is presented by the authors in Chapter 9.
Tuesday, May 19, 2009
Solutions to Chapter 8 (p. 210), questions 1-2
First, if you're having trouble loading Glob.hs, that's probably because you're using a too-new version of GHC. Try replacing the line
import Control.Exception (handle)
with
import Control.OldException (handle)
to use a backwards-compatible version of handle, which solves this problem.
#1. The trick to detecting the current OS is examining the path-separator character. It's '/' on Unix and Unix-like systems, and '\\' (a backslash) on Windows.
I'll be using the GlobRegex module which was updated in the previous post, which includes the matchesGlobIgnoreCase function. Here's the updated Glog.hs; changed or new lines are in bold.
import System.Directory (doesDirectoryExist, doesFileExist,
getCurrentDirectory, getDirectoryContents)
import System.FilePath (dropTrailingPathSeparator, splitFileName, (>))
import Control.OldException (handle)
import Control.Monad (forM)
import GlobRegex (matchesGlob, matchesGlobIgnoreCase)
import System.FilePath (pathSeparator)
isPattern :: String -> Bool
isPattern = any (`elem` "[*?")
namesMatching :: String -> IO [FilePath]
namesMatching pat
| not (isPattern pat) = do
exists <- doesNameExist pat
return (if exists then [pat] else [])
| otherwise = do
case splitFileName pat of
("", baseName) -> do
curDir <- getCurrentDirectory
listMatches curDir baseName
(dirName, baseName) -> do
dirs <- if isPattern dirName
then namesMatching (dropTrailingPathSeparator dirName)
else return [dirName]
let listDir = if isPattern baseName
then listMatches
else listPlain
pathNames <- forM dirs $ \dir -> do
baseNames <- listDir dir baseName
return (map (dir >) baseNames)
return (concat pathNames)
doesNameExist :: FilePath -> IO Bool
doesNameExist name = do
fileExists <- doesFileExist name
if fileExists then return True else doesDirectoryExist name
listMatches :: FilePath -> String -> IO [FilePath]
listMatches dirName pat = do
dirName' <- if null dirName
then getCurrentDirectory
else return dirName
let matcher = if isOsCaseInsensitive
then matchesGlobIgnoreCase
else matchesGlob
handle (const (return [])) $ do
names <- getDirectoryContents dirName'
let names' = if isHidden pat
then filter isHidden names
else filter (not . isHidden) names
return (filter (`matcher` pat) names')
isHidden ('.':_) = True
isHidden _ = False
listPlain :: FilePath -> String -> IO [FilePath]
listPlain dirName baseName = do
exists <- if null baseName
then doesDirectoryExist dirName
else doesNameExist (dirName > baseName)
return (if exists then [baseName] else [])
-- Only the case-insensitive Windows uses backlash as a path separator
isOsCaseInsensitive :: Bool
isOsCaseInsensitive = pathSeparator == '\\'
#2. The System.Posix.Files module includes the function getFileStatus, with the signature FilePath -> IO FileStatus. There's also the function isDirectory :: FileStatus -> Bool, so getFileStatus obviously works for directories, too.
Of course, using an OS-specific function where OS-agnostic code can do is a poor idea.
Solutions to Chapter 8 (p. 205)
#1. Simply evaluating globToRegex "[" gives the answer - error is evaluated. There are two interesting things to note here: first, there are two different places that can flag this error: in globalToRegex', which happens if the '[' is the last character in the string; or in charClass, which happens if the character class is nonempty but not terminated. Second, and more interesting: due to lazy evaluation, the error is flagged only after the rest of the regexp string was generated and printed. See, for example, the response from ghci here:
GlobRegex> globToRegex "whatnow?[" "^whatnow.*** Exception: unterminated character class
#2. There's probably something in POSIX regular expressions that allows for case insensitivity, but of course that's not interesting.
Let's add a new Bool parameter, which should be True if case is to be ignored. We'll also add a new helper function, escapeCase, which can return "xX" for any character x if case is ignored. escapeCase is used in two different contexts: inside a character class (a [...] block), we need a simple replacement, but outside of a character class, we need to create one, i.e., replace x with [xX]. Because escape is used to process any character outside of a character class block, it is the perfect case to handle this. Here goes:
module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where
import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"
globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""
globToRegex' ('*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "." ++ globToRegex' cs ign
globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"
globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign
escape :: Char -> Bool -> String
escape c _ | c `elem` regexChars = '\\' : [c]
where regexChars = "\\+()^$.{}]"
escape c False = [c]
escape c True = '[' : (escapeCase c True) ++ "]"
escapeCase :: Char -> Bool -> String
escapeCase c True | lowerC /= upperC = [lowerC, upperC]
where upperC = toUpper c
lowerC = toLower c
escapeCase c _ = [c]
charClass :: String -> Bool -> String
charClass (']':cs) ign = ']' : globToRegex' cs ign
charClass (c:cs) ign = escapeCase c ign ++ charClass cs ign
charClass _ _ = error "unterminated character class"
matchesGlob :: FilePath -> String -> Bool
f `matchesGlob` g = f =~ globToRegex g False
matchesGlobIgnoreCase :: FilePath -> String -> Bool
f `matchesGlobIgnoreCase` g = f =~ globToRegex g True
Trying it out in ghci:
Prelude> :load "GlobRegex.hs" [1 of 1] Compiling GlobRegex ( GlobRegex.hs, interpreted ) Ok, modules loaded: GlobRegex. *GlobRegex> globToRegex "hello" False "^hello$" *GlobRegex> globToRegex "hello" True "^[hH][eE][lL][lL][oO]$" *GlobRegex> globToRegex "HELLO" True "^[hH][eE][lL][lL][oO]$" *GlobRegex> globToRegex "foo[bar]" True "^[fF][oO][oO][bBaArR]$"
Sunday, May 17, 2009
Solutions to Chapter 6
(Yes, I know I've skipped Chapter 5. It was a bit too boring; I might return to it later.)
#1. Whoa, isn't this a bit early for that? You can understand what second does by reading the previous example with great care, and trying to understand what it should do. The tricky part here is that Control.Arrow introduces the Arrow class, which acts just like Haskell's syntactical arrow: "->".
Let's use ghci to find out second's type:
Prelude> :module Control.Arrow Prelude Control.Arrow> :type second second :: (Arrow a) => a b c -> a (d, b) (d, c)
Definitely confusing. But let's replace a, of class Arrow, with a "simple" arrow:
second :: (->) b c -> (->) (d, b) (d, c)
(I've underlined the changes to make reading easier.) Moving to infix notation (think of "arrow" as an operator), we get:
second :: b -> c -> (d, b) -> (d, c)
This clarifies everything (really!). The first argument is a function, from type b to type c. The second argument is a pair: a d and a b. The function second can't really do anything with the d: it knows nothing about the type. But it can convert the b to a c, using the b->c function which is the first argument. And indeed, the result is a d (the first part of the second argument, unmodified) and a c (the second part of the second argument, after we've applied the function to it).
Confusing? Let's see this in action:
Prelude Control.Arrow> second head ("hello","world")
("hello",'w')
Prelude Control.Arrow> second tail ("hello","world")
("hello","orld")
Prelude Control.Arrow> second (head.tail.tail) ("hello","world")
("hello",'r')
Prelude Control.Arrow> second (+1) (10,20)
(10,21)
In all cases, the function provided is applied to the second element of the pair, while the first element is left untouched.
#2. To find something's type, just ask ghci:
Prelude> :type (,) (,) :: a -> b -> (a, b) Prelude> :type (,,) (,,) :: a -> b -> c -> (a, b, c)
So these are operators that build tuples.