Saturday, May 23, 2009

Solutions to Chapter 8 (p. 210), question 3

#3. What a tricky question!

No OS shell that I'm aware of interprets "**" the way the authors suggest, but I decided to give it a go anyway. I'll present the solution in steps.

The first part is changing globToRegex. We want "*" to not match anything with a path-separator (slash or backslash, depending on your OS) in it; and we want "**" do to match such strings. So, editing GlobRegex.hs, with changes bolded:

module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where

import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
import import System.FilePath (pathSeparator)

globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"

globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""

globToRegex' ('*':'*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('*':cs) ign = "[^" ++ [pathSeparator] ++ "]*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "[^]" ++ globToRegex' cs ign

globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"

globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign

... the rest of the file is unchanged -- copied from answer #2 here...

What we're going to do, for "**" patterns, is pretty inefficient: collect all possible filepaths, and then filter out the matching ones. Starting top-down, we modify the namesMatching function (based on the definition from the previous post):

import Data.List (isInfixOf)

isPattern :: String -> Bool
isPattern = any (`elem` "[*?")

isDoubleStarPattern :: String -> Bool
isDoubleStarPattern = isInfixOf "**"

namesMatching :: String -> IO [FilePath]
namesMatching pat
  | not (isPattern pat) = do
      exists <- doesNameExist pat
      return (if exists then [pat] else [])
  | isDoubleStarPattern pat = do
      names <- allFilenames
      return (filter (flip matchesGlob pat) names)
  | otherwise = do
      case splitFileName pat of
        ... continues unchanged ...

The allFilenames pattern does just what its name suggests: collects all possible filepaths, working recursively from the current directory. Like this:

allFilenames :: Char -> IO [FilePath]
allFilenames = do
  contents <- getDirectoryContents "."
  allFilenames' (return []) "" contents

allFilenames' :: IO [FilePath] -> FilePath -> [FilePath] -> IO [FilePath]
-- The first argument (acc) is the result accumulator.
-- The second argument (pref) is the path prefix to apply to all fetched filenames.
-- The third argument (f:fs) is a stack of file or directory names to process.
allFilenames' acc _ [] = acc
allFilenames' acc pref (f:fs)
  | (f == ".") = allFilenames' acc pref fs
  | (f == "..") = allFilenames' acc pref fs
  | otherwise = do
      dirExists <- doesDirectoryExist (pref ++ f)
      dirCont <- if dirExists
                 then getDirectoryContents (pref ++ f)
                 else return []
      if dirExists 
         then do -- f is a directory, recurse into it
           restOfCurrent <- (allFilenames' acc pref fs)
           recurse <- (allFilenames' (return []) (pref ++ f ++ [pathSeparator]) dirCont)
           return (restOfCurrent ++ recurse)
         else do -- f is a file, add it to the accumulator and proceed
           acc' <- acc
           allFilenames' (return ((pref ++ f):acc')) pref fs

This works just fine. (A much neater version is presented early in Chapter 9.) The solution, however, has a few limitations:

  • It does not correctly handle patterns starting with "/" (or "\" under Windows).
  • If it tries to recurse into a directory where the current user has no read access, it fails.
  • It is not case-insensitive under Windows.
Let's fix the first problem first. By making allFilenames accept a single character -- the head of the glob pattern -- as a parameter, we can make it start from "." or from "/" (or "\"), as the case might be:

allFilenames :: Char -> IO [FilePath]
allFilenames fst = do
  contents <- getDirectoryContents start
  allFilenames' (return []) prefix contents
  where start = if fst == pathSeparator then [pathSeparator] else "."
        prefix = if fst == pathSeparator then [pathSeparator] else ""

We have to patch the call to this function accordingly, in namesMatching:

namesMatching :: String -> IO [FilePath]
namesMatching pat
  | not (isPattern pat) = do
      exists <- doesNameExist pat
      return (if exists then [pat] else [])
  | isDoubleStarPattern pat = do  --xx
      names <- allFilenames $ head pat
      return (filter (flip matchesGlob pat) names)
... continues unchanged ...

The other limitations are left as an exercise to the reader.

As noted in the beginning, this solution collects all names first, and filters on matchesGlob later. A probably-faster alternative is to include the filtering in allFilenames', so that when each item is added for addition to the accumulator (acc), we actually test it for a match before adding. A more effective and elegant solution is presented by the authors in Chapter 9.

1 comment: