#3. What a tricky question!
No OS shell that I'm aware of interprets "**" the way the authors suggest, but I decided to give it a go anyway. I'll present the solution in steps.
The first part is changing globToRegex. We want "*" to not match anything with a path-separator (slash or backslash, depending on your OS) in it; and we want "**" do to match such strings. So, editing GlobRegex.hs, with changes bolded:
module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where
import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
import import System.FilePath (pathSeparator)
globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"
globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""
globToRegex' ('*':'*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('*':cs) ign = "[^" ++ [pathSeparator] ++ "]*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "[^]" ++ globToRegex' cs ign
globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"
globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign
... the rest of the file is unchanged -- copied from answer #2 here...
What we're going to do, for "**" patterns, is pretty inefficient: collect all possible filepaths, and then filter out the matching ones. Starting top-down, we modify the namesMatching function (based on the definition from the previous post):
import Data.List (isInfixOf)
isPattern :: String -> Bool
isPattern = any (`elem` "[*?")
isDoubleStarPattern :: String -> Bool
isDoubleStarPattern = isInfixOf "**"
namesMatching :: String -> IO [FilePath]
namesMatching pat
| not (isPattern pat) = do
exists <- doesNameExist pat
return (if exists then [pat] else [])
| isDoubleStarPattern pat = do
names <- allFilenames
return (filter (flip matchesGlob pat) names)
| otherwise = do
case splitFileName pat of
... continues unchanged ...
The allFilenames pattern does just what its name suggests: collects all possible filepaths, working recursively from the current directory. Like this:
allFilenames :: Char -> IO [FilePath]
allFilenames = do
contents <- getDirectoryContents "."
allFilenames' (return []) "" contents
allFilenames' :: IO [FilePath] -> FilePath -> [FilePath] -> IO [FilePath]
-- The first argument (acc) is the result accumulator.
-- The second argument (pref) is the path prefix to apply to all fetched filenames.
-- The third argument (f:fs) is a stack of file or directory names to process.
allFilenames' acc _ [] = acc
allFilenames' acc pref (f:fs)
| (f == ".") = allFilenames' acc pref fs
| (f == "..") = allFilenames' acc pref fs
| otherwise = do
dirExists <- doesDirectoryExist (pref ++ f)
dirCont <- if dirExists
then getDirectoryContents (pref ++ f)
else return []
if dirExists
then do -- f is a directory, recurse into it
restOfCurrent <- (allFilenames' acc pref fs)
recurse <- (allFilenames' (return []) (pref ++ f ++ [pathSeparator]) dirCont)
return (restOfCurrent ++ recurse)
else do -- f is a file, add it to the accumulator and proceed
acc' <- acc
allFilenames' (return ((pref ++ f):acc')) pref fs
This works just fine. (A much neater version is presented early in Chapter 9.) The solution, however, has a few limitations:
- It does not correctly handle patterns starting with "/" (or "\" under Windows).
- If it tries to recurse into a directory where the current user has no read access, it fails.
- It is not case-insensitive under Windows.
allFilenames :: Char -> IO [FilePath]
allFilenames fst = do
contents <- getDirectoryContents start
allFilenames' (return []) prefix contents
where start = if fst == pathSeparator then [pathSeparator] else "."
prefix = if fst == pathSeparator then [pathSeparator] else ""
We have to patch the call to this function accordingly, in namesMatching:
namesMatching :: String -> IO [FilePath]
namesMatching pat
| not (isPattern pat) = do
exists <- doesNameExist pat
return (if exists then [pat] else [])
| isDoubleStarPattern pat = do --xx
names <- allFilenames $ head pat
return (filter (flip matchesGlob pat) names)
... continues unchanged ...
The other limitations are left as an exercise to the reader.
As noted in the beginning, this solution collects all names first, and filters on matchesGlob later. A probably-faster alternative is to include the filtering in allFilenames', so that when each item is added for addition to the accumulator (acc), we actually test it for a match before adding. A more effective and elegant solution is presented by the authors in Chapter 9.
zsh implements '**' as given
ReplyDelete