#1. Simply evaluating globToRegex "[" gives the answer - error is evaluated. There are two interesting things to note here: first, there are two different places that can flag this error: in globalToRegex', which happens if the '[' is the last character in the string; or in charClass, which happens if the character class is nonempty but not terminated. Second, and more interesting: due to lazy evaluation, the error is flagged only after the rest of the regexp string was generated and printed. See, for example, the response from ghci here:
GlobRegex> globToRegex "whatnow?[" "^whatnow.*** Exception: unterminated character class
#2. There's probably something in POSIX regular expressions that allows for case insensitivity, but of course that's not interesting.
Let's add a new Bool parameter, which should be True if case is to be ignored. We'll also add a new helper function, escapeCase, which can return "xX" for any character x if case is ignored. escapeCase is used in two different contexts: inside a character class (a [...] block), we need a simple replacement, but outside of a character class, we need to create one, i.e., replace x with [xX]. Because escape is used to process any character outside of a character class block, it is the perfect case to handle this. Here goes:
module GlobRegex (globToRegex, matchesGlob, matchesGlobIgnoreCase) where
import Text.Regex.Posix ((=~))
import Data.Char (toUpper, toLower)
globToRegex :: String -> Bool -> String
globToRegex cs ign = '^' : globToRegex' cs ign ++ "$"
globToRegex' :: String -> Bool -> String
globToRegex' "" _ = ""
globToRegex' ('*':cs) ign = ".*" ++ globToRegex' cs ign
globToRegex' ('?':cs) ign = "." ++ globToRegex' cs ign
globToRegex' ('[':'!':c:cs) ign = "[^" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':c:cs) ign = "[" ++ (escapeCase c ign) ++ (charClass cs ign)
globToRegex' ('[':_) _ = error "unterminated character class"
globToRegex' (c:cs) ign = (escape c ign) ++ globToRegex' cs ign
escape :: Char -> Bool -> String
escape c _ | c `elem` regexChars = '\\' : [c]
where regexChars = "\\+()^$.{}]"
escape c False = [c]
escape c True = '[' : (escapeCase c True) ++ "]"
escapeCase :: Char -> Bool -> String
escapeCase c True | lowerC /= upperC = [lowerC, upperC]
where upperC = toUpper c
lowerC = toLower c
escapeCase c _ = [c]
charClass :: String -> Bool -> String
charClass (']':cs) ign = ']' : globToRegex' cs ign
charClass (c:cs) ign = escapeCase c ign ++ charClass cs ign
charClass _ _ = error "unterminated character class"
matchesGlob :: FilePath -> String -> Bool
f `matchesGlob` g = f =~ globToRegex g False
matchesGlobIgnoreCase :: FilePath -> String -> Bool
f `matchesGlobIgnoreCase` g = f =~ globToRegex g True
Trying it out in ghci:
Prelude> :load "GlobRegex.hs" [1 of 1] Compiling GlobRegex ( GlobRegex.hs, interpreted ) Ok, modules loaded: GlobRegex. *GlobRegex> globToRegex "hello" False "^hello$" *GlobRegex> globToRegex "hello" True "^[hH][eE][lL][lL][oO]$" *GlobRegex> globToRegex "HELLO" True "^[hH][eE][lL][lL][oO]$" *GlobRegex> globToRegex "foo[bar]" True "^[fF][oO][oO][bBaArR]$"
0 comments:
Post a Comment