[cl-ppcre-devel] Detecting "partial" matches
Dan Debertin
airboss at nodewarrior.org
Wed Apr 26 20:46:35 UTC 2006
Hi,
I'm using CL-PPCRE to develop a character-at-a-time lexer. This is
causing me some perplexity, though, with regexes like the common
notation for hexadecimal literals:
"^0(?:x[0-9A-Fa-f]+)?$"
This should match both the string "0", between positions 0 and 1, as
just a bare literal zero, and should also match things like "0xa6"
between positions 0 and 3, but should not match simply "0x". But I want
the longest match possible, so (for example) I'd like to know that while
"0x" didn't match, parts of the regex *did* match and might produce a
"real" match depending on what comes after "x".
So, in succession, if the input is "0xa6 ", my scanner gets called thus:
1. Input: "0". a) A match. b) But it *could* possibly match more, depending on
what comes next.
2. Input: "0x". a) Not a match. b) But, once again, the possibility
exists that more input could still produce a longer match than "0".
3. Input: "0xa". a) A match. b) Because of the "+" attached to the
character class, a longer match is still possible.
4. Input: "0xa6". a) A match. b) As above.
5. Input: "0xa6 ". a) Not a match. b) Will *never* match no matter how
much more input you add to it.
CL-PPCRE just tells me a), and I also want to know b). Is there any way
to get this information (if it even exists) out of the scanner?
TIA,
-Dan
--
Dan Debertin |
airboss at nodewarrior.org |
www.nodewarrior.org |
More information about the Cl-ppcre-devel
mailing list