[cl-ppcre-devel] Detecting "partial" matches

Wed Apr 26 20:46:35 UTC 2006

Hi,

I'm using CL-PPCRE to develop a character-at-a-time lexer. This is
causing me some perplexity, though, with regexes like the common
notation for hexadecimal literals:

"^0(?:x[0-9A-Fa-f]+)?$"

This should match both the string "0", between positions 0 and 1, as
just a bare literal zero, and should also match things like "0xa6"
between positions 0 and 3, but should not match simply "0x". But I want
the longest match possible, so (for example) I'd like to know that while
"0x" didn't match, parts of the regex *did* match and might produce a
"real" match depending on what comes after "x".

So, in succession, if the input is "0xa6 ", my scanner gets called thus:

1. Input: "0". a) A match. b) But it *could* possibly match more, depending on
   what comes next.
2. Input: "0x". a) Not a match. b) But, once again, the possibility
   exists that more input could still produce a longer match than "0".
3. Input: "0xa". a) A match. b) Because of the "+" attached to the
   character class, a longer match is still possible.
4. Input: "0xa6". a) A match. b) As above.
5. Input: "0xa6 ". a) Not a match. b) Will *never* match no matter how
   much more input you add to it.

CL-PPCRE just tells me a), and I also want to know b). Is there any way
to get this information (if it even exists) out of the scanner? 

TIA,

-Dan
--
Dan Debertin             | 
airboss at nodewarrior.org  | 
www.nodewarrior.org      |