[cl-ppcre-devel] Re: cl-ppcre

Edi Weitz edi at agharta.de
Wed Jul 7 01:08:41 UTC 2004


Sorry for the delay, I had moved this email into the wrong IMAP
folder... :(

On Sun, 20 Jun 2004 12:07:08 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote:

>   After more regexp experiments I found that the main difference
> between Perl and GNU Regexp is not the syntax of regexps (as I
> naively thought), but the definition of "the best match" (especially
> for `|' alt node).
>
>   One can agree with Perl man pages, that Perl definition could be
> better (and more comprehensible) for handwritten regexps. Is "first
> match" strategy also better for writing lexers? I doubt.
>
>   Consider languages where some word (token) can be prefix of
> another word. This is not unusual: remember that in Lisp `12345' is
> number and `12345a' is symbol :)
>
>   While writing "first match" lexer (and your deflexer macro is
> "first match" lexer) one has to be careful with rules ordering and
> think about possible prefix ambiguity:

Yes. But if you prefer not to be careful you'll definitely sacrifice
performance...

>   My conclusion is, that 'success node is meaningful only for
> "longest match" regexps engines, because one can expect, that such
> engine could do better than match all 'alt nodes in sequence and
> return the longest match.
>
>   My new question is: how hard it would be to add :longest-match
> option to create-scanner?

Pretty hard. This is not going to be done by me. However, if you
manage to add this yourself without breaking the rest of CL-PPCRE (and
without making it slower) I'll gladly accept your patches.

> ps: I am not subscribed to cl-ppcre-devel mailing list. Please "Cc:"
> me your replies.

Subscribing to the list is easy and the list is low-volume. If you'd
like to continue this discussion please either subscribe to the list
or use it via nntp:

  <http://common-lisp.net/nntp.shtml>

Cheers,
Edi.




More information about the Cl-ppcre-devel mailing list