[cl-ppcre-devel] Re: Question: PCRE -> Thompson NFA Implementation

Edi Weitz edi at agharta.de
Tue Feb 13 21:11:39 UTC 2007


Hi,

first of all:

1. Please use the mailing list.

     http://weitz.de/cl-ppcre/#mail

2. It's called CL-PPCRE and not PCRE.  PCRE is something else... :)

On Tue, 13 Feb 2007 12:23:36 -0800 (PST), Brent Fulgham <bfulg at pacbell.net> wrote:

> Although your Lisp implementation of a Perl-compatible regular
> expression engine already handily beats the original Perl version,
> it could be modified to be even faster for expressions that do not
> contain back-references.  See the following article that discusses
> the 1960's-era algorithm used in Awk/Grep that discusses this
> (http://swtch.com/~rsc/regexp/regexp1.html).
>
> In my testing, CL-PCRE isn't quite faster than Perl, though it makes
> a very creditable showing
> (http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna〈=all).
> Tcl, which uses the "Thompson DFA" algorithm discussed in the paper
> I referenced is nearly an order of magnitude faster on this
> benchmark than Perl.
>
> Please let me know if you have any interest exploring this.  I might
> try to play with this and see if I can make any headway...

I'm aware of the advantages of DFAs over NFAs for "simple" regular
expressions, but I shied away from them until now because having two
engines in CL-PPCRE would make the code base even bigger and more
complicated than it already is.  (And having /only/ a DFA engine
wouldn't be enough, right?  I haven't read the article yet, but I'm
pretty sure you'd have to let go some of Perl's more advanced regex
features.)

Also, although I boast about CL-PPCRE's performance on its web site,
I'm not too concerned about its speed anymore.  It's fast enough for
what I'm doing with it.

Having said that, the idea of automatically switching to a fast DFA
engine if possible (I guess this is what you want to do) is kind of
tempting.  If you come up with something that's really a big
improvement and adheres to CL-PPCREs current coding and documentation
standards, I'd be willing to review and possibly integrate it.  Right
now, I'm to busy to help with that, though.

Cheers,
Edi.



More information about the Cl-ppcre-devel mailing list