[cl-ppcre-devel] Matching on very long strings.

Matthew D. Swank akopa.gmane.poster at gmail.com
Sun Sep 28 20:28:07 UTC 2008


On Sun, 28 Sep 2008 21:31:05 +0200
Edi Weitz <edi at agharta.de> wrote:

> On Sun, 28 Sep 2008 14:15:40 -0500, "Matthew D. Swank"
> <akopa.gmane.poster at gmail.com> wrote:
> 
> > I tried using a contruct like `(:sequence :start-anchor (:regex
> > ,regex)) where regex is a pcre string, but matching still takes for
> > ever (as in I gave up after 10 min) when slurping a moderately sized
> > file (400k).  Note, matching works fine for files under 1k, or if I
> > break it up into lines for line oriented input.
> 
> Show us the regex you were using and some test data and then maybe we
> can help you to optimize it.
> 
> I suppose you read this?
> 
>   http://weitz.de/cl-ppcre/#blabla
> 
> Edi.
> _______________________________________________
> cl-ppcre-devel site list
> cl-ppcre-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel

Well the regexes are defined in the lexers in this file:
http://common-lisp.net/~mswank/apache-ppcre.lisp

The lexer api is in this file:
http://common-lisp.net/~mswank/cl-ppcre-lexer.lisp

Finally, the log file I'm lexing:
http://lcpug.asternix.com/pub/Main/ApacheLogProject/access.log

Compare 
(with-open-file  (in "access.log") 
  (let ((foo (stream-gen *apache-pcrelex-line* in))) 
    (time (loop :for x := (funcall foo)
                :unless x :return nil))))

with

(with-open-file  (in "access.log") 
  (let ((foo (stream-gen *apache-pcrelex* in))) 
    (time (loop :for x := (funcall foo)
                :unless x :return nil))))

When I slurp the entire file into a string the matches seem to be
taking about a tenth of a second for each token.


Matt

-- 
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.



More information about the Cl-ppcre-devel mailing list