[cl-ppcre-devel] Matching on very long strings.
Matthew D. Swank
akopa.gmane.poster at gmail.com
Sun Sep 28 20:28:07 UTC 2008
On Sun, 28 Sep 2008 21:31:05 +0200
Edi Weitz <edi at agharta.de> wrote:
> On Sun, 28 Sep 2008 14:15:40 -0500, "Matthew D. Swank"
> <akopa.gmane.poster at gmail.com> wrote:
>
> > I tried using a contruct like `(:sequence :start-anchor (:regex
> > ,regex)) where regex is a pcre string, but matching still takes for
> > ever (as in I gave up after 10 min) when slurping a moderately sized
> > file (400k). Note, matching works fine for files under 1k, or if I
> > break it up into lines for line oriented input.
>
> Show us the regex you were using and some test data and then maybe we
> can help you to optimize it.
>
> I suppose you read this?
>
> http://weitz.de/cl-ppcre/#blabla
>
> Edi.
> _______________________________________________
> cl-ppcre-devel site list
> cl-ppcre-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
Well the regexes are defined in the lexers in this file:
http://common-lisp.net/~mswank/apache-ppcre.lisp
The lexer api is in this file:
http://common-lisp.net/~mswank/cl-ppcre-lexer.lisp
Finally, the log file I'm lexing:
http://lcpug.asternix.com/pub/Main/ApacheLogProject/access.log
Compare
(with-open-file (in "access.log")
(let ((foo (stream-gen *apache-pcrelex-line* in)))
(time (loop :for x := (funcall foo)
:unless x :return nil))))
with
(with-open-file (in "access.log")
(let ((foo (stream-gen *apache-pcrelex* in)))
(time (loop :for x := (funcall foo)
:unless x :return nil))))
When I slurp the entire file into a string the matches seem to be
taking about a tenth of a second for each token.
Matt
--
"You do not really understand something unless you can explain it to
your grandmother." -- Albert Einstein.
More information about the Cl-ppcre-devel
mailing list