[cl-ppcre-devel] Matching on very long strings.

Sébastien Saint-Sevin seb-cl-mailist at matchix.com
Mon Sep 29 11:16:29 UTC 2008


Hi Matthew,

You are probably not doing the same thing with the "line oriented approach" and the "full file in 
one string" approach.

With full file in, if not taking care of stopping the scan at end of each line (if you want a line 
by line scanning as you suggest by trying such an approach as well), I guess your are scanning until 
the end of the full string for each line (which for sure is very expensive).

But that's just a guess as I've only had a very quick look to your code :-)

Cheers,
Sebastien.

Matthew D. Swank a écrit :
> On Sun, 28 Sep 2008 21:31:05 +0200
> Edi Weitz <edi at agharta.de> wrote:
> 
>> On Sun, 28 Sep 2008 14:15:40 -0500, "Matthew D. Swank"
>> <akopa.gmane.poster at gmail.com> wrote:
>>
>>> I tried using a contruct like `(:sequence :start-anchor (:regex
>>> ,regex)) where regex is a pcre string, but matching still takes for
>>> ever (as in I gave up after 10 min) when slurping a moderately sized
>>> file (400k).  Note, matching works fine for files under 1k, or if I
>>> break it up into lines for line oriented input.
>> Show us the regex you were using and some test data and then maybe we
>> can help you to optimize it.
>>
>> I suppose you read this?
>>
>>   http://weitz.de/cl-ppcre/#blabla
>>
>> Edi.
>> _______________________________________________
>> cl-ppcre-devel site list
>> cl-ppcre-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
> 
> Well the regexes are defined in the lexers in this file:
> http://common-lisp.net/~mswank/apache-ppcre.lisp
> 
> The lexer api is in this file:
> http://common-lisp.net/~mswank/cl-ppcre-lexer.lisp
> 
> Finally, the log file I'm lexing:
> http://lcpug.asternix.com/pub/Main/ApacheLogProject/access.log
> 
> Compare 
> (with-open-file  (in "access.log") 
>   (let ((foo (stream-gen *apache-pcrelex-line* in))) 
>     (time (loop :for x := (funcall foo)
>                 :unless x :return nil))))
> 
> with
> 
> (with-open-file  (in "access.log") 
>   (let ((foo (stream-gen *apache-pcrelex* in))) 
>     (time (loop :for x := (funcall foo)
>                 :unless x :return nil))))
> 
> When I slurp the entire file into a string the matches seem to be
> taking about a tenth of a second for each token.
> 
> 
> Matt
> 



More information about the Cl-ppcre-devel mailing list