From pete-cl-ppcre at kazmier.com Thu Jan 20 20:16:52 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Thu, 20 Jan 2005 15:16:52 -0500 Subject: [cl-ppcre-devel] Need help with a slow regexp Message-ID: <20050120201652.GB2892@kazmier.com> Hello, Not sure if this is the appropriate forum as the email is not related to the development of cl-ppcre, but I did not find a list for users. Please feel free to redirect me elsewhere. I could use some help in figuring out why this regexp is so slow. As far as I can tell, there is nothing abnormal about it. I currently use the same regexp in python and its blazes through the input file. Bear in mind, this is the first time that I've used cl-ppcre. It is was an experiment to see if I could lisp for this little application. Here is the regexp (at least a small portion of it that exhibits the behavior I am seeing): ^(?:\\S+ ){7}(\\S+)\\s+- commAlarm Here is the input line it is matching against (note: this is a single line albeit a long one): 1105243660 11 Sun Jan 09 04:07:40 2005 sclax02.ibasis.net - commAlarm ovnyc00p.ov.i\vanet.net [1] private.enterprises.2496.1.1.5.5.1.0 (Integer): 0 [2] private.enterprises.\2496.1.1.5.5.2.0 (Integer): 115 [3] private.enterprises.2496.1.1.5.5.3.0 (OctetString): \ISUP: UNEX ANM [4] private.enterprises.2496.1.1.5.5.4.0 (OctetString): ISDN User Part Un\expected ANM [5] private.enterprises.2496.1.1.5.5.5.0 (Integer): 2 [6] private.enterpri\ses.2496.1.1.5.5.6.0 (Integer): 1 [7] private.enterprises.2496.1.1.5.5.7.0 (Integer): 1 \ [8] private.enterprises.2496.1.1.5.5.8.0 (Integer): 2 [9] private.enterprises.2496.1.1.\1.1.1.1.1.1.1.1376258 (Integer): 1376258 [10] private.enterprises.2496.1.1.1.1.1.1.1.1.2\.1376258 (Integer): 21 [11] private.enterprises.2496.1.1.1.1.1.1.1.1.4.1376258 (OctetStr\ing): ss7path-att [12] private.enterprises.2496.1.1.1.1.1.1.1.1.5.1376258 (OctetString):\ SS7 Path For ATT and NGT DPC 5.21.39 [13] private.enterprises.2496.1.1.1.1.1.1.1.1.3.13\76258 (Integer): 1245188 [14] private.enterprises.2496.1.1.5.5.9.0 (Integer): 1105243880\;1 .1.3.6.1.4.1.2496.1.1.4.1 0 Stuff 51 of those lines above into a into a file and try to match on that regexp and I get the following results: PGW> (time (parse-file "/tmp/sample")) Evaluation took: 2.984 seconds of real time 1.81 seconds of user run time 1.12 seconds of system run time 0 page faults and 228,191,424 bytes consed. I am hoping to parse a file that has close to 75,000 lines in that format. At this rate, I will never make it in a reasonable amount of time. Here is the PARSE-FILE function I am using: (defun parse-file (file) (with-open-file (in file) (do ((line (read-line in nil :eof) (read-line in nil :eof))) ((eql line :eof) t) (do-register-groups ((#'intern host)) ("^(?:\\S+ ){7}(\\S+)\\s+- commAlarm" line) (format t "Found host ~A~%" host))))) I've also read the docs and I am unable to find anything to help identify the problem with my regexp. I have tried single-line mode, however the results were very similiar. My platform is SBCL 0.8.18.23 and version 1.0 of cl-ppcre. Any help would be appreciated. Thanks, Pete From edi at agharta.de Thu Jan 20 20:51:05 2005 From: edi at agharta.de (Edi Weitz) Date: Thu, 20 Jan 2005 21:51:05 +0100 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: <20050120201652.GB2892@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Thu, 20 Jan 2005 15:16:52 -0500") References: <20050120201652.GB2892@kazmier.com> Message-ID: Hi! On Thu, 20 Jan 2005 15:16:52 -0500, pete-cl-ppcre at kazmier.com wrote: > Not sure if this is the appropriate forum as the email is not > related to the development of cl-ppcre, but I did not find a list > for users. Please feel free to redirect me elsewhere. It's fine to ask this kind of questions here. > I could use some help in figuring out why this regexp is so slow. > As far as I can tell, there is nothing abnormal about it. I > currently use the same regexp in python and its blazes through the > input file. Bear in mind, this is the first time that I've used > cl-ppcre. It is was an experiment to see if I could lisp for this > little application. > > Here is the regexp (at least a small portion of it that exhibits the > behavior I am seeing): > > ^(?:\\S+ ){7}(\\S+)\\s+- commAlarm > > Here is the input line it is matching against (note: this is a single > line albeit a long one): > > 1105243660 11 Sun Jan 09 04:07:40 2005 sclax02.ibasis.net - commAlarm ovnyc00p.ov.i\vanet.net [1] private.enterprises.2496.1.1.5.5.1.0 (Integer): 0 [2] private.enterprises.\2496.1.1.5.5.2.0 (Integer): 115 [3] private.enterprises.2496.1.1.5.5.3.0 (OctetString): \ISUP: UNEX ANM [4] private.enterprises.2496.1.1.5.5.4.0 (OctetString): ISDN User Part Un\expected ANM [5] private.enterprises.2496.1.1.5.5.5.0 (Integer): 2 [6] private.enterpri\ses.2496.1.1.5.5.6.0 (Integer): 1 [7] private.enterprises.2496.1.1.5.5.7.0 (Integer): 1 \ [8] private.enterprises.2496.1.1.5.5.8.0 (Integer): 2 [9] private.enterprises.2496.1.1.\1.1.1.1.1.1.1.1376258 (Integer): 1376258 [10] private.enterprises.2496.1.1.1.1.1.1.1.1.2\.1376258 (Integer): 21 [11] private.enterprises.2496.1.1.1.1.1.1.1.1.4.1376258 (OctetStr\ing): ss7path-att [12] private.enterprises.2496.1.1.1.1.1.1.1.1.5.1376258 (OctetString):\ SS7 Path For ATT and NGT DPC 5.21.39 [13] private.enterprises.2496.1.1.1.1.1.1.1.1.3.13\76258 (Integer): 1245188 [14] private.enterprises.2496.1.1.5.5.9.0 (Integer): 1105243880\;1 .1.3.6.1.4.1.2496.1.1.4.1 0 > > Stuff 51 of those lines above into a into a file and try to match on > that regexp and I get the following results: > > PGW> (time (parse-file "/tmp/sample")) > Evaluation took: > 2.984 seconds of real time > 1.81 seconds of user run time > 1.12 seconds of system run time > 0 page faults and > 228,191,424 bytes consed. That's much too slow and much too much consing. FWIW, here's what I get with SBCL 0.8.16 and 50 lines like the one from above: * (time (parse-file "/tmp/sample")) Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Evaluation took: 0.148 seconds of real time 0.1 seconds of user run time 0.01 seconds of system run time 0 page faults and 506,416 bytes consed. T > My platform is SBCL 0.8.18.23 and version 1.0 of cl-ppcre. My wild guess is that this is due your version of SBCL being from the new Unicode branch which I haven't tried yet. If you don't need full Unicode support then maybe you should switch it off. Or better, report this to the SBCL maintainers (if my guess is right). Also, see the note about simple strings in the CL-PPCRE docs. To show you that CL-PPCRE is not necessarily slow with full Unicode support here's the output from AllegroCL 7.0: CL-USER(5): (time (parse-file "/tmp/sample")) Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net Found host sclax02.ibasis.net ; cpu time (non-gc) 100 msec user, 30 msec system ; cpu time (gc) 110 msec user, 0 msec system ; cpu time (total) 210 msec user, 30 msec system ; real time 294 msec ; space allocation: ; 15,326 cons cells, 13,453,192 other bytes, 0 static bytes T > I am hoping to parse a file that has close to 75,000 lines in that > format. At this rate, I will never make it in a reasonable amount > of time. Here's the output from CMUCL 19a for 100,000 lines like above (with the FORMAT form in your function removed and Linux running within VMWare on my laptop): * (time (parse-file "/tmp/sample")) ; Compiling LAMBDA NIL: ; Compiling Top-Level Form: ; Evaluation took: ; 20.27 seconds of real time ; 3.01 seconds of user run time ; 17.25 seconds of system run time ; 40,541,141,192 CPU cycles ; [Run times include 1.05 seconds GC run time] ; 0 page faults and ; 938,959,528 bytes consed. ; T If that doesn't help, let me know. Cheers, Edi. From pete-cl-ppcre at kazmier.com Thu Jan 20 22:15:32 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Thu, 20 Jan 2005 17:15:32 -0500 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: References: <20050120201652.GB2892@kazmier.com> Message-ID: <20050120221532.GA4114@kazmier.com> On Thu, Jan 20, 2005 at 09:51:05PM +0100, Edi Weitz wrote: > My wild guess is that this is due your version of SBCL being from the > new Unicode branch which I haven't tried yet. If you don't need full > Unicode support then maybe you should switch it off. Hmmm ... right now I am using a debian package, but I'll build myself an SBCL without unicode support and see what happens. Thanks, Pete From edi at agharta.de Thu Jan 20 23:13:27 2005 From: edi at agharta.de (Edi Weitz) Date: Fri, 21 Jan 2005 00:13:27 +0100 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: <20050120221532.GA4114@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Thu, 20 Jan 2005 17:15:32 -0500") References: <20050120201652.GB2892@kazmier.com> <20050120221532.GA4114@kazmier.com> Message-ID: On Thu, 20 Jan 2005 17:15:32 -0500, pete-cl-ppcre at kazmier.com wrote: > Hmmm ... right now I am using a debian package, but I'll build > myself an SBCL without unicode support and see what happens. I just checked myself - I installed the SBCL version from Debian unstable (which currently is 0.8.18.34) and got the same bad results that you got (for 50 lines): Evaluation took: 1.951 seconds of real time 0.4 seconds of user run time 1.54 seconds of system run time 1 page fault and 224,923,896 bytes consed. T Looks like you should ask on the SBCL list or switch. Cheers, Edi. From pete-cl-ppcre at kazmier.com Thu Jan 20 23:34:03 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Thu, 20 Jan 2005 18:34:03 -0500 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: References: <20050120201652.GB2892@kazmier.com> <20050120221532.GA4114@kazmier.com> Message-ID: <20050120233403.GA4972@kazmier.com> And my SBCL build just finished (no unicode support) and things work as expected. I'll ask the folks on #lisp where to send a report. On Fri, Jan 21, 2005 at 12:13:27AM +0100, Edi Weitz wrote: > On Thu, 20 Jan 2005 17:15:32 -0500, pete-cl-ppcre at kazmier.com wrote: > > > Hmmm ... right now I am using a debian package, but I'll build > > myself an SBCL without unicode support and see what happens. > > I just checked myself - I installed the SBCL version from Debian > unstable (which currently is 0.8.18.34) and got the same bad results > that you got (for 50 lines): > > Evaluation took: > 1.951 seconds of real time > 0.4 seconds of user run time > 1.54 seconds of system run time > 1 page fault and > 224,923,896 bytes consed. > T > > Looks like you should ask on the SBCL list or switch. > > Cheers, > Edi. > From edi at agharta.de Fri Jan 21 08:44:11 2005 From: edi at agharta.de (Edi Weitz) Date: Fri, 21 Jan 2005 09:44:11 +0100 Subject: [cl-ppcre-devel] Re: [Sbcl-devel] unicode and cl-ppcre performance In-Reply-To: (Christophe Rhodes's message of "Fri, 21 Jan 2005 07:24:06 +0000") References: <87fz0vk6ck.fsf@coco.kazmier.com> Message-ID: On Fri, 21 Jan 2005 07:24:06 +0000, Christophe Rhodes wrote: > Also, does setting cl-ppcre:*regex-char-code-limit* to 128 before > making your parser function make any difference? That does the trick, thanks. I should have thought about this myself, sorry. Could you explain, for a Unicode ignorant like me, why CHAR-CODE-LIMIT is so high in SBCL and CLISP? Does Unicode really define 1114112 characters? Thanks, Edi. From edi at agharta.de Sun Jan 23 15:28:55 2005 From: edi at agharta.de (Edi Weitz) Date: Sun, 23 Jan 2005 16:28:55 +0100 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: <20050120201652.GB2892@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Thu, 20 Jan 2005 15:16:52 -0500") References: <20050120201652.GB2892@kazmier.com> Message-ID: Actually I just realized that your problem revealed a thinko in CL-PPCRE. I'll release a new version in a couple of hours (have to cook something for my wife and my kid first...) which fixes that. Cheers, Edi. From edi at agharta.de Sun Jan 23 17:44:20 2005 From: edi at agharta.de (Edi Weitz) Date: Sun, 23 Jan 2005 18:44:20 +0100 Subject: [cl-ppcre-devel] New version 1.1.0 Message-ID: Changelog: Version 1.1.0 2005-01-23 Cleaned up load.lisp and cl-ppcre.asd Make large hash tables smaller, if possible Correct treatment of constant regular expressions in DO-SCANS Download: Have fun, Edi. From edi at agharta.de Sun Jan 23 17:51:32 2005 From: edi at agharta.de (Edi Weitz) Date: Sun, 23 Jan 2005 18:51:32 +0100 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: <20050120201652.GB2892@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Thu, 20 Jan 2005 15:16:52 -0500") References: <20050120201652.GB2892@kazmier.com> Message-ID: OK, now these are the results I get with CL-PPCRE 1.1.0: * (lisp-implementation-type) "SBCL" * (lisp-implementation-version) "0.8.18.34" * *regex-char-code-limit* 1114112 * (defun parse-file (file) (with-open-file (in file) (do ((line (read-line in nil :eof) (read-line in nil :eof))) ((eql line :eof) t) (do-register-groups ((#'intern host)) ("^(?:\\S+ ){7}(\\S+)\\s+- commAlarm" line) ;; do something other than printing for more accurate TIME ;; results host)))) PARSE-FILE * (time (parse-file "/tmp/sample50")) ;; your sample, 50 lines Evaluation took: 0.03 seconds of real time 0.01 seconds of user run time 0.02 seconds of system run time 0 page faults and 519,008 bytes consed. T * (time (parse-file "/tmp/sample500")) ;; your sample, 500 lines Evaluation took: 0.314 seconds of real time 0.24 seconds of user run time 0.07 seconds of system run time 0 page faults and 5,194,960 bytes consed. T * (time (parse-file "/tmp/sample5000")) ;; your sample, 5000 lines Evaluation took: 1.187 seconds of real time 0.34 seconds of user run time 0.85 seconds of system run time 0 page faults and 51,942,672 bytes consed. T That's better, isn't it? Thanks for making me aware of this problem. Cheers, Edi. From download at hpc.unm.edu Mon Jan 24 15:13:59 2005 From: download at hpc.unm.edu (Jim Prewett) Date: Mon, 24 Jan 2005 08:13:59 -0700 (MST) Subject: [cl-ppcre-devel] error loading 1.1.0 under CMUCL 19a Message-ID: Hello, I'm having problems loading CL-PPCRE 1.1.0 under CMUCL 19a under FreeBSD. Lisp spews the following when I ask it to load cl-ppcre: bash-2.05b$ lisp -eval '(load "cl-ppcre-1.1.0/load.lisp")' ; Loading #p"/usr/home/dl/logs/cl-ppcre-1.1.0/load.lisp". ; In: LET ((CL-PPCRE-BASE-DIRECTORY #) MUST-COMPILE) ; (DOLIST (FILE '# #)) ; --> DO* BLOCK LET* TAGBODY ; ==> ; (LET (#) ; (TAGBODY)) ; Note: Variable FILE defined but never used. Error in function LISP::%ENUMERATE-MATCHES: Cannot supply a type without a name: #, Device=NIL, Directory=(:ABSOLUTE "usr" "home" "dl" "logs" "cl-ppcre-1.1.0"), Name=NIL, Type="x86f", Version=:UNSPECIFIC> [Condition of type SIMPLE-ERROR] Restarts: 0: [CONTINUE] Return NIL from load of "cl-ppcre-1.1.0/load.lisp". 1: [ABORT ] Skip remaining initializations. Debug (type H for help) (LISP::%ENUMERATE-MATCHES #, Device=NIL, Directory=(:ABSOLUTE "usr" "home" "dl" "logs" ...), Name=NIL, Type="x86f", Version=:UNSPECIFIC> T T #) Source: Error finding source: Error in function DEBUG::GET-FILE-TOP-LEVEL-FORM: Source file no longer exists: target:code/filesys.lisp. 0] 0 ; CMU Common Lisp 19a, running on spacebrain.hpc.unm.edu With core: /usr/local/lib/cmucl/lib/lisp.core Dumped on: Tue, 2004-08-03 10:19:33-06:00 on snapdragon.csl.sri.com See for support information. Loaded subsystems: Python 1.1, target Intel x86 CLOS based on Gerd's PCL 2004/04/14 03:32:47 * (quit) James E. Prewett Systems Team Leader 505.277.8210 Designated Security Officer download at hpc.unm.edu Jim at Prewett.org HPC Systems Engineer III @ HPC at UNM OpenPGP key: pub 1024D/31816D93 From pete-cl-ppcre at kazmier.com Mon Jan 24 20:24:47 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Mon, 24 Jan 2005 15:24:47 -0500 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: References: <20050120201652.GB2892@kazmier.com> Message-ID: <20050124202447.GA17504@kazmier.com> On Sun, Jan 23, 2005 at 06:51:32PM +0100, Edi Weitz wrote: > OK, now these are the results I get with CL-PPCRE 1.1.0: > * (time (parse-file "/tmp/sample5000")) ;; your sample, 5000 lines > > Evaluation took: > 1.187 seconds of real time > 0.34 seconds of user run time > 0.85 seconds of system run time > 0 page faults and > 51,942,672 bytes consed. > T > > That's better, isn't it? Thanks for making me aware of this problem. It is better now. However, I am no longer using DO-REGISTER-GROUPS as I realized that I intended to use REGISTER-GROUPS-BIND all along. In any case, I am thrilled with the performance of cl-ppcre now that I have set *regex-char-code-limit* to an appropriate value. I think your library is great, my only problem now is the performance of READ-LINE in the free CL implementations (which is certainly not your problem). Its absolutely terrible in SBCL. I've inquired about it, but have not really received any good responses yet: http://sourceforge.net/mailarchive/forum.php?thread_id=6400934&forum_id=4133 From edi at agharta.de Mon Jan 24 20:26:31 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 24 Jan 2005 21:26:31 +0100 Subject: [cl-ppcre-devel] New release 1.2.0 Message-ID: Changelog: Version 1.2.0 2005-01-24 AllegroCL compatibility mode Fixed broken load.lisp file (caught by Jim Prewett and Zach Beane) Download: From edi at agharta.de Mon Jan 24 20:27:14 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 24 Jan 2005 21:27:14 +0100 Subject: [cl-ppcre-devel] error loading 1.1.0 under CMUCL 19a In-Reply-To: (Jim Prewett's message of "Mon, 24 Jan 2005 08:13:59 -0700 (MST)") References: Message-ID: On Mon, 24 Jan 2005 08:13:59 -0700 (MST), Jim Prewett wrote: > I'm having problems loading CL-PPCRE 1.1.0 under CMUCL 19a under > FreeBSD. Hi Jim! Thanks for the note, the load.lisp file was broken. 1.2.0 should fix this. Cheers, Edi. From edi at agharta.de Mon Jan 24 20:57:38 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 24 Jan 2005 21:57:38 +0100 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: <20050124202447.GA17504@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Mon, 24 Jan 2005 15:24:47 -0500") References: <20050120201652.GB2892@kazmier.com> <20050124202447.GA17504@kazmier.com> Message-ID: On Mon, 24 Jan 2005 15:24:47 -0500, pete-cl-ppcre at kazmier.com wrote: > It is better now. However, I am no longer using DO-REGISTER-GROUPS > as I realized that I intended to use REGISTER-GROUPS-BIND all along. > In any case, I am thrilled with the performance of cl-ppcre now that > I have set *regex-char-code-limit* to an appropriate value. Actually, if your Lisp implementation has a decent hash table implementation the value of *REGEX-CHAR-CODE-LIMIT* shouldn't matter much during scanning - only when a regular expression is created. The problem that I fixed in 1.1.0 was that your function created a new scanner each time it was called although this wasn't necessary. This resulted in much too much consing (because of the huge hash tables that were generated). > I think your library is great, my only problem now is the > performance of READ-LINE in the free CL implementations (which is > certainly not your problem). Its absolutely terrible in SBCL. I've > inquired about it, but have not really received any good responses > yet: > > http://sourceforge.net/mailarchive/forum.php?thread_id=6400934&forum_id=4133 Yeah, I've read that and I'm concerned, too. I'd love to see an encouraging response. Have you tried to download the trial versions of AllegroCL and/or LispWorks just to check how they cope with this? Cheers, Edi. From edi at agharta.de Tue Jan 25 01:08:22 2005 From: edi at agharta.de (Edi Weitz) Date: Tue, 25 Jan 2005 02:08:22 +0100 Subject: [cl-ppcre-devel] New release 1.2.1 Message-ID: I blew it twice this time because I was in a hurry - my apologies. Changelog: Version 1.2.1 2005-01-25 There was a wrong read-time conditional in api.lisp, sorry Download: Cheers, Edi. From pete-cl-ppcre at kazmier.com Tue Jan 25 16:52:49 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Tue, 25 Jan 2005 11:52:49 -0500 Subject: [cl-ppcre-devel] Need help with a slow regexp In-Reply-To: References: <20050120201652.GB2892@kazmier.com> <20050124202447.GA17504@kazmier.com> Message-ID: <20050125165249.GA28136@kazmier.com> On Mon, Jan 24, 2005 at 09:57:38PM +0100, Edi Weitz wrote: > Yeah, I've read that and I'm concerned, too. I'd love to see an > encouraging response. Have you tried to download the trial versions > of AllegroCL and/or LispWorks just to check how they cope with this? I spent the morning downloading the trial versions of these products to test their speed in comparison to the free implementations. LW was very fast, Allegro was still slow compared to the equivalent python program, but faster than the fastest free implementation (Clisp). I want to see how fast LW and cl-ppcre compare to the equivalent python program (I think it'll be faster), I just haven't figured out how to get ASDF working with it yet. And I'm still hoping for some developer to comment on my post to the SBCL user list as I'm stuck with the open-source implementations. LispWorks: CL-USER 17 > (time (tester "/tmp/pgw-logs/trapd.log.ovnyc00p")) Timing the evaluation of (TESTER "/tmp/pgw-logs/trapd.log.ovnyc00p") user time = 0.880 system time = 0.180 Elapsed time = 0:00:01 Allocation = 36822608 bytes standard / 4070 bytes conses 0 Page faults Calls to %EVAL 34 T Allegro: CL-USER(7): (time (tester "/tmp/pgw-logs/trapd.log.ovnyc00p")) ; cpu time (non-gc) 2,730 msec user, 90 msec system ; cpu time (gc) 80 msec user, 0 msec system ; cpu time (total) 2,810 msec user, 90 msec system ; real time 3,493 msec ; space allocation: ; 67 cons cells, 67,825,256 other bytes, 0 static bytes T From wencel at gmail.com Sat Jan 29 16:16:50 2005 From: wencel at gmail.com (Lawrence Mitchell) Date: Sat, 29 Jan 2005 16:16:50 +0000 Subject: [cl-ppcre-devel] Using cl-ppcre in a text editor Message-ID: <2b1f8a5b0501290816133a2b82@mail.gmail.com> Hello there, I'm looking at trying to use cl-ppcre to add regular expression support to the Climacs editor (). A few things spring to mind: o Licensing differences. Climacs is released under the LGPL, while cl-ppcre is under as BSD-style license. I don't think this is a problem (as far as I can tell from reading the licenses), but if you know otherwise, I'd be grateful to hear. o How to best match up cl-ppcre's matching on strings with climacs' idea of a buffer. A climacs buffer is a sequence of objects (which may or may not be characters, but we'll ignore that for the moment). Now, I can easily generate a string of the contents of the buffer, and call SCAN (or whatever) on the string. However, this is going to be slow for large buffers (especially if we find something just after point, we've still constructed the whole buffer-string). The "obvious" solution to this is to use streams instead (probably), so, I wonder if cl-ppcre would be amenable to something like this? On another, somewhat unrelated note. One thing that one would like to do is regexp search and replace, now, if I know how many groups the user is going to input into their regexp before the fact, I can use REGISTER-GROUPS-BIND to get at the substring matches via variables. I guess there isn't any way to do this without knowing the input beforehand. So is the idea then just to use SCAN and then manually grab the substrings via REG-STARTS and REG-ENDS, or have I missed something obvious? Cheers, Lawrence -- Lawrence Mitchell From edi at agharta.de Sat Jan 29 19:47:56 2005 From: edi at agharta.de (Edi Weitz) Date: Sat, 29 Jan 2005 20:47:56 +0100 Subject: [cl-ppcre-devel] Using cl-ppcre in a text editor In-Reply-To: <2b1f8a5b0501290816133a2b82@mail.gmail.com> (Lawrence Mitchell's message of "Sat, 29 Jan 2005 16:16:50 +0000") References: <2b1f8a5b0501290816133a2b82@mail.gmail.com> Message-ID: Hi! On Sat, 29 Jan 2005 16:16:50 +0000, Lawrence Mitchell wrote: > I'm looking at trying to use cl-ppcre to add regular expression > support to the Climacs editor ( http://common-lisp.net/project/climacs/>). Sounds cool... ) > A few things spring to mind: > > o Licensing differences. Climacs is released under the LGPL, while > cl-ppcre is under as BSD-style license. I don't think this is a > problem (as far as I can tell from reading the licenses), but if you > know otherwise, I'd be grateful to hear. I don't see a problem but IANAL. It is my understanding that the BSD license basically means that you can do with CL-PPCRE whatever you want as long as you credit my original work - this is what I intended. So you could, e.g., incorporate it into a LPGL project without a problem. Of course, the original CL-PPCRE will still be available under the old license. > o How to best match up cl-ppcre's matching on strings with climacs' > idea of a buffer. > > A climacs buffer is a sequence of objects (which may or may not be > characters, but we'll ignore that for the moment). Now, I can > easily generate a string of the contents of the buffer, and call > SCAN (or whatever) on the string. However, this is going to be slow > for large buffers (especially if we find something just after point, > we've still constructed the whole buffer-string). > > The "obvious" solution to this is to use streams instead (probably), > so, I wonder if cl-ppcre would be amenable to something like this? Well, supporting all of Perl's regex facilities implies that you need to have random access to the target - I don't think you can fit streams into this picture. I'm not a CS guy but my understanding is that CL-PPCRE is based on an NFA and you can't change that easily. You can build a DFA that implements a subset of CL-PPCRE and that would work with streams but that wouldn't be CL-PPCRE anymore... :) Now, using another kind of structures (like, say, your buffers) that aren't strings but are random-access - that wouldn't be /too/ hard. It would involve going through three or four files and change SCHAR to something else but basically I don't really see a problem. However, as CL-PPCRE has a reputation for being quite fast I wouldn't want to sacrifice this for greater flexibility (buffers instead of strings, arbitrary objects instead of characters - you name it). I think the right way to do it would be to offer the ability to build different versions of CL-PPCRE based on *FEATURES*, i.e. at compile time you decide whether you want a fast regex engine for strings or if you want a not-so-fast regex engine for, say, buffers. Would that be OK for you? > On another, somewhat unrelated note. One thing that one would like > to do is regexp search and replace, now, if I know how many groups > the user is going to input into their regexp before the fact, I can > use REGISTER-GROUPS-BIND to get at the substring matches via > variables. I guess there isn't any way to do this without knowing > the input beforehand. So is the idea then just to use SCAN and then > manually grab the substrings via REG-STARTS and REG-ENDS, or have I > missed something obvious? No, I don't see a better way. If you don't now the regex then you have to check the return value and see how long the register arrays are. Cheers, Edi.