From edi at agharta.de Wed Jul 7 01:08:41 2004 From: edi at agharta.de (Edi Weitz) Date: Wed, 07 Jul 2004 03:08:41 +0200 Subject: [cl-ppcre-devel] Re: cl-ppcre In-Reply-To: (Daniel Skarda's message of "Sun, 20 Jun 2004 12:07:08 +0200") References: <87n0384z7t.fsf@bird.agharta.de> Message-ID: <87d638pqsm.fsf@bird.agharta.de> Sorry for the delay, I had moved this email into the wrong IMAP folder... :( On Sun, 20 Jun 2004 12:07:08 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote: > After more regexp experiments I found that the main difference > between Perl and GNU Regexp is not the syntax of regexps (as I > naively thought), but the definition of "the best match" (especially > for `|' alt node). > > One can agree with Perl man pages, that Perl definition could be > better (and more comprehensible) for handwritten regexps. Is "first > match" strategy also better for writing lexers? I doubt. > > Consider languages where some word (token) can be prefix of > another word. This is not unusual: remember that in Lisp `12345' is > number and `12345a' is symbol :) > > While writing "first match" lexer (and your deflexer macro is > "first match" lexer) one has to be careful with rules ordering and > think about possible prefix ambiguity: Yes. But if you prefer not to be careful you'll definitely sacrifice performance... > My conclusion is, that 'success node is meaningful only for > "longest match" regexps engines, because one can expect, that such > engine could do better than match all 'alt nodes in sequence and > return the longest match. > > My new question is: how hard it would be to add :longest-match > option to create-scanner? Pretty hard. This is not going to be done by me. However, if you manage to add this yourself without breaking the rest of CL-PPCRE (and without making it slower) I'll gladly accept your patches. > ps: I am not subscribed to cl-ppcre-devel mailing list. Please "Cc:" > me your replies. Subscribing to the list is easy and the list is low-volume. If you'd like to continue this discussion please either subscribe to the list or use it via nntp: Cheers, Edi. From edi at agharta.de Tue Jul 13 00:29:17 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jul 2004 02:29:17 +0200 Subject: [cl-ppcre-devel] New version 0.7.8 Message-ID: <871xjgoile.fsf@bird.agharta.de> Hi! A new release is available from . Here's the relevant part from the changelog: Version 0.7.8 2004-07-13 New SIMPLE-CALLS keyword argument for REGEX-REPLACE(-ALL) Added environment parameter to compiler macros (thanks to c.l.l article by Joe Marshall) Added compiler macros for SCAN-TO-STRINGS and REGEX-REPLACE(-ALL) (they somehow got lost) Have fun, Edi From jan at rychter.com Tue Jul 13 12:57:42 2004 From: jan at rychter.com (Jan Rychter) Date: Tue, 13 Jul 2004 05:57:42 -0700 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre Message-ID: I'm confused. I must be doing something wrong. I have a string: CL-USER> *str* "1 2 3 4 " Just to make sure it's really what it seems: CL-USER> (loop for c across *str* do (format t "~S " c)) #\1 #\Newline #\2 #\Newline #\3 #\Newline #\Newline #\4 #\Newline NIL I wanted to match empty lines, so I did: CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!") "1 2 3 !! 4 !!" Now, I would normally expect this: "1 2 3 ! 4 " Playing with regex-coach indeed produces the result I'd normally expect. What am I doing wrong? (using CMUCL 19a, the testing version, and CL-PPCRE-0.7.7) many thanks, --J. From edi at agharta.de Tue Jul 13 07:10:28 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jul 2004 09:10:28 +0200 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre In-Reply-To: (Jan Rychter's message of "Tue, 13 Jul 2004 05:57:42 -0700") References: Message-ID: <87zn64757f.fsf@bird.agharta.de> On Tue, 13 Jul 2004 05:57:42 -0700, Jan Rychter wrote: > I'm confused. I must be doing something wrong. > > I have a string: > > CL-USER> *str* > "1 > 2 > 3 > > 4 > " > > Just to make sure it's really what it seems: > > CL-USER> (loop for c across *str* > do (format t "~S " c)) > > #\1 #\Newline #\2 #\Newline #\3 #\Newline #\Newline #\4 #\Newline > NIL > > > I wanted to match empty lines, so I did: > > CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!") > "1 > 2 > 3 > !! > 4 > !!" > > Now, I would normally expect this: > > "1 > 2 > 3 > ! > 4 > " > > Playing with regex-coach indeed produces the result I'd normally > expect. What am I doing wrong? (using CMUCL 19a, the testing version, > and CL-PPCRE-0.7.7) Yes, this looks like a bug. I'll try to fix this ASAP. Thanks for the report. Cheers, Edi. From edi at agharta.de Tue Jul 13 17:04:39 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jul 2004 19:04:39 +0200 Subject: [cl-ppcre-devel] New version 0.7.9 Message-ID: <87pt6zdejc.fsf@bird.agharta.de> Hi! A new release is available from . Here's the relevant part from the changelog: Version 0.7.9 2004-07-13 Fixed bug in DO-SCANS (caught by Jan Rychter) Have fun, Edi From edi at agharta.de Tue Jul 13 17:06:47 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jul 2004 19:06:47 +0200 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre In-Reply-To: (Jan Rychter's message of "Tue, 13 Jul 2004 05:57:42 -0700") References: Message-ID: <87llhndefs.fsf@bird.agharta.de> Should be fixed now. Please try. > CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!") It's shorter to write (cl-ppcre:regex-replace-all "(?m)^$" *str* "!") instead. This will also allow the compiler macro to compile the regex at load time. Cheers, Edi. From jan at rychter.com Wed Jul 14 06:35:28 2004 From: jan at rychter.com (Jan Rychter) Date: Tue, 13 Jul 2004 23:35:28 -0700 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre In-Reply-To: <87llhndefs.fsf@bird.agharta.de> (Edi Weitz's message of "Tue, 13 Jul 2004 19:06:47 +0200") References: <87llhndefs.fsf@bird.agharta.de> Message-ID: > Should be fixed now. Please try. > > CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!") Thank you -- indeed, it is fixed. It now produces: JWR-TEST> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "(?m)^$") *str* "!") "1 2 3 ! 4 !" I guess it is debatable whether the last "!" should be there. Perl doesn't behave that way, but I guess it _is_ an empty line, now that I think of it. And I wanted to get "!" instead of empty lines. So it actually makes more sense than Perl. > It's shorter to write > > (cl-ppcre:regex-replace-all "(?m)^$" *str* "!") > > instead. This will also allow the compiler macro to compile the regex > at load time. Nice, thanks! --J. From edi at agharta.de Tue Jul 13 21:41:58 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jul 2004 23:41:58 +0200 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre In-Reply-To: (Jan Rychter's message of "Tue, 13 Jul 2004 23:35:28 -0700") References: <87llhndefs.fsf@bird.agharta.de> Message-ID: <87n023h9eh.fsf@bird.agharta.de> On Tue, 13 Jul 2004 23:35:28 -0700, Jan Rychter wrote: > I guess it is debatable whether the last "!" should be there. Perl > doesn't behave that way, but I guess it _is_ an empty line, now that > I think of it. And I wanted to get "!" instead of empty lines. So it > actually makes more sense than Perl. Hmmm, yes it seems to make more sense. On the other hand, I'm trying to be as close to Perl as possible. Do you see any pattern there? Any idea why Perl doesn't add the last exclamation mark? Cheers, Edi. From jan at rychter.com Wed Jul 14 10:49:50 2004 From: jan at rychter.com (Jan Rychter) Date: Wed, 14 Jul 2004 03:49:50 -0700 Subject: [cl-ppcre-devel] empty line matches with cl-ppcre In-Reply-To: <87n023h9eh.fsf@bird.agharta.de> (Edi Weitz's message of "Tue, 13 Jul 2004 23:41:58 +0200") References: <87llhndefs.fsf@bird.agharta.de> <87n023h9eh.fsf@bird.agharta.de> Message-ID: >>>>> "Edi" == Edi Weitz writes: Edi> On Tue, 13 Jul 2004 23:35:28 -0700, Jan Rychter Edi> wrote: >> I guess it is debatable whether the last "!" should be there. Perl >> doesn't behave that way, but I guess it _is_ an empty line, now that >> I think of it. And I wanted to get "!" instead of empty lines. So it >> actually makes more sense than Perl. Edi> Hmmm, yes it seems to make more sense. On the other hand, I'm Edi> trying to be as close to Perl as possible. Do you see any pattern Edi> there? Any idea why Perl doesn't add the last exclamation mark? Uh, well, hmm. I've tried reading "man perlre", but the part about \z, \Z and multiline strings gave me a headache. I really have no idea why Perl doesn't treat the end of a string as an "$" in this case, because it certainly does so for other expressions (e.g. "^4$" _will_ match at the end of a multiline string ending in "...\n4"). I see no reason to treat a string ending in "\n$" (on UNIX) differently: "^$" should definitely match there, as a new line has begun, and ended, being empty. My suggestion would be to document this behavior. A brave soul could report this to the Perl people, but I seriously doubt they'd consider it a bug. It might be one of those DWIM things. --J.