From edi at agharta.de  Wed Jul  7 01:08:41 2004
From: edi at agharta.de (Edi Weitz)
Date: Wed, 07 Jul 2004 03:08:41 +0200
Subject: [cl-ppcre-devel] Re: cl-ppcre
In-Reply-To: <m01xka5yr7.fsf@hobitin.ucw.cz> (Daniel Skarda's message of
	"Sun, 20 Jun 2004 12:07:08 +0200")
References: <m0smd0lw58.fsf@hobitin.ucw.cz> <87n0384z7t.fsf@bird.agharta.de>
	<m01xka5yr7.fsf@hobitin.ucw.cz>
Message-ID: <87d638pqsm.fsf@bird.agharta.de>

Sorry for the delay, I had moved this email into the wrong IMAP
folder... :(

On Sun, 20 Jun 2004 12:07:08 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote:

>   After more regexp experiments I found that the main difference
> between Perl and GNU Regexp is not the syntax of regexps (as I
> naively thought), but the definition of "the best match" (especially
> for `|' alt node).
>
>   One can agree with Perl man pages, that Perl definition could be
> better (and more comprehensible) for handwritten regexps. Is "first
> match" strategy also better for writing lexers? I doubt.
>
>   Consider languages where some word (token) can be prefix of
> another word. This is not unusual: remember that in Lisp `12345' is
> number and `12345a' is symbol :)
>
>   While writing "first match" lexer (and your deflexer macro is
> "first match" lexer) one has to be careful with rules ordering and
> think about possible prefix ambiguity:

Yes. But if you prefer not to be careful you'll definitely sacrifice
performance...

>   My conclusion is, that 'success node is meaningful only for
> "longest match" regexps engines, because one can expect, that such
> engine could do better than match all 'alt nodes in sequence and
> return the longest match.
>
>   My new question is: how hard it would be to add :longest-match
> option to create-scanner?

Pretty hard. This is not going to be done by me. However, if you
manage to add this yourself without breaking the rest of CL-PPCRE (and
without making it slower) I'll gladly accept your patches.

> ps: I am not subscribed to cl-ppcre-devel mailing list. Please "Cc:"
> me your replies.

Subscribing to the list is easy and the list is low-volume. If you'd
like to continue this discussion please either subscribe to the list
or use it via nntp:

  <http://common-lisp.net/nntp.shtml>

Cheers,
Edi.


From edi at agharta.de  Tue Jul 13 00:29:17 2004
From: edi at agharta.de (Edi Weitz)
Date: Tue, 13 Jul 2004 02:29:17 +0200
Subject: [cl-ppcre-devel] New version 0.7.8
Message-ID: <871xjgoile.fsf@bird.agharta.de>

Hi!

A new release is available from

  <http://weitz.de/files/cl-ppcre.tgz>.

Here's the relevant part from the changelog:

  Version 0.7.8
  2004-07-13
  New SIMPLE-CALLS keyword argument for REGEX-REPLACE(-ALL)
  Added environment parameter to compiler macros (thanks to c.l.l article <aczhx5hj.fsf at ccs.neu.edu> by Joe Marshall)
  Added compiler macros for SCAN-TO-STRINGS and REGEX-REPLACE(-ALL) (they somehow got lost)

Have fun,
Edi


From jan at rychter.com  Tue Jul 13 12:57:42 2004
From: jan at rychter.com (Jan Rychter)
Date: Tue, 13 Jul 2004 05:57:42 -0700
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
Message-ID: <m2n024vzcp.fsf@tnuctip.rychter.com>

I'm confused. I must be doing something wrong.

I have a string:

CL-USER> *str*
"1
2
3

4
"

Just to make sure it's really what it seems:

CL-USER> (loop for c across *str*
               do (format t "~S " c))

#\1 #\Newline #\2 #\Newline #\3 #\Newline #\Newline #\4 #\Newline 
NIL


I wanted to match empty lines, so I did:

CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!")
"1
2
3
!!
4
!!"

Now, I would normally expect this:

"1
2
3
!
4
"

Playing with regex-coach indeed produces the result I'd normally
expect. What am I doing wrong? (using CMUCL 19a, the testing version,
and CL-PPCRE-0.7.7)

many thanks,
--J.


From edi at agharta.de  Tue Jul 13 07:10:28 2004
From: edi at agharta.de (Edi Weitz)
Date: Tue, 13 Jul 2004 09:10:28 +0200
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
In-Reply-To: <m2n024vzcp.fsf@tnuctip.rychter.com> (Jan Rychter's message of
	"Tue, 13 Jul 2004 05:57:42 -0700")
References: <m2n024vzcp.fsf@tnuctip.rychter.com>
Message-ID: <87zn64757f.fsf@bird.agharta.de>

On Tue, 13 Jul 2004 05:57:42 -0700, Jan Rychter <jan at rychter.com> wrote:

> I'm confused. I must be doing something wrong.
>
> I have a string:
>
> CL-USER> *str*
> "1
> 2
> 3
>
> 4
> "
>
> Just to make sure it's really what it seems:
>
> CL-USER> (loop for c across *str*
>                do (format t "~S " c))
>
> #\1 #\Newline #\2 #\Newline #\3 #\Newline #\Newline #\4 #\Newline 
> NIL
>
>
> I wanted to match empty lines, so I did:
>
> CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!")
> "1
> 2
> 3
> !!
> 4
> !!"
>
> Now, I would normally expect this:
>
> "1
> 2
> 3
> !
> 4
> "
>
> Playing with regex-coach indeed produces the result I'd normally
> expect. What am I doing wrong? (using CMUCL 19a, the testing version,
> and CL-PPCRE-0.7.7)

Yes, this looks like a bug. I'll try to fix this ASAP. Thanks for the
report.

Cheers,
Edi.


From edi at agharta.de  Tue Jul 13 17:04:39 2004
From: edi at agharta.de (Edi Weitz)
Date: Tue, 13 Jul 2004 19:04:39 +0200
Subject: [cl-ppcre-devel] New version 0.7.9
Message-ID: <87pt6zdejc.fsf@bird.agharta.de>

Hi!

A new release is available from

  <http://weitz.de/files/cl-ppcre.tgz>.

Here's the relevant part from the changelog:

  Version 0.7.9
  2004-07-13
  Fixed bug in DO-SCANS (caught by Jan Rychter)

Have fun,
Edi


From edi at agharta.de  Tue Jul 13 17:06:47 2004
From: edi at agharta.de (Edi Weitz)
Date: Tue, 13 Jul 2004 19:06:47 +0200
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
In-Reply-To: <m2n024vzcp.fsf@tnuctip.rychter.com> (Jan Rychter's message of
	"Tue, 13 Jul 2004 05:57:42 -0700")
References: <m2n024vzcp.fsf@tnuctip.rychter.com>
Message-ID: <87llhndefs.fsf@bird.agharta.de>

Should be fixed now. Please try.

> CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!")

It's shorter to write

  (cl-ppcre:regex-replace-all "(?m)^$" *str* "!")

instead. This will also allow the compiler macro to compile the regex
at load time.

Cheers,
Edi.


From jan at rychter.com  Wed Jul 14 06:35:28 2004
From: jan at rychter.com (Jan Rychter)
Date: Tue, 13 Jul 2004 23:35:28 -0700
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
In-Reply-To: <87llhndefs.fsf@bird.agharta.de> (Edi Weitz's message of "Tue,
	13 Jul 2004 19:06:47 +0200")
References: <m2n024vzcp.fsf@tnuctip.rychter.com>
	<87llhndefs.fsf@bird.agharta.de>
Message-ID: <m2n023t7tb.fsf@tnuctip.rychter.com>

> Should be fixed now. Please try.
> > CL-USER> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "^$" :multi-line-mode t) *str* "!")

Thank you -- indeed, it is fixed. It now produces:
  
  JWR-TEST> (cl-ppcre:regex-replace-all (cl-ppcre:create-scanner "(?m)^$") *str* "!")
  
  "1
  2
  3
  !
  4
  !"

I guess it is debatable whether the last "!" should be there. Perl
doesn't behave that way, but I guess it _is_ an empty line, now that I
think of it. And I wanted to get "!" instead of empty lines. So it
actually makes more sense than Perl.

> It's shorter to write
> 
>   (cl-ppcre:regex-replace-all "(?m)^$" *str* "!")
> 
> instead. This will also allow the compiler macro to compile the regex
> at load time.

Nice, thanks!

--J.


From edi at agharta.de  Tue Jul 13 21:41:58 2004
From: edi at agharta.de (Edi Weitz)
Date: Tue, 13 Jul 2004 23:41:58 +0200
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
In-Reply-To: <m2n023t7tb.fsf@tnuctip.rychter.com> (Jan Rychter's message of
	"Tue, 13 Jul 2004 23:35:28 -0700")
References: <m2n024vzcp.fsf@tnuctip.rychter.com>
	<87llhndefs.fsf@bird.agharta.de> <m2n023t7tb.fsf@tnuctip.rychter.com>
Message-ID: <87n023h9eh.fsf@bird.agharta.de>

On Tue, 13 Jul 2004 23:35:28 -0700, Jan Rychter <jan at rychter.com> wrote:

> I guess it is debatable whether the last "!" should be there. Perl
> doesn't behave that way, but I guess it _is_ an empty line, now that
> I think of it. And I wanted to get "!" instead of empty lines. So it
> actually makes more sense than Perl.

Hmmm, yes it seems to make more sense. On the other hand, I'm trying
to be as close to Perl as possible. Do you see any pattern there? Any
idea why Perl doesn't add the last exclamation mark?

Cheers,
Edi.


From jan at rychter.com  Wed Jul 14 10:49:50 2004
From: jan at rychter.com (Jan Rychter)
Date: Wed, 14 Jul 2004 03:49:50 -0700
Subject: [cl-ppcre-devel] empty line matches with cl-ppcre
In-Reply-To: <87n023h9eh.fsf@bird.agharta.de> (Edi Weitz's message of "Tue,
	13 Jul 2004 23:41:58 +0200")
References: <m2n024vzcp.fsf@tnuctip.rychter.com>
	<87llhndefs.fsf@bird.agharta.de> <m2n023t7tb.fsf@tnuctip.rychter.com>
	<87n023h9eh.fsf@bird.agharta.de>
Message-ID: <m21xjeooc1.fsf@tnuctip.rychter.com>

>>>>> "Edi" == Edi Weitz <edi at agharta.de> writes:
 Edi> On Tue, 13 Jul 2004 23:35:28 -0700, Jan Rychter <jan at rychter.com>
 Edi> wrote:
 >> I guess it is debatable whether the last "!" should be there. Perl
 >> doesn't behave that way, but I guess it _is_ an empty line, now that
 >> I think of it. And I wanted to get "!" instead of empty lines. So it
 >> actually makes more sense than Perl.

 Edi> Hmmm, yes it seems to make more sense. On the other hand, I'm
 Edi> trying to be as close to Perl as possible. Do you see any pattern
 Edi> there? Any idea why Perl doesn't add the last exclamation mark?

Uh, well, hmm. I've tried reading "man perlre", but the part about \z,
\Z and multiline strings gave me a headache. 

I really have no idea why Perl doesn't treat the end of a string as an
"$" in this case, because it certainly does so for other expressions
(e.g. "^4$" _will_ match at the end of a multiline string ending in
"...\n4"). I see no reason to treat a string ending in "\n$" (on UNIX)
differently: "^$" should definitely match there, as a new line has
begun, and ended, being empty.

My suggestion would be to document this behavior. A brave soul could
report this to the Perl people, but I seriously doubt they'd consider it
a bug. It might be one of those DWIM things.

--J.