From edi at agharta.de Mon Aug 1 13:27:30 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 01 Aug 2005 15:27:30 +0200 Subject: [cl-ppcre-devel] New version 1.2.11 Message-ID: ChangeLog: Version 1.2.11 2005-08-01 Added external format for SBCL in ppcre-tests.lisp (thanks to Christophe Rhodes) Download: Cheers, Edi. From dpeschel at eskimo.com Wed Aug 3 18:59:48 2005 From: dpeschel at eskimo.com (Derek Peschel) Date: Wed, 3 Aug 2005 11:59:48 -0700 Subject: [cl-ppcre-devel] Roles of scanner vs. parser vs. lexer? Message-ID: <20050803115948.A15221@eskimo.com> Hi everybody, I've been reading the CL-PPCRE docs and code to get a clear specification of the syntax. Ultimately I'd like to add syntax highlighting for CL-PPCRE regexps to the Climacs text editor. But there seems to be a certain amount of defensive or sloppy programming (things being done in more than one place). The scanner knows something about skipping # comments but the lexer does too. The lexer has code to ignore \E markers but I get the impression the scanner removes them before the lexer starts. If this kind of duplication does exist, is there a useful reason for it? -- Derek From dpeschel at eskimo.com Wed Aug 3 18:37:50 2005 From: dpeschel at eskimo.com (Derek Peschel) Date: Wed, 3 Aug 2005 11:37:50 -0700 Subject: [cl-ppcre-devel] Typo in documentation string for parser function GROUP? Message-ID: <20050803113750.A14903@eskimo.com> Hi everybody, I only have version 1.2.10. Since the change log for 1.2.11 doesn't mention this problem, I assume it hasn't been fixed yet. In parser.lisp, the documentation string for function GROUP starts out "Parses and consumes a . The productions are: -> \"(\"\")\" \"(?:\"\")\" \"(?<\"\")\" But shouldn't the last line read \"(?>\"\")\" ? After all, (?< tokens always have something else after the angle bracket. -- Derek From edi at agharta.de Wed Aug 3 21:11:06 2005 From: edi at agharta.de (Edi Weitz) Date: Wed, 03 Aug 2005 23:11:06 +0200 Subject: [cl-ppcre-devel] Typo in documentation string for parser function GROUP? In-Reply-To: <20050803113750.A14903@eskimo.com> (Derek Peschel's message of "Wed, 3 Aug 2005 11:37:50 -0700") References: <20050803113750.A14903@eskimo.com> Message-ID: On Wed, 3 Aug 2005 11:37:50 -0700, Derek Peschel wrote: > I only have version 1.2.10. Since the change log for 1.2.11 doesn't > mention this problem, I assume it hasn't been fixed yet. > > In parser.lisp, the documentation string for function GROUP starts > out > > "Parses and consumes a . > The productions are: -> \"(\"\")\" > \"(?:\"\")\" > \"(?<\"\")\" > > But shouldn't the last line read > > \"(?>\"\")\" > > ? After all, (?< tokens always have something else after the angle > bracket. Yes, that's obviously a typo. Will be fixed in the next release. Thanks, Edi. From edi at agharta.de Wed Aug 3 21:43:38 2005 From: edi at agharta.de (Edi Weitz) Date: Wed, 03 Aug 2005 23:43:38 +0200 Subject: [cl-ppcre-devel] Roles of scanner vs. parser vs. lexer? In-Reply-To: <20050803115948.A15221@eskimo.com> (Derek Peschel's message of "Wed, 3 Aug 2005 11:59:48 -0700") References: <20050803115948.A15221@eskimo.com> Message-ID: Hi! On Wed, 3 Aug 2005 11:59:48 -0700, Derek Peschel wrote: > I've been reading the CL-PPCRE docs and code to get a clear > specification of the syntax. Uh, I think there is no clear specification of the syntax. Your best bets probably are `man perlre' and the Camel Book but these are moving targets. > Ultimately I'd like to add syntax highlighting for CL-PPCRE regexps > to the Climacs text editor. Cool... > But there seems to be a certain amount of defensive or sloppy > programming (things being done in more than one place). I wouldn't be surprised. > The scanner knows something about skipping # comments but the lexer > does too. See below. > The lexer has code to ignore \E markers but I get the impression the > scanner removes them before the lexer starts. If this kind of > duplication does exist, is there a useful reason for it? The \Q\E stuff (*allow-quoting*) was added pretty late, almost a year after CL-PPCRE's first release. The problem with \Q\E and friends is that they're not really part of Perl's regex syntax either - they're part of Perl's string syntax: edi at vmware:~$ perl -le '$a = "\Q*\E"; print $a' \* That's why I ignored them first and later implemented them as a kind of "pre-parsing" of the regex string (which itself uses regular expressions). In the process of doing this it is possible that a dangling \E remains in the regex string and that's why the lexer is instructed to specifically ignore these. Maybe this can be done in api.lisp as well but at that time it seemed easier to me to do that in the lexer. (The lexer is pretty ugly anyway because it has to cope with a very ugly syntax.) If you have a patch to make the code cleaner without breaking it I'd be happy to incorporate it. Cheers, Edi. From edi at agharta.de Wed Aug 3 23:32:37 2005 From: edi at agharta.de (Edi Weitz) Date: Thu, 04 Aug 2005 01:32:37 +0200 Subject: [cl-ppcre-devel] Roles of scanner vs. parser vs. lexer? In-Reply-To: <20050803154956.A24602@eskimo.com> (Derek Peschel's message of "Wed, 3 Aug 2005 15:49:58 -0700") References: <20050803115948.A15221@eskimo.com> <20050803154956.A24602@eskimo.com> Message-ID: [You forgot to Cc the mailing list.] On Wed, 3 Aug 2005 15:49:58 -0700, Derek Peschel wrote: > CL-PPCRE makes a start, with a grammar and a set of token types that > the lexer produces. The problem, as you know, is how to make sure > the specification is complete while stating it in a way that's not > tied to a particular implementation. All the context-sensitive > tricks (extended mode, backreference vs. octal constants, \Q and \E) > make the spec harder to check too. Definitely. I'm afraid I won't be able to help very much, though. CL-PPCRE's parser/lexer is very much an ad hoc implementation and it's the first parser I ever wrote. > Climacs's syntax highlighters work with parse trees, so CL-PPCRE's > use of them seemed to make it a good match for Climacs. I think > Climacs needs to reconstruct the original text from the parse tree, > and it parses the text incrementally, and a good syntax module will > flag errors yet allow further changes to the text. So I don't know > if the Climacs parse tree could be ready to pass to CL-PPCRE. But > that would be ideal. Without knowing anything about Climacs' internals: How about using CL-PPCRE::PARSE-STRING and feeding the result to Climacs? > Are dangling \Es legal in Perl too? Yep. edi at vmware:~$ perl -le '$a = "\Q*\E\E"; print $a' \* Cheers, Edi. From dpeschel at eskimo.com Thu Aug 4 06:34:52 2005 From: dpeschel at eskimo.com (Derek Peschel) Date: Wed, 3 Aug 2005 23:34:52 -0700 Subject: [cl-ppcre-devel] Roles of scanner vs. parser vs. lexer? In-Reply-To: ; from edi@agharta.de on Thu, Aug 04, 2005 at 01:32:37AM +0200 References: <20050803115948.A15221@eskimo.com> <20050803154956.A24602@eskimo.com> Message-ID: <20050803233451.A12932@eskimo.com> On Thu, Aug 04, 2005 at 01:32:37AM +0200, Edi Weitz wrote: > [You forgot to Cc the mailing list.] Oops. Every list works differently. If I reply to you and Cc the list, is the server smart enough not to send you a second copy? For myself, I have the "don't send me copies of my posts" option on (Mutt already saves a copy) so that would be a logical extension as long as everyone uses it and everyone remembers to Cc the list. > Without knowing anything about Climacs' internals: How about using > CL-PPCRE::PARSE-STRING and feeding the result to Climacs? That might be OK. But the incremental parser algorithm keeps a lot more state than just the parse tree, I think. You don't have to worry about the algorithm, but you do have to use Climacs's classes to describe your language. You need to preserve _all_ original text, since I believe Climacs uses the parse tree to display the buffer, and I'd say you need to recover from errors by continuing to parse (maybe in some restricted way) past them. Considering these requirements it seems sensible to rewrite the parser using Climacs's framework. -- Derek From edi at agharta.de Thu Aug 4 10:22:39 2005 From: edi at agharta.de (Edi Weitz) Date: Thu, 04 Aug 2005 12:22:39 +0200 Subject: [cl-ppcre-devel] Roles of scanner vs. parser vs. lexer? In-Reply-To: <20050803233451.A12932@eskimo.com> (Derek Peschel's message of "Wed, 3 Aug 2005 23:34:52 -0700") References: <20050803115948.A15221@eskimo.com> <20050803154956.A24602@eskimo.com> <20050803233451.A12932@eskimo.com> Message-ID: On Wed, 3 Aug 2005 23:34:52 -0700, Derek Peschel wrote: > Oops. Every list works differently. If I reply to you and Cc the > list, is the server smart enough not to send you a second copy? No, but my procmailrc takes care of that. > Considering these requirements it seems sensible to rewrite the > parser using Climacs's framework. OK, good luck. Cheers, Edi.