From cl-ppcre-devel at frederic.jolliton.com Sat Jul 1 13:57:26 2006 From: cl-ppcre-devel at frederic.jolliton.com (=?iso-8859-1?Q?Fr=E9d=E9ric_Jolliton?=) Date: Sat, 01 Jul 2006 15:57:26 +0200 Subject: [cl-ppcre-devel] [PATCH] Patches to mix parse trees and regex string. Message-ID: <86u061zaux.fsf@mau.intra.tuxee.net> Here is 2 patches (attached, if mailman doesn't drop them) for CL-PPCRE for mixing parse trees and regex strings. The first patch add a (?.) syntax, where designate a keyword, and is case sensitive. The ?. was chosen to match the idea of the #. reader macro. It includes the synonym parse tree from the corresponding keyword while the regex is parsed. The second patch add the (:REGEX ) construct to use regex string into parse tree. It's the opposite idea of the former patch. The rationale is that sometimes it's preferable to use regex strings (for compactness), while sometimes it's better to use parse tree (when programmatically computed), but to my knowledge it was impossible to mix them easily. I hope that these patches are not too "hackish". In particular, the reference to a synonym must be a keyword, it's not possible to specify symbols in other packages. Also, since cl-ppcre use symbol property, it's worst to use keywords (unless one take cares to avoid name clashes.) I'm not sure how to fix that. These patches are more to describe what I've in mind rather than providing production quality patches. What do you think about these suggestions ? Examples of use: -=-=- CL> (define-parse-tree-synonym :foo (:sequence #\a (:greedy-repetition 1 3 (:alternation #\b #\c)))) CL> (scan-to-strings "b(ar(?.FOO))a" "baracca") "baracca" #("aracc") -=-=- Mixing the other way: -=-=- CL> (scan-to-strings '(:sequence "b" (:register (:sequence "ar" (:regex "a(?:b|c){1,3}"))) "a") "baracca") "baracca" #("aracc") CL> -=-=- -------------- next part -------------- A non-text attachment was scrubbed... Name: synonym-reference.patch Type: text/x-patch Size: 2911 bytes Desc: [PATCH 1/2] Add a syntax (?.) to reference a tree synonym. URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: regex-tree.patch Type: text/x-patch Size: 719 bytes Desc: [PATCH 2/2] Add a (:REGEX ) construct to use regex string into parse tree. URL: -------------- next part -------------- -- Fr?d?ric Jolliton From edi at agharta.de Mon Jul 3 11:21:06 2006 From: edi at agharta.de (Edi Weitz) Date: Mon, 03 Jul 2006 13:21:06 +0200 Subject: [cl-ppcre-devel] New version 1.2.15 (Was: Patches to mix parse trees and regex string) In-Reply-To: <86u061zaux.fsf@mau.intra.tuxee.net> =?iso-8859-1?q?=28Fr=E9d=E9ric?= Jolliton's message of "Sat, 01 Jul 2006 15:57:26 +0200") References: <86u061zaux.fsf@mau.intra.tuxee.net> Message-ID: Hi! On Sat, 01 Jul 2006 15:57:26 +0200, Fr?d?ric Jolliton wrote: > The first patch add a (?.) syntax, where designate a > keyword, and is case sensitive. The ?. was chosen to match the idea > of the #. reader macro. It includes the synonym parse tree from the > corresponding keyword while the regex is parsed. Thanks for this patch, but as you wrote in your email, I think this one is a little bit too hackish. It also breaks compatibility with Perl syntax. > The second patch add the (:REGEX ) construct to use regex > string into parse tree. It's the opposite idea of the former patch. And thanks for this one as well. I've made a new release (1.2.15) which incorporates your changes. Cheers, Edi. From cl-ppcre-devel at frederic.jolliton.com Mon Jul 3 14:26:51 2006 From: cl-ppcre-devel at frederic.jolliton.com (=?iso-8859-1?Q?Fr=E9d=E9ric_Jolliton?=) Date: Mon, 03 Jul 2006 16:26:51 +0200 Subject: [cl-ppcre-devel] Re: New version 1.2.15 (Was: Patches to mix parse trees and regex string) In-Reply-To: (Edi Weitz's message of "Mon, 03 Jul 2006 13:21:06 +0200") References: <86u061zaux.fsf@mau.intra.tuxee.net> Message-ID: <86d5cmzrv8.fsf@mau.intra.tuxee.net> >> The first patch add a (?.) syntax, where designate a >> keyword, and is case sensitive. The ?. was chosen to match the idea >> of the #. reader macro. It includes the synonym parse tree from the >> corresponding keyword while the regex is parsed. > > Thanks for this patch, but as you wrote in your email, I think this > one is a little bit too hackish. It also breaks compatibility with > Perl syntax. Ok, then I've another suggestion. Let (:REGEX ) take optionally more symbols, and use place holders in to insert corresponding syntax trees. For example: (dpts tree1 (:regex "a{2,5}")) (dpts tree2 (:regex "b{1,3}")) (dpts tree3 (:regex "foo((?~)-bar-(?~)+)baz" tree1 tree2)) Where (?~) is the place holder. Or something else which doesn't break compatibility with Perl syntax. Without such a feature, the last tree would have been: (dpts tree3 (:sequence "foo" (:register (:sequence tree1 "-bar-" (:greedy-repetition 1 nil tree2))) "baz")) (Where dpts = ppcre:define-parse-tree-synonym) Is that a better alternative ? -- Fr?d?ric Jolliton From edi at agharta.de Tue Jul 4 12:59:12 2006 From: edi at agharta.de (Edi Weitz) Date: Tue, 04 Jul 2006 14:59:12 +0200 Subject: [cl-ppcre-devel] Re: New version 1.2.15 In-Reply-To: <86d5cmzrv8.fsf@mau.intra.tuxee.net> =?iso-8859-1?q?=28Fr=E9d=E9ric?= Jolliton's message of "Mon, 03 Jul 2006 16:26:51 +0200") References: <86u061zaux.fsf@mau.intra.tuxee.net> <86d5cmzrv8.fsf@mau.intra.tuxee.net> Message-ID: On Mon, 03 Jul 2006 16:26:51 +0200, Fr?d?ric Jolliton wrote: > Ok, then I've another suggestion. Let (:REGEX ) take > optionally more symbols, and use place holders in to insert > corresponding syntax trees. For example: > > (dpts tree1 (:regex "a{2,5}")) > (dpts tree2 (:regex "b{1,3}")) > (dpts tree3 (:regex "foo((?~)-bar-(?~)+)baz" tree1 tree2)) > > Where (?~) is the place holder. Or something else which doesn't > break compatibility with Perl syntax. (?~) is not special in Perl, so this /would/ break compatibility with Perl syntax. In fact, everything would break compatibility. Apart from that, you'd have to change the parser accordingly, you'd have to check if the number of occurrences of (?~) is equal to the number of optional parameters, you'd have to check that (?~) is only used within (:REGEX ...), and so on. > Without such a feature, the last tree would have been: > > (dpts tree3 (:sequence "foo" > (:register > (:sequence tree1 > "-bar-" > (:greedy-repetition 1 nil tree2))) > "baz")) > > (Where dpts = ppcre:define-parse-tree-synonym) > > Is that a better alternative ? I don't think it's worth the trouble. My personal opinion is that for complicated regular expressions you should use the S-expression syntax anyway. YMMV, of course. Cheers, Edi. From cl-ppcre-devel at frederic.jolliton.com Tue Jul 4 14:20:03 2006 From: cl-ppcre-devel at frederic.jolliton.com (=?iso-8859-1?Q?Fr=E9d=E9ric_Jolliton?=) Date: Tue, 04 Jul 2006 16:20:03 +0200 Subject: [cl-ppcre-devel] Re: New version 1.2.15 In-Reply-To: (Edi Weitz's message of "Tue, 04 Jul 2006 14:59:12 +0200") References: <86u061zaux.fsf@mau.intra.tuxee.net> <86d5cmzrv8.fsf@mau.intra.tuxee.net> Message-ID: <8664id310s.fsf@mau.intra.tuxee.net> >> Ok, then I've another suggestion. Let (:REGEX ) take >> optionally more symbols, and use place holders in to >> insert corresponding syntax trees. For example: [...] >> Is that a better alternative ? > > I don't think it's worth the trouble. My personal opinion is that > for complicated regular expressions you should use the S-expression > syntax anyway. YMMV, of course. Ok. Indeed, it is adding too much complexity. I will stick with s-exp. And thanks again for this great package ! -- Fr?d?ric Jolliton From edi at agharta.de Sun Jul 16 13:35:36 2006 From: edi at agharta.de (Edi Weitz) Date: Sun, 16 Jul 2006 15:35:36 +0200 Subject: [cl-ppcre-devel] New version 1.2.16 Message-ID: ChangeLog: Version 1.2.16 2006-07-16 Added :ELEMENT-TYPE to REGEX-REPLACE(-ALL) Download: http://weitz.de/files/cl-ppcre.tar.gz Cheers, Edi.