From ondrej.svitek at gmail.com Fri Mar 23 23:29:10 2007 From: ondrej.svitek at gmail.com (Ondrej Svitek) Date: Sat, 24 Mar 2007 00:29:10 +0100 Subject: [cl-ppcre-devel] Re: cl-ppcre new feature proposal In-Reply-To: References: <86d303410703161631l11d8c1e4qef174f2f76bd3f49@mail.gmail.com> Message-ID: <86d303410703231629v10590f79id88c4f974d79b427@mail.gmail.com> Hello Edi, I've followed your list of suggestions and am sending you the patch [note: ASDF recognizes the new system as :cl-ppcre-testing], parser extension is now user-controllable through *ALLOW-NAMED-REGISTERS* switch, changes are documented in the source and html doc. I've also discovered a subtle problem - according to *ALLOW-QUOTING* documentation: * (let ((cl-ppcre:*allow-quoting* t)) (cl-ppcre:scan "^\\Qa+\\E$" "a+")) 0 2 #() #() but my SBCL simply returns NIL. It will be immediately obvious what's happening from the following code: (let ((cl-ppcre:*allow-named-registers* t)) (cl-ppcre:scan "(?.*)" "abc")) => error ... ; (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER "(?.*)")) ; ; caught ERROR: ; (during EVAL of LOAD-TIME-VALUE) ; Character 'r' may not follow '(?<' at position 3 in string "(?.*)" ; ==> ; (CL-PPCRE:SCAN (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER (?.*)")) "abc") ... SCAN function has a compiler-macro, which precompiles constant Perl regexes at load time. But LOAD-TIME-VALUE doesn't know about any runtime bindings (of course) affecting the scanner closure creation. Since compiler-macros may or may not get expanded, it is implementation dependent what happens. This code is likely to work in an interpreted REPL (but SBCL compiles all forms by default, hence it doesn't work here), but less likely to work when compiled. The situation probably affects more special variables than the mentioned two. Again, this is a rather subtle problem and unsuspecting user can get quite puzzled by it. I can think of the following remedies: 1. Clearly mention the pitfall in the doc and warn users to always explicitly use CREATE-SCANNER when binding special variables affecting closure generation. They can even use LOAD-TIME-VALUE, provided that it contains the desired binding inside. 2. Don't use LOAD-TIME-VALUE in the SCAN compiler-macro (I think there are more similar places that have to be fixed too, but haven't investigated them), but rather some kind of "FIRST-TIME-VALUE" - I mean, some simple sort of memoization, which would compute a scanner closure when it is needed for the first time, remembering it afterwards. This would fix the problem with binding of specials (safe only for constant values, though, as only the first-time encountered binding would be remembered and effective). It would also have the effect of spreading closure creation through program execution time. This could be seen as a benefit sometimes, e.g. when a program uses lots of constant regexes, which cause a noticeable start-up pause while compiling them during load time (hypothetically, I haven't run across such a case). Maybe there are some other possibilities, that's why I have just mentioned this issue and haven't done anything to fix it. I hope this helped. Regards, Ondrej On 19/03/07, Edi Weitz < edi at agharta.de> wrote: > > [Cc to mailing list.] > > Hi Ondrej, > > On Sat, 17 Mar 2007 00:31:46 +0100, "Ondrej Svitek" > wrote: > > > I've written a little extension to your wonderful CL-PPCRE library - > > support for named registers and back-references. I don't know if > > Perl has them (never used it), but ACL does and they proved useful > > for me in certain situations. > > > > [...] > > > > Feel free to incorporate this change, if you like it. Or not, if not > > :) > > Thanks for the code. I'd be interested to incorporate this, but for > that I'd like you to do the following: > > 1. Send a "unified diff" (diff -u) of your changes instead of a full > tarball. > > 2. Make sure to (if necessary) update all docstrings of functions that > changed their behaviour and to add docstrings for functions, > classes, or slots you added. > > 3. Add a user-visible switch to turn this new behaviour on or off, so > users can opt to have the old, Perl-compatible syntax instead. The > default should be off. > > 4. Update the HTML documentation accordingly. > > Thanks in advance, > Edi. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cl-ppcre-1.2.20-testing.diff.tar.gz Type: application/x-gzip Size: 16752 bytes Desc: not available URL: From edi at agharta.de Sat Mar 24 23:56:56 2007 From: edi at agharta.de (Edi Weitz) Date: Sun, 25 Mar 2007 00:56:56 +0100 Subject: [cl-ppcre-devel] New release 1.3.0 Message-ID: ChangeLog: Version 1.3.0 2007-03-24 Optional support for named registers (patch by Ondrej Svitek) Download: http://weitz.de/files/cl-ppcre.tar.gz Cheers, Edi. From edi at agharta.de Sun Mar 25 00:02:06 2007 From: edi at agharta.de (Edi Weitz) Date: Sun, 25 Mar 2007 01:02:06 +0100 Subject: [cl-ppcre-devel] Re: cl-ppcre new feature proposal In-Reply-To: <86d303410703231629v10590f79id88c4f974d79b427@mail.gmail.com> (Ondrej Svitek's message of "Sat, 24 Mar 2007 00:29:10 +0100") References: <86d303410703161631l11d8c1e4qef174f2f76bd3f49@mail.gmail.com> <86d303410703231629v10590f79id88c4f974d79b427@mail.gmail.com> Message-ID: Hi Ondrej, On Sat, 24 Mar 2007 00:29:10 +0100, "Ondrej Svitek" wrote: > I've followed your list of suggestions and am sending you the patch Thanks a lot. I've made a new release (1.3.0) which incorporates your patch. BTW, please the next time you send a patch don't include TAB characters in it. > SCAN function has a compiler-macro, which precompiles constant Perl > regexes at load time. But LOAD-TIME-VALUE doesn't know about any > runtime bindings (of course) affecting the scanner closure creation. Yes, this is a known issue and you'll find it mentioned in the documentation for *USE-BMH-MATCHERS* and *REGEX-CHAR-CODE-LIMIT*. It is clearly an oversight that it is not mentioned for *ALLOW-QUOTING*. I've added that and changed the example accordingly. Thanks for making me aware of that. Cheers, Edi.