[cl-ppcre-devel] Using cl-ppcre in a text editor

Edi Weitz edi at agharta.de
Sat Jan 29 19:47:56 UTC 2005


Hi!

On Sat, 29 Jan 2005 16:16:50 +0000, Lawrence Mitchell <wencel at gmail.com> wrote:

> I'm looking at trying to use cl-ppcre to add regular expression
> support to the Climacs editor (<URL:
> http://common-lisp.net/project/climacs/>).

Sounds cool... )

> A few things spring to mind:
>
> o Licensing differences.  Climacs is released under the LGPL, while
> cl-ppcre is under as BSD-style license.  I don't think this is a
> problem (as far as I can tell from reading the licenses), but if you
> know otherwise, I'd be grateful to hear.

I don't see a problem but IANAL.  It is my understanding that the BSD
license basically means that you can do with CL-PPCRE whatever you
want as long as you credit my original work - this is what I intended.
So you could, e.g., incorporate it into a LPGL project without a
problem.  Of course, the original CL-PPCRE will still be available
under the old license.

> o How to best match up cl-ppcre's matching on strings with climacs'
> idea of a buffer.
>
> A climacs buffer is a sequence of objects (which may or may not be
> characters, but we'll ignore that for the moment).  Now, I can
> easily generate a string of the contents of the buffer, and call
> SCAN (or whatever) on the string.  However, this is going to be slow
> for large buffers (especially if we find something just after point,
> we've still constructed the whole buffer-string).
>
> The "obvious" solution to this is to use streams instead (probably),
> so, I wonder if cl-ppcre would be amenable to something like this?

Well, supporting all of Perl's regex facilities implies that you need
to have random access to the target - I don't think you can fit
streams into this picture.  I'm not a CS guy but my understanding is
that CL-PPCRE is based on an NFA and you can't change that easily.
You can build a DFA that implements a subset of CL-PPCRE and that
would work with streams but that wouldn't be CL-PPCRE anymore... :)

Now, using another kind of structures (like, say, your buffers) that
aren't strings but are random-access - that wouldn't be /too/ hard.
It would involve going through three or four files and change SCHAR to
something else but basically I don't really see a problem.  However,
as CL-PPCRE has a reputation for being quite fast I wouldn't want to
sacrifice this for greater flexibility (buffers instead of strings,
arbitrary objects instead of characters - you name it).  I think the
right way to do it would be to offer the ability to build different
versions of CL-PPCRE based on *FEATURES*, i.e. at compile time you
decide whether you want a fast regex engine for strings or if you want
a not-so-fast regex engine for, say, buffers.  Would that be OK for
you?

> On another, somewhat unrelated note.  One thing that one would like
> to do is regexp search and replace, now, if I know how many groups
> the user is going to input into their regexp before the fact, I can
> use REGISTER-GROUPS-BIND to get at the substring matches via
> variables.  I guess there isn't any way to do this without knowing
> the input beforehand.  So is the idea then just to use SCAN and then
> manually grab the substrings via REG-STARTS and REG-ENDS, or have I
> missed something obvious?

No, I don't see a better way.  If you don't now the regex then you
have to check the return value and see how long the register arrays
are.

Cheers,
Edi.




More information about the Cl-ppcre-devel mailing list