[climacs-devel] Re: Regexs in climacs

Robert Strandh strandh at labri.fr
Thu Feb 3 05:42:19 UTC 2005


Hello, 

Lawrence Mitchell writes:
 > Lawrence Mitchell wrote:
 > 
 > > after my rather abortive attempts to try and define a
 > > delete-other-windows the other day, I thought I'd look at maybe
 > > supporting some kind of regex search in climacs.
 > 
 > Here's a rather trivial buffer-string based idea to demonstrate
 > that things [cw]ould work :).
 > 
 > (define-named-command com-re-search-backward ()
 >   (let ((regex (accept 'string
 >                        :prompt "RE search backward")))
 >     (re-search-backward regex (buffer (current-window))
 >                         (point (current-window)))))
 > 
 > (defun re-search-forward (regex buffer offset)
 >   (let ((string (coerce (buffer-sequence buffer offset (size buffer)) 'string)))
 >     (multiple-value-bind (m-start m-end r-starts r-ends)
 >         (cl-pprcre:scan regex string)
 >       (when m-start
 >         (incf (offset (point buffer)) m-end)))))
 > 
 > (defun re-search-backward (regex buffer mark)
 >   (let ((string (coerce (buffer-sequence buffer 0 (offset mark))
 >                         'string)))
 >     (multiple-value-bind (m-start m-end r-starts r-ends)
 >         (cl-ppcre:scan regex string)
 >       (when m-start
 >         (decf (offset mark) (- (offset mark) m-start))))))
 > 
 > (define-named-command com-re-search-forward ()
 >   (let ((regex (accept 'string
 >                        :prompt "RE Search")))
 >     (re-search-forward regex (buffer (current-window))
 >                        (point (current-window)))))
 > 
 > [...]

Yes, I see.  Nice, but probably not ready for prime time since the
entire buffer is converted to a string. 

 > |> o Licensing differences.  Climacs is released under the LGPL, while
 > |> cl-ppcre is under as BSD-style license.  I don't think this is a
 > |> problem (as far as I can tell from reading the licenses), but if you
 > |> know otherwise, I'd be grateful to hear.
 > 
 > | I don't see a problem but IANAL.  It is my understanding that the BSD
 > | license basically means that you can do with CL-PPCRE whatever you
 > | want as long as you credit my original work - this is what I intended.
 > | So you could, e.g., incorporate it into a LPGL project without a
 > | problem.  Of course, the original CL-PPCRE will still be available
 > | under the old license.

OK, It's OK with me to stick it in.  Technically, there is a problem
though, the LGPL says that any addition to the software must be LGPL,
and cl-ppcre would be an addition, but it is not LGPL.  I guess as the
author, I can grant a special exception for cl-ppcre.

 > |> o How to best match up cl-ppcre's matching on strings with climacs'
 > |> idea of a buffer.
 > |>
 > |> A climacs buffer is a sequence of objects (which may or may not be
 > |> characters, but we'll ignore that for the moment).  Now, I can
 > |> easily generate a string of the contents of the buffer, and call
 > |> SCAN (or whatever) on the string.  However, this is going to be slow
 > |> for large buffers (especially if we find something just after point,
 > |> we've still constructed the whole buffer-string).
 > |>
 > |> The "obvious" solution to this is to use streams instead (probably),
 > |> so, I wonder if cl-ppcre would be amenable to something like this?
 > 
 > | Well, supporting all of Perl's regex facilities implies that you need
 > | to have random access to the target - I don't think you can fit
 > | streams into this picture.  I'm not a CS guy but my understanding is
 > | that CL-PPCRE is based on an NFA and you can't change that easily.
 > | You can build a DFA that implements a subset of CL-PPCRE and that
 > | would work with streams but that wouldn't be CL-PPCRE anymore... :)
 > 
 > | Now, using another kind of structures (like, say, your buffers) that
 > | aren't strings but are random-access - that wouldn't be /too/ hard.
 > | It would involve going through three or four files and change SCHAR to
 > | something else but basically I don't really see a problem.  

I agree, replacing schar with calls to (buffer-object buffer offset)
would probably be all that is required. 

 > | However,
 > | as CL-PPCRE has a reputation for being quite fast I wouldn't want to
 > | sacrifice this for greater flexibility (buffers instead of strings,
 > | arbitrary objects instead of characters - you name it).  I think the
 > | right way to do it would be to offer the ability to build different
 > | versions of CL-PPCRE based on *FEATURES*, i.e. at compile time you
 > | decide whether you want a fast regex engine for strings or if you want
 > | a not-so-fast regex engine for, say, buffers.  Would that be OK for
 > | you?

Probably the easiest way to accomplish this is to include a modified
version of cl-ppcre in the Climacs distribution which is adapted to
the buffer access method. 

-- 
Robert Strandh

---------------------------------------------------------------------
Greenspun's Tenth Rule of Programming: any sufficiently complicated C
or Fortran program contains an ad hoc informally-specified bug-ridden
slow implementation of half of Common Lisp.
---------------------------------------------------------------------



More information about the climacs-devel mailing list