[climacs-devel] Re: Regexs in climacs
Robert Strandh
strandh at labri.fr
Thu Feb 3 05:42:19 UTC 2005
Hello,
Lawrence Mitchell writes:
> Lawrence Mitchell wrote:
>
> > after my rather abortive attempts to try and define a
> > delete-other-windows the other day, I thought I'd look at maybe
> > supporting some kind of regex search in climacs.
>
> Here's a rather trivial buffer-string based idea to demonstrate
> that things [cw]ould work :).
>
> (define-named-command com-re-search-backward ()
> (let ((regex (accept 'string
> :prompt "RE search backward")))
> (re-search-backward regex (buffer (current-window))
> (point (current-window)))))
>
> (defun re-search-forward (regex buffer offset)
> (let ((string (coerce (buffer-sequence buffer offset (size buffer)) 'string)))
> (multiple-value-bind (m-start m-end r-starts r-ends)
> (cl-pprcre:scan regex string)
> (when m-start
> (incf (offset (point buffer)) m-end)))))
>
> (defun re-search-backward (regex buffer mark)
> (let ((string (coerce (buffer-sequence buffer 0 (offset mark))
> 'string)))
> (multiple-value-bind (m-start m-end r-starts r-ends)
> (cl-ppcre:scan regex string)
> (when m-start
> (decf (offset mark) (- (offset mark) m-start))))))
>
> (define-named-command com-re-search-forward ()
> (let ((regex (accept 'string
> :prompt "RE Search")))
> (re-search-forward regex (buffer (current-window))
> (point (current-window)))))
>
> [...]
Yes, I see. Nice, but probably not ready for prime time since the
entire buffer is converted to a string.
> |> o Licensing differences. Climacs is released under the LGPL, while
> |> cl-ppcre is under as BSD-style license. I don't think this is a
> |> problem (as far as I can tell from reading the licenses), but if you
> |> know otherwise, I'd be grateful to hear.
>
> | I don't see a problem but IANAL. It is my understanding that the BSD
> | license basically means that you can do with CL-PPCRE whatever you
> | want as long as you credit my original work - this is what I intended.
> | So you could, e.g., incorporate it into a LPGL project without a
> | problem. Of course, the original CL-PPCRE will still be available
> | under the old license.
OK, It's OK with me to stick it in. Technically, there is a problem
though, the LGPL says that any addition to the software must be LGPL,
and cl-ppcre would be an addition, but it is not LGPL. I guess as the
author, I can grant a special exception for cl-ppcre.
> |> o How to best match up cl-ppcre's matching on strings with climacs'
> |> idea of a buffer.
> |>
> |> A climacs buffer is a sequence of objects (which may or may not be
> |> characters, but we'll ignore that for the moment). Now, I can
> |> easily generate a string of the contents of the buffer, and call
> |> SCAN (or whatever) on the string. However, this is going to be slow
> |> for large buffers (especially if we find something just after point,
> |> we've still constructed the whole buffer-string).
> |>
> |> The "obvious" solution to this is to use streams instead (probably),
> |> so, I wonder if cl-ppcre would be amenable to something like this?
>
> | Well, supporting all of Perl's regex facilities implies that you need
> | to have random access to the target - I don't think you can fit
> | streams into this picture. I'm not a CS guy but my understanding is
> | that CL-PPCRE is based on an NFA and you can't change that easily.
> | You can build a DFA that implements a subset of CL-PPCRE and that
> | would work with streams but that wouldn't be CL-PPCRE anymore... :)
>
> | Now, using another kind of structures (like, say, your buffers) that
> | aren't strings but are random-access - that wouldn't be /too/ hard.
> | It would involve going through three or four files and change SCHAR to
> | something else but basically I don't really see a problem.
I agree, replacing schar with calls to (buffer-object buffer offset)
would probably be all that is required.
> | However,
> | as CL-PPCRE has a reputation for being quite fast I wouldn't want to
> | sacrifice this for greater flexibility (buffers instead of strings,
> | arbitrary objects instead of characters - you name it). I think the
> | right way to do it would be to offer the ability to build different
> | versions of CL-PPCRE based on *FEATURES*, i.e. at compile time you
> | decide whether you want a fast regex engine for strings or if you want
> | a not-so-fast regex engine for, say, buffers. Would that be OK for
> | you?
Probably the easiest way to accomplish this is to include a modified
version of cl-ppcre in the Climacs distribution which is adapted to
the buffer access method.
--
Robert Strandh
---------------------------------------------------------------------
Greenspun's Tenth Rule of Programming: any sufficiently complicated C
or Fortran program contains an ad hoc informally-specified bug-ridden
slow implementation of half of Common Lisp.
---------------------------------------------------------------------
More information about the climacs-devel
mailing list