[cl-ppcre-devel] Re: cl-ppcre
Edi Weitz
edi at agharta.de
Sat Jun 12 14:39:50 UTC 2004
Hi Daniel!
On Sat, 12 Jun 2004 15:54:11 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote:
> today I explored the possibilities of regular expressions
> implementations in various Debian Common Lisp packages. I really
> liked your library - thank you for writing cl-ppcre library.
You're welcome.
> I also looked into elegant cl-lexer package built on top of
> cl-regex library. What I missed in cl-ppcre is a parse-tree node
> similar to cl-regex's 'success node, which defines return value of
> match/scan functions. With 'success node one can build `deflexer'
> macro on top of cl-ppcre as easy as on top of cl-regex package.
> Is it possible to extend cl-ppcre with similar feature?
I might look into this for a future version but see below.
> Footnote: In cl-lexer, deflexer macro
> (deflexer foo
> ("regexp" some action) ; 0
> ("another regexp" another action) ; 1
> ...))
> numbers each pair of regexp and action, then combine regexp parse
> trees into one big parse tree
> `(alt
> (seq (regexp tree) (success 0))
> (seq (another regexp tree) (success 1))
> ...)
> and use return value from match (ie regexp serial number) to select
> an action associated to matching regexp)
I've recently written demo code like this for another CL-PPCRE user
who also wanted to build a lexer:
(in-package :cl-user)
(eval-when (:compile-toplevel :load-toplevel :execute)
(defmacro with-unique-names ((&rest bindings) &body body)
;; see <http://www.cliki.net/Common%20Lisp%20Utilities>
`(let ,(mapcar #'(lambda (binding)
(check-type binding (or cons symbol))
(if (consp binding)
(destructuring-bind (var x) binding
(check-type var symbol)
`(,var (gensym ,(etypecase x
(symbol (symbol-name x))
(character (string x))
(string x)))))
`(,binding (gensym ,(symbol-name binding)))))
, at body)))
(defmacro deflexer (name &body body)
(with-unique-names (regex-table regex token sexpr-regex anchored-regex string start scanner next-pos)
`(let ((,regex-table
(loop for (,regex . ,token) in (list ,@(loop for (regex token) in body
collect `(cons ,regex ,token)))
for ,sexpr-regex = (etypecase ,regex
(error "Compiled scanners are not allowed here"))
(cl-ppcre::parse-string ,regex))
for ,anchored-regex = (cl-ppcre:create-scanner `(:sequence
collect (cons ,anchored-regex ,token))))
(defun ,name (,string &key ((:start ,start) 0))
(loop for (,scanner . ,token) in ,regex-table
for ,next-pos = (nth-value 1 (cl-ppcre:scan ,scanner ,string :start ,start))
when ,next-pos do (return (values ,token ,next-pos)))))))
You should be able to use it like this:
* (deflexer mylexer
("'.*'" 'string)
("#.*$" 'comment)
("[ \t\r\f]+" 'ws)
(":=" 'assign)
("[\[]" 'lbrack)
("[\]]" 'rbrack)
("[\,]" 'comma)
("[\:]" 'colon)
("[\;]" 'semicolon)
("[+-]?[0-9]*[\.][0-9]+([eE][+-]?[0-9]+)?" 'float)
("[+-]?[0-9]+" 'integer)
("[a-zA-Z0-9_]+" 'id)
("." 'unknown))
; Converted MYLEXER.
* (mylexer "a:=123.4?")
* (mylexer "a:=123.4?" :start 1)
* (mylexer "a:=123.4?" :start 3)
* (mylexer "a:=123.4?" :start 8)
This one only returns tokens but it should be trivial to change the
macro such that the newly-defined lexer invokes functions
instead. Wouldn't that already do what you want? I'm not sure what the
approach you sketched above would buy you compared to this one.
PS: Please, if possible, continue this conversation on the mailing
list. Thanks.
