[cl-ppcre-devel] Re: cl-ppcre

Edi Weitz edi at agharta.de
Sat Jun 12 14:39:50 UTC 2004


Hi Daniel!

On Sat, 12 Jun 2004 15:54:11 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote:

>   today I explored the possibilities of regular expressions
> implementations in various Debian Common Lisp packages. I really
> liked your library - thank you for writing cl-ppcre library.

You're welcome.

>   I also looked into elegant cl-lexer package built on top of
> cl-regex library.  What I missed in cl-ppcre is a parse-tree node
> similar to cl-regex's 'success node, which defines return value of
> match/scan functions. With 'success node one can build `deflexer'
> macro on top of cl-ppcre as easy as on top of cl-regex package.
>
>   Is it possible to extend cl-ppcre with similar feature?

I might look into this for a future version but see below.

> Footnote: In cl-lexer, deflexer macro 
>
>   (deflexer foo
>     ("regexp" some action)                      ; 0
>     ("another regexp" another action)           ; 1
>     ...))
>
> numbers each pair of regexp and action, then combine regexp parse
> trees into one big parse tree
>
>   `(alt
>      (seq (regexp tree)         (success 0))
>      (seq (another regexp tree) (success 1))
>      ...)
>
> and use return value from match (ie regexp serial number) to select
> an action associated to matching regexp)

I've recently written demo code like this for another CL-PPCRE user
who also wanted to build a lexer:

  (in-package :cl-user)

  (eval-when (:compile-toplevel :load-toplevel :execute)
    (defmacro with-unique-names ((&rest bindings) &body body)
      ;; see <http://www.cliki.net/Common%20Lisp%20Utilities>
      `(let ,(mapcar #'(lambda (binding)
                         (check-type binding (or cons symbol))
                         (if (consp binding)
                           (destructuring-bind (var x) binding
                             (check-type var symbol)
                             `(,var (gensym ,(etypecase x
                                               (symbol (symbol-name x))
                                               (character (string x))
                                               (string x)))))
                           `(,binding (gensym ,(symbol-name binding)))))
                     bindings)
         , at body)))

  (defmacro deflexer (name &body body)
      (with-unique-names (regex-table regex token sexpr-regex anchored-regex string start scanner next-pos)
        `(let ((,regex-table
                (loop for (,regex . ,token) in (list ,@(loop for (regex token) in body
                                                             collect `(cons ,regex ,token)))
                      for ,sexpr-regex = (etypecase ,regex
                                           (function
                                             (error "Compiled scanners are not allowed here"))
                                           (string
                                            (cl-ppcre::parse-string ,regex))
                                           (list
                                            ,regex))
                      for ,anchored-regex = (cl-ppcre:create-scanner `(:sequence
                                                                        :modeless-start-anchor
                                                                        ,,sexpr-regex))
                      collect (cons ,anchored-regex ,token))))
          (defun ,name (,string &key ((:start ,start) 0))
            (loop for (,scanner . ,token) in ,regex-table
                  for ,next-pos = (nth-value 1 (cl-ppcre:scan ,scanner ,string :start ,start))
                  when ,next-pos do (return (values ,token ,next-pos)))))))

You should be able to use it like this:

  *   (deflexer mylexer
        ("'.*'"          'string)
        ("#.*$"          'comment)
        ("[ \t\r\f]+"    'ws)
        (":="            'assign)
        ("[\[]"          'lbrack)
        ("[\]]"          'rbrack)
        ("[\,]"          'comma)
        ("[\:]"          'colon)
        ("[\;]"          'semicolon)
        ("[+-]?[0-9]*[\.][0-9]+([eE][+-]?[0-9]+)?" 'float)
        ("[+-]?[0-9]+"   'integer)
        ("[a-zA-Z0-9_]+" 'id)
        ("."             'unknown))
  ; Converted MYLEXER.

  MYLEXER
  * (mylexer "a:=123.4?")

  ID
  1
  * (mylexer "a:=123.4?" :start 1)

  ASSIGN
  3
  * (mylexer "a:=123.4?" :start 3)

  FLOAT
  8
  * (mylexer "a:=123.4?" :start 8)

  UNKNOWN
  9

This one only returns tokens but it should be trivial to change the
macro such that the newly-defined lexer invokes functions
instead. Wouldn't that already do what you want? I'm not sure what the
approach you sketched above would buy you compared to this one.

Cheers,
Edi.

PS: Please, if possible, continue this conversation on the mailing
    list. Thanks.




More information about the Cl-ppcre-devel mailing list