[cl-ppcre-devel] defpatt

Klaus Harbo klaus at harbo.net
Wed Jun 2 11:02:44 UTC 2004


Working with cl-ppcre, I have found that I increasingly use the s-expr 
representation rather than the traditional string representation with 
its infix operators.  To make it easier to work with the s-expressions, 
I've developed 'defpatt' - a package which implements a notation for 
defininig and referring to regular expressions in terms of cl-ppcre 
s-expressions.  I thought it might interest the readers of this list.

The package can be downloaded from 
http://www.harbo.net/downloads/defpatt-0.2.tar.gz .

Suggestions, comments, improvements are welcome. 

best regards,

-Klaus.

------
defpatt examples (from defpatt.lisp):
------

#| EXAMPLES

  ; If you want to try the examples, be sure to evaluate the
  ; expression below first - otherwise the other ones won't work.

 > (defpatt:defpatt-set-default-macro-char)

  ; Defines #\¤ as macro character

=> T

 > (cl-ppcre:all-matches-as-strings ¤(alt "a" "c" "f") "abcdefghi")

  ; Note: Equivalent to "a|c|f"

=> ("a" "c" "f")

  ; That's all very well, but doesn't buy us very much.
  ; However `defpatt' (as per cl-ppcre's sexpr-based
  ; representation of REs) enables us to both document
  ; the patterns much better by letting us insert comments
  ; into REs...

 > (cl-ppcre:scan-to-strings
         ¤(seq digit+  ; used space
           ws+    
           digit+  ; available space
           ws+
           digit+  ; remaining space
           ) "123   4567   7887")

  ; Note: `ws+' and `digit+' are defined above, in `defpatt-initialize'.

=> "123   4567   7887", #()

  ; ...as well as lets us capture data in a structured fashion...

 > (cl-ppcre:register-groups-bind (used avail remain)
         (¤(seq (reg digit+)  ; used space
            ws+          
            (reg digit+)  ; available space
            ws+
            (reg digit+)  ; remaining space
            ) "123   4567   7887")
         (mapcar #'parse-integer (list used avail remain)))

  ; Note: `(reg ...)' creates a register binding

=> (123 4567 7887)

  ; ...but also lets us _FIRST_ define and document the abstraction...

 > (defpatt match-nums ()
     ¤(seq (reg digit+)  ; used space
       ws+
       (reg digit+)  ; available space
       ws+
       (reg digit+)  ; remaining space
       ))
=> MATCH-NUMS

  ; ...and _THEN_ use it...

 > (cl-ppcre:register-groups-bind (used avail remain)
         (¤match-nums "123   4567   7887")
       (mapcar #'parse-integer (list used avail remain)))

=> (123 4567 7887)

  ; which is a lot more easily understood, as I am sure you will
  ; agree.

 > (cl-ppcre:scan-to-strings ¤(upto "efg") "abcdefghi")

=> "abcd", #()

 > (cl-ppcre:scan-to-strings ¤(upto+ "efg") "abcdefghi")

=> "abcdefg", #()

  ; To see the raw cl-ppcre expansion of a `defpatt' expression,
  ; simply enter it:

 > ¤(seq (reg digit+)    ; used space
    ws+
    (reg digit+)    ; available space
    ws+
    (reg digit+)    ; remaining space
    )
=> (:SEQUENCE
    (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))
    (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)
    (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))
    (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)
    (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))))

  ; To see _HOW_ `defpatt' expands an expression use `macroexpand':

 > (macroexpand-1 '¤(seq (reg digit+)    ; used space
            ws+
            (reg digit+)    ; available space
            ws+
            (reg digit+)    ; remaining space
            ))
=> (LABELS ((++ (PATT) (REP PATT 1 NIL))
        (UPTO (PATT)
          `(:SEQUENCE
        (:FLAGS :SINGLE-LINE-MODE-P)
        (:GREEDY-REPETITION
         0
         NIL
         (:SEQUENCE :EVERYTHING (:NEGATIVE-LOOKAHEAD ,PATT)))
        :EVERYTHING))
        (?? (PATT) (REP PATT 0 1))
        (UPTO+ (PATT) `(:SEQUENCE ,(UPTO PATT) ,PATT))
        (ALT (&REST ARGS) `(:ALTERNATION , at ARGS))
        (** (PATT) (REP PATT 0 NIL))
        (SEQ (&REST ARGS) `(:SEQUENCE , at ARGS))
        (REG (&REST ARGS) `(:REGISTER , at ARGS))
        (REP (PATT &OPTIONAL (MIN 0) (MAX NIL))
          `(:GREEDY-REPETITION ,MIN ,MAX ,PATT)))
     (SYMBOL-MACROLET ((WS+
            '(:GREEDY-REPETITION
              1
              NIL
              :WHITESPACE-CHAR-CLASS))
               (WS*
            '(:GREEDY-REPETITION
              0
              NIL
              :WHITESPACE-CHAR-CLASS))
               (DIGIT '(:CHAR-CLASS (:RANGE #\0 #\9)))
               (DIGIT+ (++ DIGIT))
               (MATCH-NUMS
            (DEFPATT-PATTERN (SEQ
                      (REG DIGIT+)
                      WS+
                      (REG DIGIT+)
                      WS+
                      (REG DIGIT+))))
               (DIGIT* (** DIGIT)))
     (SEQ
      (REG DIGIT+)
      WS+
      (REG DIGIT+)
      WS+
      (REG DIGIT+))))

  ; `upto' and `upto+' are good examples of how having an abstraction
  ; mechanism helps keep maintainable and understandable REs.  See
  ; their definitions above.

|#




More information about the Cl-ppcre-devel mailing list