[cl-ppcre-devel] defpatt
Klaus Harbo
klaus at harbo.net
Wed Jun 2 11:02:44 UTC 2004
Working with cl-ppcre, I have found that I increasingly use the s-expr
representation rather than the traditional string representation with
its infix operators. To make it easier to work with the s-expressions,
I've developed 'defpatt' - a package which implements a notation for
defininig and referring to regular expressions in terms of cl-ppcre
s-expressions. I thought it might interest the readers of this list.
The package can be downloaded from
http://www.harbo.net/downloads/defpatt-0.2.tar.gz .
Suggestions, comments, improvements are welcome.
best regards,
-Klaus.
------
defpatt examples (from defpatt.lisp):
------
#| EXAMPLES
; If you want to try the examples, be sure to evaluate the
; expression below first - otherwise the other ones won't work.
> (defpatt:defpatt-set-default-macro-char)
; Defines #\¤ as macro character
=> T
> (cl-ppcre:all-matches-as-strings ¤(alt "a" "c" "f") "abcdefghi")
; Note: Equivalent to "a|c|f"
=> ("a" "c" "f")
; That's all very well, but doesn't buy us very much.
; However `defpatt' (as per cl-ppcre's sexpr-based
; representation of REs) enables us to both document
; the patterns much better by letting us insert comments
; into REs...
> (cl-ppcre:scan-to-strings
¤(seq digit+ ; used space
ws+
digit+ ; available space
ws+
digit+ ; remaining space
) "123 4567 7887")
; Note: `ws+' and `digit+' are defined above, in `defpatt-initialize'.
=> "123 4567 7887", #()
; ...as well as lets us capture data in a structured fashion...
> (cl-ppcre:register-groups-bind (used avail remain)
(¤(seq (reg digit+) ; used space
ws+
(reg digit+) ; available space
ws+
(reg digit+) ; remaining space
) "123 4567 7887")
(mapcar #'parse-integer (list used avail remain)))
; Note: `(reg ...)' creates a register binding
=> (123 4567 7887)
; ...but also lets us _FIRST_ define and document the abstraction...
> (defpatt match-nums ()
¤(seq (reg digit+) ; used space
ws+
(reg digit+) ; available space
ws+
(reg digit+) ; remaining space
))
=> MATCH-NUMS
; ...and _THEN_ use it...
> (cl-ppcre:register-groups-bind (used avail remain)
(¤match-nums "123 4567 7887")
(mapcar #'parse-integer (list used avail remain)))
=> (123 4567 7887)
; which is a lot more easily understood, as I am sure you will
; agree.
> (cl-ppcre:scan-to-strings ¤(upto "efg") "abcdefghi")
=> "abcd", #()
> (cl-ppcre:scan-to-strings ¤(upto+ "efg") "abcdefghi")
=> "abcdefg", #()
; To see the raw cl-ppcre expansion of a `defpatt' expression,
; simply enter it:
> ¤(seq (reg digit+) ; used space
ws+
(reg digit+) ; available space
ws+
(reg digit+) ; remaining space
)
=> (:SEQUENCE
(:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))
(:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)
(:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))
(:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)
(:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))))
; To see _HOW_ `defpatt' expands an expression use `macroexpand':
> (macroexpand-1 '¤(seq (reg digit+) ; used space
ws+
(reg digit+) ; available space
ws+
(reg digit+) ; remaining space
))
=> (LABELS ((++ (PATT) (REP PATT 1 NIL))
(UPTO (PATT)
`(:SEQUENCE
(:FLAGS :SINGLE-LINE-MODE-P)
(:GREEDY-REPETITION
0
NIL
(:SEQUENCE :EVERYTHING (:NEGATIVE-LOOKAHEAD ,PATT)))
:EVERYTHING))
(?? (PATT) (REP PATT 0 1))
(UPTO+ (PATT) `(:SEQUENCE ,(UPTO PATT) ,PATT))
(ALT (&REST ARGS) `(:ALTERNATION , at ARGS))
(** (PATT) (REP PATT 0 NIL))
(SEQ (&REST ARGS) `(:SEQUENCE , at ARGS))
(REG (&REST ARGS) `(:REGISTER , at ARGS))
(REP (PATT &OPTIONAL (MIN 0) (MAX NIL))
`(:GREEDY-REPETITION ,MIN ,MAX ,PATT)))
(SYMBOL-MACROLET ((WS+
'(:GREEDY-REPETITION
1
NIL
:WHITESPACE-CHAR-CLASS))
(WS*
'(:GREEDY-REPETITION
0
NIL
:WHITESPACE-CHAR-CLASS))
(DIGIT '(:CHAR-CLASS (:RANGE #\0 #\9)))
(DIGIT+ (++ DIGIT))
(MATCH-NUMS
(DEFPATT-PATTERN (SEQ
(REG DIGIT+)
WS+
(REG DIGIT+)
WS+
(REG DIGIT+))))
(DIGIT* (** DIGIT)))
(SEQ
(REG DIGIT+)
WS+
(REG DIGIT+)
WS+
(REG DIGIT+))))
; `upto' and `upto+' are good examples of how having an abstraction
; mechanism helps keep maintainable and understandable REs. See
; their definitions above.
|#
More information about the Cl-ppcre-devel
mailing list