[cl-ppcre-devel] Quickly finding which alternative matched
Harold Lee
harold at hotelling.net
Tue Feb 13 00:18:43 UTC 2007
For an :ALTERNATION, is there an O(1) way to tell which alternate
matched? e.g. for a regexp (a)|(b)|(c)|(d)|... I don't want the O(n)
performance of scanning the list returned by SCAN or SCAN-TO-STRINGS.
Instead, I'd just like a number saying which one matched. I understand
that there can normally be nested groups, alternatives that are not
groups, and other complications - so maybe a list/vector of groups that
were matched would work as a general interface.
More background:
For a quick and dirty scanner (similar to lex) I thought I could use
CL-PPCRE. The "rules" (in lex terminology) are just regular expressions,
and so I combine them into a set of alternatives, e.g.
\s+ -> whitespace
[0-9]+ -> number
[a-zA-Z][a-zA-Z0-9]* -> identifier
=> "(\\s+)|([0-9]+)|([a-zA-Z][a-zA-Z0-9]*)"
Actually, I use CL-PPCRE::PARSE-STRING to get a parse tree for each
expression, change any :REGISTER tags to :GROUP tags, and combine them
under an :ALTERNATION tag:
(defun combine-regexps (&rest regexps)
"Combine several regexp strings into a CL-PPCRE parse tree of
alternatives."
;; All registers are changed into groups, i.e. (x) -> (?:x) in regexp
syntax.
;; That keeps the registers from messing up the scanner's expectation
that
;; each register is one of the rules, and allows the list of rules to use
;; () instead of (?:) throughout for readability.
(let ((registers (mapcar (lambda (r)
`(:register ,(subst :group :register
(cl-ppcre::parse-string
r))))
regexps)))
`(:sequence :start-anchor (:group (:alternation , at registers)))))
More information about the Cl-ppcre-devel
mailing list