[cl-ppcre-devel] Quickly finding which alternative matched

Harold Lee harold at hotelling.net
Tue Feb 13 00:18:43 UTC 2007


For an :ALTERNATION, is there an O(1) way to tell which alternate 
matched? e.g. for a regexp (a)|(b)|(c)|(d)|... I don't want the O(n) 
performance of scanning the list returned by SCAN or SCAN-TO-STRINGS. 
Instead, I'd just like a number saying which one matched. I understand 
that there can normally be nested groups, alternatives that are not 
groups, and other complications - so maybe a list/vector of groups that 
were matched would work as a general interface.

More background:

For a quick and dirty scanner (similar to lex) I thought I could use 
CL-PPCRE. The "rules" (in lex terminology) are just regular expressions, 
and so I combine them into a set of alternatives, e.g.

\s+ -> whitespace
[0-9]+ -> number
[a-zA-Z][a-zA-Z0-9]* -> identifier

=> "(\\s+)|([0-9]+)|([a-zA-Z][a-zA-Z0-9]*)"

Actually, I use CL-PPCRE::PARSE-STRING to get a parse tree for each 
expression, change any :REGISTER tags to :GROUP tags, and combine them 
under an :ALTERNATION tag:

(defun combine-regexps (&rest regexps)
  "Combine several regexp strings into a CL-PPCRE parse tree of 
alternatives."
  ;; All registers are changed into groups, i.e. (x) -> (?:x) in regexp 
syntax.
  ;; That keeps the registers from messing up the scanner's expectation 
that
  ;; each register is one of the rules, and allows the list of rules to use
  ;; () instead of (?:) throughout for readability.
  (let ((registers (mapcar (lambda (r)
                             `(:register ,(subst :group :register
                                                 (cl-ppcre::parse-string 
r))))
                           regexps)))
  `(:sequence :start-anchor (:group (:alternation , at registers)))))





More information about the Cl-ppcre-devel mailing list