[cl-ppcre-devel] Buffered multi-line question
Sébastien Saint-Sevin
seb-cl-mailist at matchix.com
Mon Oct 11 19:35:41 UTC 2004
> Hi Sébastien!
>
> On Mon, 11 Oct 2004 18:52:56 +0200, Sébastien Saint-Sevin
> <seb-cl-mailist at matchix.com> wrote:
>
> > I'm doing multi-lines regex searches over big files that can't be
> > converted to single string. So I introduced a kind of buffer that
> > I'm using to search.
> >
> > Now, I need to add a constraint to scan, do-scans & others (in
> > addition to (&key start end)) : I want to be able to specify to the
> > engine that a scan must start before a certain index in the string
> > (to avoid searching further results that will be cancelled later
> > because of my buffered multi-line matching process).
> >
> > Logically, this :must-start-before value correspond to the first
> > line of my buffer. If nothing starts at first line, I need to move
> > the search one line forward, so everything that the engine would
> > match later on in the string is wasted time.
> >
> > How can I do it ?
>
> Have you considered using something like
>
> (?s:(?=.{n}))<actual-regular-expression>
>
> where n obviously is an integer computed from your constraints above?
> I don't know how this'll behave performance-wise but you could just
> try it... :)
>
> Or have I misunderstood your question? Actually, I'm not sure why the
> END keyword parameter doesn't suffice. Can you give an example?
>
As far as I understand it, (?s:(?=.{n})) will only garantee that at least n
chars are remaining from match-start in the consumed string. This is not
what I want. I want something that garantee that match-start will be before
index n (meaning n'th char in consumed string), wether match-end is before
or after this index n.
> > PS: Edi, if you are back, my previous post is still an open question
> > ;-) (the one with FILTER...)
>
> Yes, I'm back but unfortunately I'm very busy with commercial stuff
> right now. Sorry, filters will have to wait some more.
>
> Cheers,
> Edi.
Here is what I've got right now (it's ok for my needs actually).
(defclass filter (regex)
((num :initarg :num
:accessor num
:type fixnum
:documentation "The number of the register this filter refers to.")
(predicate :initarg :predicate
:accessor predicate
:documentation "The predicate to validate the register with"))
(:documentation "FILTER objects represent the combination of a register
and a predicate.
This is not available in regex string, but only used in parse tree."))
(defmethod create-matcher-aux ((filter filter) next-fn)
(declare (type function next-fn))
;; the position of the corresponding REGISTER within the whole
;; regex; we start to count at 0
(let ((num (num filter)))
(lambda (start-pos)
(declare (type fixnum start-pos))
(let ((reg-start (svref *reg-starts* num))
(reg-end (svref *reg-ends* num)))
;; only bother to check if the corresponding REGISTER as
;; matched successfully already
(and reg-start
(funcall (predicate filter) (subseq *string* reg-start
reg-end))
(funcall next-fn start-pos))))))
ADDED TO (defun convert-aux (parse-tree) ...
;; (:FILTER <number> <predicate>)
((:filter)
(let ((backref-number (second parse-tree))
(predicate (third parse-tree)))
(declare (type fixnum backref-number))
(when (or (not (typep backref-number 'fixnum))
(<= backref-number 0))
(signal-ppcre-syntax-error
"Illegal back-reference: ~S"
parse-tree))
(unless (or (typep predicate 'symbol) (typep predicate 'function))
(signal-ppcre-syntax-error
"Illegal predicate: ~S"
parse-tree))
;; stop accumulating into STARTS-WITH and increase
;; MAX-BACK-REF if necessary
(setq accumulate-start-p nil
max-back-ref (max (the fixnum max-back-ref)
backref-number))
(make-instance 'filter
;; we start counting from 0 internally
:num (1- backref-number)
:predicate predicate)))
ADDED FOR MY PURPOSES...
(defmethod create-scanner-with-predicate
((regex-string string) predicate &key
case-insensitive-mode
multi-line-mode
single-line-mode
extended-mode
destructive)
(declare (optimize speed (safety 0) (space 0) (debug 0)
(compilation-speed 0)
#+:lispworks (hcl:fixnum-safety 0)))
(declare (ignore destructive))
;; parse the string into a parse-tree and then call CREATE-SCANNER again
(let* ((*extended-mode-p* extended-mode)
(quoted-regex-string (if *allow-quoting*
(quote-sections (clean-comments regex-string extended-mode))
regex-string))
(*syntax-error-string* (copy-seq quoted-regex-string))
(parse-tree (parse-string quoted-regex-string)))
;; wrap the result with FILTER to check for predicate
(create-scanner
`(:sequence (:register ,(shift-back-reference parse-tree)) (:filter
1 ,predicate))
:case-insensitive-mode case-insensitive-mode
:multi-line-mode multi-line-mode
:single-line-mode single-line-mode
:destructive t)))
(defun shift-back-reference (tree)
(if (and (consp tree) (eq (first tree) :back-reference))
`(:back-reference ,(1+ (second tree)))
(if (atom tree)
tree
(cons (shift-back-reference (car tree))
(shift-back-reference (cdr tree))))))
More information about the Cl-ppcre-devel
mailing list