[cl-ppcre-devel] Need help with a slow regexp

pete-cl-ppcre at kazmier.com pete-cl-ppcre at kazmier.com
Thu Jan 20 20:16:52 UTC 2005


Hello,

Not sure if this is the appropriate forum as the email is not related to
the development of cl-ppcre, but I did not find a list for users.
Please feel free to redirect me elsewhere.

I could use some help in figuring out why this regexp is so slow.  As
far as I can tell, there is nothing abnormal about it.  I currently use
the same regexp in python and its blazes through the input file.  Bear
in mind, this is the first time that I've used cl-ppcre.  It is was an
experiment to see if I could lisp for this little application.

Here is the regexp (at least a small portion of it that exhibits the
behavior I am seeing):

    ^(?:\\S+ ){7}(\\S+)\\s+- commAlarm

Here is the input line it is matching against (note: this is a single
line albeit a long one):

1105243660 11 Sun Jan 09 04:07:40 2005 sclax02.ibasis.net       - commAlarm ovnyc00p.ov.i\vanet.net [1] private.enterprises.2496.1.1.5.5.1.0 (Integer): 0  [2] private.enterprises.\2496.1.1.5.5.2.0 (Integer): 115  [3] private.enterprises.2496.1.1.5.5.3.0 (OctetString): \ISUP: UNEX ANM  [4] private.enterprises.2496.1.1.5.5.4.0 (OctetString): ISDN User Part Un\expected ANM  [5] private.enterprises.2496.1.1.5.5.5.0 (Integer): 2  [6] private.enterpri\ses.2496.1.1.5.5.6.0 (Integer): 1  [7] private.enterprises.2496.1.1.5.5.7.0 (Integer): 1 \ [8] private.enterprises.2496.1.1.5.5.8.0 (Integer): 2  [9] private.enterprises.2496.1.1.\1.1.1.1.1.1.1.1376258 (Integer): 1376258  [10] private.enterprises.2496.1.1.1.1.1.1.1.1.2\.1376258 (Integer): 21  [11] private.enterprises.2496.1.1.1.1.1.1.1.1.4.1376258 (OctetStr\ing): ss7path-att  [12] private.enterprises.2496.1.1.1.1.1.1.1.1.5.1376258 (OctetString):\ SS7 Path For ATT and NGT DPC 5.21.39  [13] private.enterprises.2496.1.1.1.1.1.1.1.1.3.13\76258 (Integer): 1245188  [14] private.enterprises.2496.1.1.5.5.9.0 (Integer): 1105243880\;1 .1.3.6.1.4.1.2496.1.1.4.1 0

Stuff 51 of those lines above into a into a file and try to match on
that regexp and I get the following results:

PGW> (time (parse-file "/tmp/sample"))
Evaluation took:
  2.984 seconds of real time
  1.81 seconds of user run time
  1.12 seconds of system run time
  0 page faults and
  228,191,424 bytes consed.

I am hoping to parse a file that has close to 75,000 lines in that
format.  At this rate, I will never make it in a reasonable amount of
time.  Here is the PARSE-FILE function I am using:

(defun parse-file (file)
  (with-open-file (in file)
    (do ((line (read-line in nil :eof) (read-line in nil :eof)))
        ((eql line :eof) t)
      (do-register-groups ((#'intern host))
          ("^(?:\\S+ ){7}(\\S+)\\s+- commAlarm" line)
        (format t "Found host ~A~%" host)))))

I've also read the docs and I am unable to find anything to help
identify the problem with my regexp.  I have tried single-line mode,
however the results were very similiar.  My platform is SBCL 0.8.18.23
and version 1.0 of cl-ppcre.

Any help would be appreciated.

Thanks,
Pete




More information about the Cl-ppcre-devel mailing list