[cxml-devel] CXML::P/CONTENT causes heap exhaustion due to deep recursion where there are too many entity references

Ivan Shvedunov ivan4th at gmail.com
Sat Sep 22 17:36:27 UTC 2007


  Hi.

  I've noticed that parsing XML that contains many entity references
may cause CXML to eat up all available heap space and crash. This can
be easily reproduced:
(cxml:parse-rod (format nil "<xml>~{~a~}</xml>" (loop repeat 4000
collect "<p>zz")) (cxml-dom:make-dom-builder))
(tested on SBCL / Linux)

  Looking at the code, I've noticed that xml/xml-parse.lisp relies on
the tail recursion a lot (though always having proper tail recursion
is frankly speaking somewhat dubious expectation when working with
Common Lisp). What's worse, P/CONTENT function is non-tail-recursive
when an entity reference is encountered, and this is exactly the cause
of aforementioned crash. I've attached the patch which helped me solve
the problem (it's made against cxml-2007-08-05). I've just noticed
that recurse-on-entity returns NIL most of time and used that fact to
cut off most of recursion. As of now my brain is somewhat foggy due to
continuing coding race, so I didn't learn what recurse-on-entity
actually does and how often can it return a non-NIL value. Overall I
don't think this is a proper solution, most likely xml-parse.lisp
needs some serious rework to make it more reliable. But I think at
least this hack will help in most cases.

  Ivan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xml-parse-recursion.diff
Type: text/x-diff
Size: 1105 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/cxml-devel/attachments/20070922/8e7a8f87/attachment.diff>


More information about the cxml-devel mailing list