[cxml-devel] CXML::P/CONTENT causes heap exhaustion due to deep recursion where there are too many entity references

David Lichteblau david at lichteblau.com
Wed Oct 3 15:33:28 UTC 2007


Hi,

Quoting Ivan Shvedunov (ivan4th at gmail.com):
>   I've noticed that parsing XML that contains many entity references
> may cause CXML to eat up all available heap space and crash. This can
> be easily reproduced:

thank you for the report.  I have replace tail recursion in p/content
with a loop and also removed the call to append, which serves no purpose
in the current code base.

(In addition, I have fixed string normalization in the DOM builder,
reducing the parse time for your example from 9s to around 2s for me.)

Please test cxml CVS again.


> Overall I don't think this is a proper solution, most likely
> xml-parse.lisp needs some serious rework to make it more reliable.

In general, recursion depth in the SAX parser should be at most
proportional to the nesting depth of XML elements being parsing (as well
as the nesting depth of entity references).  This level of recursion
should not usually be a problem.  Most processing of in-memory models
like DOM uses similar algorithms.

If any recursive function exceeds this complexity (in the way p/content
did prior to your patch), I would regard that as a bug.

If you find any other similar problems, please report them.


Note that the Klacks parser is available as an alternative to SAX for
applications that need stricter guarantees.  The klacks parser will only
report one event at a time and does not use most of the recursive
functions found in the SAX parser.


Thanks,
David



More information about the cxml-devel mailing list