[closure-devel] Parsing Outlook HTML emails

Elias Mårtenson lokedhs at gmail.com
Tue Sep 11 10:32:51 UTC 2012


I am currently faced with the task of parsing HTML emails generated by
Outlook. My frustrations with that thing can fill an entire email of its
own, so I won't do that.

Anyway, one thing it keeps doing is to create lots of non-standard tags of
the form <o:p></o:p> and the likes. The problem is that when Closure-HTML
parses these, they end up like this: "#BAD TAGp>".

I worked around the problem by adding the following check to the function
NAME-RUNE-P: (rune= char #/:). This includes the colon as a valid character
in a node name, and thus will cause such nodes to be ignored in the
generated output.

Would it be reasonable to include this fix in an update to Closure-HTML?

Regards,
Elias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/closure-devel/attachments/20120911/b4a71f29/attachment.html>


More information about the closure-devel mailing list