[cxml-devel] seeking suggestion for handling malformed attribute names
Russell Kliese
russell at kliese.id.au
Sat Oct 29 10:43:26 UTC 2011
Hi David,
Thanks for your reply.
It sounds that I would need to do some careful reading of the html and
xml specs in order to come up with a patch to get parsing right.
It may be possible, however, to solve the problem without having to
perfect the parser. I found that HTML Tidy can deal with the malformed
attributes if you use the option that removes proprietary attributes.
This way only a set of known good attributes are passed to the SAX
builder. Would this be appropriate for Closure HTML?
If you care to look at Tidy (I was using HTML Tidy for Linux released on
25 March 2009), here is an example invocation:
echo '<html>
<body>
<table>
<tr><td vertical-align:="" ;=""></td></tr>
</table>
</body>
</html>
' | tidy -q -asxml --force-output yes --drop-proprietary-attributes yes
Regards,
Russell
On 28/10/11 23:13, David Lichteblau wrote:
> Quoting Russell Kliese (russell at kliese.id.au):
>> I am looking for suggestions on how to continue parsing even when
>> attribute names are malformed.
> When Closure was written, I think a lot of effort was put into making it
> correct errors, but that logic is just not 100% complete. We'd need
> a set of good test cases.
>
> Unfortunately I don't have a ready-to-use patch at this point; do you
> have a suggestion?
More information about the cxml-devel
mailing list