[cxml-devel] DTD parsing

Mariano Montone marianomontone at gmail.com
Tue May 4 15:42:36 UTC 2010


Mariano Montone escribió:
> David Lichteblau escribió:
>   
>> Hi,
>>
>> Quoting Mariano Montone (marianomontone at gmail.com):
>>   
>>     
>>>     I write because I have a problem with DTD validation. I can not
>>> parse the DTD The DTD I want to use is this:
>>> http://www.w3.org/TR/html4/strict.dtd
>>>     
>>>       
>> I believe that can't work, because the DTD in question in an SGML DTD,
>> not an XML DTD.
>>
>> If you really need the SGML DTD, then Closure HTML has a parser for
>> those somewhere (in fact, its HTML parser is based on information
>> extracted from the DTD).
>>
>> Otherwise though, I would recommend use of the XHTML DTD instead, which
>> describes the same content model, just for XML.
>>
>> The XHTML DTD can definitely be parsed with cxml.
>>
>>
>> d.
>>   
>>     
> I can parse the DTD now, but I cannot parse my html output. What I want
> to do is to write a bunch of tests that validate my html serialization.
>
> So this is what I'm doing:
>
> (let ((html (with-output-to-string (s)
>              (serialize-html (generate (gen-html))
>                      s
>                      :mode :xml))))
>          (cxml:parse html (cxml-xmls:make-xmls-builder)))
>
> The html variable is a string like the following:
>
> "<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'
>               'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>
> <html><head><title>qimt example</title></head><body><p><a
> href='http://example.com/'>Link to example.com</a></p><p><a
> href='http://example2.com/'>Link to example2.com</a></p><h2>Dynamic code
> generation</h2><ul><li>Item 0</li><li>Item 1</li><li>Item 2</li><li>Item
> 3</li><li>Item 4</li><li>Item 5</li><li>Item 6</li><li>Item
> 7</li><li>Item 8</li><li>Item 9</li><li>Item 10</li><li>Item
> 11</li><li>Item 12</li><li>Item 13</li><li>Item 14</li></ul></body></html>"
>
> I've already added the xhtml dtd to my catalog with:
>
> <delegateSystem
> systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
> catalog="file:///home/marian/work/cluts2/qimt/test/xhtml1-strict.dtd"/>
>
> And it is being detected, but there's a problem when parsing it:
>
> WARNING:
>    deprecated SAX default method used by a handler that is not a
> subclass of SAX:ABSTRACT-HANDLER or HAX:ABSTRACT-HANDLER
> WARNING: ignoring catalog error: Document not well-formed: element expected
> Context:
>   Line 27, column 9 in
> file://+/home/marian/work/cluts2/qimt/test/xhtml1-strict.dtd
>
> What am I doing wrong now?
>
> Thanks again,
>
> Mariano
>   
And this is the error I get (what I wrote above where the warnings):

URI scheme :HTTP not supported
   [Condition of type CXML:XML-PARSE-ERROR]





More information about the cxml-devel mailing list