[cxml-devel] DTD parsing

Mariano Montone marianomontone at gmail.com
Tue May 4 15:37:03 UTC 2010


David Lichteblau escribió:
> Hi,
>
> Quoting Mariano Montone (marianomontone at gmail.com):
>   
>>     I write because I have a problem with DTD validation. I can not
>> parse the DTD The DTD I want to use is this:
>> http://www.w3.org/TR/html4/strict.dtd
>>     
>
> I believe that can't work, because the DTD in question in an SGML DTD,
> not an XML DTD.
>
> If you really need the SGML DTD, then Closure HTML has a parser for
> those somewhere (in fact, its HTML parser is based on information
> extracted from the DTD).
>
> Otherwise though, I would recommend use of the XHTML DTD instead, which
> describes the same content model, just for XML.
>
> The XHTML DTD can definitely be parsed with cxml.
>
>
> d.
>   
I can parse the DTD now, but I cannot parse my html output. What I want
to do is to write a bunch of tests that validate my html serialization.

So this is what I'm doing:

(let ((html (with-output-to-string (s)
             (serialize-html (generate (gen-html))
                     s
                     :mode :xml))))
         (cxml:parse html (cxml-xmls:make-xmls-builder)))

The html variable is a string like the following:

"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'
              'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>
<html><head><title>qimt example</title></head><body><p><a
href='http://example.com/'>Link to example.com</a></p><p><a
href='http://example2.com/'>Link to example2.com</a></p><h2>Dynamic code
generation</h2><ul><li>Item 0</li><li>Item 1</li><li>Item 2</li><li>Item
3</li><li>Item 4</li><li>Item 5</li><li>Item 6</li><li>Item
7</li><li>Item 8</li><li>Item 9</li><li>Item 10</li><li>Item
11</li><li>Item 12</li><li>Item 13</li><li>Item 14</li></ul></body></html>"

I've already added the xhtml dtd to my catalog with:

<delegateSystem
systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
catalog="file:///home/marian/work/cluts2/qimt/test/xhtml1-strict.dtd"/>

And it is being detected, but there's a problem when parsing it:

WARNING:
   deprecated SAX default method used by a handler that is not a
subclass of SAX:ABSTRACT-HANDLER or HAX:ABSTRACT-HANDLER
WARNING: ignoring catalog error: Document not well-formed: element expected
Context:
  Line 27, column 9 in
file://+/home/marian/work/cluts2/qimt/test/xhtml1-strict.dtd

What am I doing wrong now?

Thanks again,

Mariano




More information about the cxml-devel mailing list