<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Feb 6, 2009, at 10:21 , David Lichteblau wrote:</div><br><blockquote type="cite"><div>Quoting Marco Antoniotti (<a href="mailto:marcoxa@cs.nyu.edu">marcoxa@cs.nyu.edu</a>):<br><blockquote type="cite">I get all my actual RUNE-DOM::ELEMENTs interleaved with "bogus" TEXT <br></blockquote><blockquote type="cite">elements containing just #\Newline and #\Tab (or more #\Tab).<br></blockquote><blockquote type="cite">This is obviously an artifact of parsing. (See attached figure from a <br></blockquote><blockquote type="cite">15 minutes CXML browser I whipped up)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">What I do not know it's (1) whether this makes sense or not, or (2) <br></blockquote><blockquote type="cite">whether it is dependent on my platform (LWM).<br></blockquote><br>Certainly -- XML preserves whitespace in character data, except for CRLF<br>to LF normalization.<br><br>There are no universally correct rules for whitespace normalization in<br>XML, and in general, any change to whitespace could change the meaning<br>of the document.<br><br><br>One rule that is relatively common is to consider whitespace<br>insignificant in "element content", e.g. in places where no<br>non-whitespace text nodes are allowed by the DTD.<br><br>This rule is implemented by CXML:MAKE-WHITESPACE-NORMALIZER<br>(see <a href="http://common-lisp.net/project/cxml/sax.html#misc)">http://common-lisp.net/project/cxml/sax.html#misc)</a>, which may be<br>helpful in your case.<br><br>However, note that the limitation to element content means that you<br>actually need to write or find a DTD that matches your document.<br>Without a DTD, this approach doesn't work.</div></blockquote><div><br></div><div>Ok. I think I understand this. I'll try the CXML:MAKE-WHITESPACE-NORMALIZER (I need to understand how to use it first).</div><div><br></div><div>However, let me ask you this too. The SBML XML files start like this:</div><div><font class="Apple-style-span" face="'Courier New'"><br></font></div><div><div><font class="Apple-style-span" face="'Courier New'"><?xml version="1.0" encoding="UTF-8"?></font></div><div><font class="Apple-style-span" face="'Courier New'"><!-- Created by Gepasi 3.30 on March 17, 2003, 12:57 --></font></div><div><font class="Apple-style-span" face="'Courier New'"><sbml xmlns="<a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a>" level="1" version="1"></font></div></div><div><br></div>Am I correct in assuming that CXML would be able to forgo the DTD if it could access the<font class="Apple-style-span" face="'Courier New'"> <a href="http://www.sbml.org">www.sbml.org</a></font> site and find a DTD or a XSD there?</div><div><br></div><div>Pardon the naïveté of my questions, but I really do not know enough about XML.</div><div><br></div><div>Cheers</div><div>--</div><div>Marco</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br><blockquote type="cite"><div><br><br><br>Other approaches are to use HTML rules for whitespace normalization<br>(which are a more tricky to get right though, and cxml does not provide<br>a ready-to-use function for this) or to discard all whitespace. It<br>really depends on the schema and application.<br><br><br>(Note that we would like to have some support for this in cxml, because<br>whitespace rules also matter for indentation, and at some point we would<br>like to have more flexible/correct/useful indentation modes in our<br>serializer. Whitespace stripping could be considered as a form of<br>indentation, in the sense that it is a "removal of all indentation".<br>But so far, I haven't found the time to implement anything in this<br>direction.)<br><br><br>d.<br></div></blockquote></div><br><div> <div><div>--</div><div>Marco Antoniotti</div></div><br> </div><br></body></html>