<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Dear Cyrus<div><br></div><div><br><div><div>On Dec 1, 2009, at 06:39 , Cyrus Harmon wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div><br>Marco,<br><br>Maybe the problem isn't that the liveness of the list (although, yes, it isn't particularly active), but rather the phrasing of your question:<br><br>"Am I correct in assuming that CXML would be able to forgo the DTD if it could access the <a href="http://www.sbml.org">www.sbml.org</a> site and find a DTD or a XSD there?"<br><br>What are you suggesting CXML should do by ignoring (if that's what you mean by forgo) a DTD that it may or not find at some site?<br><br>Do you have a URL to a SBML file with which the folks at home can try to convince CXML to forgo the DTD (whatever that means)?<br><br>thanks,<br></div></blockquote><div><br></div><div>you are right.  I am not very well versed in XML stuff, so my question is not really well phrased.</div><div><br></div><div>Let's recapitulate: AFAIU, there is a way to tell CXML to use a particular DTD in order to make the CXML:WHITESPACE-NORMALIZER work as expected.  I.e., drop the extra TEXT elements corresponding to newlines and indentation.  There is also a way to tell CXML to use a "resolver" downloaded from the net using DRAKMA (or another HTTP client library).</div><div><br></div><div>Now, in the case of SBML I do not have a DTD.  I have a file that starts as</div><div><br></div><div><div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="'Courier New'" size="3"><span class="Apple-style-span" style="font-size: 12px;"><?xml version="1.0" encoding="UTF-8"?></span></font></font></div><div><font class="Apple-style-span" color="#000000"><font class="Apple-style-span" face="'Courier New'" size="3"><span class="Apple-style-span" style="font-size: 12px;"><sbml xmlns="<a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a>" level="1" version="1"></span></font><br></font></div><div><br></div><div>AFAIU, using the <font class="Apple-style-span" face="'Courier New'" size="3"><span class="Apple-style-span" style="font-size: 12px;">xmlns="<a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a>"</span></font> should make it possible for CXML to get such document (which eventually is a <font class="Apple-style-span" face="'Courier New'" size="3"><span class="Apple-style-span" style="font-size: 12px;">.xsd</span></font>) maybe using DRAKMA, and therefore make it parse the rest of the file automagically dropping the TEXT elements.  In this sense this would make CXML "ignore" the DTD.</div><div><br></div><div>So my questions are</div><div><br></div><div>1 - Is this a correct assumption?</div><div>2 - since this is not CXML default behavior, is there a way to get CXML to do the "obvious" thing?</div><div><br></div><div>I know that I could possibly remove the TEXT elements by hand, after having built the internal structure; but it does not feel right.</div><div><br></div><div>If reading a XSchema file results in a DTD then CXML would not really "ignore it".  I am just wondere if it is possible and where exaclty to look in the code base.</div><div><br></div><div>Thanks</div><div>--</div><div>Marco</div><div><br></div><div><br></div><div><br></div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><br><blockquote type="cite"><div><br>Cyrus<br><br>On Nov 30, 2009, at 3:53 PM, Marco Antoniotti wrote:<br><br><blockquote type="cite">Well...  Any ideas on this or the list is dead?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Cheers<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Marco<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On Nov 25, 2009, at 14:09 , Marco Antoniotti wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">Hi<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">months ago I posted this problem and I still do not have a solution for it.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Note that I do not have a DTD for the file I want to parse (and I don't want and cannot write it).<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The SBML XML files I want to read start like this:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><?xml version="1.0" encoding="UTF-8"?><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><!-- Created by Gepasi 3.30 on March 17, 2003, 12:57 --><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><sbml xmlns="<a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a>" level="1" version="1"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Am I correct in assuming that CXML would be able to forgo the DTD if it could access the <a href="http://www.sbml.org">www.sbml.org</a> site and find a DTD or a XSD there?<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Note that my problem is to get rid of the spurious TEXT elements.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The <a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a> points to a xsd file.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Cheers<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">--<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Marco<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">On Feb 6, 2009, at 16:34 , Marco Antoniotti wrote:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">On Feb 6, 2009, at 10:21 , David Lichteblau wrote:<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Quoting Marco Antoniotti (<a href="mailto:marcoxa@cs.nyu.edu">marcoxa@cs.nyu.edu</a>):<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">I get all my actual RUNE-DOM::ELEMENTs interleaved with "bogus" TEXT<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">elements containing just #\Newline and #\Tab (or more #\Tab).<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">This is obviously an artifact of parsing.  (See attached figure from a<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">15 minutes CXML browser I whipped up)<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">What I do not know it's (1) whether this makes sense or not, or (2)<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">whether it is dependent on my platform (LWM).<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Certainly -- XML preserves whitespace in character data, except for CRLF<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">to LF normalization.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">There are no universally correct rules for whitespace normalization in<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">XML, and in general, any change to whitespace could change  the meaning<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">of the document.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">One rule that is relatively common is to consider whitespace<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">insignificant in "element content", e.g. in places where no<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">non-whitespace text nodes are allowed by the DTD.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">This rule is implemented by CXML:MAKE-WHITESPACE-NORMALIZER<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">(see <a href="http://common-lisp.net/project/cxml/sax.html#misc)">http://common-lisp.net/project/cxml/sax.html#misc)</a>, which may be<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">helpful in your case.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">However, note that the limitation to element content means that you<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">actually need to write or find a DTD that matches your document.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Without a DTD, this approach doesn't work.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Ok.  I think I understand this.  I'll try the CXML:MAKE-WHITESPACE-NORMALIZER (I need to understand how to use it first).<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">However, let me ask you this too.  The SBML XML files start like this:<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><?xml version="1.0" encoding="UTF-8"?><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><!-- Created by Gepasi 3.30 on March 17, 2003, 12:57 --><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><sbml xmlns="<a href="http://www.sbml.org/sbml/level1">http://www.sbml.org/sbml/level1</a>" level="1" version="1"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Am I correct in assuming that CXML would be able to forgo the DTD if it could access the <a href="http://www.sbml.org">www.sbml.org</a> site and find a DTD or a XSD there?<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Pardon the naïveté of my questions, but I really do not know enough about XML.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Cheers<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">--<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Marco<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Other approaches are to use HTML rules for whitespace normalization<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">(which are a more tricky to get right though, and cxml does not provide<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">a ready-to-use function for this) or to discard all whitespace.  It<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">really depends on the schema and application.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">(Note that we would like to have some support for this in cxml, because<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">whitespace rules also matter for indentation, and at some point we would<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">like to have more flexible/correct/useful indentation modes in our<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">serializer.  Whitespace stripping could be considered as a form of<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">indentation, in the sense that it is a "removal of all indentation".<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">But so far, I haven't found the time to implement anything in this<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">direction.)<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">d.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">--<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Marco Antoniotti<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">--<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Marco Antoniotti<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">--<br></blockquote><blockquote type="cite">Marco Antoniotti<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><br><br></div></blockquote></div><br><div><div><div>--</div><div>Marco Antoniotti</div></div><br></div><br></div></body></html>