[cxml-devel] CDATA doesn't preserve whitespace

David Lichteblau david at lichteblau.com
Sat Sep 16 07:25:55 UTC 2006


Quoting Sunil Mishra (smishra at sfmishras.com):
> CL-USER(116): (dom:map-document (cxml:make-namespace-normalizer
> (cxml:make-octet-stream-sink *standard-output*)) *)

Note that make-octet-stream-sink defaults to canonical mode for
historical reasons.

> <svg xmlns="http://www.w3.org/2000/svg">
  <script
> type="text/css">
    
    
    
  </script>
</svg>
> #<MULTIVALENT stream socket connected from localhost/3813 to
>   localhost/3817 @ #x205003d2>

Sorry, I don't see a bug.  The serializer in canonical mode outputs
character references for the newlines here, but it doesn't output a
CDATA section either in the first place, so that's fine.

If you want to see a CDATA section, use non-canonical mode:

cl-user(43): (dom:map-document
              (cxml:make-octet-stream-sink *standard-output* :canonical nil)
              (cxml:parse-file "~/graph.xml" (cxml-dom:make-dom-builder)))
<?xml version="1.0" encoding="UTF-8"?>
<svg>
  <script type="text/css">
    <![CDATA[
     
    ]]>
  </script>
</svg>

> ``Within a CDATA section, only the CDEnd string is recognized as markup,
> so that left angle brackets and ampersands may occur in their literal
> form; they need not (and cannot) be escaped using "<" and "&".
> CDATA sections cannot nest.''
> 
> Can cxml please correctly follow this requirement?

It follows this requirement while parsing.

Only in serialization there is one little "problem" (unrelated to your
question):

A document constructed in memory might include a CDATA section with
characters not representable in a CDATA section.  That is a user error,
and CXML should signal an error when told to serialize such a document
in non-canonical mode; right now I believe it does not signal that error
and outputs the user data as-is, resulting in output that isn't
well-formed.  (But I'm taking patches. :-))


d.



More information about the cxml-devel mailing list