[cxml-devel] CXML on OpenMCL

David Lichteblau david at lichteblau.com
Wed Feb 21 09:09:10 UTC 2007


Hi,

Quoting Sunil Mishra (smishra at sfmishras.com):
> Attached is an example. You should be able to load up the latest cxml in
> the latest openmcl snapshot, and then load bug.lisp. That's enough for
> me to see the error.

Thanks.

What you are seeing is due to the automatic recoding to strings that I
introduced to make things easier for cxml users on non-Unicode Lisps.  :-)

When cxml:parse-file is used on such Lisps, the parser will use rods
internally, but since users don't tend to be interested in working with
runes (Closure itself is probably the only application that really wants
to see runes), the default is to recode those rods into UTF-8 strings
before handing them to the user-specified SAX handler.

So in this case, your DOM builder gets Lisp strings containing UTF-8
octets, but you explicitly created a DOM builder for runes, which is the
reason for the mismatch:

;; fails:
(cxml:parse-file "redirect.xml"
                 (rune-dom:make-dom-builder)
                 :recode t    ;<--- default setting
                 :entity-resolver #'default-entity-resolver)

There are two solutions.  One is to disable recoding and use runes:

;; works (using runes)
(cxml:parse-file "redirect.xml"
                 (rune-dom:make-dom-builder)
                 :recode nil    ;<--- disable recoding
                 :entity-resolver #'default-entity-resolver)

The other is what most users will be interested in:

;; works (using characters representing UTF-8 octets)
(cxml:parse-file "redirect.xml"
                 (cxml-dom:make-dom-builder)   ;<--- note cxml-dom package
                 :entity-resolver #'default-entity-resolver)

One Lisps without Unicode support, cxml-dom is an alias for utf8-dom.

On Unicode-aware Lisps, cxml-dom is an alias for rune-dom, since runes
are characters anyway. 


d.



More information about the cxml-devel mailing list