[Cxml-devel] how to create tags with SAX to make an almost identity transformation

Alexandre Rademaker arademaker at gmail.com
Mon Nov 3 16:33:45 UTC 2014


Thank you very much Russ! It works as expected! I have one last question. Running the parser with the command:

(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output)
       (let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out))))
	 (cxml:parse #P"harem.xml" h :validate t)))

where the file harem.xml begins with (see the doctype):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd">
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
  <DOC DOCID="H2-dftre765">
    <p>...


the command produces in the teste.xml output file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd"<!ELEMENT EM #PCDATA>
<!ATTLIST EM ID CDATA #REQUIRED>
<!ATTLIST EM CATEG CDATA #IMPLIED>
<!ATTLIST EM TIPO CDATA #IMPLIED>
<!ATTLIST EM COMENT CDATA #IMPLIED>
<!ATTLIST EM SUBTIPO CDATA #IMPLIED>
<!ELEMENT ALT (#PCDATA|EM)*>
<!ELEMENT OMITIDO (#PCDATA|EM|ALT|p)*>
<!ELEMENT colHAREM (DOC)*>
<!ATTLIST colHAREM versao CDATA #REQUIRED>
<!ELEMENT p (#PCDATA|EM|OMITIDO|ALT)*>
<!ATTLIST p xml:space (default|preserve) "default">
<!ELEMENT DOC (#PCDATA|p|OMITIDO)*>
<!ATTLIST DOC DOCID CDATA #REQUIRED>
>
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
...


That is, the handler writes the DTD inside the output but in the wrong way, without the [ ]. Is it a bug in the library or in my code?

Thank you very much for this additional help! 

Best,

----
Alexandre Rademaker
http://arademaker.github.com


On Nov 3, 2014, at 1:35 PM, Russ Tyndall <russ at acceleration.net> wrote:

Howdy,

You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.

EG:
(defclass preproc (cxml:sax-proxy) ())

(defmethod sax:characters ((handler preproc) data)
 (let ((chunks (cl-ppcre:split "\\|" data)))
   (if (= 1 (length chunks))
       (call-next-method)
       (loop for c in chunks
             for first? = t then nil
             do (unless first?
                  (sax:start-element handler nil nil "bar" nil)
                  (sax:end-element handler nil nil "bar"))
                (sax:characters handler c)))))

(cxml:parse "<test>content | ola</test>"
 (make-instance 'preproc
  :chained-handler (cxml:make-string-sink)))
=>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test>content <bar/> ola</test>"


Hope this helps,
Russ Tyndall
Acceleration.net

On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
> Hi,
> 
> I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
> 
> <test att="one|two">content | something more | and done</test>
> 
> I need to transform to
> 
> <test att="one|two">content <bar/> something more <bar/> and done</test>
> 
> Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
> 
> ===
> (defclass preproc (cxml:sax-proxy) ())
> 
> (defmethod sax:characters ((handler preproc) data)
>   (call-next-method handler (cl-ppcre:regex-replace "\\|" data "<bar/>")))
> ===
> 
> But of course, it produces a string (escaped) not a tag in the final XML.
> 
> WML> (cxml:parse "<test>content | ola</test>"
>                      (make-instance 'preproc
>                                     :chained-handler (cxml:make-string-sink)))
> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
> <test>content <bar/> ola</test>"
> 
> Any idea or directions?
> 
> Best,
> 
> ----
> Alexandre Rademaker
> http://arademaker.github.com
> 
> 
> 
> _______________________________________________
> Cxml-devel mailing list
> Cxml-devel at common-lisp.net
> http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel


_______________________________________________
Cxml-devel mailing list
Cxml-devel at common-lisp.net
http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel





More information about the cxml-devel mailing list