[Cxml-devel] how to create tags with SAX to make an almost identity transformation

Alexandre Rademaker arademaker at gmail.com
Mon Nov 3 17:23:36 UTC 2014


To clarify my last question, I know that I can use

(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output)
       (flet ((resolver (pubid sysid)
		(declare (ignore pubid sysid))
		(flexi-streams:make-in-memory-input-stream nil)))
	 (let ((h (make-instance 'preproc
				 :chained-handler (cxml:make-character-stream-sink out))))
	   (cxml:parse #P"CDSegundoHAREMclassico.xml" h :validate nil :entity-resolver #'resolver))))

to skip loading the DTD, but it would force me to also skip the validation of the input! It would be nicer to control the output of the declarations and DOCTYPE definition. Anyway, the code in my last message is producing an invalid XML.


Another idea would be to use a DOM as the output of the proxy handler and serialize it with map-document avoiding the inclusion of the doctype declarations:

(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output)
  (let* ((h (make-instance 'preproc :chained-handler (cxml-dom:make-dom-builder)))
	 (dom (cxml:parse #P"CDSegundoHAREMclassico.xml" h :validate t)))
    (dom:map-document out dom :include-doctype nil)))

But this code produces a lot of warnings like the one below without writing anything in the output.

WARNING:
   deprecated SAX default method used by a handler that is not a subclass of SAX:ABSTRACT-HANDLER or HAX:ABSTRACT-HANDLER


Best,

----
Alexandre Rademaker
http://arademaker.github.com


On Nov 3, 2014, at 2:33 PM, Alexandre Rademaker <arademaker at gmail.com> wrote:


Thank you very much Russ! It works as expected! I have one last question. Running the parser with the command:

(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output)
      (let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out))))
	 (cxml:parse #P"harem.xml" h :validate t)))

where the file harem.xml begins with (see the doctype):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd">
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
 <DOC DOCID="H2-dftre765">
   <p>...


the command produces in the teste.xml output file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd"<!ELEMENT EM #PCDATA>
<!ATTLIST EM ID CDATA #REQUIRED>
<!ATTLIST EM CATEG CDATA #IMPLIED>
<!ATTLIST EM TIPO CDATA #IMPLIED>
<!ATTLIST EM COMENT CDATA #IMPLIED>
<!ATTLIST EM SUBTIPO CDATA #IMPLIED>
<!ELEMENT ALT (#PCDATA|EM)*>
<!ELEMENT OMITIDO (#PCDATA|EM|ALT|p)*>
<!ELEMENT colHAREM (DOC)*>
<!ATTLIST colHAREM versao CDATA #REQUIRED>
<!ELEMENT p (#PCDATA|EM|OMITIDO|ALT)*>
<!ATTLIST p xml:space (default|preserve) "default">
<!ELEMENT DOC (#PCDATA|p|OMITIDO)*>
<!ATTLIST DOC DOCID CDATA #REQUIRED>
> 
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
...


That is, the handler writes the DTD inside the output but in the wrong way, without the [ ]. Is it a bug in the library or in my code?

Thank you very much for this additional help! 

Best,

----
Alexandre Rademaker
http://arademaker.github.com


On Nov 3, 2014, at 1:35 PM, Russ Tyndall <russ at acceleration.net> wrote:

Howdy,

You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.

EG:
(defclass preproc (cxml:sax-proxy) ())

(defmethod sax:characters ((handler preproc) data)
(let ((chunks (cl-ppcre:split "\\|" data)))
  (if (= 1 (length chunks))
      (call-next-method)
      (loop for c in chunks
            for first? = t then nil
            do (unless first?
                 (sax:start-element handler nil nil "bar" nil)
                 (sax:end-element handler nil nil "bar"))
               (sax:characters handler c)))))

(cxml:parse "<test>content | ola</test>"
(make-instance 'preproc
 :chained-handler (cxml:make-string-sink)))
=>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test>content <bar/> ola</test>"


Hope this helps,
Russ Tyndall
Acceleration.net

On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
> Hi,
> 
> I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
> 
> <test att="one|two">content | something more | and done</test>
> 
> I need to transform to
> 
> <test att="one|two">content <bar/> something more <bar/> and done</test>
> 
> Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
> 
> ===
> (defclass preproc (cxml:sax-proxy) ())
> 
> (defmethod sax:characters ((handler preproc) data)
>  (call-next-method handler (cl-ppcre:regex-replace "\\|" data "<bar/>")))
> ===
> 
> But of course, it produces a string (escaped) not a tag in the final XML.
> 
> WML> (cxml:parse "<test>content | ola</test>"
>                     (make-instance 'preproc
>                                    :chained-handler (cxml:make-string-sink)))
> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
> <test>content <bar/> ola</test>"
> 
> Any idea or directions?
> 
> Best,
> 
> ----
> Alexandre Rademaker
> http://arademaker.github.com
> 
> 
> 
> _______________________________________________
> Cxml-devel mailing list
> Cxml-devel at common-lisp.net
> http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel


_______________________________________________
Cxml-devel mailing list
Cxml-devel at common-lisp.net
http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel






More information about the cxml-devel mailing list