[Cxml-devel] how to create tags with SAX to make an almost identity transformation
Alexandre Rademaker
arademaker at gmail.com
Mon Nov 3 16:33:45 UTC 2014
Thank you very much Russ! It works as expected! I have one last question. Running the parser with the command:
(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output)
(let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out))))
(cxml:parse #P"harem.xml" h :validate t)))
where the file harem.xml begins with (see the doctype):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd">
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
<DOC DOCID="H2-dftre765">
<p>...
the command produces in the teste.xml output file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE colHAREM SYSTEM "harem.dtd"<!ELEMENT EM #PCDATA>
<!ATTLIST EM ID CDATA #REQUIRED>
<!ATTLIST EM CATEG CDATA #IMPLIED>
<!ATTLIST EM TIPO CDATA #IMPLIED>
<!ATTLIST EM COMENT CDATA #IMPLIED>
<!ATTLIST EM SUBTIPO CDATA #IMPLIED>
<!ELEMENT ALT (#PCDATA|EM)*>
<!ELEMENT OMITIDO (#PCDATA|EM|ALT|p)*>
<!ELEMENT colHAREM (DOC)*>
<!ATTLIST colHAREM versao CDATA #REQUIRED>
<!ELEMENT p (#PCDATA|EM|OMITIDO|ALT)*>
<!ATTLIST p xml:space (default|preserve) "default">
<!ELEMENT DOC (#PCDATA|p|OMITIDO)*>
<!ATTLIST DOC DOCID CDATA #REQUIRED>
>
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010">
...
That is, the handler writes the DTD inside the output but in the wrong way, without the [ ]. Is it a bug in the library or in my code?
Thank you very much for this additional help!
Best,
----
Alexandre Rademaker
http://arademaker.github.com
On Nov 3, 2014, at 1:35 PM, Russ Tyndall <russ at acceleration.net> wrote:
Howdy,
You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.
EG:
(defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data)
(let ((chunks (cl-ppcre:split "\\|" data)))
(if (= 1 (length chunks))
(call-next-method)
(loop for c in chunks
for first? = t then nil
do (unless first?
(sax:start-element handler nil nil "bar" nil)
(sax:end-element handler nil nil "bar"))
(sax:characters handler c)))))
(cxml:parse "<test>content | ola</test>"
(make-instance 'preproc
:chained-handler (cxml:make-string-sink)))
=>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test>content <bar/> ola</test>"
Hope this helps,
Russ Tyndall
Acceleration.net
On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
> Hi,
>
> I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
>
> <test att="one|two">content | something more | and done</test>
>
> I need to transform to
>
> <test att="one|two">content <bar/> something more <bar/> and done</test>
>
> Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
>
> ===
> (defclass preproc (cxml:sax-proxy) ())
>
> (defmethod sax:characters ((handler preproc) data)
> (call-next-method handler (cl-ppcre:regex-replace "\\|" data "<bar/>")))
> ===
>
> But of course, it produces a string (escaped) not a tag in the final XML.
>
> WML> (cxml:parse "<test>content | ola</test>"
> (make-instance 'preproc
> :chained-handler (cxml:make-string-sink)))
> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
> <test>content <bar/> ola</test>"
>
> Any idea or directions?
>
> Best,
>
> ----
> Alexandre Rademaker
> http://arademaker.github.com
>
>
>
> _______________________________________________
> Cxml-devel mailing list
> Cxml-devel at common-lisp.net
> http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
_______________________________________________
Cxml-devel mailing list
Cxml-devel at common-lisp.net
http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
More information about the cxml-devel
mailing list