[s-xml-cvs] CVS update: public_html/index.html

Sven Van Caekenberghe scaekenberghe at common-lisp.net
Fri Jun 11 11:15:34 UTC 2004


Update of /project/s-xml/cvsroot/public_html
In directory common-lisp.net:/tmp/cvs-serv8363

Modified Files:
	index.html 
Log Message:
reworked the example(s) section with better and more code
added release history section

Date: Fri Jun 11 04:15:34 2004
Author: scaekenberghe

Index: public_html/index.html
diff -u public_html/index.html:1.2 public_html/index.html:1.3
--- public_html/index.html:1.2	Tue Jun  8 01:23:18 2004
+++ public_html/index.html	Fri Jun 11 04:15:34 2004
@@ -46,32 +46,160 @@
   Using a DOM parser is easier, but usually less efficient: see the next sections. To use the event-based API of the parser, you call the function start-parse-xml on a stream, specifying 3 hook functions:
 </p>
 <ul>
-  <li><b>new-element-hook</b> <tt>(name attributes seed) => seed</tt><br/> Called when the parser enters a new element. The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords and attribute values as strings) of the element are passed in, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value to be used for the next sibling.</li>
-  <li><b>finish-element-hook</b> <tt>(name attributes parent-seed seed) => seed</tt><br/> Called when the parser leaves an element. The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords and attribute values as strings) of the element are passed in, as well as the parent-seed directly from the parent, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value as result to passed to the parent.</li>
-  <li><b>text-hook</b> <tt>(string seed) => seed</tt><br/> Called when the parser finds text as contents. The string of the text encountered is passed in, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value to be used for the next sibling or as result passed to the parent.</li>
+  <li><b>new-element-hook</b> <tt>(name attributes seed) => seed</tt><br/> 
+    Called when the parser enters a new element. 
+    The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords 
+    and attribute values as strings) of the element are passed in, 
+    as well as the seed from the previous element (either the last encountered sibling or the parent). 
+    The hook must return a seed value to be passed to the first child element 
+    or directly to finish-element-hook (when there are no children).</li>
+  <li><b>finish-element-hook</b> <tt>(name attributes parent-seed seed) => seed</tt><br/> 
+    Called when the parser leaves an element. 
+    The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords 
+    and attribute values as strings) of the element are passed in, 
+    as well as the parent-seed, the seed passed to us when this element started,
+    i.e. passed to our corresponding new-element-hook,
+    as well as the seed from the previous element (either the last encountered sibling or the parent). 
+    The hook must return the final seed value for this element
+    to be passed to the next sibling or to the parent (when there are no more children).</li>
+  <li><b>text-hook</b> <tt>(string seed) => seed</tt><br/> 
+    Called when the parser finds text as contents. 
+    The string of the text encountered is passed in, as well as the seed from the previous element
+    (either the last encountered sibling or the parent).
+    The hook must return the final seed value for this element
+    to be passed to the next sibling or to the parent (when there are no more children).</li>
 </ul>      
 <p>
-  As an example, consider the next function that will echo an XML input stream to an output stream:
-</p>
-<pre>(defun echo-xml (in out)
-  (start-parse-xml
-   in
-   (make-instance 'xml-parser-state
-		  :new-element-hook #'(lambda (name attributes seed)
-					(declare (ignore seed))
-					(format out "<~a~:{ ~a='~a'~}>"
-						name
-						(mapcar #'(lambda (p) (list (car p) (cdr p)))
-							attributes)))
-		  :finish-element-hook #'(lambda (name attributes parent-seed seed)
-					   (declare (ignore attributes parent-seed seed))
-					   (format out "</~a>" name))
-		  :text-hook #'(lambda (string seed)
-				 (declare (ignore seed))
-				 (princ string out)))))</pre>
-<p>
-  The seed parameters and return values are not used here. As a simplification, we just print text and attribute values, in real code we would have to use print-xml-string to properly escape special characters. Have a look at the implementations of the different DOM representations, as well as the XML-RPC code and the CLOS serialization code elsewhere for more real-world examples.
+  As an example, consider the following tracer that shows how the different hooks are called:
 </p>
+<pre>(defun trace-xml-new-element-hook (name attributes seed)
+  (let ((new-seed (cons (1+ (car seed)) (1+ (cdr seed)))))
+    (trace-xml-log (car seed) 
+                   "(new-element :name ~s :attributes ~:[()~;~:*~s~] :seed ~s) => ~s" 
+                   name attributes seed new-seed)
+    new-seed))
+
+(defun trace-xml-finish-element-hook (name attributes parent-seed seed)
+  (let ((new-seed (cons (1- (car seed)) (1+ (cdr seed)))))
+    (trace-xml-log (car parent-seed)
+                   "(finish-element :name ~s :attributes ~:[()~;~:*~s~] :parent-seed ~s :seed ~s) => ~s" 
+                   name attributes parent-seed seed new-seed)
+    new-seed))
+
+(defun trace-xml-text-hook (string seed)
+  (let ((new-seed (cons (car seed) (1+ (cdr seed)))))
+    (trace-xml-log (car seed) 
+                   "(text :string ~s :seed ~s) => ~s" 
+                   string seed new-seed)
+    new-seed))
+
+(defun trace-xml (in)
+  "Parse and trace a toplevel XML element from stream in"
+  (start-parse-xml in
+		   (make-instance 'xml-parser-state
+				  :seed (cons 0 0) 
+                                  ;; seed car is xml element nesting level
+                                  ;; seed cdr is ever increasing from element to element
+				  :new-element-hook #'trace-xml-new-element-hook
+                                  :finish-element-hook #'trace-xml-finish-element-hook
+				  :text-hook #'trace-xml-text-hook)))</pre>
+<p>
+  This is the output of the tracer on two small XML documents, the seed is a CONS that keeps track of the nesting level in its CAR and of its flow through the hooks with an ever increasing number is its CDR:
+</p>
+<pre>S-XML 31 > (with-input-from-string (in "<FOO X='10' Y='20'><P>Text</P><BAR/><H1><H2></H2></H1></FOO>") (trace-xml in))
+(new-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :seed (0 . 0)) => (1 . 1)
+  (new-element :name :P :attributes () :seed (1 . 1)) => (2 . 2)
+    (text :string "Text" :seed (2 . 2)) => (2 . 3)
+  (finish-element :name :P :attributes () :parent-seed (1 . 1) :seed (2 . 3)) => (1 . 4)
+  (new-element :name :BAR :attributes () :seed (1 . 4)) => (2 . 5)
+  (finish-element :name :BAR :attributes () :parent-seed (1 . 4) :seed (2 . 5)) => (1 . 6)
+  (new-element :name :H1 :attributes () :seed (1 . 6)) => (2 . 7)
+    (new-element :name :H2 :attributes () :seed (2 . 7)) => (3 . 8)
+    (finish-element :name :H2 :attributes () :parent-seed (2 . 7) :seed (3 . 8)) => (2 . 9)
+  (finish-element :name :H1 :attributes () :parent-seed (1 . 6) :seed (2 . 9)) => (1 . 10)
+(finish-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :parent-seed (0 . 0) :seed (1 . 10)) => (0 . 11)
+(0 . 11)
+
+S-XML 32 > (with-input-from-string (in "<FOO><UL><LI>1</LI><LI>2</LI><LI>3</LI></UL></FOO>") (trace-xml in))
+(new-element :name :FOO :attributes () :seed (0 . 0)) => (1 . 1)
+  (new-element :name :UL :attributes () :seed (1 . 1)) => (2 . 2)
+    (new-element :name :LI :attributes () :seed (2 . 2)) => (3 . 3)
+      (text :string "1" :seed (3 . 3)) => (3 . 4)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 2) :seed (3 . 4)) => (2 . 5)
+    (new-element :name :LI :attributes () :seed (2 . 5)) => (3 . 6)
+      (text :string "2" :seed (3 . 6)) => (3 . 7)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 5) :seed (3 . 7)) => (2 . 8)
+    (new-element :name :LI :attributes () :seed (2 . 8)) => (3 . 9)
+      (text :string "3" :seed (3 . 9)) => (3 . 10)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 8) :seed (3 . 10)) => (2 . 11)
+  (finish-element :name :UL :attributes () :parent-seed (1 . 1) :seed (2 . 11)) => (1 . 12)
+(finish-element :name :FOO :attributes () :parent-seed (0 . 0) :seed (1 . 12)) => (0 . 13)
+(0 . 13)</pre>
+<p>
+  The following example counts tags, attributes and characters:
+</p>
+<pre>(defclass count-xml-seed ()
+  ((elements :initform 0)
+   (attributes :initform 0)
+   (characters :initform 0)))
+
+(defun count-xml-new-element-hook (name attributes seed)
+  (declare (ignore name))
+  (incf (slot-value seed 'elements))
+  (incf (slot-value seed 'attributes) (length attributes))
+  seed)
+
+(defun count-xml-text-hook (string seed)
+  (incf (slot-value seed 'characters) (length string))
+  seed)
+  
+(defun count-xml (in)
+  "Parse a toplevel XML element from stream in, counting elements, attributes and characters"
+  (start-parse-xml in
+		   (make-instance 'xml-parser-state
+				  :seed (make-instance 'count-xml-seed)
+				  :new-element-hook #'count-xml-new-element-hook
+				  :text-hook #'count-xml-text-hook)))
+
+(defun count-xml-file (pathname)
+  "Parse XMl from the file at pathname, counting elements, attributes and characters"
+  (with-open-file (in pathname)
+    (let ((result (count-xml in)))
+      (with-slots (elements attributes characters) result
+        (format t 
+  "~a contains ~d XML elements, ~d attributes and ~d characters.~%"
+                pathname elements attributes characters)))))</pre>
+<p>
+  This example removes XML markup:
+</p>
+<pre>(defun remove-xml-markup (in)
+  (let* ((state (make-instance 'xml-parser-state
+                              :text-hook #'(lambda (string seed) (cons string seed))))
+         (result (start-parse-xml in state)))
+    (apply #'concatenate 'string (nreverse result))))</pre>
+<p>
+  The next example is from the xml-element struct DOM implementation, where the SSAX parser hook functions are building the actual DOM:
+</p>
+<pre>(defun standard-new-element-hook (name attributes seed)
+  (declare (ignore name attributes seed))
+  '())
+
+(defun standard-finish-element-hook (name attributes parent-seed seed)
+  (let ((xml-element (make-xml-element :name name
+				       :attributes attributes
+				       :children (nreverse seed))))
+    (cons xml-element parent-seed)))
+
+(defun standard-text-hook (string seed)
+  (cons string seed))
+
+(defmethod parse-xml-dom (stream (output-type (eql :xml-struct)))
+  (car (start-parse-xml stream
+			(make-instance 'xml-parser-state
+				       :new-element-hook #'standard-new-element-hook
+				       :finish-element-hook #'standard-finish-element-hook
+				       :text-hook #'standard-text-hook))))
+</pre>
 <p>
   The parse state can be used to specify the initial seed value (nil by default), and the set of known entities (the 5 standard entities (lt, gt, amp, qout, apos) and nbps by default).
 </p>
@@ -118,7 +246,17 @@
   Tag and attribute names are converted to keywords. Note that XML is case-sensitive, hence the fact that Common Lisp has to resort to the special literal symbol syntax.
 </p>
 
-  <p>CVS version $Id: index.html,v 1.2 2004/06/08 08:23:18 scaekenberghe Exp $</p>
+<h3>Release History</h3>
+
+<ul>
+  <li>today: project moved to common-lisp.net, renamed to s-xml, added examples counter, tracer and remove-markup, improved documentation</li>
+  <li>release 3, Januari 13, 2004: added ASDF systems, optimized print-string-xml</li>
+  <li>release 2, June 10, 2003: added echo-xml function, we are no longer taking the car when the last seed is returned from start-parse-xml</li>
+  <li>release 1, May 25, 2003: first public release of working code, tested on <a href="http://openmcl.clozure.com/">OpenMCL</a>, rewritten to be event-based, to improve efficiency and to optionally use different DOM representations, more documentation</li>
+  <li>release 0, end of 2002: as part of an XML-RPC implementation</li>
+</ul>
+
+  <p>CVS version $Id: index.html,v 1.3 2004/06/11 11:15:34 scaekenberghe Exp $</p>
 
   <div class="footer">
     <p>Back to <a href="http://common-lisp.net/">Common-lisp.net</a>.</p>  





More information about the S-xml-cvs mailing list