From scaekenberghe at common-lisp.net Wed Jun 9 08:44:46 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Wed, 09 Jun 2004 01:44:46 -0700 Subject: [s-xml-cvs] CVS update: Module imported: s-xml-rpc Message-ID: Update of /project/s-xml/cvsroot/s-xml-rpc In directory common-lisp.net:/tmp/cvs-serv9860 Log Message: Project Creation Status: Vendor Tag: svc Release Tags: start N s-xml-rpc/Makefile N s-xml-rpc/s-xml-rpc.asd N s-xml-rpc/src/aserve.lisp N s-xml-rpc/src/base64.lisp N s-xml-rpc/src/package.lisp N s-xml-rpc/src/sysdeps.lisp N s-xml-rpc/src/validator1.lisp N s-xml-rpc/src/xml-rpc.lisp N s-xml-rpc/test/all-tests.lisp N s-xml-rpc/test/test-base64.lisp N s-xml-rpc/test/test-xml-rpc.lisp N s-xml-rpc/test/test.b64 No conflicts created by this import Date: Wed Jun 9 01:44:46 2004 Author: scaekenberghe New module s-xml-rpc added From scaekenberghe at common-lisp.net Wed Jun 9 09:01:01 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Wed, 09 Jun 2004 02:01:01 -0700 Subject: [s-xml-cvs] CVS update: Module imported: public_html Message-ID: Update of /project/s-xml/cvsroot/public_html In directory common-lisp.net:/tmp/cvs-serv20014 Log Message: Project Creation Status: Vendor Tag: svc Release Tags: start C public_html/index.html U public_html/style.css 1 conflicts created by this import. Use the following command to help the merge: cvs checkout -jsvc:yesterday -jsvc public_html Date: Wed Jun 9 02:01:01 2004 Author: scaekenberghe New module public_html added From scaekenberghe at common-lisp.net Fri Jun 11 08:20:58 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 11 Jun 2004 01:20:58 -0700 Subject: [s-xml-cvs] CVS update: s-xml/test/tracer.lisp Message-ID: Update of /project/s-xml/cvsroot/s-xml/test In directory common-lisp.net:/tmp/cvs-serv21504/test Added Files: tracer.lisp Log Message: a new example to help understand the SAX api (hook functions) Date: Fri Jun 11 01:20:58 2004 Author: scaekenberghe From scaekenberghe at common-lisp.net Fri Jun 11 08:22:47 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 11 Jun 2004 01:22:47 -0700 Subject: [s-xml-cvs] CVS update: s-xml/src/xml.lisp Message-ID: Update of /project/s-xml/cvsroot/s-xml/src In directory common-lisp.net:/tmp/cvs-serv23378/src Modified Files: xml.lisp Log Message: changed hook function comments Date: Fri Jun 11 01:22:47 2004 Author: scaekenberghe Index: s-xml/src/xml.lisp diff -u s-xml/src/xml.lisp:1.1.1.1 s-xml/src/xml.lisp:1.2 --- s-xml/src/xml.lisp:1.1.1.1 Mon Jun 7 11:49:58 2004 +++ s-xml/src/xml.lisp Fri Jun 11 01:22:47 2004 @@ -1,6 +1,6 @@ ;;;; -*- mode: lisp -*- ;;;; -;;;; $Id: xml.lisp,v 1.1.1.1 2004/06/07 18:49:58 scaekenberghe Exp $ +;;;; $Id: xml.lisp,v 1.2 2004/06/11 08:22:47 scaekenberghe Exp $ ;;;; ;;;; This is a Common Lisp implementation of a very basic XML parser. ;;;; The parser is non-validating and not at all complete (no CDATA). @@ -161,9 +161,9 @@ seed)) (finish-element-hook :documentation "Called when element ends" ;; Handle the end of an xml element with name and attributes, - ;; receiving the seed that was passed to our parent, - ;; receiving seed from last child/contents (or from new-level-hook for empty tags), - ;; return final seed for this element to higher levels + ;; receiving the seed that was passed by our parent, + ;; receiving seed from last child/contents + ;; return final seed for this element :accessor get-finish-element-hook :initarg :finish-element-hook :initform #'(lambda (name attributes parent-seed seed) @@ -171,7 +171,7 @@ seed)) (text-hook :documentation "Called when text is found" ;; Handle text in string, found as child/contents, - ;; receiving seed from parent element, return seed" + ;; receiving seed from parent element, return final seed for this element :accessor get-text-hook :initarg :text-hook :initform #'(lambda (string seed) From scaekenberghe at common-lisp.net Fri Jun 11 11:14:43 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 11 Jun 2004 04:14:43 -0700 Subject: [s-xml-cvs] CVS update: s-xml/src/xml.lisp Message-ID: Update of /project/s-xml/cvsroot/s-xml/src In directory common-lisp.net:/tmp/cvs-serv8028/src Modified Files: xml.lisp Log Message: further cleanup of examples and hook documentation Date: Fri Jun 11 04:14:42 2004 Author: scaekenberghe Index: s-xml/src/xml.lisp diff -u s-xml/src/xml.lisp:1.2 s-xml/src/xml.lisp:1.3 --- s-xml/src/xml.lisp:1.2 Fri Jun 11 01:22:47 2004 +++ s-xml/src/xml.lisp Fri Jun 11 04:14:42 2004 @@ -1,6 +1,6 @@ ;;;; -*- mode: lisp -*- ;;;; -;;;; $Id: xml.lisp,v 1.2 2004/06/11 08:22:47 scaekenberghe Exp $ +;;;; $Id: xml.lisp,v 1.3 2004/06/11 11:14:42 scaekenberghe Exp $ ;;;; ;;;; This is a Common Lisp implementation of a very basic XML parser. ;;;; The parser is non-validating and not at all complete (no CDATA). @@ -151,9 +151,10 @@ :accessor get-mini-buffer :initform (make-extendable-string)) (new-element-hook :documentation "Called when new element starts" - ;; Handle a new xml element with name and attributes, - ;; receiving seed from parent if any or top level - ;; return seed to be used for children/contents" + ;; Handle the start of a new xml element with name and attributes, + ;; receiving seed from previous element (sibling or parent) + ;; return seed to be used for first child (content) + ;; or directly to finish-element-hook :accessor get-new-element-hook :initarg :new-element-hook :initform #'(lambda (name attributes seed) @@ -161,17 +162,20 @@ seed)) (finish-element-hook :documentation "Called when element ends" ;; Handle the end of an xml element with name and attributes, - ;; receiving the seed that was passed by our parent, - ;; receiving seed from last child/contents - ;; return final seed for this element + ;; receiving parent-seed, the seed passed to us when this element started, + ;; i.e. passed to our corresponding new-element-hook + ;; and receiving seed from last child (content) + ;; or directly from new-element-hook + ;; return final seed for this element to next element (sibling or parent) :accessor get-finish-element-hook :initarg :finish-element-hook :initform #'(lambda (name attributes parent-seed seed) (declare (ignore name attributes parent-seed)) seed)) (text-hook :documentation "Called when text is found" - ;; Handle text in string, found as child/contents, - ;; receiving seed from parent element, return final seed for this element + ;; Handle text in string, found as contents, + ;; receiving seed from previous element (sibling or parent), + ;; return final seed for this element to next element (sibling or parent) :accessor get-text-hook :initarg :text-hook :initform #'(lambda (string seed) From scaekenberghe at common-lisp.net Fri Jun 11 11:14:43 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 11 Jun 2004 04:14:43 -0700 Subject: [s-xml-cvs] CVS update: s-xml/test/remove-markup.lisp s-xml/test/counter.lisp s-xml/test/tracer.lisp Message-ID: Update of /project/s-xml/cvsroot/s-xml/test In directory common-lisp.net:/tmp/cvs-serv8028/test Modified Files: counter.lisp tracer.lisp Added Files: remove-markup.lisp Log Message: further cleanup of examples and hook documentation Date: Fri Jun 11 04:14:43 2004 Author: scaekenberghe Index: s-xml/test/counter.lisp diff -u s-xml/test/counter.lisp:1.1.1.1 s-xml/test/counter.lisp:1.2 --- s-xml/test/counter.lisp:1.1.1.1 Mon Jun 7 11:49:59 2004 +++ s-xml/test/counter.lisp Fri Jun 11 04:14:43 2004 @@ -1,8 +1,8 @@ ;;;; -*- mode: lisp -*- ;;;; -;;;; $Id: counter.lisp,v 1.1.1.1 2004/06/07 18:49:59 scaekenberghe Exp $ +;;;; $Id: counter.lisp,v 1.2 2004/06/11 11:14:43 scaekenberghe Exp $ ;;;; -;;;; A simple SAX counter example that can be used as a performance test +;;;; A simple SSAX counter example that can be used as a performance test ;;;; ;;;; Copyright (C) 2004 Sven Van Caekenberghe, Beta Nine BVBA. ;;;; @@ -41,7 +41,7 @@ (let ((result (count-xml in))) (with-slots (elements attributes characters) result (format t - "~a countains ~d XML elements, ~d attributes and ~d characters.~%" + "~a contains ~d XML elements, ~d attributes and ~d characters.~%" pathname elements attributes characters))))) ;;;; eof Index: s-xml/test/tracer.lisp diff -u s-xml/test/tracer.lisp:1.1 s-xml/test/tracer.lisp:1.2 --- s-xml/test/tracer.lisp:1.1 Fri Jun 11 01:20:58 2004 +++ s-xml/test/tracer.lisp Fri Jun 11 04:14:43 2004 @@ -1,8 +1,8 @@ ;;;; -*- mode: lisp -*- ;;;; -;;;; $Id: tracer.lisp,v 1.1 2004/06/11 08:20:58 scaekenberghe Exp $ +;;;; $Id: tracer.lisp,v 1.2 2004/06/11 11:14:43 scaekenberghe Exp $ ;;;; -;;;; A simple SAX tracer example that can be used to understand how the hooks are called +;;;; A simple SSAX tracer example that can be used to understand how the hooks are called ;;;; ;;;; Copyright (C) 2004 Sven Van Caekenberghe, Beta Nine BVBA. ;;;; @@ -18,28 +18,33 @@ (terpri *standard-output*)) (defun trace-xml-new-element-hook (name attributes seed) - (trace-xml-log (car seed) - "(new-element :name ~s :attributes ~:[()~;~:*s~] :seed ~s)" - name attributes seed) - (cons (1+ (car seed)) (1+ (cdr seed)))) + (let ((new-seed (cons (1+ (car seed)) (1+ (cdr seed))))) + (trace-xml-log (car seed) + "(new-element :name ~s :attributes ~:[()~;~:*~s~] :seed ~s) => ~s" + name attributes seed new-seed) + new-seed)) (defun trace-xml-finish-element-hook (name attributes parent-seed seed) - (trace-xml-log (car parent-seed) - "(finish-element :name ~s :attributes ~:[()~;~:*s~] :parent-seed ~s :seed ~s)" - name attributes parent-seed seed) - (cons (1- (car seed)) (1+ (cdr seed)))) + (let ((new-seed (cons (1- (car seed)) (1+ (cdr seed))))) + (trace-xml-log (car parent-seed) + "(finish-element :name ~s :attributes ~:[()~;~:*~s~] :parent-seed ~s :seed ~s) => ~s" + name attributes parent-seed seed new-seed) + new-seed)) (defun trace-xml-text-hook (string seed) - (trace-xml-log (car seed) - "(text :string ~s :seed ~s)" - string seed) - seed) + (let ((new-seed (cons (car seed) (1+ (cdr seed))))) + (trace-xml-log (car seed) + "(text :string ~s :seed ~s) => ~s" + string seed new-seed) + new-seed)) (defun trace-xml (in) "Parse and trace a toplevel XML element from stream in" (start-parse-xml in (make-instance 'xml-parser-state - :seed (cons 0 0) + :seed (cons 0 0) + ;; seed car is xml element nesting level + ;; seed cdr is ever increasing from element to element :new-element-hook #'trace-xml-new-element-hook :finish-element-hook #'trace-xml-finish-element-hook :text-hook #'trace-xml-text-hook))) From scaekenberghe at common-lisp.net Fri Jun 11 11:15:34 2004 From: scaekenberghe at common-lisp.net (Sven Van Caekenberghe) Date: Fri, 11 Jun 2004 04:15:34 -0700 Subject: [s-xml-cvs] CVS update: public_html/index.html Message-ID: Update of /project/s-xml/cvsroot/public_html In directory common-lisp.net:/tmp/cvs-serv8363 Modified Files: index.html Log Message: reworked the example(s) section with better and more code added release history section Date: Fri Jun 11 04:15:34 2004 Author: scaekenberghe Index: public_html/index.html diff -u public_html/index.html:1.2 public_html/index.html:1.3 --- public_html/index.html:1.2 Tue Jun 8 01:23:18 2004 +++ public_html/index.html Fri Jun 11 04:15:34 2004 @@ -46,32 +46,160 @@ Using a DOM parser is easier, but usually less efficient: see the next sections. To use the event-based API of the parser, you call the function start-parse-xml on a stream, specifying 3 hook functions:

    -
  • new-element-hook (name attributes seed) => seed
    Called when the parser enters a new element. The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords and attribute values as strings) of the element are passed in, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value to be used for the next sibling.
  • -
  • finish-element-hook (name attributes parent-seed seed) => seed
    Called when the parser leaves an element. The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords and attribute values as strings) of the element are passed in, as well as the parent-seed directly from the parent, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value as result to passed to the parent.
  • -
  • text-hook (string seed) => seed
    Called when the parser finds text as contents. The string of the text encountered is passed in, as well as the seed from the parent or from the last encountered sibling. The hook must return a new seed value to be used for the next sibling or as result passed to the parent.
  • +
  • new-element-hook (name attributes seed) => seed
    + Called when the parser enters a new element. + The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords + and attribute values as strings) of the element are passed in, + as well as the seed from the previous element (either the last encountered sibling or the parent). + The hook must return a seed value to be passed to the first child element + or directly to finish-element-hook (when there are no children).
  • +
  • finish-element-hook (name attributes parent-seed seed) => seed
    + Called when the parser leaves an element. + The name of the element (tag) and the attributes (an unordered dotted pair list of attribute names as keywords + and attribute values as strings) of the element are passed in, + as well as the parent-seed, the seed passed to us when this element started, + i.e. passed to our corresponding new-element-hook, + as well as the seed from the previous element (either the last encountered sibling or the parent). + The hook must return the final seed value for this element + to be passed to the next sibling or to the parent (when there are no more children).
  • +
  • text-hook (string seed) => seed
    + Called when the parser finds text as contents. + The string of the text encountered is passed in, as well as the seed from the previous element + (either the last encountered sibling or the parent). + The hook must return the final seed value for this element + to be passed to the next sibling or to the parent (when there are no more children).

- As an example, consider the next function that will echo an XML input stream to an output stream: -

-
(defun echo-xml (in out)
-  (start-parse-xml
-   in
-   (make-instance 'xml-parser-state
-		  :new-element-hook #'(lambda (name attributes seed)
-					(declare (ignore seed))
-					(format out "<~a~:{ ~a='~a'~}>"
-						name
-						(mapcar #'(lambda (p) (list (car p) (cdr p)))
-							attributes)))
-		  :finish-element-hook #'(lambda (name attributes parent-seed seed)
-					   (declare (ignore attributes parent-seed seed))
-					   (format out "</~a>" name))
-		  :text-hook #'(lambda (string seed)
-				 (declare (ignore seed))
-				 (princ string out)))))
-

- The seed parameters and return values are not used here. As a simplification, we just print text and attribute values, in real code we would have to use print-xml-string to properly escape special characters. Have a look at the implementations of the different DOM representations, as well as the XML-RPC code and the CLOS serialization code elsewhere for more real-world examples. + As an example, consider the following tracer that shows how the different hooks are called:

+
(defun trace-xml-new-element-hook (name attributes seed)
+  (let ((new-seed (cons (1+ (car seed)) (1+ (cdr seed)))))
+    (trace-xml-log (car seed) 
+                   "(new-element :name ~s :attributes ~:[()~;~:*~s~] :seed ~s) => ~s" 
+                   name attributes seed new-seed)
+    new-seed))
+
+(defun trace-xml-finish-element-hook (name attributes parent-seed seed)
+  (let ((new-seed (cons (1- (car seed)) (1+ (cdr seed)))))
+    (trace-xml-log (car parent-seed)
+                   "(finish-element :name ~s :attributes ~:[()~;~:*~s~] :parent-seed ~s :seed ~s) => ~s" 
+                   name attributes parent-seed seed new-seed)
+    new-seed))
+
+(defun trace-xml-text-hook (string seed)
+  (let ((new-seed (cons (car seed) (1+ (cdr seed)))))
+    (trace-xml-log (car seed) 
+                   "(text :string ~s :seed ~s) => ~s" 
+                   string seed new-seed)
+    new-seed))
+
+(defun trace-xml (in)
+  "Parse and trace a toplevel XML element from stream in"
+  (start-parse-xml in
+		   (make-instance 'xml-parser-state
+				  :seed (cons 0 0) 
+                                  ;; seed car is xml element nesting level
+                                  ;; seed cdr is ever increasing from element to element
+				  :new-element-hook #'trace-xml-new-element-hook
+                                  :finish-element-hook #'trace-xml-finish-element-hook
+				  :text-hook #'trace-xml-text-hook)))
+

+ This is the output of the tracer on two small XML documents, the seed is a CONS that keeps track of the nesting level in its CAR and of its flow through the hooks with an ever increasing number is its CDR: +

+
S-XML 31 > (with-input-from-string (in "<FOO X='10' Y='20'><P>Text</P><BAR/><H1><H2></H2></H1></FOO>") (trace-xml in))
+(new-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :seed (0 . 0)) => (1 . 1)
+  (new-element :name :P :attributes () :seed (1 . 1)) => (2 . 2)
+    (text :string "Text" :seed (2 . 2)) => (2 . 3)
+  (finish-element :name :P :attributes () :parent-seed (1 . 1) :seed (2 . 3)) => (1 . 4)
+  (new-element :name :BAR :attributes () :seed (1 . 4)) => (2 . 5)
+  (finish-element :name :BAR :attributes () :parent-seed (1 . 4) :seed (2 . 5)) => (1 . 6)
+  (new-element :name :H1 :attributes () :seed (1 . 6)) => (2 . 7)
+    (new-element :name :H2 :attributes () :seed (2 . 7)) => (3 . 8)
+    (finish-element :name :H2 :attributes () :parent-seed (2 . 7) :seed (3 . 8)) => (2 . 9)
+  (finish-element :name :H1 :attributes () :parent-seed (1 . 6) :seed (2 . 9)) => (1 . 10)
+(finish-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :parent-seed (0 . 0) :seed (1 . 10)) => (0 . 11)
+(0 . 11)
+
+S-XML 32 > (with-input-from-string (in "<FOO><UL><LI>1</LI><LI>2</LI><LI>3</LI></UL></FOO>") (trace-xml in))
+(new-element :name :FOO :attributes () :seed (0 . 0)) => (1 . 1)
+  (new-element :name :UL :attributes () :seed (1 . 1)) => (2 . 2)
+    (new-element :name :LI :attributes () :seed (2 . 2)) => (3 . 3)
+      (text :string "1" :seed (3 . 3)) => (3 . 4)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 2) :seed (3 . 4)) => (2 . 5)
+    (new-element :name :LI :attributes () :seed (2 . 5)) => (3 . 6)
+      (text :string "2" :seed (3 . 6)) => (3 . 7)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 5) :seed (3 . 7)) => (2 . 8)
+    (new-element :name :LI :attributes () :seed (2 . 8)) => (3 . 9)
+      (text :string "3" :seed (3 . 9)) => (3 . 10)
+    (finish-element :name :LI :attributes () :parent-seed (2 . 8) :seed (3 . 10)) => (2 . 11)
+  (finish-element :name :UL :attributes () :parent-seed (1 . 1) :seed (2 . 11)) => (1 . 12)
+(finish-element :name :FOO :attributes () :parent-seed (0 . 0) :seed (1 . 12)) => (0 . 13)
+(0 . 13)
+

+ The following example counts tags, attributes and characters: +

+
(defclass count-xml-seed ()
+  ((elements :initform 0)
+   (attributes :initform 0)
+   (characters :initform 0)))
+
+(defun count-xml-new-element-hook (name attributes seed)
+  (declare (ignore name))
+  (incf (slot-value seed 'elements))
+  (incf (slot-value seed 'attributes) (length attributes))
+  seed)
+
+(defun count-xml-text-hook (string seed)
+  (incf (slot-value seed 'characters) (length string))
+  seed)
+  
+(defun count-xml (in)
+  "Parse a toplevel XML element from stream in, counting elements, attributes and characters"
+  (start-parse-xml in
+		   (make-instance 'xml-parser-state
+				  :seed (make-instance 'count-xml-seed)
+				  :new-element-hook #'count-xml-new-element-hook
+				  :text-hook #'count-xml-text-hook)))
+
+(defun count-xml-file (pathname)
+  "Parse XMl from the file at pathname, counting elements, attributes and characters"
+  (with-open-file (in pathname)
+    (let ((result (count-xml in)))
+      (with-slots (elements attributes characters) result
+        (format t 
+  "~a contains ~d XML elements, ~d attributes and ~d characters.~%"
+                pathname elements attributes characters)))))
+

+ This example removes XML markup: +

+
(defun remove-xml-markup (in)
+  (let* ((state (make-instance 'xml-parser-state
+                              :text-hook #'(lambda (string seed) (cons string seed))))
+         (result (start-parse-xml in state)))
+    (apply #'concatenate 'string (nreverse result))))
+

+ The next example is from the xml-element struct DOM implementation, where the SSAX parser hook functions are building the actual DOM: +

+
(defun standard-new-element-hook (name attributes seed)
+  (declare (ignore name attributes seed))
+  '())
+
+(defun standard-finish-element-hook (name attributes parent-seed seed)
+  (let ((xml-element (make-xml-element :name name
+				       :attributes attributes
+				       :children (nreverse seed))))
+    (cons xml-element parent-seed)))
+
+(defun standard-text-hook (string seed)
+  (cons string seed))
+
+(defmethod parse-xml-dom (stream (output-type (eql :xml-struct)))
+  (car (start-parse-xml stream
+			(make-instance 'xml-parser-state
+				       :new-element-hook #'standard-new-element-hook
+				       :finish-element-hook #'standard-finish-element-hook
+				       :text-hook #'standard-text-hook))))
+

The parse state can be used to specify the initial seed value (nil by default), and the set of known entities (the 5 standard entities (lt, gt, amp, qout, apos) and nbps by default).

@@ -118,7 +246,17 @@ Tag and attribute names are converted to keywords. Note that XML is case-sensitive, hence the fact that Common Lisp has to resort to the special literal symbol syntax.

-

CVS version $Id: index.html,v 1.2 2004/06/08 08:23:18 scaekenberghe Exp $

+

Release History

+ +
    +
  • today: project moved to common-lisp.net, renamed to s-xml, added examples counter, tracer and remove-markup, improved documentation
  • +
  • release 3, Januari 13, 2004: added ASDF systems, optimized print-string-xml
  • +
  • release 2, June 10, 2003: added echo-xml function, we are no longer taking the car when the last seed is returned from start-parse-xml
  • +
  • release 1, May 25, 2003: first public release of working code, tested on OpenMCL, rewritten to be event-based, to improve efficiency and to optionally use different DOM representations, more documentation
  • +
  • release 0, end of 2002: as part of an XML-RPC implementation
  • +
+ +

CVS version $Id: index.html,v 1.3 2004/06/11 11:15:34 scaekenberghe Exp $