[closure-devel] How to disregard namespaces

David Lichteblau david at lichteblau.com
Fri Mar 4 18:15:58 UTC 2011


Quoting Andrei Stebakov (lispercat at gmail.com):
> Say I need to parse html that I got from some external source and for
> some reason there are namespaces in the text:
> 
> (chtml:parse "<a href='someurl.com' somens:url='someurl.com'>text</a>"
>  (stp:make-builder))
> 
> The parser will choke on somens namespace since it's not mapped to any url:
>   0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL]
>   1: (CXML-STP:RENAME-ATTRIBUTE #<error printing object>)
>   2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "")
>   3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T
> T T T)) ..)

Indeed, something needs to be done to fix this, since chtml purports to
fix bogus html without erroring out.

At the moment, chtml liberally accepts these attributes for its own
internal PT representation, but then accidentally turns PT attributes
into HAX events (and then SAX events) without further validation.

I think it might be easiest to continue allowing them in PT, but to
change PT serialization to fix them before constructing hax attribute
objects.

Here is a simple patch that just discards the attribute (changing its
name would be another option).  Note that the patch isn't good enough to
commit it as this point, because it introduces a dependency from chtml
to cxml.

--- a/src/parse/html-parser.lisp
+++ b/src/parse/html-parser.lisp
@@ -98,16 +98,20 @@
 ;;; 		    (merge-pathnames (or pathname (pathname input))))))
        (parse-xstream xstream handler)))))
 
+(defun good-attribute-name-p (name)
+  (and (cxml::valid-name-p name)
+       (not (or (string-equal name "xmlns")
+		(position #\: name)))))
+
 (defun serialize-pt-attributes (plist recode)
   (loop
      for (name value) on plist by #'cddr
-     unless
-       ;; better don't emit as HAX what would be bogus as SAX anyway
-       (string-equal name "xmlns")
+     for n = #+rune-is-character (coerce (symbol-name name) 'rod)
+	     #-rune-is-character (symbol-name name)
+     ;; don't emit as HAX what would be bogus as SAX anyway
+     if (good-attribute-name-p n)
      collect
-     (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod)
-	       #-rune-is-character (symbol-name name))
-	    (v (etypecase value
+     (let ((v (etypecase value
 		 (symbol (coerce (string-downcase (symbol-name value)) 'rod))
 		 (rod (funcall recode value))
 		 (string (coerce value 'rod)))))


> Is there a way to specify some global variable to turn off namespace
> processing?
> I saw *namespace-processing* variable in some other package but it
> doesn't seem to be relevant in this case.

You could use DOM instead of STP, I suppose.  DOM doesn't do these sorts
of checks IIRC.

(Personally I strongly prefer STP over DOM, but one reason for that
preference is that STP is stricter, which is nice when actually working
with XML.)


d.




More information about the closure-devel mailing list