[closure-devel] How to disregard namespaces
Andrei Stebakov
lispercat at gmail.com
Fri Mar 4 18:54:00 UTC 2011
Can it be resolved at (defun rename-attribute (attribute prefix uri) level?
So instead of throwing
((zerop (length uri))
(stp-error "attribute with prefix but no URI"))
It would check some global var like *ignore-namespaces* and just continue?
On Fri, Mar 4, 2011 at 1:15 PM, David Lichteblau <david at lichteblau.com> wrote:
> Quoting Andrei Stebakov (lispercat at gmail.com):
>> Say I need to parse html that I got from some external source and for
>> some reason there are namespaces in the text:
>>
>> (chtml:parse "<a href='someurl.com' somens:url='someurl.com'>text</a>"
>> (stp:make-builder))
>>
>> The parser will choke on somens namespace since it's not mapped to any url:
>> 0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL]
>> 1: (CXML-STP:RENAME-ATTRIBUTE #<error printing object>)
>> 2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "")
>> 3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T
>> T T T)) ..)
>
> Indeed, something needs to be done to fix this, since chtml purports to
> fix bogus html without erroring out.
>
> At the moment, chtml liberally accepts these attributes for its own
> internal PT representation, but then accidentally turns PT attributes
> into HAX events (and then SAX events) without further validation.
>
> I think it might be easiest to continue allowing them in PT, but to
> change PT serialization to fix them before constructing hax attribute
> objects.
>
> Here is a simple patch that just discards the attribute (changing its
> name would be another option). Note that the patch isn't good enough to
> commit it as this point, because it introduces a dependency from chtml
> to cxml.
>
> --- a/src/parse/html-parser.lisp
> +++ b/src/parse/html-parser.lisp
> @@ -98,16 +98,20 @@
> ;;; (merge-pathnames (or pathname (pathname input))))))
> (parse-xstream xstream handler)))))
>
> +(defun good-attribute-name-p (name)
> + (and (cxml::valid-name-p name)
> + (not (or (string-equal name "xmlns")
> + (position #\: name)))))
> +
> (defun serialize-pt-attributes (plist recode)
> (loop
> for (name value) on plist by #'cddr
> - unless
> - ;; better don't emit as HAX what would be bogus as SAX anyway
> - (string-equal name "xmlns")
> + for n = #+rune-is-character (coerce (symbol-name name) 'rod)
> + #-rune-is-character (symbol-name name)
> + ;; don't emit as HAX what would be bogus as SAX anyway
> + if (good-attribute-name-p n)
> collect
> - (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod)
> - #-rune-is-character (symbol-name name))
> - (v (etypecase value
> + (let ((v (etypecase value
> (symbol (coerce (string-downcase (symbol-name value)) 'rod))
> (rod (funcall recode value))
> (string (coerce value 'rod)))))
>
>
>> Is there a way to specify some global variable to turn off namespace
>> processing?
>> I saw *namespace-processing* variable in some other package but it
>> doesn't seem to be relevant in this case.
>
> You could use DOM instead of STP, I suppose. DOM doesn't do these sorts
> of checks IIRC.
>
> (Personally I strongly prefer STP over DOM, but one reason for that
> preference is that STP is stricter, which is nice when actually working
> with XML.)
>
>
> d.
>
More information about the closure-devel
mailing list