From lispercat at gmail.com Fri Mar 4 17:38:56 2011 From: lispercat at gmail.com (Andrei Stebakov) Date: Fri, 4 Mar 2011 12:38:56 -0500 Subject: [closure-devel] How to disregard namespaces Message-ID: Say I need to parse html that I got from some external source and for some reason there are namespaces in the text: (chtml:parse "text" (stp:make-builder)) The parser will choke on somens namespace since it's not mapped to any url: 0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL] 1: (CXML-STP:RENAME-ATTRIBUTE #) 2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "") 3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T T T T)) ..) Is there a way to specify some global variable to turn off namespace processing? I saw *namespace-processing* variable in some other package but it doesn't seem to be relevant in this case. Thank you, Andrei From david at lichteblau.com Fri Mar 4 18:15:58 2011 From: david at lichteblau.com (David Lichteblau) Date: Fri, 4 Mar 2011 19:15:58 +0100 Subject: [closure-devel] How to disregard namespaces In-Reply-To: References: Message-ID: <20110304181558.GA25455@radon> Quoting Andrei Stebakov (lispercat at gmail.com): > Say I need to parse html that I got from some external source and for > some reason there are namespaces in the text: > > (chtml:parse "text" > (stp:make-builder)) > > The parser will choke on somens namespace since it's not mapped to any url: > 0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL] > 1: (CXML-STP:RENAME-ATTRIBUTE #) > 2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "") > 3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T > T T T)) ..) Indeed, something needs to be done to fix this, since chtml purports to fix bogus html without erroring out. At the moment, chtml liberally accepts these attributes for its own internal PT representation, but then accidentally turns PT attributes into HAX events (and then SAX events) without further validation. I think it might be easiest to continue allowing them in PT, but to change PT serialization to fix them before constructing hax attribute objects. Here is a simple patch that just discards the attribute (changing its name would be another option). Note that the patch isn't good enough to commit it as this point, because it introduces a dependency from chtml to cxml. --- a/src/parse/html-parser.lisp +++ b/src/parse/html-parser.lisp @@ -98,16 +98,20 @@ ;;; (merge-pathnames (or pathname (pathname input)))))) (parse-xstream xstream handler))))) +(defun good-attribute-name-p (name) + (and (cxml::valid-name-p name) + (not (or (string-equal name "xmlns") + (position #\: name))))) + (defun serialize-pt-attributes (plist recode) (loop for (name value) on plist by #'cddr - unless - ;; better don't emit as HAX what would be bogus as SAX anyway - (string-equal name "xmlns") + for n = #+rune-is-character (coerce (symbol-name name) 'rod) + #-rune-is-character (symbol-name name) + ;; don't emit as HAX what would be bogus as SAX anyway + if (good-attribute-name-p n) collect - (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod) - #-rune-is-character (symbol-name name)) - (v (etypecase value + (let ((v (etypecase value (symbol (coerce (string-downcase (symbol-name value)) 'rod)) (rod (funcall recode value)) (string (coerce value 'rod))))) > Is there a way to specify some global variable to turn off namespace > processing? > I saw *namespace-processing* variable in some other package but it > doesn't seem to be relevant in this case. You could use DOM instead of STP, I suppose. DOM doesn't do these sorts of checks IIRC. (Personally I strongly prefer STP over DOM, but one reason for that preference is that STP is stricter, which is nice when actually working with XML.) d. From lispercat at gmail.com Fri Mar 4 18:54:00 2011 From: lispercat at gmail.com (Andrei Stebakov) Date: Fri, 4 Mar 2011 13:54:00 -0500 Subject: [closure-devel] How to disregard namespaces In-Reply-To: <20110304181558.GA25455@radon> References: <20110304181558.GA25455@radon> Message-ID: Can it be resolved at (defun rename-attribute (attribute prefix uri) level? So instead of throwing ((zerop (length uri)) (stp-error "attribute with prefix but no URI")) It would check some global var like *ignore-namespaces* and just continue? On Fri, Mar 4, 2011 at 1:15 PM, David Lichteblau wrote: > Quoting Andrei Stebakov (lispercat at gmail.com): >> Say I need to parse html that I got from some external source and for >> some reason there are namespaces in the text: >> >> (chtml:parse "text" >> ?(stp:make-builder)) >> >> The parser will choke on somens namespace since it's not mapped to any url: >> ? 0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL] >> ? 1: (CXML-STP:RENAME-ATTRIBUTE #) >> ? 2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "") >> ? 3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T >> T T T)) ..) > > Indeed, something needs to be done to fix this, since chtml purports to > fix bogus html without erroring out. > > At the moment, chtml liberally accepts these attributes for its own > internal PT representation, but then accidentally turns PT attributes > into HAX events (and then SAX events) without further validation. > > I think it might be easiest to continue allowing them in PT, but to > change PT serialization to fix them before constructing hax attribute > objects. > > Here is a simple patch that just discards the attribute (changing its > name would be another option). ?Note that the patch isn't good enough to > commit it as this point, because it introduces a dependency from chtml > to cxml. > > --- a/src/parse/html-parser.lisp > +++ b/src/parse/html-parser.lisp > @@ -98,16 +98,20 @@ > ?;;; ? ? ? ? ? ? ? ?(merge-pathnames (or pathname (pathname input)))))) > ? ? ? ?(parse-xstream xstream handler))))) > > +(defun good-attribute-name-p (name) > + ?(and (cxml::valid-name-p name) > + ? ? ? (not (or (string-equal name "xmlns") > + ? ? ? ? ? ? ? (position #\: name))))) > + > ?(defun serialize-pt-attributes (plist recode) > ? (loop > ? ? ?for (name value) on plist by #'cddr > - ? ? unless > - ? ? ? ;; better don't emit as HAX what would be bogus as SAX anyway > - ? ? ? (string-equal name "xmlns") > + ? ? for n = #+rune-is-character (coerce (symbol-name name) 'rod) > + ? ? ? ? ? ?#-rune-is-character (symbol-name name) > + ? ? ;; don't emit as HAX what would be bogus as SAX anyway > + ? ? if (good-attribute-name-p n) > ? ? ?collect > - ? ? (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod) > - ? ? ? ? ? ? ?#-rune-is-character (symbol-name name)) > - ? ? ? ? ? (v (etypecase value > + ? ? (let ((v (etypecase value > ? ? ? ? ? ? ? ? (symbol (coerce (string-downcase (symbol-name value)) 'rod)) > ? ? ? ? ? ? ? ? (rod (funcall recode value)) > ? ? ? ? ? ? ? ? (string (coerce value 'rod))))) > > >> Is there a way to specify some global variable to turn off namespace >> processing? >> I saw *namespace-processing* variable in some other package but it >> doesn't seem to be relevant in this case. > > You could use DOM instead of STP, I suppose. ?DOM doesn't do these sorts > of checks IIRC. > > (Personally I strongly prefer STP over DOM, but one reason for that > preference is that STP is stricter, which is nice when actually working > with XML.) > > > d. > From lispercat at gmail.com Fri Mar 4 20:25:53 2011 From: lispercat at gmail.com (Andrei Stebakov) Date: Fri, 4 Mar 2011 15:25:53 -0500 Subject: [closure-devel] How to disable from being generated? Message-ID: I agree that style of the html code I sometimes get leaves much to be desired. In this case there is a some data " (stp:make-builder))) Will produce
.someClass { width: 26px; height:26px; } some data
How do I disable this from being generated? Thank you, Andrei From david at lichteblau.com Fri Mar 4 21:11:23 2011 From: david at lichteblau.com (David Lichteblau) Date: Fri, 4 Mar 2011 22:11:23 +0100 Subject: [closure-devel] How to disable from being generated? In-Reply-To: References: Message-ID: <20110304211123.GB25455@radon> Hi, Quoting Andrei Stebakov (lispercat at gmail.com): > I agree that style of the html code I sometimes get leaves much to be desired. > In this case there is a