From lispercat at gmail.com Fri Nov 26 23:48:16 2010 From: lispercat at gmail.com (Andrei Stebakov) Date: Fri, 26 Nov 2010 18:48:16 -0500 Subject: [plexippus-xpath-devel] Beginner: trying out xpath Message-ID: Just tired to evaluate (let ((doc (chtml:parse (drakma:http-request "http://www.google.com") (stp:make-builder)))) (xpath:evaluate "//body" doc)) This results in an exception: The value NIL is not of type VECTOR. [Condition of type TYPE-ERROR] Restarts: 0: [RETRY] Retry SLIME interactive evaluation request. 1: [*ABORT] Return to SLIME's top level. 2: [TERMINATE-THREAD] Terminate this thread (#) Backtrace: 0: (CXML-STP-IMPL::NORMALIZE-TEXT-NODES! ..) 1: ((SB-PCL::FAST-METHOD XPATH-PROTOCOL:CHILD-PIPE-USING-NAVIGATOR ((EQL :DEFAULT-NAVIGATOR) CXML-STP:ELEMENT)) ..) 2: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0. SB-PCL::.ARG1.)) ..) 3: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) 4: ((LAMBDA (XPATH::N)) ..) 5: (XPATH::MAPPEND-PIPE ..) 6: (XPATH::MAPPEND-PIPE ..) 7: (XPATH::MAPPEND-PIPE ..) 8: (XPATH::MAPPEND-PIPE ..) 9: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) 10: ((LAMBDA (XPATH::N)) #.(CXML-STP-IMPL::DOCUMENT :CHILDREN '(#.(CXML-STP:ELEMENT #| :PARENT of type DOCUMENT |# :CHILDREN '# :LOCAL-NAME "html" :NAMESPACE-URI "http://www.w3.org/1999/xhtml")))) 11: (XPATH::MAPPEND-PIPE ..) 12: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) 13: ((LAMBDA (XPATH:CONTEXT)) #) 14: (XPATH:EVALUATE-COMPILED ..) When evaluating "body" (without //) just returns an empty set. Got the latest xpath, cxml-stp, closure-html. What am I missing? Parsing html, should I use (stp:do-recursively ...) instead for better results? Thank you, Andrew From ivan4th at gmail.com Fri Nov 26 23:56:43 2010 From: ivan4th at gmail.com (Ivan Shvedunov) Date: Sat, 27 Nov 2010 02:56:43 +0300 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: References: Message-ID: Hello. The error message indicates that your cxml-stp is not the most recent one. Please use the version from quicklisp (make sure it was updated since its beta release if you're already using it) or from cxml-stp git repository: git clone http://www.lichteblau.com/git/cxml-stp.git On Sat, Nov 27, 2010 at 2:48 AM, Andrei Stebakov wrote: > Just tired to evaluate > > (let ((doc (chtml:parse (drakma:http-request "http://www.google.com") > ? ? ? ? ? ? ? ? ? ? ? ?(stp:make-builder)))) > ?(xpath:evaluate "//body" doc)) > > This results in an exception: > The value NIL is not of type VECTOR. > ? [Condition of type TYPE-ERROR] > > Restarts: > ?0: [RETRY] Retry SLIME interactive evaluation request. > ?1: [*ABORT] Return to SLIME's top level. > ?2: [TERMINATE-THREAD] Terminate this thread (# RUNNING {C499B91}>) > > Backtrace: > ?0: (CXML-STP-IMPL::NORMALIZE-TEXT-NODES! ..) > ?1: ((SB-PCL::FAST-METHOD XPATH-PROTOCOL:CHILD-PIPE-USING-NAVIGATOR > ((EQL :DEFAULT-NAVIGATOR) CXML-STP:ELEMENT)) ..) > ?2: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0. > SB-PCL::.ARG1.)) ..) > ?3: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) > ?4: ((LAMBDA (XPATH::N)) ..) > ?5: (XPATH::MAPPEND-PIPE ..) > ?6: (XPATH::MAPPEND-PIPE ..) > ?7: (XPATH::MAPPEND-PIPE ..) > ?8: (XPATH::MAPPEND-PIPE ..) > ?9: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) > ?10: ((LAMBDA (XPATH::N)) #.(CXML-STP-IMPL::DOCUMENT :CHILDREN > '(#.(CXML-STP:ELEMENT #| :PARENT of type DOCUMENT |# :CHILDREN '# > :LOCAL-NAME "html" :NAMESPACE-URI "http://www.w3.org/1999/xhtml")))) > ?11: (XPATH::MAPPEND-PIPE ..) > ?12: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..) > ?13: ((LAMBDA (XPATH:CONTEXT)) #) > ?14: (XPATH:EVALUATE-COMPILED ..) > > > When evaluating "body" (without //) just returns an empty set. > > Got the latest xpath, cxml-stp, closure-html. > What am I missing? Parsing html, should I use ?(stp:do-recursively > ...) instead for better results? > > Thank you, > Andrew > > _______________________________________________ > plexippus-xpath-devel mailing list > plexippus-xpath-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/plexippus-xpath-devel > -- Ivan Shvedunov ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9? F7D0 613E C0F8 0BC5 2807 From david at lichteblau.com Fri Nov 26 23:56:20 2010 From: david at lichteblau.com (David Lichteblau) Date: Sat, 27 Nov 2010 00:56:20 +0100 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: References: Message-ID: <20101126235620.GC26825@radon> Hi, Quoting Andrei Stebakov (lispercat at gmail.com): [...] > When evaluating "body" (without //) just returns an empty set. > > Got the latest xpath, cxml-stp, closure-html. you've got the latest tarball, but unfortunately this bug has only been fixed in the development version of cxml-stp, and I've still not found the time to make new tarball releases. Please try switching from the tarball to the version in git. > What am I missing? Parsing html, should I use (stp:do-recursively > ...) instead for better results? d. From lispercat at gmail.com Sat Nov 27 03:27:52 2010 From: lispercat at gmail.com (Andrei Stebakov) Date: Fri, 26 Nov 2010 22:27:52 -0500 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: <20101126235620.GC26825@radon> References: <20101126235620.GC26825@radon> Message-ID: Now that I updated to the latest git clone (I am using quicklisp), the crash doesn't happen anymore, but all requests to "body" or "//body" return an empty node-set. Thank you, Andrew On Fri, Nov 26, 2010 at 6:56 PM, David Lichteblau wrote: > Hi, > > Quoting Andrei Stebakov (lispercat at gmail.com): > [...] >> When evaluating "body" (without //) just returns an empty set. >> >> Got the latest xpath, cxml-stp, closure-html. > > you've got the latest tarball, but unfortunately this bug has only been > fixed in the development version of cxml-stp, and I've still not found > the time to make new tarball releases. > > Please try switching from the tarball to the version in git. > >> What am I missing? Parsing html, should I use ?(stp:do-recursively >> ...) instead for better results? > > > d. > From rm at tuxteam.de Sat Nov 27 14:14:19 2010 From: rm at tuxteam.de (rm at tuxteam.de) Date: Sat, 27 Nov 2010 15:14:19 +0100 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: References: <20101126235620.GC26825@radon> Message-ID: <20101127141419.GA25673@seid-online.de> On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote: > Now that I updated to the latest git clone (I am using quicklisp), the > crash doesn't happen anymore, but all requests to "body" or "//body" > return an empty node-set. Which is correct ;-) Your XPath expression is wrong, you're looking for a 'body' element in the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml' namespace. HTH Ralf Mattes From lispercat at gmail.com Sun Nov 28 04:39:52 2010 From: lispercat at gmail.com (Andrei Stebakov) Date: Sat, 27 Nov 2010 23:39:52 -0500 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: <20101127141419.GA25673@seid-online.de> References: <20101126235620.GC26825@radon> <20101127141419.GA25673@seid-online.de> Message-ID: What method do you use to serialize the result returned by (xpath:evaluate ....) back to string? I assume (stp:serialze ...) won't work on xpath:node-set. Thank you, Andrew On Sat, Nov 27, 2010 at 9:14 AM, wrote: > On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote: >> Now that I updated to the latest git clone (I am using quicklisp), the >> crash doesn't happen anymore, but all requests to "body" or "//body" >> return an empty node-set. > > > Which is correct ?;-) > Your XPath expression is wrong, you're looking for a 'body' element in > the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml' > namespace. > > ?HTH ?Ralf Mattes > > From lispercat at gmail.com Sun Nov 28 04:49:13 2010 From: lispercat at gmail.com (Andrei Stebakov) Date: Sat, 27 Nov 2010 23:49:13 -0500 Subject: [plexippus-xpath-devel] Beginner: trying out xpath In-Reply-To: References: <20101126235620.GC26825@radon> <20101127141419.GA25673@seid-online.de> Message-ID: I just found that it's possible to (stp:serialize ...) using (xpath:first-node ...), or just iterating over the nodes. Thank you, Andrew On Sat, Nov 27, 2010 at 11:39 PM, Andrei Stebakov wrote: > What method do you use to serialize the result returned by > (xpath:evaluate ....) back to string? > I assume (stp:serialze ...) won't work on xpath:node-set. > > Thank you, > Andrew > > On Sat, Nov 27, 2010 at 9:14 AM, ? wrote: >> On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote: >>> Now that I updated to the latest git clone (I am using quicklisp), the >>> crash doesn't happen anymore, but all requests to "body" or "//body" >>> return an empty node-set. >> >> >> Which is correct ?;-) >> Your XPath expression is wrong, you're looking for a 'body' element in >> the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml' >> namespace. >> >> ?HTH ?Ralf Mattes >> >> > From lispercat at gmail.com Sun Nov 28 19:14:06 2010 From: lispercat at gmail.com (Andrei Stebakov) Date: Sun, 28 Nov 2010 14:14:06 -0500 Subject: [plexippus-xpath-devel] parsed input != serialized output? Message-ID: Hello I wonder if parse/serialize should arrive at the same string given to the parser? Let's say (let ((sink (cxml:make-string-sink))) (stp:serialize (chtml:parse "

some text

" (stp:make-builder)) sink) (sax:end-document sink)) I would expect the result to be "

some text

", but instead it's "

some text
" (with some headers). Why would it rearrange the

tag in this manner? What other kinds of re-arrangement to expect? Thank you, Andrew From david at lichteblau.com Sun Nov 28 19:46:11 2010 From: david at lichteblau.com (David Lichteblau) Date: Sun, 28 Nov 2010 20:46:11 +0100 Subject: [plexippus-xpath-devel] parsed input != serialized output? In-Reply-To: References: Message-ID: <20101128194611.GA18633@radon> Quoting Andrei Stebakov (lispercat at gmail.com): > Hello > > I wonder if parse/serialize should arrive at the same string given to > the parser? > Let's say > > (let ((sink (cxml:make-string-sink))) > (stp:serialize (chtml:parse "

some text

" > (stp:make-builder)) sink) > (sax:end-document sink)) > > I would expect the result to be "

some text

", but > instead it's "

some text
" (with some > headers). For XML (!), the content model should stay the same -- and even that cannot be said on a character-by-character basis. An XML declaration doesn't affect the content model and can therefore change. (You can suppress it explicitly with a keyword argument.) Also note that you're parsing HTML and writing XML. Perhaps you would prefer to write HTML again, i.e. make a sink using chtml:make-xyz instead of cxml:make-xyz? > Why would it rearrange the

tag in this manner? What other kinds of > re-arrangement to expect? This question is very specific to Closure HTML. For comparison, XML parsers certainly wouldn't do this sort of re-arrangement. However, Closure HTML tries to follow the HTML DTD. div (a block element) in p (itself a block element) isn't permitted in HTML (only inline content is), so the parser does what browsers would also do, and tries to "repair" the HTML to bring it closer to the DTD. (Closure HTML was written to do this because it was actually part of a web browser, namely Closure.) Whether users of a general-purpose parser expect this step is certainly a different question. Unfortunately I don't have a ready-to-use patch to change this behaviour. A special purpose change for this particular test case is to tweak the DTD as follows. Changing the DTD works, because much behaviour of the parser is actually not programmed in Lisp, but DTD-driven. (Note that the DTD is re-parsed only when the fasl loads, e.g. after a restart of the Lisp.) diff --git a/resources/dtd/DTD-HTML-4.0-Transitional b/resources/dtd/DTD-HTML-4.0-Transitional index 82f0a74..f7f6c91 100644 --- a/resources/dtd/DTD-HTML-4.0-Transitional +++ b/resources/dtd/DTD-HTML-4.0-Transitional @@ -526,7 +526,7 @@ - + References: Message-ID: <20101128200249.GB24160@seid-online.de> On Sun, Nov 28, 2010 at 02:14:06PM -0500, Andrei Stebakov wrote: > Hello > > I wonder if parse/serialize should arrive at the same string given to > the parser? But that's impossible _unless_ you restrict yourself to canonic xml. > Let's say > > (let ((sink (cxml:make-string-sink))) > (stp:serialize (chtml:parse "

some text

" > (stp:make-builder)) sink) > (sax:end-document sink)) > > I would expect the result to be "

some text

", but > instead it's "

some text
" (with some > headers). > Why would it rearrange the

tag in this manner? What other kinds of > re-arrangement to expect? But it doesn't! You parse _html_ where '

' -> '

' ... Parse your string as xml and you get what you want: (let ((sink (cxml:make-string-sink))) (stp:serialize (cxml:parse "

some text

" (stp:make-builder)) sink) (sax:end-document sink)) "

some text

" HTH Ralf Mattes > > Thank you, > Andrew > > _______________________________________________ > plexippus-xpath-devel mailing list > plexippus-xpath-devel at common-lisp.net > http://common-lisp.net/cgi-bin/mailman/listinfo/plexippus-xpath-devel