From lispercat at gmail.com Fri Nov 26 23:48:16 2010
From: lispercat at gmail.com (Andrei Stebakov)
Date: Fri, 26 Nov 2010 18:48:16 -0500
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
Message-ID:
Just tired to evaluate
(let ((doc (chtml:parse (drakma:http-request "http://www.google.com")
(stp:make-builder))))
(xpath:evaluate "//body" doc))
This results in an exception:
The value NIL is not of type VECTOR.
[Condition of type TYPE-ERROR]
Restarts:
0: [RETRY] Retry SLIME interactive evaluation request.
1: [*ABORT] Return to SLIME's top level.
2: [TERMINATE-THREAD] Terminate this thread (#)
Backtrace:
0: (CXML-STP-IMPL::NORMALIZE-TEXT-NODES! ..)
1: ((SB-PCL::FAST-METHOD XPATH-PROTOCOL:CHILD-PIPE-USING-NAVIGATOR
((EQL :DEFAULT-NAVIGATOR) CXML-STP:ELEMENT)) ..)
2: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0.
SB-PCL::.ARG1.)) ..)
3: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
4: ((LAMBDA (XPATH::N)) ..)
5: (XPATH::MAPPEND-PIPE ..)
6: (XPATH::MAPPEND-PIPE ..)
7: (XPATH::MAPPEND-PIPE ..)
8: (XPATH::MAPPEND-PIPE ..)
9: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
10: ((LAMBDA (XPATH::N)) #.(CXML-STP-IMPL::DOCUMENT :CHILDREN
'(#.(CXML-STP:ELEMENT #| :PARENT of type DOCUMENT |# :CHILDREN '#
:LOCAL-NAME "html" :NAMESPACE-URI "http://www.w3.org/1999/xhtml"))))
11: (XPATH::MAPPEND-PIPE ..)
12: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
13: ((LAMBDA (XPATH:CONTEXT)) #)
14: (XPATH:EVALUATE-COMPILED ..)
When evaluating "body" (without //) just returns an empty set.
Got the latest xpath, cxml-stp, closure-html.
What am I missing? Parsing html, should I use (stp:do-recursively
...) instead for better results?
Thank you,
Andrew
From ivan4th at gmail.com Fri Nov 26 23:56:43 2010
From: ivan4th at gmail.com (Ivan Shvedunov)
Date: Sat, 27 Nov 2010 02:56:43 +0300
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To:
References:
Message-ID:
Hello.
The error message indicates that your cxml-stp is not the most recent one.
Please use the version from quicklisp (make sure it was updated since its
beta release if you're already using it) or from cxml-stp git repository:
git clone http://www.lichteblau.com/git/cxml-stp.git
On Sat, Nov 27, 2010 at 2:48 AM, Andrei Stebakov wrote:
> Just tired to evaluate
>
> (let ((doc (chtml:parse (drakma:http-request "http://www.google.com")
> ? ? ? ? ? ? ? ? ? ? ? ?(stp:make-builder))))
> ?(xpath:evaluate "//body" doc))
>
> This results in an exception:
> The value NIL is not of type VECTOR.
> ? [Condition of type TYPE-ERROR]
>
> Restarts:
> ?0: [RETRY] Retry SLIME interactive evaluation request.
> ?1: [*ABORT] Return to SLIME's top level.
> ?2: [TERMINATE-THREAD] Terminate this thread (# RUNNING {C499B91}>)
>
> Backtrace:
> ?0: (CXML-STP-IMPL::NORMALIZE-TEXT-NODES! ..)
> ?1: ((SB-PCL::FAST-METHOD XPATH-PROTOCOL:CHILD-PIPE-USING-NAVIGATOR
> ((EQL :DEFAULT-NAVIGATOR) CXML-STP:ELEMENT)) ..)
> ?2: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0.
> SB-PCL::.ARG1.)) ..)
> ?3: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
> ?4: ((LAMBDA (XPATH::N)) ..)
> ?5: (XPATH::MAPPEND-PIPE ..)
> ?6: (XPATH::MAPPEND-PIPE ..)
> ?7: (XPATH::MAPPEND-PIPE ..)
> ?8: (XPATH::MAPPEND-PIPE ..)
> ?9: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
> ?10: ((LAMBDA (XPATH::N)) #.(CXML-STP-IMPL::DOCUMENT :CHILDREN
> '(#.(CXML-STP:ELEMENT #| :PARENT of type DOCUMENT |# :CHILDREN '#
> :LOCAL-NAME "html" :NAMESPACE-URI "http://www.w3.org/1999/xhtml"))))
> ?11: (XPATH::MAPPEND-PIPE ..)
> ?12: ((LAMBDA (XPATH::NODE XPATH::STARTING-NODE)) ..)
> ?13: ((LAMBDA (XPATH:CONTEXT)) #)
> ?14: (XPATH:EVALUATE-COMPILED ..)
>
>
> When evaluating "body" (without //) just returns an empty set.
>
> Got the latest xpath, cxml-stp, closure-html.
> What am I missing? Parsing html, should I use ?(stp:do-recursively
> ...) instead for better results?
>
> Thank you,
> Andrew
>
> _______________________________________________
> plexippus-xpath-devel mailing list
> plexippus-xpath-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/plexippus-xpath-devel
>
--
Ivan Shvedunov
;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9? F7D0 613E C0F8 0BC5 2807
From david at lichteblau.com Fri Nov 26 23:56:20 2010
From: david at lichteblau.com (David Lichteblau)
Date: Sat, 27 Nov 2010 00:56:20 +0100
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To:
References:
Message-ID: <20101126235620.GC26825@radon>
Hi,
Quoting Andrei Stebakov (lispercat at gmail.com):
[...]
> When evaluating "body" (without //) just returns an empty set.
>
> Got the latest xpath, cxml-stp, closure-html.
you've got the latest tarball, but unfortunately this bug has only been
fixed in the development version of cxml-stp, and I've still not found
the time to make new tarball releases.
Please try switching from the tarball to the version in git.
> What am I missing? Parsing html, should I use (stp:do-recursively
> ...) instead for better results?
d.
From lispercat at gmail.com Sat Nov 27 03:27:52 2010
From: lispercat at gmail.com (Andrei Stebakov)
Date: Fri, 26 Nov 2010 22:27:52 -0500
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To: <20101126235620.GC26825@radon>
References:
<20101126235620.GC26825@radon>
Message-ID:
Now that I updated to the latest git clone (I am using quicklisp), the
crash doesn't happen anymore, but all requests to "body" or "//body"
return an empty node-set.
Thank you,
Andrew
On Fri, Nov 26, 2010 at 6:56 PM, David Lichteblau wrote:
> Hi,
>
> Quoting Andrei Stebakov (lispercat at gmail.com):
> [...]
>> When evaluating "body" (without //) just returns an empty set.
>>
>> Got the latest xpath, cxml-stp, closure-html.
>
> you've got the latest tarball, but unfortunately this bug has only been
> fixed in the development version of cxml-stp, and I've still not found
> the time to make new tarball releases.
>
> Please try switching from the tarball to the version in git.
>
>> What am I missing? Parsing html, should I use ?(stp:do-recursively
>> ...) instead for better results?
>
>
> d.
>
From rm at tuxteam.de Sat Nov 27 14:14:19 2010
From: rm at tuxteam.de (rm at tuxteam.de)
Date: Sat, 27 Nov 2010 15:14:19 +0100
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To:
References:
<20101126235620.GC26825@radon>
Message-ID: <20101127141419.GA25673@seid-online.de>
On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote:
> Now that I updated to the latest git clone (I am using quicklisp), the
> crash doesn't happen anymore, but all requests to "body" or "//body"
> return an empty node-set.
Which is correct ;-)
Your XPath expression is wrong, you're looking for a 'body' element in
the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml'
namespace.
HTH Ralf Mattes
From lispercat at gmail.com Sun Nov 28 04:39:52 2010
From: lispercat at gmail.com (Andrei Stebakov)
Date: Sat, 27 Nov 2010 23:39:52 -0500
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To: <20101127141419.GA25673@seid-online.de>
References:
<20101126235620.GC26825@radon>
<20101127141419.GA25673@seid-online.de>
Message-ID:
What method do you use to serialize the result returned by
(xpath:evaluate ....) back to string?
I assume (stp:serialze ...) won't work on xpath:node-set.
Thank you,
Andrew
On Sat, Nov 27, 2010 at 9:14 AM, wrote:
> On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote:
>> Now that I updated to the latest git clone (I am using quicklisp), the
>> crash doesn't happen anymore, but all requests to "body" or "//body"
>> return an empty node-set.
>
>
> Which is correct ?;-)
> Your XPath expression is wrong, you're looking for a 'body' element in
> the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml'
> namespace.
>
> ?HTH ?Ralf Mattes
>
>
From lispercat at gmail.com Sun Nov 28 04:49:13 2010
From: lispercat at gmail.com (Andrei Stebakov)
Date: Sat, 27 Nov 2010 23:49:13 -0500
Subject: [plexippus-xpath-devel] Beginner: trying out xpath
In-Reply-To:
References:
<20101126235620.GC26825@radon>
<20101127141419.GA25673@seid-online.de>
Message-ID:
I just found that it's possible to (stp:serialize ...) using
(xpath:first-node ...), or just iterating over the nodes.
Thank you,
Andrew
On Sat, Nov 27, 2010 at 11:39 PM, Andrei Stebakov wrote:
> What method do you use to serialize the result returned by
> (xpath:evaluate ....) back to string?
> I assume (stp:serialze ...) won't work on xpath:node-set.
>
> Thank you,
> Andrew
>
> On Sat, Nov 27, 2010 at 9:14 AM, ? wrote:
>> On Fri, Nov 26, 2010 at 10:27:52PM -0500, Andrei Stebakov wrote:
>>> Now that I updated to the latest git clone (I am using quicklisp), the
>>> crash doesn't happen anymore, but all requests to "body" or "//body"
>>> return an empty node-set.
>>
>>
>> Which is correct ?;-)
>> Your XPath expression is wrong, you're looking for a 'body' element in
>> the null-namespace but you html most likely is in the 'http://www.w3.org/1999/xhtml'
>> namespace.
>>
>> ?HTH ?Ralf Mattes
>>
>>
>
From lispercat at gmail.com Sun Nov 28 19:14:06 2010
From: lispercat at gmail.com (Andrei Stebakov)
Date: Sun, 28 Nov 2010 14:14:06 -0500
Subject: [plexippus-xpath-devel] parsed input != serialized output?
Message-ID:
Hello
I wonder if parse/serialize should arrive at the same string given to
the parser?
Let's say
(let ((sink (cxml:make-string-sink)))
(stp:serialize (chtml:parse "some text
"
(stp:make-builder)) sink)
(sax:end-document sink))
I would expect the result to be "some text
", but
instead it's "some text
" (with some
headers).
Why would it rearrange the tag in this manner? What other kinds of
re-arrangement to expect?
Thank you,
Andrew
From david at lichteblau.com Sun Nov 28 19:46:11 2010
From: david at lichteblau.com (David Lichteblau)
Date: Sun, 28 Nov 2010 20:46:11 +0100
Subject: [plexippus-xpath-devel] parsed input != serialized output?
In-Reply-To:
References:
Message-ID: <20101128194611.GA18633@radon>
Quoting Andrei Stebakov (lispercat at gmail.com):
> Hello
>
> I wonder if parse/serialize should arrive at the same string given to
> the parser?
> Let's say
>
> (let ((sink (cxml:make-string-sink)))
> (stp:serialize (chtml:parse "some text
"
> (stp:make-builder)) sink)
> (sax:end-document sink))
>
> I would expect the result to be "some text
", but
> instead it's "some text
" (with some
> headers).
For XML (!), the content model should stay the same -- and even that
cannot be said on a character-by-character basis. An XML declaration
doesn't affect the content model and can therefore change. (You can
suppress it explicitly with a keyword argument.)
Also note that you're parsing HTML and writing XML. Perhaps you would
prefer to write HTML again, i.e. make a sink using chtml:make-xyz
instead of cxml:make-xyz?
> Why would it rearrange the tag in this manner? What other kinds of
> re-arrangement to expect?
This question is very specific to Closure HTML. For comparison, XML
parsers certainly wouldn't do this sort of re-arrangement.
However, Closure HTML tries to follow the HTML DTD. div (a block
element) in p (itself a block element) isn't permitted in HTML (only
inline content is), so the parser does what browsers would also do, and
tries to "repair" the HTML to bring it closer to the DTD. (Closure HTML
was written to do this because it was actually part of a web browser,
namely Closure.)
Whether users of a general-purpose parser expect this step is certainly
a different question. Unfortunately I don't have a ready-to-use patch
to change this behaviour.
A special purpose change for this particular test case is to tweak the
DTD as follows. Changing the DTD works, because much behaviour of the
parser is actually not programmed in Lisp, but DTD-driven. (Note that
the DTD is re-parsed only when the fasl loads, e.g. after a restart of
the Lisp.)
diff --git a/resources/dtd/DTD-HTML-4.0-Transitional b/resources/dtd/DTD-HTML-4.0-Transitional
index 82f0a74..f7f6c91 100644
--- a/resources/dtd/DTD-HTML-4.0-Transitional
+++ b/resources/dtd/DTD-HTML-4.0-Transitional
@@ -526,7 +526,7 @@
-
+
References:
Message-ID: <20101128200249.GB24160@seid-online.de>
On Sun, Nov 28, 2010 at 02:14:06PM -0500, Andrei Stebakov wrote:
> Hello
>
> I wonder if parse/serialize should arrive at the same string given to
> the parser?
But that's impossible _unless_ you restrict yourself to canonic xml.
> Let's say
>
> (let ((sink (cxml:make-string-sink)))
> (stp:serialize (chtml:parse "some text
"
> (stp:make-builder)) sink)
> (sax:end-document sink))
>
> I would expect the result to be "some text
", but
> instead it's "some text
" (with some
> headers).
> Why would it rearrange the tag in this manner? What other kinds of
> re-arrangement to expect?
But it doesn't! You parse _html_ where '
' -> '
' ...
Parse your string as xml and you get what you want:
(let ((sink (cxml:make-string-sink)))
(stp:serialize (cxml:parse "some text
"
(stp:make-builder)) sink)
(sax:end-document sink))
"
some text
"
HTH Ralf Mattes
>
> Thank you,
> Andrew
>
> _______________________________________________
> plexippus-xpath-devel mailing list
> plexippus-xpath-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/plexippus-xpath-devel