[cxml-devel] XPath in CXML
Chas Emerick
cemerick at snowtide.com
Sun May 13 05:56:25 UTC 2007
Greetings,
I noticed a post by David Lichteblau on c.l.l where it was mentioned
that an XPath impl for CXML is on a todo list.
This is something that I've had a run at, although my end point
turned out to be far from the XPath standard. Rather than
implementing XPath itself, I implemented a "lispy" element set
addressing scheme. By lispy, I mean that rather than addressing
nodesets using strings, a declarative DSL is used. Here's an example
that yields the title element in an XHTML document:
(xmatch :root "html" "head" "title")
Here's one that yields all of the TD elements in an XHTML document
that have a colspan attribute:
(xmatch :root
(:desc "td"
(:@ "colspan")))
And here's one that yields the second TD element in each row of a
table, only if it has a colspan of '1':
(xmatch "tr" (:child "td"
(:@value "colspan" "1")))
The first two examples are context-free, thanks to the :root
directive; the third would need to be run on a table element (or list
thereof).
If it isn't already clear, xmatch is a macro that builds a closure
that, when provided with an element, document, or list as context,
returns the node-set that matches the XPath-esque definition. This
makes it very similar in style and usage to cl-ppcre:create-scanner.
It supports a subset of XPath predicates (such
as :@, :@value, :index, and a few others) that I have needed in my
application so far, but is by no means complete.
I'm passing this information on, not because I have code I'm ready to
contribute at the moment (although that could be arranged given some
time), but because I think this approach (while not standards-
compliant) is superior to any potential "direct" XPath implementation
for CXML. Perhaps CXML could grow something like xmatch; in addition
to it being used directly, a "proper" XPath implementation could be
built on top of an xmatch-like facility.
I don't want to belabor the point, but this approach is far more
flexible, allows for a much richer set of predicates (and custom
ones, at that), and doesn't confine the match definition to a flat
string -- sexps are good here, for the same reasons why they are good
elsewhere. For example, I have a couple of other macros that
generate xmatch definitions themselves; XPath strings *can* be
generated dynamically, but that is a Dark Path (at least by my
standards).
I hope I'm not suggesting anything patently obvious -- I'm
functionally new to Lisp (again, this being my second tour of duty,
after a long hiatus), so this may all be elementary to others.
Thanks for your time,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick at snowtide.com
http://snowtide.com
More information about the cxml-devel
mailing list