[cxml-devel] XPath in CXML

Chas Emerick cemerick at snowtide.com
Sun May 13 05:56:25 UTC 2007


Greetings,

I noticed a post by David Lichteblau on c.l.l where it was mentioned  
that an XPath impl for CXML is on a todo list.

This is something that I've had a run at, although my end point  
turned out to be far from the XPath standard.  Rather than  
implementing XPath itself, I implemented a "lispy" element set  
addressing scheme.  By lispy, I mean that rather than addressing  
nodesets using strings, a declarative DSL is used.  Here's an example  
that yields the title element in an XHTML document:

(xmatch :root "html" "head" "title")

Here's one that yields all of the TD elements in an XHTML document  
that have a colspan attribute:

(xmatch :root
         (:desc "td"
           (:@ "colspan")))

And here's one that yields the second TD element in each row of a  
table, only if it has a colspan of '1':

(xmatch "tr" (:child "td"
                (:@value "colspan" "1")))

The first two examples are context-free, thanks to the :root  
directive; the third would need to be run on a table element (or list  
thereof).

If it isn't already clear, xmatch is a macro that builds a closure  
that, when provided with an element, document, or list as context,  
returns the node-set that matches the XPath-esque definition.  This  
makes it very similar in style and usage to cl-ppcre:create-scanner.

It supports a subset of XPath predicates (such  
as :@, :@value, :index, and a few others) that I have needed in my  
application so far, but is by no means complete.

I'm passing this information on, not because I have code I'm ready to  
contribute at the moment (although that could be arranged given some  
time), but because I think this approach (while not standards- 
compliant) is superior to any potential "direct" XPath implementation  
for CXML.  Perhaps CXML could grow something like xmatch; in addition  
to it being used directly, a "proper" XPath implementation could be  
built on top of an xmatch-like facility.

I don't want to belabor the point, but this approach is far more  
flexible, allows for a much richer set of predicates (and custom  
ones, at that), and doesn't confine the match definition to a flat  
string -- sexps are good here, for the same reasons why they are good  
elsewhere.  For example, I have a couple of other macros that  
generate xmatch definitions themselves; XPath strings *can* be  
generated dynamically, but that is a Dark Path (at least by my  
standards).

I hope I'm not suggesting anything patently obvious -- I'm  
functionally new to Lisp (again, this being my second tour of duty,  
after a long hiatus), so this may all be elementary to others.

Thanks for your time,

Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction

cemerick at snowtide.com
http://snowtide.com





More information about the cxml-devel mailing list