[cxml-devel] Problems parsing HTML with embeded JS which itself embeds HTML

David Lichteblau david at lichteblau.com
Wed Dec 23 13:58:47 UTC 2009


Quoting Plamen . (plamen.usenet at gmail.com):
> http://yellow.local.ch/de/q?ext=1&name=&company=Berufsschule,+Fachschule&street=&city=&area=Bern+%28Kanton%29&phone=&suchen=Suchen#start=1
> 
> from which I need to extract some of the address/street/phone data. It
> seems, that all HTML/XML parsers for CL can't parse it correctly and
> most of the missing parts in the parsed representation are the ones
> which deal with the HTML-source which defines a Javascript element
> which itself includes HTML as a string parameter in the embedded JS.
> Which is of course exactly the text which I need from the site :) Of

Please reduce the source code of the page to a self-contained example
and point out exactly which part of the document it is that gets
discarded.


d.




More information about the cxml-devel mailing list