[closure-devel] forever stirring the tag soup
Ben Hyde
bhyde at pobox.com
Mon Jul 15 20:27:21 UTC 2013
Parsing a github project page takes forever.
> (setf sgml::*parse-warn-level* 5)
> (let ((page (drakma:http-request "https://github.com/rss-sync/corpus"))) (handler-case (bt:with-timeout (10) (chtml:parse page (make-instance 'hax:default-handler))) (condition (c) c)))
#<bordeaux-threads:timeout #x302009C80C6D>
Multiple div's appearing in a THead element are the root cause.
> (setf sgml::*parse-warn-level* 0)
0
> (chtml:parse "<table><thead><div>a</div><div>b</div></thead></table>" (make-instance 'hax:default-handler))
;; Parser warning: Line 1, column 26 : **** [-] Saw <div> in thead -- nuked <div>.
;; Parser warning: Line 1, column 31 : **** [H] Saw </div> in thead -- ??? patched (</div> <div>) -> (<div> </div>)
;; Parser warning: Line 1, column 31 : **** [-] Saw <div> in thead -- nuked <div>.
;; Parser warning: Line 1, column 32 : **** [H] Saw </div> in thead -- ??? patched (</div> <pcdata>) -> (<pcdata> </div>)
;; Parser warning: Line 1, column 32 : **** [-] Saw <pcdata> in thead -- nuked <pcdata>.
;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
…
So far, I'm not clever enough to fix this.
- ben
ps. Thanks for the awesome library.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/closure-devel/attachments/20130715/0f860255/attachment.html>
More information about the closure-devel
mailing list