[closure-devel] forever stirring the tag soup

Ben Hyde bhyde at pobox.com
Mon Jul 15 20:27:21 UTC 2013


Parsing a github project page takes forever.

> (setf sgml::*parse-warn-level* 5) 
> (let ((page (drakma:http-request "https://github.com/rss-sync/corpus"))) (handler-case (bt:with-timeout (10) (chtml:parse page (make-instance 'hax:default-handler))) (condition (c) c)))
#<bordeaux-threads:timeout #x302009C80C6D>

Multiple div's appearing in a THead element are the root cause.

> (setf sgml::*parse-warn-level* 0)
0
> (chtml:parse "<table><thead><div>a</div><div>b</div></thead></table>" (make-instance 'hax:default-handler))
;; Parser warning: Line 1,     column 26  : ****  [-] Saw <div> in thead -- nuked <div>.
;; Parser warning: Line 1,     column 31  : ****  [H] Saw </div> in thead -- ??? patched (</div> <div>) -> (<div> </div>)
;; Parser warning: Line 1,     column 31  : ****  [-] Saw <div> in thead -- nuked <div>.
;; Parser warning: Line 1,     column 32  : ****  [H] Saw </div> in thead -- ??? patched (</div> <pcdata>) -> (<pcdata> </div>)
;; Parser warning: Line 1,     column 32  : ****  [-] Saw <pcdata> in thead -- nuked <pcdata>.
;; Parser warning: Line 1,     column 38  : ****  [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
;; Parser warning: Line 1,     column 38  : ****  [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
;; Parser warning: Line 1,     column 38  : ****  [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>)
…

So far, I'm not clever enough to fix this.

 - ben

ps. Thanks for the awesome library.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/closure-devel/attachments/20130715/0f860255/attachment.html>


More information about the closure-devel mailing list