From bobbie at ua.fm Sun Mar 10 18:13:41 2013 From: bobbie at ua.fm (Victor) Date: Sun, 10 Mar 2013 20:13:41 +0200 Subject: [closure-devel] Problem parsing HTML from BBC Message-ID: Hello! When trying to parse HTML from articles on BBC, like: http://www.bbc.co.uk/news/world-africa-21734036 I get the following error: prefix/URI mismatch for `xml' namespace Backtrace is: (6FA0924) : 0 (PRINT-CALL-HISTORY :CONTEXT NIL :PROCESS NIL :ORIGIN NIL :DETAILED-P NIL :COUNT 536870911 :START-FRAME-NUMBER 0 :STREAM # :PRINT-LEVEL 2 :PRINT-LENGTH 5 :SHOW-INTERNAL-FRAMES NIL :FORMAT :TRADITIONAL) 727 (6FA09D8) : 1 (PRINT-BACKTRACE-TO-STREAM #) 71 (6FA09F0) : 2 (GET-BACKTRACE) 311 (6FA0A24) : 3 (FUNCALL #'#<(:INTERNAL (HUNCHENTOOT:HANDLE-REQUEST (HUNCHENTOOT:ACCEPTOR HUNCHENTOOT:REQUEST)))> #) 95 (6FA0A3C) : 4 (SIGNAL #) 871 (6FA0A64) : 5 (%ERROR # (:FORMAT-CONTROL "prefix/URI mismatch for `xml' namespace" :FORMAT-ARGUMENTS NIL) 29262494) 111 (6FA0A78) : 6 (STP-ERROR "prefix/URI mismatch for `xml' namespace") 103 (6FA0A88) : 7 (RENAME-ATTRIBUTE # "xml" "") 287 (6FA0A9C) : 8 (MAKE-ATTRIBUTE "en-GB" "xml:lang" "") 303 (6FA0AC0) : 9 (FUNCALL #'#<#> # "http://www.w3.org/1999/xhtml" "div" "div" (# #)) 279 (6FA0AEC) : 10 (FUNCALL #'#<(:INTERNAL CLOSURE-HTML::RECURSE CLOSURE-HTML:SERIALIZE-PT)> #) 455 (6FA0B10) : 11 (FUNCALL #'#<(:INTERNAL CLOSURE-HTML::RECURSE CLOSURE-HTML:SERIALIZE-PT)> #) 559 (6FA0B40) : 12 (FUNCALL #'#<(:INTERNAL CLOSURE-HTML::RECURSE CLOSURE-HTML:SERIALIZE-PT)> #) 559 (6FA0B70) : 13 (FUNCALL #'#<(:INTERNAL CLOSURE-HTML::RECURSE CLOSURE-HTML:SERIALIZE-PT)> #) 559 (6FA0BA0) : 14 (SERIALIZE-PT # # :NAME "HTML" :PUBLIC-ID NIL :SYSTEM-ID NIL :DOCUMENTP T) 343 (6FA0BD8) : 15 (PARSE-XSTREAM # #) 263 The parse call in my program is: (closure-html:parse article-page (stp:make-builder)) As far as I understand the problem has something to do with the xml prefix in attributes (for language, xml:lang) but I can not understand how to fix it or to work around. Could please anybody give a hint where to look for the problem and its solution. Thanks, Victor -- ??????? ----------------------------------------------------------- ?????? ?????? ? ?????! ???? ????, ??? ? ????????! http://moda.aukro.ua/?utm_source=i.ua&utm_medium=advert&utm_campaign=m From bobbie at ua.fm Wed Mar 13 13:04:01 2013 From: bobbie at ua.fm (Victor) Date: Wed, 13 Mar 2013 15:04:01 +0200 Subject: [closure-devel] Problem parsing HTML from BBC In-Reply-To: References: Message-ID: On Sun, 10 Mar 2013 20:13:41 +0200, Victor wrote: > When trying to parse HTML from articles on BBC, like: > > http://www.bbc.co.uk/news/world-africa-21734036 > > I get the following error: > > prefix/URI mismatch for `xml' namespace A brief search in the archives produced a previous discussion on a similar topic: http://lists.common-lisp.net/pipermail/closure-devel/2011-March/000108.html As far as I understand a proper and nice solution is still in the works. Thanks, Victor