[drakma-devel] Bug handling bad html?

Jeffrey Cunningham jeffrey at cunningham.net
Sun Feb 25 00:39:54 UTC 2007


On Sat Feb 24, 2007 at 09:47:15PM +0100, Edi Weitz wrote:
> My guess is that the website sends wrong content-type headers.  (Or,
> in other words, it claims to send UTF-8 but it doesn't.)  This is not
> unusual.  See the mailing list archive of the last weeks for similar
> problems and for workarounds.
> 
> If you still think this is a bug in FLEXI-STREAMS, send a simple,
> reproducible test case and point out where in the sequence of
> characters FLEXI-STREAMS thinks it's not UTF-8 anymore although it is.


I believe you are right - incorrectly identified content-type. This
gets it to work:

(setf flexi-streams::*SUBSTITUTION-CHAR* (code-char #xA0))
(setf flexi-streams::*PROVIDE-USE-VALUE-RESTART* t)
(http-request "http://www.gifttree.com/Christmas/Christmas-gift-idea.html")


And I read about the performance hit associated with setting this up
as a default. But it seems like it raises some issues - at least for
what I'm doing, which is trying to automate updating information about
some sites I have no control over. In this case I set it to make a
substitution for the 'bad' character. Is it possible for there to be
more than one? If so, how could that be handled? 

And more generally, should there not be a way to set drakma so it may
take a performance hit but is guaranteed not to die on any html that
is thrown at it?

Thanks,

--Jeff



More information about the Drakma-devel mailing list