[drakma-devel] drakma vs. http://popurls.com
Edi Weitz
edi at agharta.de
Tue Jan 30 07:52:45 UTC 2007
On Mon, 29 Jan 2007 18:20:17 -0800, Chris Dean <ctdean at sokitomi.com> wrote:
> The problem is that I regularly download web pages and many of them
> are poorly formed. I'd like my software to be permissive and return
> something reasonable.
Sure, I agree.
> Drakma is nicely designed and I'd like to keep using it. If I were
> to add this "feature" of less-strict UTF-8 where should I do that?
>
> I could modify (define-char-reader (stream flexi-utf-8-input-stream)
> ...) in some clever way I suppose.
My hope is that FLEXI-STREAMS is already "flexible" enough to deal
with this:
CL-USER 22 > (drakma:http-request "http://zappa.agharta.de/test.html")
Error: Unexpected value #xF6 in UTF-8 sequence.
1 (abort) Return to level 0.
2 Return to top loop level 0.
Type :b for backtrace, :c <option number> to proceed, or :? for other options
CL-USER 23 : 1 > :a
CL-USER 24 > (defun use-replacement-char (condition)
(declare (ignore condition))
(use-value #.(code-char 65533)))
USE-REPLACEMENT-CHAR
CL-USER 25 > (let ((flex:*provide-use-value-restart* t))
(handler-bind ((flex:flexi-stream-encoding-error #'use-replacement-char))
(drakma:http-request "http://zappa.agharta.de/test.html")))
"<html>
<body>
This is not really UTF-8: ��
</body>
</html>
"
200
((:DATE . "Tue, 30 Jan 2007 07:47:59 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8"))
#<URI http://zappa.agharta.de/test.html>
#<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 226B80FB>
T
CL-USER 26 > (let ((flex:*provide-use-value-restart* t)
(flex:*substitution-char* #\?))
(drakma:http-request "http://zappa.agharta.de/test.html"))
"<html>
<body>
This is not really UTF-8: ??
</body>
</html>
"
200
((:DATE . "Tue, 30 Jan 2007 07:50:30 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8"))
#<URI http://zappa.agharta.de/test.html>
#<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 2263F957>
T
http://weitz.de/flexi-streams/#*provide-use-value-restart*
http://weitz.de/flexi-streams/#*substitution-char*
Does that help?
Cheers,
Edi.
More information about the Drakma-devel
mailing list