[drakma-devel] drakma vs. http://popurls.com

Edi Weitz edi at agharta.de
Tue Jan 30 07:52:45 UTC 2007


On Mon, 29 Jan 2007 18:20:17 -0800, Chris Dean <ctdean at sokitomi.com> wrote:

> The problem is that I regularly download web pages and many of them
> are poorly formed.  I'd like my software to be permissive and return
> something reasonable.

Sure, I agree.

> Drakma is nicely designed and I'd like to keep using it.  If I were
> to add this "feature" of less-strict UTF-8 where should I do that?
>
> I could modify (define-char-reader (stream flexi-utf-8-input-stream)
> ...)  in some clever way I suppose.

My hope is that FLEXI-STREAMS is already "flexible" enough to deal
with this:

  CL-USER 22 > (drakma:http-request "http://zappa.agharta.de/test.html")

  Error: Unexpected value #xF6 in UTF-8 sequence.
    1 (abort) Return to level 0.
    2 Return to top loop level 0.

  Type :b for backtrace, :c <option number> to proceed,  or :? for other options

  CL-USER 23 : 1 > :a

  CL-USER 24 > (defun use-replacement-char (condition)
                 (declare (ignore condition))
                 (use-value #.(code-char 65533)))
  USE-REPLACEMENT-CHAR

  CL-USER 25 > (let ((flex:*provide-use-value-restart* t))
                 (handler-bind ((flex:flexi-stream-encoding-error #'use-replacement-char))
                   (drakma:http-request "http://zappa.agharta.de/test.html")))
  "<html>
    <body>
      This is not really UTF-8: ��
    </body>
  </html>
  "
  200
  ((:DATE . "Tue, 30 Jan 2007 07:47:59 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8"))
  #<URI http://zappa.agharta.de/test.html>
  #<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 226B80FB>
  T

  CL-USER 26 > (let ((flex:*provide-use-value-restart* t)
                     (flex:*substitution-char* #\?))
                 (drakma:http-request "http://zappa.agharta.de/test.html"))
  "<html>
    <body>
      This is not really UTF-8: ??
    </body>
  </html>
  "
  200
  ((:DATE . "Tue, 30 Jan 2007 07:50:30 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8"))
  #<URI http://zappa.agharta.de/test.html>
  #<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 2263F957>
  T

http://weitz.de/flexi-streams/#*provide-use-value-restart*
http://weitz.de/flexi-streams/#*substitution-char*

Does that help?

Cheers,
Edi.



More information about the Drakma-devel mailing list