[drakma-devel] charset errors question

Jeff Cunningham jeffrey at jkcunningham.com
Mon Sep 24 17:01:47 UTC 2012


I've been running into some trouble using drakma to retrieve pages from 
certain commercial websites. It is very likely the HTML they are 
generating is broken one way or another. But the problem still remains 
as to how one can retrieve their pages using drakma.

For example, if you try this simple case:

    (http-request "http://www.walmart.com")

It will display the following:

WARNING: Problems determining charset (falling back to binary):
Corrupted Content-Type header:
Read character #\;, but expected #\=.

And the returned body is binary-encoded ascii. This can be converted to 
real ascii, of course, but it is inconvenient to say the least.

Often the problem is that their metatag for the charset is simply wrong. 
Sometimes I can figure out what it is and supply this information, like 
this:

    (http-request "http://www.walmart.com" :external-format-in :UTF-8)

and it will solve he problem. But this particular example does not lend 
itself to this, at least using the following charsets:

  :UTF-8
  :UTF-7
  :iso-8859-1
  :iso-8859-2
  :iso-8859-3
  :iso-8859-4
  :iso-8859-5
  :iso-8859-6
  :iso-8859-7
  :iso-8859-8
  :iso-8859-9
  :BIG5
  :US-ASCII
  :UTF-16
  :UTF-32

I have no idea what their server is actually sending - it appears to be 
invalid for any of these charsets.

Is there any way to get around this problem?

Best regards,
Jeff Cunningham
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/drakma-devel/attachments/20120924/dffab34e/attachment.html>


More information about the Drakma-devel mailing list