<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
I've been running into some trouble using drakma to retrieve pages
from certain commercial websites. It is very likely the HTML they
are generating is broken one way or another. But the problem still
remains as to how one can retrieve their pages using drakma. <br>
<br>
For example, if you try this simple case:<br>
<blockquote>(http-request <a class="moz-txt-link-rfc2396E" href="http://www.walmart.com">"http://www.walmart.com"</a>)<br>
</blockquote>
It will display the following: <br>
<br>
WARNING: Problems determining charset (falling back to binary):<br>
Corrupted Content-Type header:<br>
Read character #\;, but expected #\=.<br>
<br>
And the returned body is binary-encoded ascii. This can be converted
to real ascii, of course, but it is inconvenient to say the least. <br>
<br>
Often the problem is that their metatag for the charset is simply
wrong. Sometimes I can figure out what it is and supply this
information, like this: <br>
<blockquote>(http-request <a class="moz-txt-link-rfc2396E" href="http://www.walmart.com">"http://www.walmart.com"</a>
:external-format-in :UTF-8)<br>
</blockquote>
and it will solve he problem. But this particular example does not
lend itself to this, at least using the following charsets:<br>
<br>
:UTF-8<br>
:UTF-7<br>
:iso-8859-1<br>
:iso-8859-2<br>
:iso-8859-3<br>
:iso-8859-4<br>
:iso-8859-5<br>
:iso-8859-6<br>
:iso-8859-7<br>
:iso-8859-8<br>
:iso-8859-9<br>
:BIG5<br>
:US-ASCII<br>
:UTF-16<br>
:UTF-32<br>
<br>
I have no idea what their server is actually sending - it appears to
be invalid for any of these charsets.<br>
<br>
Is there any way to get around this problem? <br>
<br>
Best regards,<br>
Jeff Cunningham<br>
</body>
</html>