[tbnl-announce] New version 0.8.7 (Was: Content length with multibyte character encodings)
Edi Weitz
edi at agharta.de
Tue Nov 29 09:14:59 UTC 2005
On Tue, 29 Nov 2005 00:18:08 +0200, Ignas Mikalajunas <ignas.mikalajunas at gmail.com> wrote:
> Content length is calculated by calling (length content) which
> produces wrong results with unicode characters in the string. Piso
> on #lisp proposed a solution - using (length (string-to-octets
> string :external-format :utf-8)) which translates to just (length
> (string-to-octets string :external-format)) in the code.
I won't do that because it's most likely a terrible performance hog if
you convert each page to octets be default (assuming that most users
already send octets).
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
> The true way to solve this would be using (file-string-length),
> but the function is not working properly on sbcl yet.
Huh? How is that supposed to work (even if it would work on SBCL)?
*TBNL-STREAM* is a binary stream which accepts octets, isn't it?
> So could you please fix the (send-output),
IMHO there's nothing to "fix" because TBNL works as expected. The
docs clearly say that you're supposed to send octets, see for example
here:
<http://weitz.de/tbnl/#quirks>
Note that the UTF-8 example that comes with TBNL sends a correct
header.
FWIW, I've just released a new version where you can manually set the
CONTENT-LENGTH slot of the REPLY object. If it is not NIL TBNL won't
bother to compute the content length so you can set it to any value
you want. Note, though, that you'll run into trouble
w.r.t. TBNL/Apache interaction if you set a wrong value there.
> because with current setup browsers that strictly adhere to the
> content-lenght (IE 6.0, Opera) would trim 1 character of the
> responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
Cheers,
Edi.
More information about the Tbnl-announce
mailing list