[tbnl-devel] Re: New version 0.8.7 (Was: Content length with multibyte character encodings)

Ignas Mikalajunas ignas.mikalajunas at gmail.com
Tue Nov 29 11:24:44 UTC 2005


> On Tue, 29 Nov 2005 00:18:08 +0200, Ignas Mikalajunas <ignas.mikalajunas at gmail.com> wrote:
>
> >   Content length is calculated by calling (length content) which
> > produces wrong results with unicode characters in the string. Piso
> > on #lisp proposed a solution - using (length (string-to-octets
> > string :external-format :utf-8)) which translates to just (length
> > (string-to-octets string :external-format)) in the code.
>
> I won't do that because it's most likely a terrible performance hog if
> you convert each page to octets be default (assuming that most users
> already send octets).

Sorry i was not aware of that. If i understand you correctly the right
way is converting all of my pages (they all are utf-8) to octets
before sending them to tbnl?

> I also don't understand why
>
>   (length (string-to-octets string :external-format :utf-8))
>
> translates to
>
>   (length (string-to-octets string :external-format))

Because the first one is cl-user:string-to-octets and the second one is
tbnl:string-to-octets.

> > because with current setup browsers that strictly adhere to the
> > content-lenght (IE 6.0, Opera) would trim 1 character of the
> > responses body for each UTF-8 character in it.
>
> Nope, that's not how UTF-8 works.

What i meant was:
(length "ąčęė") returns 4
though (lenght (string-to-octests "ąčęė")) is 8.
Which means that tbnl would try to fit an 8 octet body with a content
length of 4 and IE/Opera would display that as "ąč". That's how it
works on SBCL.

  Ignas


More information about the Tbnl-devel mailing list