[tbnl-devel] Re: New version 0.8.7

Edi Weitz edi at agharta.de
Tue Nov 29 14:58:30 UTC 2005


On Tue, 29 Nov 2005 13:24:44 +0200, Ignas Mikalajunas <ignas.mikalajunas at gmail.com> wrote:

> Sorry i was not aware of that. If i understand you correctly the
> right way is converting all of my pages (they all are utf-8) to
> octets before sending them to tbnl?

Yep.  Either that or (with the new version) figure out the octet
length with other, less expensive means and setting it directly - if
SBCL allows you to return a random Unicode string to TBNL.

>> I also don't understand why
>>
>>   (length (string-to-octets string :external-format :utf-8))
>>
>> translates to
>>
>>   (length (string-to-octets string :external-format))
>
> Because the first one is cl-user:string-to-octets and the second one
> is tbnl:string-to-octets.

Ah, OK.  Assuming you meant :UTF-8 instead of :EXTERNAL-FORMAT in the
second form it's rather the other way around, though.  A call to
TBNL::STRING-TO-OCTETS will be translated to a call to the
corresponding function in SB-EXT.

>> > because with current setup browsers that strictly adhere to the
>> > content-lenght (IE 6.0, Opera) would trim 1 character of the
>> > responses body for each UTF-8 character in it.
>>
>> Nope, that's not how UTF-8 works.
>
> What i meant was:
> (length "ąčęė") returns 4
> though (lenght (string-to-octests "ąčęė")) is 8.
> Which means that tbnl would try to fit an 8 octet body with a
> content length of 4 and IE/Opera would display that as "ąč". That's
> how it works on SBCL.

But you said "each" character.  UTF-8 is a variable-length encoding
where one character can have any length from one to six octets.  For
example, if your characters are all within the ASCII charset you won't
lose any octets at all.

Cheers,
Edi.



More information about the Tbnl-devel mailing list