[Bese-devel] character issues. aka: http is a binary protocol, get over it.

Marco Baringer mb at bese.it
Thu Dec 15 13:50:13 UTC 2005


[this is directed to those of you who are using ucw and non latin-1
charsets.]

i'm currently trying to make ucw do the Right Thing(TM) with regard to
non ascii characters. the first change, and the one i've almost
finished, is a large patch to rfc2388 which makes it work on
unsigned-byte streams (instead of character streams). this moves the
encoding problems out of rfc2388 (who shoudn't be bothered) and into
the application (where we have enough information to decide how to
handle it)[1].

once rfc2388 is "fixed" i'll move on to the httpd (and therefore
mod_lisp) backend and start treating the input/output streams as
unsigned-byte streams[2]. at this point much user code will break,
i'll try and make things as backward compatable as possible, but it's
still going to be painfull, though i'm pretty sure it'll be worth it
in the end.

it would be helpfull for me if i could get copies of the data people
are trying to send back and forth (especially non latin-1 stuff). if
anyone could send me some example forms and files containing latin-9
(or whatever) data so i can test things out it that would be
great. even just an explanation of what you're trying to do (and
currently doesn't work without ugly hacks) would be enough. 

[1] - as a side effect of this i'll finally fix the large file
problem. if we attemtp to parse more than N bytes of data in a form
submit we'll open a file and put the data there. the value passed to
user code will always be a _binary_ stream (even if the data was small
enough to be kept in memory) and the user can do what they want with
it, users will have access to the content-type and charset parameters
passed plus a set of utility functions for encoding/decoding data.

[2] - added bonus: sending images and pdf files as responses will
become trivial.

-- 
-Marco
Ring the bells that still can ring.
Forget the perfect offering.
There is a crack in everything.
That's how the light gets in.
	-Leonard Cohen



More information about the bese-devel mailing list