[Bese-devel] Re: character issues. aka: http is a binary protocol, get over it.

Marco Baringer mb at bese.it
Thu Dec 15 18:18:57 UTC 2005


Maciek Pasternacki <maciekp at japhy.fnord.org> writes:

> As for latin-1 characters, practically it allows page to use any
> octets, which are passed as `cookies' without meaning and charset (in
> latin-1 all octets are legal, so when I do <input name="Здравствуйте">
> in utf-8-encoded source, the input field name as far as RFC is
> concerned will just be a Latin-1 string "ÐдÑавÑÑвÑйÑе".
> Browsers will just send straight byte-to-byte copies of field name as
> seen in HTML source and everything will be correct.

this is good to know. however, what happens when one of the utf-8
encoded characters, when viewed as a byte sequence, conflicts with the
standard application/x-www-form-urlencoded markers? the utf-8 sequence
0626 (arabic yeh with hamza) would parse as the control sequence 06
plus a #\& character, this would confuse my current parser greatly.

-- 
-Marco
Ring the bells that still can ring.
Forget the perfect offering.
There is a crack in everything.
That's how the light gets in.
	-Leonard Cohen



More information about the bese-devel mailing list