[Bese-devel] Re: rfc2388

Mon Jul 17 21:05:05 UTC 2006

Lou Vanek <vanek at acd.net> writes:

> i don't see how sending rfc2388 code to an http server project helps,
> but if i explained all the problems i'm having with their choice
> of streams that may be of some use.
>
> on second thought, maybe you're right. Maybe they'll add rfc2388 support
> directly into the http server. I didn't get the impression that araneida
> was maintained any more, though.

the problem revolves around the encoding for the data. let's say you
have a form uploading a picture, if clisp tries to treat this as utf-8
encoded data you're going to get corruption. the only real solution is
to explicitly manage the decoding of the various parts of the data and
treat the input as binary, converting it only when you have enough
info to do it properly.

>>>The binary parser cannot handle pure-unix line endings,
>>>and i believe the binary parser requires two dashes surrounding
>>>the boundary string.
>> the binary parser doesn't know what a line ending is. it can handle
>> CRLF sequences just fine.
>
> but the state machine spends quite a bit of time dealing with line-end
> characters (or at least integers 13s and 10s) when parsing the mime headers.
> the mime headers that i was getting back from ff where getting lost in
> this state machine and (most of the time) not reaching the final state.

yeah, since the first thing in a boundry line is a CRLF sequence the
state machine spends the vast vast majority of its time looking for
boundries.

> it's just my opinion, but i don't think it's too much of a stretch
> to support unix line endings in addition to DOS. But that's only
> my opinion. It seems to work for me on windows on my wacky setup.

once you've decided to treat the data as character then you're
right. my problem is with treating the data as chars.

>>>clisp coalesces <cr><nl> into just <nl> on windows unless
>>>you are able to drop down into reading the stream in binary,
>>>which i wasn't able to do. I don't think that's possible in clisp
>>>for some types of streams.
>> then we need to add an :external-format when araneida. 
>
> i'm not having any more trouble supporting both
> unix and dos line endings. It doesn't require much code, and the
> version-c state machine is more robust.
>
> sam gave an explanation of why he coalesces the line-ending
> characters but i can't remember his explanation. It sounded
> good at the time. (saw it somewhere on c.l.l.)

i agree that read-char, on windows, should return #\Newline for a
cr-lf sequence, while on unix it will only consume the CR
character. that's what people expect from a text oriented
interface. however you'll notice that rcf2388 defines CR as byte 10,
not as some code-point in a character space, it's a binary protocol.

the real issue in all of this is that araneida treats http as text...

> if you're
>> treating the http stream as text where do you setup the character
>> encoding?
>
> i'm just assuming character data is either ISO 8859-1 or a subset thereof,
> which is the way araneida is hard-coded and the mode i start clisp up in.
> not a perfect solution, i know, but i don't need a perfect solution.

iso-8859-9 (not iso-8859-1) has the nice property of not changing the
binary data when encoding/decoding (unlike utf-X), if you don't have a
choice that's what you should use.

>> one of the main reasons i wrote the new rfc2388 was to deal with
>> non-ascii, non-utf data, i can not accept a change to rfc2388 which
>> breaks this.
>
> ok. thanks for saying it.
> i don't see why having both a binary and character parser hurts, though,
> especially if somebody's in the situation where they have a character stream
> and you can't change it. And it needs to also work with rfc2046 headers.
> But i understand it's easier to support just one parser.

it hurts because a character parser can not, by definiton, work
correctly. sorry.

if someone on araneida needs multipart/form-data support than your
parser is a fine solution and should be included in araneida. at the
same time i have issues with supporting in the generic rfc2388 parser
(which is the reason you sent it to this list right?)

>>>i think the binary parser expects the boundary to be
>>>both prefixed and suffixed with two dashes, but the
>>>mime boundary that i received didn't end with two dashes,
>>>and rfc2046 doesn't require it.
>> rfc2388 specifies two dashes on either side of the boundray as an
>> end-of-data marker. between parts the boundry is only prefixed by the
>> dashes. note that rfc2388 and rfc2046 are different standards (albeit
>> very similar). rfc2388 does not purport to implement rfc2046.
>
> i didn't know that. Then i'm not receiving mime headers in rfc2388
> back from my browsers.
>
>> is there a browser out there sending rfc2046 in place of rfc2388?
>> (explorer right?)
>
> every boundary i have inspected so far starts with dashes but ends with
> some sort of line ending character (no double-dash).
>
> i spend 90% of my time debugging in ff1.5.x/windows, and 10% of my time
> in ie7b3. Given up on opera. Don't have access to mac unless I pull the
> ol' SE30 out of the closet. Lost my linux dual boot in a terrible accident.

do you have some data you could post? i've tested with ff on various
platforms (even windows) and have never seen this behaviour.

-- 
-Marco
Ring the bells that still can ring.
Forget your perfect offering.
There is a crack in everything.
That's how the light gets in.
	-Leonard Cohen