[mel-base-devel] Some remarks
Jochen Schmidt
js at codeartist.org
Thu Dec 7 22:32:02 UTC 2006
Am 07.12.2006 um 22:17 schrieb Frédéric Jolliton:
> Hi,
>
> Just some little remarks about mel-base. Except perhaps for point 1,
> it's nothing serious for my own needs.
>
> I've discovered mel-base some days ago, and while I searched the ML
> (it was fast :)) and googled for more info, I really hope I'm not
> asking obvious things.
>
> 1) Is there a way to open an IMAP folder with element-type
> (unsigned-byte 8) ?
Currently there is no way to open an IMAP folder with element-type
(unsigned-byte 8).
> make-imap-connection is calling make-connection with 'character, but
> it looks wrong to me. Technically, a mail is a sequence of
> (unsigned-byte 8) with no meaning at the low level, and not a sequence
> of (CL) characters. While some protocols forbid 8 bits characters,
> most (all?) of them have 0-127 to represent ASCII set, but 128-255 are
> unspecified and used for various encoding.
While I agree on 'character being wrong in the transport layer of mel-
base, I have to say that 8bit is not the standard encoding in mail-
systems. There are extensions to the different mail-protocols to
support 8bit transfers, but it is AFAIK not guaranteed to work since
any SMTP server between the sender and receiver might
just strip 8bit bytes out (or at least the 8th bit). The standard
encoding for internet-mail messages
is indeed not octets but 7bit ASCII; you're supposed to use base64 or
quoted-printable encoding to transfer 8bit contents. You're right,
that many mail libraries, servers and tools do not
comply to this and use characters in the 8bit range. mel-base should
(and it may well be that it does not...)
be robust enough to handle this cases.
> I understand that returning array of (unsigned-byte 8) may not be
> convenient to users, but at least it is close to the "real" contents
> of mails.
>
> For example, a mail encoded with utf-8 cannot be processed by mel-base
> because the contents is translated to another default representation
> (iso-8859-1?), probably depending of the CL implementation (or locale
> settings.) Also sometimes, I receive broken mails with 8bits bytes in
> header. Not failing in such cases may be interesting for robustness.
>
> CL-USER> (mel:content-type (first (mel:parts (first (mel:messages
> *inbox*)))))
> :TEXT
> :PLAIN
> (:CHARSET "utf-8")
>
> Here, mel-base know it is the utf-8 charset, but it's too late since
> it is already a string internally.
Yes - but actually the string you get from accessing a mime-part in
mel-base is still thought
to be of kind 7bit us-ascii. I assumed "faithful i/o" when I
implemented mel-base, so that a bijective mapping between characters
and octets should always be possible. In other words: the characters
in mel-base's transport-streams are actually meant to be octets. I do
not decode the contents automatically because not every mail
application needs decoded mime-parts. The idea was that one has to
convert the content in a second (higher-level) step.
I never came to implement this second step routines, but we should
nowadays be able to use
flexi-streams for that purpose. To do this it would be indeed better
if all transport streams would just use
'(unsigned-byte 8).
> If that was me, I would handle everything in (unsigned-byte 8) and
> even split/search for fields in header using this element-type, and so
> on for all the internal stuff. And perhaps returns string to the user
> when encoding is known or when the sequence is composed only of
> (unsigned-byte 7).
Yes - looking back I wished I had used just octets.
> Also, not all CL implementation may support ASCII subset. But ok, such
> implementations are probably not widespread :)
I think we currently could ignore those safely ;-)
> What do you think about that?
I think it would be better to use real octets instead of "octets by
assuming character I/O to be ASCII".
Using octets would make it easier to use flexi-streams with mel-base
but would make it a bit more difficult
to use it without any decoding.
One idea to get this transition done might be to use a #"abc" reader
macro which is not read as a string but as a octet-vector and do
something similar for character literals. Then one had to look over
the complete source to
find all places where character streams are assumed. AFAIK some CL i/
O functions like LISTEN and PEEK are thought to work only on
character-streams in CL - which is AFAIR one of the reasons I used
character streams instead of octet streams.
>
> 2) What is the way to handle connection timeout in mel-base?
>
> When the connection times out, I get:
>
> end of file on #<SB-SYS:FD-STREAM for "a constant string" {D25A101}>
> [Condition of type END-OF-FILE]
>
> We have to know what exceptions the implementation raises when the
> connection is broken, which may not be convenient.
>
> Ultimately, it could be nice to have a restart to reconnect
> automatically. But on other side, it is perhaps bad because we lost
> the state of the connection which be necessary to mel-base.
There is no folder-implementation independent abstraction for
timeouts. The imap implementation
tried to some degree to automatically reconnect if the connection
stream is broken. This may not work or
may not work in all lisp systems. It is indeed difficult to robustly
reconnect a lost imap-connection because of the implied state which
is then lost. To handle this one would need to implement a higher-
level transaction
protocol which is able to track the progress of the operations and to
enable to synchronize with the transaction state again after a
reconnection.
>
> 3) As said in a private mail, a SSL support could be nice too :)
>
> I've no idea how hard it would be to use cl-ssl with cl-mel-base.
I've successfully used SSL tunneling tools like stunnel to access
imaps mailboxes.
ciao,
Jochen
More information about the mel-base-devel
mailing list