[mel-base-devel] Some remarks

Thu Dec 7 22:32:02 UTC 2006

Am 07.12.2006 um 22:17 schrieb Frédéric Jolliton:

> Hi,
>
> Just some little remarks about mel-base. Except perhaps for point 1,
> it's nothing serious for my own needs.
>
> I've discovered mel-base some days ago, and while I searched the ML
> (it was fast :)) and googled for more info, I really hope I'm not
> asking obvious things.
>
> 1) Is there a way to open an IMAP folder with element-type
> (unsigned-byte 8) ?

Currently there is no way to open an IMAP folder with element-type  
(unsigned-byte 8).

> make-imap-connection is calling make-connection with 'character, but
> it looks wrong to me. Technically, a mail is a sequence of
> (unsigned-byte 8) with no meaning at the low level, and not a sequence
> of (CL) characters. While some protocols forbid 8 bits characters,
> most (all?) of them have 0-127 to represent ASCII set, but 128-255 are
> unspecified and used for various encoding.

While I agree on 'character being wrong in the transport layer of mel- 
base, I have to say that 8bit  is not the standard encoding in mail- 
systems. There are extensions to the different mail-protocols to  
support 8bit transfers, but it is AFAIK not guaranteed to work since  
any SMTP server between the sender and receiver might
just strip 8bit bytes out (or at least the 8th bit). The standard  
encoding for internet-mail messages
is indeed not octets but 7bit ASCII; you're supposed to use base64 or  
quoted-printable encoding to transfer 8bit contents. You're right,  
that many  mail libraries, servers and tools do not
comply to this and use characters in the 8bit range. mel-base should  
(and it may well be that it does not...)
be robust enough to handle this cases.

> I understand that returning array of (unsigned-byte 8) may not be
> convenient to users, but at least it is close to the "real" contents
> of mails.
>
> For example, a mail encoded with utf-8 cannot be processed by mel-base
> because the contents is translated to another default representation
> (iso-8859-1?), probably depending of the CL implementation (or locale
> settings.) Also sometimes, I receive broken mails with 8bits bytes in
> header. Not failing in such cases may be interesting for robustness.
>
>   CL-USER> (mel:content-type (first (mel:parts (first (mel:messages  
> *inbox*)))))
>   :TEXT
>   :PLAIN
>   (:CHARSET "utf-8")
>
> Here, mel-base know it is the utf-8 charset, but it's too late since
> it is already a string internally.

Yes - but actually the string you get from accessing a mime-part in  
mel-base is still thought
to be of kind 7bit us-ascii. I assumed "faithful i/o" when I  
implemented mel-base, so that a bijective mapping between characters  
and octets should always be possible. In other words: the characters  
in mel-base's transport-streams are actually meant to be octets. I do  
not decode the contents automatically because not every mail  
application needs decoded mime-parts. The idea was that one has to  
convert the content in a second (higher-level) step.
I never came to implement this second step routines, but we should  
nowadays be able to use
flexi-streams for that purpose. To do this it would be indeed better  
if all transport streams would just use
'(unsigned-byte 8).

> If that was me, I would handle everything in (unsigned-byte 8) and
> even split/search for fields in header using this element-type, and so
> on for all the internal stuff. And perhaps returns string to the user
> when encoding is known or when the sequence is composed only of
> (unsigned-byte 7).

Yes - looking back I wished I had used just octets.

> Also, not all CL implementation may support ASCII subset. But ok, such
> implementations are probably not widespread :)

I think we currently could ignore those safely ;-)

> What do you think about that?

I think it would be better to use real octets instead of "octets by  
assuming character I/O to be ASCII".
Using octets would make it easier to use flexi-streams with mel-base  
but would make it a bit more difficult
to use it without any decoding.

One idea to get this transition done might be to use a #"abc" reader  
macro which is not read as a string but as a octet-vector and do  
something similar for character literals. Then one had to look over  
the complete source to
find all places where character streams are assumed. AFAIK some CL i/ 
O functions like LISTEN and PEEK are thought to work only on  
character-streams in CL - which is AFAIR one of the reasons I used  
character streams instead of octet streams.

>
> 2) What is the way to handle connection timeout in mel-base?
>
> When the connection times out, I get:
>
>   end of file on #<SB-SYS:FD-STREAM for "a constant string" {D25A101}>
>      [Condition of type END-OF-FILE]
>
> We have to know what exceptions the implementation raises when the
> connection is broken, which may not be convenient.
>
> Ultimately, it could be nice to have a restart to reconnect
> automatically. But on other side, it is perhaps bad because we lost
> the state of the connection which be necessary to mel-base.

There is no folder-implementation independent abstraction for  
timeouts. The imap implementation
tried to some degree to automatically reconnect if the connection  
stream is broken. This may not work or
may not work in all lisp systems. It is indeed difficult to robustly  
reconnect a lost imap-connection because of the implied state which  
is then lost. To handle this one would need to implement a higher- 
level transaction
protocol which is able to track the progress of the operations and to  
enable to synchronize with the transaction state again after a  
reconnection.

>
> 3) As said in a private mail, a SSL support could be nice too :)
>
> I've no idea how hard it would be to use cl-ssl with cl-mel-base.

I've successfully used SSL tunneling tools like stunnel to access  
imaps mailboxes.

ciao,
Jochen