From mel-base at frederic.jolliton.com Thu Dec 7 21:17:48 2006 From: mel-base at frederic.jolliton.com (=?iso-8859-1?Q?Fr=E9d=E9ric_Jolliton?=) Date: Thu, 07 Dec 2006 22:17:48 +0100 Subject: [mel-base-devel] Some remarks Message-ID: <86y7pj2yo3.fsf@mau.intra.tuxee.net> Hi, Just some little remarks about mel-base. Except perhaps for point 1, it's nothing serious for my own needs. I've discovered mel-base some days ago, and while I searched the ML (it was fast :)) and googled for more info, I really hope I'm not asking obvious things. 1) Is there a way to open an IMAP folder with element-type (unsigned-byte 8) ? make-imap-connection is calling make-connection with 'character, but it looks wrong to me. Technically, a mail is a sequence of (unsigned-byte 8) with no meaning at the low level, and not a sequence of (CL) characters. While some protocols forbid 8 bits characters, most (all?) of them have 0-127 to represent ASCII set, but 128-255 are unspecified and used for various encoding. I understand that returning array of (unsigned-byte 8) may not be convenient to users, but at least it is close to the "real" contents of mails. For example, a mail encoded with utf-8 cannot be processed by mel-base because the contents is translated to another default representation (iso-8859-1?), probably depending of the CL implementation (or locale settings.) Also sometimes, I receive broken mails with 8bits bytes in header. Not failing in such cases may be interesting for robustness. CL-USER> (mel:content-type (first (mel:parts (first (mel:messages *inbox*))))) :TEXT :PLAIN (:CHARSET "utf-8") Here, mel-base know it is the utf-8 charset, but it's too late since it is already a string internally. If that was me, I would handle everything in (unsigned-byte 8) and even split/search for fields in header using this element-type, and so on for all the internal stuff. And perhaps returns string to the user when encoding is known or when the sequence is composed only of (unsigned-byte 7). Also, not all CL implementation may support ASCII subset. But ok, such implementations are probably not widespread :) What do you think about that? 2) What is the way to handle connection timeout in mel-base? When the connection times out, I get: end of file on # [Condition of type END-OF-FILE] We have to know what exceptions the implementation raises when the connection is broken, which may not be convenient. Ultimately, it could be nice to have a restart to reconnect automatically. But on other side, it is perhaps bad because we lost the state of the connection which be necessary to mel-base. 3) As said in a private mail, a SSL support could be nice too :) I've no idea how hard it would be to use cl-ssl with cl-mel-base. Thanks again for this project! -- Fr?d?ric Jolliton From js at codeartist.org Thu Dec 7 22:32:02 2006 From: js at codeartist.org (Jochen Schmidt) Date: Thu, 7 Dec 2006 23:32:02 +0100 Subject: [mel-base-devel] Some remarks In-Reply-To: <86y7pj2yo3.fsf@mau.intra.tuxee.net> References: <86y7pj2yo3.fsf@mau.intra.tuxee.net> Message-ID: <2439F349-60FC-4943-B52E-68B2D4D296CB@codeartist.org> Am 07.12.2006 um 22:17 schrieb Fr?d?ric Jolliton: > Hi, > > Just some little remarks about mel-base. Except perhaps for point 1, > it's nothing serious for my own needs. > > I've discovered mel-base some days ago, and while I searched the ML > (it was fast :)) and googled for more info, I really hope I'm not > asking obvious things. > > 1) Is there a way to open an IMAP folder with element-type > (unsigned-byte 8) ? Currently there is no way to open an IMAP folder with element-type (unsigned-byte 8). > make-imap-connection is calling make-connection with 'character, but > it looks wrong to me. Technically, a mail is a sequence of > (unsigned-byte 8) with no meaning at the low level, and not a sequence > of (CL) characters. While some protocols forbid 8 bits characters, > most (all?) of them have 0-127 to represent ASCII set, but 128-255 are > unspecified and used for various encoding. While I agree on 'character being wrong in the transport layer of mel- base, I have to say that 8bit is not the standard encoding in mail- systems. There are extensions to the different mail-protocols to support 8bit transfers, but it is AFAIK not guaranteed to work since any SMTP server between the sender and receiver might just strip 8bit bytes out (or at least the 8th bit). The standard encoding for internet-mail messages is indeed not octets but 7bit ASCII; you're supposed to use base64 or quoted-printable encoding to transfer 8bit contents. You're right, that many mail libraries, servers and tools do not comply to this and use characters in the 8bit range. mel-base should (and it may well be that it does not...) be robust enough to handle this cases. > I understand that returning array of (unsigned-byte 8) may not be > convenient to users, but at least it is close to the "real" contents > of mails. > > For example, a mail encoded with utf-8 cannot be processed by mel-base > because the contents is translated to another default representation > (iso-8859-1?), probably depending of the CL implementation (or locale > settings.) Also sometimes, I receive broken mails with 8bits bytes in > header. Not failing in such cases may be interesting for robustness. > > CL-USER> (mel:content-type (first (mel:parts (first (mel:messages > *inbox*))))) > :TEXT > :PLAIN > (:CHARSET "utf-8") > > Here, mel-base know it is the utf-8 charset, but it's too late since > it is already a string internally. Yes - but actually the string you get from accessing a mime-part in mel-base is still thought to be of kind 7bit us-ascii. I assumed "faithful i/o" when I implemented mel-base, so that a bijective mapping between characters and octets should always be possible. In other words: the characters in mel-base's transport-streams are actually meant to be octets. I do not decode the contents automatically because not every mail application needs decoded mime-parts. The idea was that one has to convert the content in a second (higher-level) step. I never came to implement this second step routines, but we should nowadays be able to use flexi-streams for that purpose. To do this it would be indeed better if all transport streams would just use '(unsigned-byte 8). > If that was me, I would handle everything in (unsigned-byte 8) and > even split/search for fields in header using this element-type, and so > on for all the internal stuff. And perhaps returns string to the user > when encoding is known or when the sequence is composed only of > (unsigned-byte 7). Yes - looking back I wished I had used just octets. > Also, not all CL implementation may support ASCII subset. But ok, such > implementations are probably not widespread :) I think we currently could ignore those safely ;-) > What do you think about that? I think it would be better to use real octets instead of "octets by assuming character I/O to be ASCII". Using octets would make it easier to use flexi-streams with mel-base but would make it a bit more difficult to use it without any decoding. One idea to get this transition done might be to use a #"abc" reader macro which is not read as a string but as a octet-vector and do something similar for character literals. Then one had to look over the complete source to find all places where character streams are assumed. AFAIK some CL i/ O functions like LISTEN and PEEK are thought to work only on character-streams in CL - which is AFAIR one of the reasons I used character streams instead of octet streams. > > 2) What is the way to handle connection timeout in mel-base? > > When the connection times out, I get: > > end of file on # > [Condition of type END-OF-FILE] > > We have to know what exceptions the implementation raises when the > connection is broken, which may not be convenient. > > Ultimately, it could be nice to have a restart to reconnect > automatically. But on other side, it is perhaps bad because we lost > the state of the connection which be necessary to mel-base. There is no folder-implementation independent abstraction for timeouts. The imap implementation tried to some degree to automatically reconnect if the connection stream is broken. This may not work or may not work in all lisp systems. It is indeed difficult to robustly reconnect a lost imap-connection because of the implied state which is then lost. To handle this one would need to implement a higher- level transaction protocol which is able to track the progress of the operations and to enable to synchronize with the transaction state again after a reconnection. > > 3) As said in a private mail, a SSL support could be nice too :) > > I've no idea how hard it would be to use cl-ssl with cl-mel-base. I've successfully used SSL tunneling tools like stunnel to access imaps mailboxes. ciao, Jochen From mel-base at frederic.jolliton.com Fri Dec 8 13:20:34 2006 From: mel-base at frederic.jolliton.com (=?iso-8859-1?Q?Fr=E9d=E9ric_Jolliton?=) Date: Fri, 08 Dec 2006 14:20:34 +0100 Subject: [mel-base-devel] Some remarks In-Reply-To: <2439F349-60FC-4943-B52E-68B2D4D296CB@codeartist.org> (Jochen Schmidt's message of "Thu\, 7 Dec 2006 23\:32\:02 +0100") References: <86y7pj2yo3.fsf@mau.intra.tuxee.net> <2439F349-60FC-4943-B52E-68B2D4D296CB@codeartist.org> Message-ID: <86odqe34nx.fsf@mau.intra.tuxee.net> Hi, [..] > The standard encoding for internet-mail messages is indeed not > octets but 7bit ASCII; you're supposed to use base64 or > quoted-printable encoding to transfer 8bit contents. I agree. But however, in 2006, I don't think you can find a 7bit-only server anymore :) I'm working mostly with sendmail and cyrus imapd, and both support 8bits (at least for the body, while for the header it is either undefined behavior, mangled or rejected in such cases.) The 8BITMIME extension for SMTP seems universal. By curiosity, I made some stats from my imap server: 28998 mails out of 244247 contains 8bit characters (11.8%.) [Which prove nothing, just that it is usual to have to process 8bit mail.] -- Fr?d?ric Jolliton