From rob.blackwell at aws.net Wed Apr 6 10:07:23 2011 From: rob.blackwell at aws.net (Rob Blackwell) Date: Wed, 6 Apr 2011 11:07:23 +0100 Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker Message-ID: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> Hi, I have some byte arrays which are UTF8 and some which are UTF8 with byte order markers. I can convert these arrays to strings using > (babel:octets-to-string foo) and > (babel:octets-to-string foo :start 3) respectively, but I'm currently having to figure out whether there is a BOM, like this > (subseq foo 0 3) #(239 187 191) If I use (babel:octets-to-string foo) on a byte array with BOM markers, then my SBCL Lisp image dies. Is there a better way to ask Babel to discover the correct encoding by looking for Byte Order Marks? Ideally I'd like one function call that worked with any array and figured out which encoding was being used automatically and works whether or not a BOM is present? Sorry if I'm missing something obvious, I'm a Babel newbie .. Any guidance or code samples gratefully received. Thanks, Rob. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luismbo at gmail.com Wed Apr 6 11:45:35 2011 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Wed, 6 Apr 2011 12:45:35 +0100 Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker In-Reply-To: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> Message-ID: Hello, On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell wrote: > If I use (babel:octets-to-string foo) on a byte array with BOM markers, then > my SBCL Lisp image dies. > > Is there a better way to ask Babel to discover the correct encoding by > looking for Byte Order Marks? Ideally I?d like one function call that worked > with any array and figured out which encoding was being used automatically > and works whether or not a BOM is present? Babel handles BOMs in UTF-16 and UTF-32 properly. It uses them to identify endianness then skips them. I'm not sure what one's supposed to do with BOMs in UTF-8; probably skip them, certainly not crash! This will require some debugging. Cheers, -- Lu?s Oliveira http://r42.eu/~luis/ From luismbo at gmail.com Tue Apr 12 22:23:25 2011 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Tue, 12 Apr 2011 23:23:25 +0100 Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker In-Reply-To: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> Message-ID: Hello again, On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell wrote: > If I use (babel:octets-to-string foo) on a byte array with BOM markers, then > my SBCL Lisp image dies. I've tried this out and it works for me: CL-USER> (babel:octets-to-string (babel-tests::ub8v 239 187 191 102 111 111)) "?foo" CL-USER> (length *) 4 I'm guessing you're using SLIME and you haven't set your slime-net-coding-system to 'utf-8-unix or something similar. Have a look at the *inferior-lisp* when your Lisp crashes to see if that's the case. HTH, -- Lu?s Oliveira http://r42.eu/~luis/ From rob.blackwell at aws.net Thu Apr 21 21:36:12 2011 From: rob.blackwell at aws.net (Rob Blackwell) Date: Thu, 21 Apr 2011 22:36:12 +0100 Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker In-Reply-To: References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net> Message-ID: <6D8D0E393259694788637A444E1C5EC75C2AEE@venus.intra.aws.net> Luis, I updated my .emacs as follows and it worked (set-language-environment "UTF-8") (load (expand-file-name "~/quicklisp/slime-helper.el")) (setq slime-net-coding-system 'utf-8-unix) I'm still a little confused as to why the length is 4 and not 3 - shouldn?t the byte order mark have been discarded? Many thanks! Rob. -----Original Message----- From: Lu?s Oliveira [mailto:luismbo at gmail.com] Sent: 12 April 2011 23:23 To: Rob Blackwell Cc: babel-devel at common-lisp.net Subject: Re: [babel-devel] octets-to-string with UTF8 and Byte Order Marker Hello again, On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell wrote: > If I use (babel:octets-to-string foo) on a byte array with BOM > markers, then my SBCL Lisp image dies. I've tried this out and it works for me: CL-USER> (babel:octets-to-string (babel-tests::ub8v 239 187 191 102 111 111)) "?foo" CL-USER> (length *) 4 I'm guessing you're using SLIME and you haven't set your slime-net-coding-system to 'utf-8-unix or something similar. Have a look at the *inferior-lisp* when your Lisp crashes to see if that's the case. HTH, -- Lu?s Oliveira http://r42.eu/~luis/ From khaelin at gmail.com Sat Apr 23 14:50:54 2011 From: khaelin at gmail.com (Nicolas Martyanoff) Date: Sat, 23 Apr 2011 16:50:54 +0200 Subject: [babel-devel] patch for cp1252 Message-ID: <874o5pggxd.fsf@gmail.com> Hi, I added support for the cp1252 encoding: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cp1252.diff URL: -------------- next part -------------- I hope you will find it useful. Regards, -- Nicolas Martyanoff http://codemore.org khaelin at gmail.com