From rob.blackwell at aws.net  Wed Apr  6 10:07:23 2011
From: rob.blackwell at aws.net (Rob Blackwell)
Date: Wed, 6 Apr 2011 11:07:23 +0100
Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker
Message-ID: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>

Hi,

 
I have some byte arrays which are UTF8 and some which are UTF8 with byte
order markers.

 
I can convert these arrays to strings using

 
> (babel:octets-to-string foo)

 
and

 
> (babel:octets-to-string foo :start 3)

 
respectively, but I'm currently having to figure out whether there is a
BOM, like this

 
> (subseq foo 0 3)

#(239 187 191)

 
If I use (babel:octets-to-string foo) on a byte array with BOM markers,
then my SBCL Lisp image dies.

 
Is there a better way to ask Babel to discover the correct encoding by
looking for Byte Order Marks? Ideally I'd like one function call that
worked with any array and figured out which encoding was being used
automatically and works whether or not a BOM is present?

 
Sorry if I'm missing something obvious, I'm a Babel newbie .. Any
guidance or code samples gratefully received.

 
Thanks,

 
Rob.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/babel-devel/attachments/20110406/adf89210/attachment.html>

From luismbo at gmail.com  Wed Apr  6 11:45:35 2011
From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=)
Date: Wed, 6 Apr 2011 12:45:35 +0100
Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker
In-Reply-To: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>
References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>
Message-ID: <BANLkTimJ-nXuC2ki-oipygQ+YhucpAQxBg@mail.gmail.com>

Hello,

On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell <rob.blackwell at aws.net> wrote:
> If I use (babel:octets-to-string foo) on a byte array with BOM markers, then
> my SBCL Lisp image dies.
>
> Is there a better way to ask Babel to discover the correct encoding by
> looking for Byte Order Marks? Ideally I?d like one function call that worked
> with any array and figured out which encoding was being used automatically
> and works whether or not a BOM is present?

Babel handles BOMs in UTF-16 and UTF-32 properly. It uses them to
identify endianness then skips them. I'm not sure what one's supposed
to do with BOMs in UTF-8; probably skip them, certainly not crash!
This will require some debugging.

Cheers,

-- 
Lu?s Oliveira
http://r42.eu/~luis/


From luismbo at gmail.com  Tue Apr 12 22:23:25 2011
From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=)
Date: Tue, 12 Apr 2011 23:23:25 +0100
Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker
In-Reply-To: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>
References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>
Message-ID: <BANLkTi=3de47vanPZsTLLAfp6T=s0zuDyA@mail.gmail.com>

Hello again,

On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell <rob.blackwell at aws.net> wrote:
> If I use (babel:octets-to-string foo) on a byte array with BOM markers, then
> my SBCL Lisp image dies.

I've tried this out and it works for me:

  CL-USER> (babel:octets-to-string (babel-tests::ub8v 239 187 191 102 111 111))
  "?foo"
  CL-USER> (length *)
  4

I'm guessing you're using SLIME and you haven't set your
slime-net-coding-system to 'utf-8-unix or something similar. Have a
look at the *inferior-lisp* when your Lisp crashes to see if that's
the case.

HTH,

-- 
Lu?s Oliveira
http://r42.eu/~luis/


From rob.blackwell at aws.net  Thu Apr 21 21:36:12 2011
From: rob.blackwell at aws.net (Rob Blackwell)
Date: Thu, 21 Apr 2011 22:36:12 +0100
Subject: [babel-devel] octets-to-string with UTF8 and Byte Order Marker
In-Reply-To: <BANLkTi=3de47vanPZsTLLAfp6T=s0zuDyA@mail.gmail.com>
References: <6D8D0E393259694788637A444E1C5EC75C2A62@venus.intra.aws.net>
	<BANLkTi=3de47vanPZsTLLAfp6T=s0zuDyA@mail.gmail.com>
Message-ID: <6D8D0E393259694788637A444E1C5EC75C2AEE@venus.intra.aws.net>

Luis,

I updated my .emacs as follows and it worked

(set-language-environment "UTF-8")
(load (expand-file-name "~/quicklisp/slime-helper.el"))
(setq slime-net-coding-system 'utf-8-unix)

I'm still a little confused as to why the length is 4 and not 3 - shouldn?t the byte order mark have been discarded?

Many thanks!

Rob.


-----Original Message-----
From: Lu?s Oliveira [mailto:luismbo at gmail.com] 
Sent: 12 April 2011 23:23
To: Rob Blackwell
Cc: babel-devel at common-lisp.net
Subject: Re: [babel-devel] octets-to-string with UTF8 and Byte Order Marker

Hello again,

On Wed, Apr 6, 2011 at 11:07 AM, Rob Blackwell <rob.blackwell at aws.net> wrote:
> If I use (babel:octets-to-string foo) on a byte array with BOM 
> markers, then my SBCL Lisp image dies.

I've tried this out and it works for me:

  CL-USER> (babel:octets-to-string (babel-tests::ub8v 239 187 191 102 111 111))
  "?foo"
  CL-USER> (length *)
  4

I'm guessing you're using SLIME and you haven't set your slime-net-coding-system to 'utf-8-unix or something similar. Have a look at the *inferior-lisp* when your Lisp crashes to see if that's the case.

HTH,

--
Lu?s Oliveira
http://r42.eu/~luis/

From khaelin at gmail.com  Sat Apr 23 14:50:54 2011
From: khaelin at gmail.com (Nicolas Martyanoff)
Date: Sat, 23 Apr 2011 16:50:54 +0200
Subject: [babel-devel] patch for cp1252
Message-ID: <874o5pggxd.fsf@gmail.com>


Hi,

I added support for the cp1252 encoding:

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cp1252.diff
URL: <https://mailman.common-lisp.net/pipermail/babel-devel/attachments/20110423/84c2bd22/attachment.ksh>
-------------- next part --------------

I hope you will find it useful.

Regards,

-- 
Nicolas Martyanoff
   http://codemore.org
   khaelin at gmail.com