[babel-devel] octets-to-string with UTF8 and Byte Order Marker

Luís Oliveira luismbo at gmail.com
Wed May 11 07:32:33 UTC 2011


Hello,

Sorry for the late reply.

On Thu, Apr 21, 2011 at 10:36 PM, Rob Blackwell <rob.blackwell at aws.net> wrote:
> I'm still a little confused as to why the length is 4 and not 3 - shouldn’t the byte order mark have been discarded?

I'm not sure. I couldn't find any clear indications on how leading
BOMs should be handled for UTF-8. The BOM FAQ seems to indicate they
should be converted to ZERO WIDTH NON-BREAKING SPACEs, maybe. Any
comments? It would perhaps be interesting to check what well
established libraries such as ICU do.

Cheers,

-- 
Luís Oliveira
http://r42.eu/~luis/




More information about the babel-devel mailing list