[Ecls-list] UTF-8 sequence decoding errors [Was: Upcoming changes]

Matthew Mondor mm_lists at pulsar-zone.net
Sun Feb 13 09:41:27 UTC 2011


On Sun, 13 Feb 2011 09:59:37 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

Yes I think that supporting that encoding would be very easy too.  The
only possibly tricky part is for users of that encoding to as necessary
output a more conventional utf-8 stream to some streams, such as for
display, possibly with bad sequences converted to latin-1.  But it
could read data from an UTF-8B exernal format stream and write it back
to another UTF-8B stream and be sure that the original data was
transparently copied as-is, and not be bothered with decoding/encoding
errors on streams with that external format.

I'm not sure if ECL should itself treat those invalid octets
transparently as LATIN-1 if doing the output on an UTF-8
external-format stream, however.  It's possible that without this some
problems occur in the debugger, slime, etc, which would be presented
with invalid UTF-8 characters in the UTF-16 surrogate range.

> Seems that, according to Luis Oliveira in the Babel mailing list and to the
> previous blog entry, the informal specification of UTF-8B is here
>    http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html
> but I can not seem to reach this page.

It also seems down from here.

Although archive.org has some archives I also couldn't find that
document there.

I could find various implementation notes however, such as an
implementation for iconv:
http://www.mail-archive.com/linux-utf8@nl.linux.org/msg05256.html

Also seems of interest:
http://hyperreal.org/~est/utf-8b/

In a previous post on this list I also posted example macros with some
documentation:
http://sourceforge.net/mailarchive/attachment.php?list_name=ecls-list&message_id=201101241340.p0ODek54021632%40ginseng.pulsar-zone.net&counter=1

Thanks again,
-- 
Matt




More information about the ecl-devel mailing list