[Ecls-list] Unicode: uncomfortable situation

Matthew Mondor mm_lists at pulsar-zone.net
Mon Jan 24 03:52:36 UTC 2011


On Sun, 23 Jan 2011 12:08:17 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> Should we just force Unicode by default? I do not want to keep getting
> random bug reports about cl+ssl and other software which is broken in that
> sense.

My builds always have unicode on, although under some circumstances I
explicitely set the external-format of some streams to LATIN-1 to have
8-bit clean characters, because the internal decoding of invalid UTF-8
sequences when reading them may produce errors which I cannot recover
from using ECL without losing those sequences at the moment.  I think
that as long as it remains easy to use a different default exteral
format (and to specify the external format for wanted streams) despite
the unicode support being built-in, no expected problem should occur in
my opinion.

On invalid UTF-8 sequences, SBCL provides a condition including the
invalid sequence which is not yet consumed, along with a restart to
gracefully recover.  This allows the application to decide what to do
and how to remap those bytes.  An application like an IRC client may
decide use an UTF-8 input stream and to map those invalid UTF-8
sequences as ISO-8859-* characters.  A program which must produce fidel
output might want to remap those bytes litterally into an unassigned
unicode character range to then be able to restore the exact output
including its errors, while still being able to take advantage of
unicode.

With ECL, the invalid sequence is already consumed when a more generic
error occurs (I forgot which, but could check my CVS logs on request),
which only allowed me to either ignore that invalid sequence or to
substitute it to an invalid unicode character (0x241a or 0xfffd).  If
the output must remain unmodified, there is no other way than to use
bytes or 8-bit clean characters at the moment.  At least a year ago or
more, we discussed this situation a bit on this list, yet I've not
looked into it again since.  Perhaps this would be a good time to
resume this work.

Thanks,
-- 
Matt




More information about the ecl-devel mailing list