[Ecls-list] UTF-8 sequence decoding errors [Was: Upcoming changes]
Matthew Mondor
mm_lists at pulsar-zone.net
Sun Feb 13 01:14:26 UTC 2011
On Sat, 12 Feb 2011 20:01:17 -0500
Matthew Mondor <mm_lists at pulsar-zone.net> wrote:
> On Sat, 12 Feb 2011 23:49:14 +0100
> Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:
>
> > I did not realize that from your previous email. This is fixed now (trivial
> > typo in utf_8_decoder)
>
> I tested and it works fine for invalid UTF-8 bytes to LATIN-1
> conversion.
>
> I also did a test relating to my previous suggestions about a way to
> preserve intact invalid input at output, later refered to as "UTF-8B"
> by Andy Hefner previously, and it seems possible.
>
> The remaining problems with UTF-8B are that it requires support by the
> UTF-8 encoder because those bytes should be output as-is. Moreover,
> this may break things if the UTF-8 decoder does not transparently
> support this input conversion, because for instance, the implementation
> would otherwise not be able to read what it can write. I got bitten by
> stuck slime several times when decoding/encoding errors occurred if I
> was not careful enough, outputting a stream with invalid characters in
> UTF-8 mode, it seems that slime could not catch the decoding error in
> that case (i.e. printing output of #xDCxx range characters without
> passing through the latin-1 conversion function) :)
>
> But UTF-8B could in fact be considered another encoding, and if I
> really need it I might eventually send patches to have it optionally
> available. The advantages of this mode have been previously mentionned
> on this list in the earlier "Unicode: uncomfortable situation" thread.
> Attached is the attempt nevertheless. It works fine except for
> litteral-output.
Oh, it seems that UTF-8B is at least planned for SBCL too (not sure if
it already has it), according to
http://sbcl10.sbcl.org/materials/crhodes/unicode-lt.pdf
--
Matt
More information about the ecl-devel
mailing list