[Ecls-list] UTF-8 sequence decoding errors [Was: Upcoming changes]
Matthew Mondor
mm_lists at pulsar-zone.net
Sun Feb 13 01:01:17 UTC 2011
On Sat, 12 Feb 2011 23:49:14 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:
> I did not realize that from your previous email. This is fixed now (trivial
> typo in utf_8_decoder)
I tested and it works fine for invalid UTF-8 bytes to LATIN-1
conversion.
I also did a test relating to my previous suggestions about a way to
preserve intact invalid input at output, later refered to as "UTF-8B"
by Andy Hefner previously, and it seems possible.
The remaining problems with UTF-8B are that it requires support by the
UTF-8 encoder because those bytes should be output as-is. Moreover,
this may break things if the UTF-8 decoder does not transparently
support this input conversion, because for instance, the implementation
would otherwise not be able to read what it can write. I got bitten by
stuck slime several times when decoding/encoding errors occurred if I
was not careful enough, outputting a stream with invalid characters in
UTF-8 mode, it seems that slime could not catch the decoding error in
that case (i.e. printing output of #xDCxx range characters without
passing through the latin-1 conversion function) :)
But UTF-8B could in fact be considered another encoding, and if I
really need it I might eventually send patches to have it optionally
available. The advantages of this mode have been previously mentionned
on this list in the earlier "Unicode: uncomfortable situation" thread.
Attached is the attempt nevertheless. It works fine except for
litteral-output.
Thanks a lot, to me it seems that ECL is on par with SBCL for UTF-8
input handling now.
--
Matt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: custom-read-line.lisp
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110212/f157a86e/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InvalidUTF8.txt
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110212/f157a86e/attachment.txt>
More information about the ecl-devel
mailing list