[Ecls-list] UTF-8 sequence decoding errors [Was: Upcoming changes]

Matthew Mondor mm_lists at pulsar-zone.net
Sat Feb 12 21:20:46 UTC 2011


On Sat, 12 Feb 2011 19:07:43 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> Thanks for the detailed report. I made some changes.
> 
> * The exported symbols come from the EXT package. They are

Indeed, SI and EXT appear to be aliases; however when a condition type
is printed, SI appears to take precedence over EXT, so for instance:

decoding error on stream #<input stream "/tmp/InvalidUTF8.txt">
(:EXTERNAL-FORMAT (:UTF-8 :LF)):
  the octet sequence (233 99) cannot be decoded.
   [Condition of type SI:STREAM-DECODING-ERROR]

Of course that's a detail, though.  I see that the symbol is now
extern, nice.

> * Two restarts are provided USE-VALUE and CONTINUE. They can be used via the
> ANSI functions with the same name (I think you missed that point regarding
> USE-VALUE)

Indeed, I hadn't realized about ANSI USE-VALUE at first, until my
second post.  I indeed now see a CONTINUE restart as well.

> * I am not likely to provide multi-character restarts for a simple reason:
> ECL's streams are too simple, not providing arbitrary push-back buffers for
> bytes. Having a USE-VALUE restart that returns more than one character may
> lead to unexpected problems with unread-char and other functions -- I do not
> mean it is impossible but it simply complicates the interface and right now
> I have no clear idea how to do that.

I agree that it's unnecessary, as long as the code can obtain the
invalid sequences and resume reading at that point it should be fine.

So I gave a quick try at the new changes; it's much better, although a
character is still getting lost after the CONTINUE restart, even if I
consume all bytes from the invalid octets supplied.  New test code
attached.  Also, in theory, there's a single invalid byte in a row in
that stream, while there are two supplied invalid octets per occurance,
but that's a detail if the CONTINUE restart doesn't lose bytes.

Thanks,
-- 
Matt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InvalidUTF8.txt
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110212/81cca4ae/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: custom-read-line.lisp
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110212/81cca4ae/attachment.ksh>


More information about the ecl-devel mailing list