[Ecls-list] Project status and changes (directions? help?)

Mon Oct 21 10:47:19 UTC 2013

On Mon, 21 Oct 2013 12:24:50 +0200
"Pascal J. Bourguignon" <pjb at informatimago.com> wrote:

> When reading utf-8 or other unicode streams, invalid byte sequences can
> signal errors, be substituted by a given character, or be encoded into
> application reseved code points to be able to transparently transmit the
> invalid byte sequence.  Cf. clisp :INPUT-ERROR-ACTION parameter of
> ext:make-encoding (clisp encodings are external-format values).
> http://clisp.org/impnotes/encoding.html#make-encoding

I agree with the above, and it's currently possible in ECL to handle
UTF-8 decoding errors (ext:stream-decoding-error) with access to the
octets of the invalid sequence (ext:character-decoding-error-octets),
with an available restart (invoke-restart 'use-value ...).  Thus an
application is free to also recode the invalid octets to LATIN or to
implement "UTF8-B" at its discretion, if it implements its own input
and output.

The advantage of native modes such as UTF8-B or UTF8-LATIN-1 etc would
be performance and simplicity in cases where this is wanted, but the
default UTF-8 streams would continue to explicitely signal decoding
errors, definitely.

If you also mean that CLisp can also optionally do such conversions
transparently on request (or that its interface allows user code to do
this more efficiently), that's a good thing to know and I should look
at its implementation for ideas on the way it presents that interface.
I've added to my notes the link above, thanks a lot for your answer.
-- 
Matt