[Ecls-list] Invalid octet sequence -> SIMPLE-ERROR

Matthew Mondor mm_lists at pulsar-zone.net
Wed Aug 5 07:36:39 UTC 2009


On Mon, 3 Aug 2009 00:19:39 -0400
Matthew Mondor <mm_lists at pulsar-zone.net> wrote:

> > Regarding errors, maybe we can provide a custom condition for illegal
> > sequences. Please understand that this kind of things are implemented
> > as needed and so far there was no need for it. You seem to have
> > special requirements. Please provide a description of how the
> > condition should look like and what would be the expected behavior --
> > continuable error, what kind of restarts, etc -- and I will look into
> > a possible implementation.
> 
> I guess that there are two solutions: to allow access to illegal
> sequences so software can interpret them as wanted (i.e. a-la-SBCL via
> the condition system), or to automatically interpret these as
> iso8859-15 (or another single-byte encoding) when the system is
> configured to, and use the condition system when this can't be done or
> auto-fallback is disabled.

I could look more closely at how SBCL does it and am noting this here
in the thread for continuity:

Bivalent streams are necessary in this case so the socket stream has to be created with :element-type :default as in the following:

(socket-make-stream socket
                    :element-type :default
                    :buffering :full
                    :input t
                    :output t)

Which socket by default on an UTF-8 configured host will expect UTF-8
sequences at READ-CHAR.  The following input example function can then
catch invalid UTF-8 sequence exceptions and use READ-BYTE as necessary:

(defun custom-read-line (stream)
  (let ((line (make-array 0
                          :element-type 'character
                          :adjustable t
                          :fill-pointer t)))
    (loop
       do
         (let ((c #+ecl(handler-case
                           (read-char stream)
                         (simple-error ()
                           #\?))
                  #+sbcl(handler-bind
                            ((sb-int:stream-decoding-error
                              #'(lambda (e)
                                  (declare (ignore e))
                                  ;; Treat invalid UTF-8 sequences as
                                  ;; ISO-8859 characters.
                                  (let ((b (read-byte stream)))
                                    (vector-push-extend (code-char b) line 1))
                                  ;; Consume/sync
                                  (invoke-restart 'sb-int:attempt-resync))))
                          (read-char stream))))
           (when (char= c #\Newline)
             (return (values line t)))
           (vector-push-extend c line 1)))))

Of course, this is all implementation-specific, and one could argue
that a more portable way to go would be using babel or flexi-streams
with binary streams.  I wonder if a similar system shouldn't be
provided for the sake of completeness, however, as this avoids external
dependencies while permitting custom handling of invalid sequences.

So sb-int:stream-decoding-error function condition has to arguments:
stream and sequence, which are passed as keyword arguments to ERROR
with stream-decoding-error condition which prints out the information.

I'm not sure if it's easy to obtain those arguments from handler-case
or handler-bind, but if it was, bivalent streams would be unnecessary
to access the invalid bytes, and the condition could be signaled after
the bytes were consumed instead of requireing a sync restart (like ECL
currently does when signaling SIMPLE-ERROR).
-- 
Matt




More information about the ecl-devel mailing list