[Ecls-list] UTF-8 sequence decoding errors [Was: Upcoming changes]
Juan Jose Garcia-Ripoll
juanjose.garciaripoll at googlemail.com
Sat Feb 12 18:07:43 UTC 2011
Thanks for the detailed report. I made some changes.
* The exported symbols come from the EXT package. They are
character-coding-error
character-coding-error-external-format
character-decoding-error
character-decoding-error-octets
character-encoding-error
character-encoding-error-code
stream-encoding-error
stream-decoding-error
* Two restarts are provided USE-VALUE and CONTINUE. They can be used via the
ANSI functions with the same name (I think you missed that point regarding
USE-VALUE)
* Encoding errors are also now created. Before the function had not been
plugged into the engine.
* I am not likely to provide multi-character restarts for a simple reason:
ECL's streams are too simple, not providing arbitrary push-back buffers for
bytes. Having a USE-VALUE restart that returns more than one character may
lead to unexpected problems with unread-char and other functions -- I do not
mean it is impossible but it simply complicates the interface and right now
I have no clear idea how to do that.
I attached a modified version of your code.
Best,
Juanjo
--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110212/3b8f43e9/attachment.html>
-------------- next part --------------
(defun custom-read-line (stream &key (max 512) (replace-char))
(let ((line (make-array max
:element-type 'character
:adjustable t
:fill-pointer 0)))
(flet ((add-char (c)
(declare (type character c))
(vector-push c line))
(finalize-line ()
(let ((len (length line)))
(when (and (> len 0)
(char= #\Return (aref line (1- len))))
(vector-pop line)))
line))
(loop
do
(let (
;; No way to determine invalid octet values with old ECL,
;; Return an unknown character code
(c #+old-ecl(handler-case
(read-char stream)
(simple-error ()
#\UFFFD))
;; SBCL provides invalid octets which we can import and
;; then issue an ATTEMPT-RESYNC restart to resume
#+sbcl(handler-bind
((sb-int:stream-decoding-error
#'(lambda (e)
;; Treat invalid UTF-8 octets as
;; ISO-8859 characters.
(mapcar #'(lambda (c)
(when (> c 127)
(add-char (code-char c))))
(sb-int:character-decoding-error-octets e))
(invoke-restart 'sb-int:attempt-resync))))
(read-char stream))
;; Test with new ECL
#+ecl(handler-bind
((ext:character-decoding-error ; Internal
#'(lambda (e)
(mapcar #'(lambda (c)
(format t "~%Code: ~A" c)
(when (> c 127)
;; Never happens
(add-char (code-char c))))
;; Not advertized interface?
(ext:character-decoding-error-octets e))
;; Either replace the character or ignore
(if replace-char
(use-value #\?)
(continue))
)))
(read-char stream)))
)
(when (char= #\Newline c)
(return (values (finalize-line) t)))
(add-char c))))))
(defun test (&rest args)
(with-open-file (stream "InvalidUTF8.txt")
(loop
do
(let ((line (handler-case
(apply #'custom-read-line stream args)
(end-of-file ()
(loop-finish)))))
(format t "~A~%" line)))))
(test)
#+ecl
(test :replace-char #\?)
More information about the ecl-devel
mailing list