[Ecls-list] Automatic string conversion - wanted or needed?
Matthew Mondor
mm_lists at pulsar-zone.net
Thu May 1 15:49:04 UTC 2014
On Thu, 01 May 2014 11:44:00 +0200
"Pascal J. Bourguignon" <pjb at informatimago.com> wrote:
> clisp distinguishes the following encodings:
>
> CUSTOM:*DEFAULT-FILE-ENCODING* for :external-format :default
> CUSTOM:*FOREIGN-ENCODING* for FFI
> CUSTOM:*MISC-ENCODING* for the rest
> CUSTOM:*PATHNAME-ENCODING* for pathnames
> CUSTOM:*TERMINAL-ENCODING* for the terminal
This is interesting, thanks PJB. I guess that EXT:*FOREIGN-ENCODING*
and EXT:*PATHNAME-ENCODING* would be useful here... CLisp on unix
seems to have #<ENCODING CHARSET:UTF-8 :UNIX> by default here for its
corresponding CUSTOM, which seems sane, too.
As for encoding/decoding, ECL supports sequence streams, which could be
used easily to write the necessary string conversion function without
extra dependencies. It it can serve as an example of using sequence
streams:
(defun utf-8-string-encode (string)
"Encodes the supplied STRING to an UTF-8 octets vector which it returns."
(let ((v (make-array (+ 5 (length string)) ; Best case but we might grow
:element-type '(unsigned-byte 8)
:adjustable t
:fill-pointer 0)))
(with-open-stream (s (ext:make-sequence-output-stream
v :external-format :utf-8))
(loop
for c across string
do
(write-char c s)
(let ((d (array-dimension v 0)))
(when (< (- d (fill-pointer v)) 5)
(adjust-array v (* 2 d))))))
v))
(defun utf-8-string-decode (bytes)
"Decodes the UTF-8 octets vector BYTES to string which it returns.
Invalid sequence octets are imported as LATIN-1 characters."
(macrolet ((add-char (c)
`(vector-push-extend ,c string 1024)))
(with-open-stream (s (ext:make-sequence-input-stream
bytes :external-format :utf-8))
(loop
with string = (make-array 1024
:element-type 'character
:adjustable t
:fill-pointer 0)
for c of-type character =
(handler-bind
((ext:stream-decoding-error
#'(lambda (e)
(mapc #'(lambda (o)
;; Assume LATIN-1 and import
(add-char (code-char o)))
(ext:character-decoding-error-octets e))
(invoke-restart 'continue)))
(end-of-file
#'(lambda (e)
(declare (ignore e))
(loop-finish))))
(read-char s))
do (add-char c)
finally (return string)))))
--
Matt
More information about the ecl-devel
mailing list