[Ecls-list] Automatic string conversion - wanted or needed?

Matthew Mondor mm_lists at pulsar-zone.net
Thu May 1 15:49:04 UTC 2014

On Thu, 01 May 2014 11:44:00 +0200
"Pascal J. Bourguignon" <pjb at informatimago.com> wrote:

> clisp distinguishes the following encodings:
> CUSTOM:*DEFAULT-FILE-ENCODING*   for :external-format :default
> CUSTOM:*MISC-ENCODING*           for the rest
> CUSTOM:*PATHNAME-ENCODING*       for pathnames
> CUSTOM:*TERMINAL-ENCODING*       for the terminal

This is interesting, thanks PJB.  I guess that EXT:*FOREIGN-ENCODING*
and EXT:*PATHNAME-ENCODING* would be useful here...  CLisp on unix
seems to have #<ENCODING CHARSET:UTF-8 :UNIX> by default here for its
corresponding CUSTOM, which seems sane, too.

As for encoding/decoding, ECL supports sequence streams, which could be
used easily to write the necessary string conversion function without
extra dependencies.  It it can serve as an example of using sequence

(defun utf-8-string-encode (string)
  "Encodes the supplied STRING to an UTF-8 octets vector which it returns."
  (let ((v (make-array (+ 5 (length string)) ; Best case but we might grow
                       :element-type '(unsigned-byte 8)
                       :adjustable t
                       :fill-pointer 0)))
    (with-open-stream (s (ext:make-sequence-output-stream
                          v :external-format :utf-8))
         for c across string
           (write-char c s)
           (let ((d (array-dimension v 0)))
             (when (< (- d (fill-pointer v)) 5)
               (adjust-array v (* 2 d))))))

(defun utf-8-string-decode (bytes)
  "Decodes the UTF-8 octets vector BYTES to string which it returns.
Invalid sequence octets are imported as LATIN-1 characters."
  (macrolet ((add-char (c)
               `(vector-push-extend ,c string 1024)))
    (with-open-stream (s (ext:make-sequence-input-stream
                          bytes :external-format :utf-8))
         with string = (make-array 1024
                                   :element-type 'character
                                   :adjustable t
                                   :fill-pointer 0)
         for c of-type character =
                 #'(lambda (e)
                     (mapc #'(lambda (o)
                               ;; Assume LATIN-1 and import
                               (add-char (code-char o)))
                           (ext:character-decoding-error-octets e))
                     (invoke-restart 'continue)))
                 #'(lambda (e)
                     (declare (ignore e))
             (read-char s))
         do (add-char c)
         finally (return string)))))


More information about the ecl-devel mailing list