[Ecls-list] Automatic string conversion - wanted or needed?

Matthew Mondor mm_lists at pulsar-zone.net
Thu May 1 15:49:04 UTC 2014


On Thu, 01 May 2014 11:44:00 +0200
"Pascal J. Bourguignon" <pjb at informatimago.com> wrote:

> clisp distinguishes the following encodings:
> 
> CUSTOM:*DEFAULT-FILE-ENCODING*   for :external-format :default
> CUSTOM:*FOREIGN-ENCODING*        for FFI
> CUSTOM:*MISC-ENCODING*           for the rest
> CUSTOM:*PATHNAME-ENCODING*       for pathnames
> CUSTOM:*TERMINAL-ENCODING*       for the terminal

This is interesting, thanks PJB.  I guess that EXT:*FOREIGN-ENCODING*
and EXT:*PATHNAME-ENCODING* would be useful here...  CLisp on unix
seems to have #<ENCODING CHARSET:UTF-8 :UNIX> by default here for its
corresponding CUSTOM, which seems sane, too.

As for encoding/decoding, ECL supports sequence streams, which could be
used easily to write the necessary string conversion function without
extra dependencies.  It it can serve as an example of using sequence
streams:

(defun utf-8-string-encode (string)
  "Encodes the supplied STRING to an UTF-8 octets vector which it returns."
  (let ((v (make-array (+ 5 (length string)) ; Best case but we might grow
                       :element-type '(unsigned-byte 8)
                       :adjustable t
                       :fill-pointer 0)))
    (with-open-stream (s (ext:make-sequence-output-stream
                          v :external-format :utf-8))
      (loop
         for c across string
         do
           (write-char c s)
           (let ((d (array-dimension v 0)))
             (when (< (- d (fill-pointer v)) 5)
               (adjust-array v (* 2 d))))))
    v))

(defun utf-8-string-decode (bytes)
  "Decodes the UTF-8 octets vector BYTES to string which it returns.
Invalid sequence octets are imported as LATIN-1 characters."
  (macrolet ((add-char (c)
               `(vector-push-extend ,c string 1024)))
    (with-open-stream (s (ext:make-sequence-input-stream
                          bytes :external-format :utf-8))
      (loop
         with string = (make-array 1024
                                   :element-type 'character
                                   :adjustable t
                                   :fill-pointer 0)
         for c of-type character =
           (handler-bind
               ((ext:stream-decoding-error
                 #'(lambda (e)
                     (mapc #'(lambda (o)
                               ;; Assume LATIN-1 and import
                               (add-char (code-char o)))
                           (ext:character-decoding-error-octets e))
                     (invoke-restart 'continue)))
                (end-of-file
                 #'(lambda (e)
                     (declare (ignore e))
                     (loop-finish))))
             (read-char s))
         do (add-char c)
         finally (return string)))))

-- 
Matt




More information about the ecl-devel mailing list