[Ecls-list] Character encoding/decoding utilities

Matthew Mondor mm_lists at pulsar-zone.net
Fri Aug 26 09:52:07 UTC 2011


On Fri, 26 Aug 2011 11:00:12 +0200
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> On Fri, Aug 26, 2011 at 10:24 AM, Matthew Mondor
> <mm_lists at pulsar-zone.net>wrote:
> 
> > Although there are existing third-party libraries for character
> > encoding conversions (including one personal implementation), since ECL
> > has everything needed for UTF-8 encoding/decoding I think that it'd be
> > nice if it could out-of-the-box permit conversion between bytes and
> > strings with the wanted external-format.
> >
> 
> I have long wanted to implement sequence-input/output-streams, which would
> be the generalization of string-input/output-streams to sequences. The idea
> would be to look at the latter and see how much of it can be refactored.
> That would be more or less what you need, am I wrong?

If the output elt type can be byte, and the input elt type be
character, as well as the converse, and that external-format is taken
in consideration (and configurable) with encoding happening during
character->byte and decoding during byte->character, then indeed this
would work.

In other words, if I understand, something similar would become
possible?

;;; Write a unicode character string to an UTF-8 encoded bytes vector
(let ((v (make-array 16 ; Expect implementation to adjust ^2 or *2 as needed
                     :element-type 'byte
                     :adjustable t
                     :fill-pointer 0)))
  (with-open-stream (os (make-sequence-output-stream
                         v :external-format '(:UTF-8 :LF)))
    (format os "some unicode string~%")
    v)) ; Contains the UTF-8 encoded bytes

;;; Read a unicode character string from an UTF-8 encoded bytes vector
(let ((v <vector of bytes to read/decode>))
  (with-open-stream (is (make-sequence-input-stream
                         v :external-format '(:UTF-8 :LF)))
    (read-line is))) ; UBCS-4 characters, may generate decoding exceptions

> I have one caveat, though, I would not implement string streams using those
> sequence streams -- the external format of those string streams is fixed by
> the ECL's interpretation of the strings and should never be changed.

This would not be a problem, considering the more flexible alternative
available when needed.
-- 
Matt




More information about the ecl-devel mailing list