[Ecls-list] Character encoding/decoding utilities
Matthew Mondor
mm_lists at pulsar-zone.net
Fri Aug 26 09:52:07 UTC 2011
On Fri, 26 Aug 2011 11:00:12 +0200
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:
> On Fri, Aug 26, 2011 at 10:24 AM, Matthew Mondor
> <mm_lists at pulsar-zone.net>wrote:
>
> > Although there are existing third-party libraries for character
> > encoding conversions (including one personal implementation), since ECL
> > has everything needed for UTF-8 encoding/decoding I think that it'd be
> > nice if it could out-of-the-box permit conversion between bytes and
> > strings with the wanted external-format.
> >
>
> I have long wanted to implement sequence-input/output-streams, which would
> be the generalization of string-input/output-streams to sequences. The idea
> would be to look at the latter and see how much of it can be refactored.
> That would be more or less what you need, am I wrong?
If the output elt type can be byte, and the input elt type be
character, as well as the converse, and that external-format is taken
in consideration (and configurable) with encoding happening during
character->byte and decoding during byte->character, then indeed this
would work.
In other words, if I understand, something similar would become
possible?
;;; Write a unicode character string to an UTF-8 encoded bytes vector
(let ((v (make-array 16 ; Expect implementation to adjust ^2 or *2 as needed
:element-type 'byte
:adjustable t
:fill-pointer 0)))
(with-open-stream (os (make-sequence-output-stream
v :external-format '(:UTF-8 :LF)))
(format os "some unicode string~%")
v)) ; Contains the UTF-8 encoded bytes
;;; Read a unicode character string from an UTF-8 encoded bytes vector
(let ((v <vector of bytes to read/decode>))
(with-open-stream (is (make-sequence-input-stream
v :external-format '(:UTF-8 :LF)))
(read-line is))) ; UBCS-4 characters, may generate decoding exceptions
> I have one caveat, though, I would not implement string streams using those
> sequence streams -- the external format of those string streams is fixed by
> the ECL's interpretation of the strings and should never be changed.
This would not be a problem, considering the more flexible alternative
available when needed.
--
Matt
More information about the ecl-devel
mailing list