[Ecls-list] Major changes (help wanted)

Anton Vodonosov avodonosov at yandex.ru
Sat Jan 3 01:13:38 UTC 2009


on Friday, January 2, 2009, 9:08:54 PM Juan wrote:

>  - ECL supports external formats. They may be a symbol, denoting the encoding
>    or an encoding option, an hash table between bytes and unicode codes
>    or a list of these. Valid symbols are :DEFAULT, :LATIN-1, :ISO-8859-1,
>    :UTF-8, :UCS-{2,4}{,BE,LE} :CR, :LF and :CRLF. Default option is :LF.

> For an example of how user defined encodings are implemented:

> (defvar *a* (loop for i from 0 below 128 collect (cons i i)))
> (defvar *b* (ext:make-encoding *a*))
> (with-open-file (s "foo.txt" :direction :output :external-format *b*
>                                                 :if-exists :supersede)
>   (write-line "abcd" s))
> (si::system "cat foo.txt")
> (with-open-file (s "foo.txt" :direction :output :external-format
> :utf-8 :if-exists :supersede)
>   (write-line "?bcd" s))
> (si::system "cat foo.txt")
> (with-open-file (s "foo.txt" :direction :input :external-format *b*)
>   (read-line s)) ;; Signals an error : character outside of encoding

> Valid encoding names are any symbol that names a file in
> contrib/encodings/ ($libdir/encodings after installing)

> I also added encoding files for most useful Windows codepages, but I
> would need help in debugging the Japanese and Chinese encodings. These
> variable width encodings are also available, although not shipped by
> default because of their size. They can be generated with the file
> ecl/contrib/encodings/generate.lisp uncommenting the appropriate
> lines.

> Variable width encodings, except for UTF-8, are rather inefficient:
> they require a large hash table mapping multi-bytes to Unicode
> characters and viceversa.

> Juanjo


Hello Juan,

If ECL supports user defined encodings, would it be useful to
allow their specification in more functional way: user provides
functions like octets-to-string for reading,
strings-to-octets for writing. Maybe some also others, like
(compute-number-of-chars byte-seq start end) - number of chars
needed to decode byte sequence BYTE-SEQ from START to END,
(compute-number-of-octets byte-seq start end)?

Another way: user provides a function that creates a
Gray stream from a byte stream provided by ECL's runtime. This
Gray stream handles all further operations.

I've been thinking of this idea that ECL may implement external
formats in streams using extern library like flexi-streams from
the time when I first discovered ECL about a year ago. But I didn't
manage to find a time to investigate it to more concrete, code
oriented suggestion.

Now, as you already implemented external format, this suggestion
has still less sense. But on the other hand, perhaps you may find
something useful in it.

Best regards,
- Anton





More information about the ecl-devel mailing list