[Ecls-list] Major changes (help wanted)

Juan Jose Garcia-Ripoll juanjose.garciaripoll at googlemail.com
Fri Jan 2 19:08:54 UTC 2009


 - ECL supports external formats. They may be a symbol, denoting the encoding
   or an encoding option, an hash table between bytes and unicode codes
   or a list of these. Valid symbols are :DEFAULT, :LATIN-1, :ISO-8859-1,
   :UTF-8, :UCS-{2,4}{,BE,LE} :CR, :LF and :CRLF. Default option is :LF.

For an example of how user defined encodings are implemented:

(defvar *a* (loop for i from 0 below 128 collect (cons i i)))
(defvar *b* (ext:make-encoding *a*))
(with-open-file (s "foo.txt" :direction :output :external-format *b*
						:if-exists :supersede)
  (write-line "abcd" s))
(si::system "cat foo.txt")
(with-open-file (s "foo.txt" :direction :output :external-format
:utf-8 :if-exists :supersede)
  (write-line "ßbcd" s))
(si::system "cat foo.txt")
(with-open-file (s "foo.txt" :direction :input :external-format *b*)
  (read-line s)) ;; Signals an error : character outside of encoding

Valid encoding names are any symbol that names a file in
contrib/encodings/ ($libdir/encodings after installing)

I also added encoding files for most useful Windows codepages, but I
would need help in debugging the Japanese and Chinese encodings. These
variable width encodings are also available, although not shipped by
default because of their size. They can be generated with the file
ecl/contrib/encodings/generate.lisp uncommenting the appropriate
lines.

Variable width encodings, except for UTF-8, are rather inefficient:
they require a large hash table mapping multi-bytes to Unicode
characters and viceversa.

Juanjo

-- 
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28009 (Spain)
http://juanjose.garciaripoll.googlepages.com


More information about the ecl-devel mailing list