[Ecls-list] Status of external formats

Juan Jose Garcia-Ripoll juanjose.garciaripoll at googlemail.com
Fri Jan 9 10:30:33 UTC 2009


Hi,

right now ECL provides support for the encodings below and for some
other aliases of these formats. The support is based on internal
conversion tables that are generated from the sources provided by the
Unicode consortium, except for ISO-2022-*, which are stateful
encodings and require a bit of programming.

These encodings are tested against iconv in the ECL test suite. Some
tests have been disabled for a simple reason: iconv is a bit more
clever and for some characters it outputs the combined Unicode
version, instead of the form made by a base character and a combining
one (accent, drawing, etc). Until we get support for normalization of
Unicode strings, this cannot be compared reliably.

This will also raise the question of whether we should just fall back
to using iconv. This has several potential problems. One of them is
stateful encoders, which have to remember data between characters we
read. Currently ECL handles this rather simply, so that even in the
multithreaded case we may get at most a wrong input or output.

The other problem is encodings that map one character to two unicode
characters (say for instance JISX0203). Since this mapping would be
opaque to ECL, unread-char, file-position and other routines would
stop working properly.

Juanjo

ATARIST        DOS-CP865       ISO-8859-3       WINDOWS-CP1251
CP-856         DOS-CP866       ISO-8859-4       WINDOWS-CP1252
DOS-CP437      DOS-CP869       ISO-8859-5       WINDOWS-CP1253
DOS-CP737      DOS-CP874       ISO-8859-6       WINDOWS-CP1254
DOS-CP775      ISO-2022-JP     ISO-8859-7       WINDOWS-CP1255
DOS-CP850      ISO-2022-JP-1   ISO-8859-8       WINDOWS-CP1256
DOS-CP852      ISO-8859-1      ISO-8859-9       WINDOWS-CP1257
DOS-CP855      ISO-8859-10                      WINDOWS-CP1258
DOS-CP857      ISO-8859-11                      WINDOWS-CP932
DOS-CP860      ISO-8859-13                      WINDOWS-CP936
DOS-CP861      ISO-8859-14     KOI8-R           WINDOWS-CP949
DOS-CP862      ISO-8859-15     KOI8-U           WINDOWS-CP950
DOS-CP863      ISO-8859-16
DOS-CP864      ISO-8859-2

-- 
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28009 (Spain)
http://juanjose.garciaripoll.googlepages.com


More information about the ecl-devel mailing list