[Ecls-list] Character encoding/decoding utilities

Matthew Mondor mm_lists at pulsar-zone.net
Fri Aug 26 08:24:44 UTC 2011


Hello,

Although there are existing third-party libraries for character
encoding conversions (including one personal implementation), since ECL
has everything needed for UTF-8 encoding/decoding I think that it'd be
nice if it could out-of-the-box permit conversion between bytes and
strings with the wanted external-format.

If there exists such a facility natively other than reading/writing
files, I'd be happy to know.  What I tried initially was creating a
string stream and trying to change its external-format, but that is not
permitted.

A simple example situation is when decoding URL % HEX HEX escaped
sequences.  If the string containing such a URL to be decoded is
already exists, it needs to be treated as bytes more than characters
for such escaping to happen, and the escaped bytes might represent
UTF-8 characters, which need to be "re-read" into an unicode string.

If there was a facility such as
make-bytes-output-stream/with-output-to-bytes and
make-bytes-input-stream/with-input-from-bytes, for instance, with
customizable external-format, this would be very easy to do.  Of
course, reading from a bytes array would then also be expected to
signal the same errors as when reading from a file.

Or, alternatively, something like (SBCL and Babel, if I remember?)
provide, ext:octets-to-string and ext:string-to-octets with
an :encoding key, yet I'm not sure what the proper interface would be
to deal with decoding errors here, if we wanted to permit resuming.  I
guess that the error condition could contain the offset where the error
was found and/or have a restart like read...

Suggestions? (other than using third party or custom facilities, of
course).  Once we established what interface we want, I could probably
help for the implementation if necessary.

Thanks,
-- 
Matt




More information about the ecl-devel mailing list