[Ecls-list] Character encoding/decoding utilities
Matthew Mondor
mm_lists at pulsar-zone.net
Fri Aug 26 08:24:44 UTC 2011
Hello,
Although there are existing third-party libraries for character
encoding conversions (including one personal implementation), since ECL
has everything needed for UTF-8 encoding/decoding I think that it'd be
nice if it could out-of-the-box permit conversion between bytes and
strings with the wanted external-format.
If there exists such a facility natively other than reading/writing
files, I'd be happy to know. What I tried initially was creating a
string stream and trying to change its external-format, but that is not
permitted.
A simple example situation is when decoding URL % HEX HEX escaped
sequences. If the string containing such a URL to be decoded is
already exists, it needs to be treated as bytes more than characters
for such escaping to happen, and the escaped bytes might represent
UTF-8 characters, which need to be "re-read" into an unicode string.
If there was a facility such as
make-bytes-output-stream/with-output-to-bytes and
make-bytes-input-stream/with-input-from-bytes, for instance, with
customizable external-format, this would be very easy to do. Of
course, reading from a bytes array would then also be expected to
signal the same errors as when reading from a file.
Or, alternatively, something like (SBCL and Babel, if I remember?)
provide, ext:octets-to-string and ext:string-to-octets with
an :encoding key, yet I'm not sure what the proper interface would be
to deal with decoding errors here, if we wanted to permit resuming. I
guess that the error condition could contain the offset where the error
was found and/or have a restart like read...
Suggestions? (other than using third party or custom facilities, of
course). Once we established what interface we want, I could probably
help for the implementation if necessary.
Thanks,
--
Matt
More information about the ecl-devel
mailing list