[Ecls-list] Two little unicode fixes.

Andy Hefner ahefner at gmail.com
Sun Feb 19 23:52:03 UTC 2012

I've tracked down the cause of some encoding-related problems
encountered while using string literals containing special characters
(squared, cubed, and degree symbols) in source files encoded as
latin-1. I specify :external-format :latin-1 to compile-file, but the
resulting objects/fasls error at load-time during their
initialization, the error indicating an attempt to decode latin-1
characters as UTF-8. A peek at the .data file confirmed the characters
were not UTF-8 encoded in the compiler output.

I've attached a patch which fixes this by changing DATA-C-DUMP,
supplying :external-format :utf8 to WT-FILTERED-DATA when writing the
data string if unicode is enabled. I also changed the surrounding
WITH-OPEN-FILE to supply :external-format :latin-1, as a pun for
passthrough encoding. I'm sure this isn't necessary, but without the
assurances of using a binary stream (or knowledge of how
:external-format :default is interpreted throughout ECL), I wanted to
pin it down to a specific behavior.

I found another bug along the way, in the use of
sequence-output-streams in C::UTF8-ENCODED-STRING. If the output
vector has to be resized, the original vector is returned instead of
the newer, larger vector. I fixed it by making the vector adjustable,
so adjust-array calls replace-array and we get what we want. An
alternative would be using an interface like string output streams,
with a function to retrieve the accumulated result, but this way was a
one line change.

Here's an example of how it returns the wrong result if the initial
size estimate (* 1.2 length) is too short:

> (length (c::utf8-encoded-string (string (code-char 179))))

0    ; should be 2.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179)) ".")))

2   ; should be 3.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179)) "..")))

4   ; correct.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecl-unicode-fixes.diff
Type: text/x-patch
Size: 1474 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20120219/fd5e7808/attachment.bin>

More information about the ecl-devel mailing list