[Ecls-list] Unicode 16-bits
Daniel Herring
dherring at tentpost.com
Tue Feb 22 03:33:16 UTC 2011
On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote:
> Would you find it useful to have an ECL that only supports character codes 0 - 65535? That would make it probably easier to embed the part of the Unicode database associated to it (< 65535 bytes) and have a standalone executable.
> Executables would also be a bit faster and use less memory (16-bits vs 32-bits per character)
...
Not sure I follow. For many people, that would be fine; but its a subset
of unicode and could cause confusion when it breaks.
Lately I've heard several fairly knowledgeable people say UTF-8 really is
ideal. While UTF-32 allows immediate indexing to a given codepoint, that
doesn't help with common tasks due to combining marks and such.
They appear to be supported by (or have subverted) wikipedia.
http://en.wikipedia.org/wiki/Utf-32
http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues
As for the database, you can always split it into separately loadable
chunks and throw an error if a chunk is not available when needed.
- Daniel
More information about the ecl-devel
mailing list