[Ecls-list] Unicode 16-bits

Daniel Herring dherring at tentpost.com
Tue Feb 22 03:33:16 UTC 2011


On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote:

> Would you find it useful to have an ECL that only supports character codes 0 - 65535? That would make it probably easier to embed the part of the Unicode database associated to it (< 65535 bytes) and have a standalone executable.
> Executables would also be a bit faster and use less memory (16-bits vs 32-bits per character)

...

Not sure I follow.  For many people, that would be fine; but its a subset 
of unicode and could cause confusion when it breaks.

Lately I've heard several fairly knowledgeable people say UTF-8 really is 
ideal.  While UTF-32 allows immediate indexing to a given codepoint, that 
doesn't help with common tasks due to combining marks and such.

They appear to be supported by (or have subverted) wikipedia.
http://en.wikipedia.org/wiki/Utf-32
http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues

As for the database, you can always split it into separately loadable 
chunks and throw an error if a chunk is not available when needed.

- Daniel




More information about the ecl-devel mailing list