[elephant-devel] Development tools

George Khouri gk1 at four-four.com
Mon Nov 20 06:53:23 UTC 2006


Ian,
Does the storing of unicode strings little-endian require little-endian input strings, or will/do you convert big-endian unicode strings as (I believe) are represented on the PPC (OpenMCL)?
Thanks,
George

>Unicode:
>
>By the way, I'm just cleaning up the last of my unicode updates.  I kept
>having problems with the efficiency hacks in the current support for
>Unicode -- there was no canonical representation of strings in the
>database; each lisp+machine coded it differently.  Also, even though
>most strings have codes in the ASCII or Latin-1 character set, SBCL was
>still storing 32-bit characters.  It now uses the smallest coding size
>(8,16 or 32) necessary to represent the string.  Support for 8 or 16 is
>fairly efficient but if you use unicode code pages > 0 there will be a
>performance and storage hit.  I put in a convention in where all 16/32
>bit unicode strings are stored little-endian (x86 is a little-endian
>machine) so I can use native string reader functions to pull shorts and
>ints out of the byte vectors when possible.  This should greatly compact
>string storage on most unicode supporting systems (2x on allegro, 4x on
>SBCL).
>
--------
George Khouri
gk1 at four-four.com



More information about the elephant-devel mailing list