[elephant-devel] Development tools
Ian Eslick
eslick at csail.mit.edu
Mon Nov 20 14:53:38 UTC 2006
George,
On writes it does the conversion to little-endian manually using
char-code and ldb to extract bytes. Similarly, the default read is also
implemented using lisp routines code-char/dpb, but can exploit native
byte-copy routines
in some cases. I'll implement that later as a performance (re)-enhancement.
The rest of the serializer still uses the underlying C representation of
byte order so integers will reflect the endianness of the host machine
and the width the host's native integer (i.e., 8 bytes on 64-bit machines).
I decided for this version that lisp-independence and the bug freeness
of string decoding was more useful than being machine-type independent.
With some more work (and performance hits) the whole thing could become
standardized on a particular byte order, but I won't do that unless asked.
Ian
George Khouri wrote:
> Ian,
> Does the storing of unicode strings little-endian require little-endian input strings, or will/do you convert big-endian unicode strings as (I believe) are represented on the PPC (OpenMCL)?
> Thanks,
> George
>
>
>> Unicode:
>>
>> By the way, I'm just cleaning up the last of my unicode updates. I kept
>> having problems with the efficiency hacks in the current support for
>> Unicode -- there was no canonical representation of strings in the
>> database; each lisp+machine coded it differently. Also, even though
>> most strings have codes in the ASCII or Latin-1 character set, SBCL was
>> still storing 32-bit characters. It now uses the smallest coding size
>> (8,16 or 32) necessary to represent the string. Support for 8 or 16 is
>> fairly efficient but if you use unicode code pages > 0 there will be a
>> performance and storage hit. I put in a convention in where all 16/32
>> bit unicode strings are stored little-endian (x86 is a little-endian
>> machine) so I can use native string reader functions to pull shorts and
>> ints out of the byte vectors when possible. This should greatly compact
>> string storage on most unicode supporting systems (2x on allegro, 4x on
>> SBCL).
>>
>>
> --------
> George Khouri
> gk1 at four-four.com
>
More information about the elephant-devel
mailing list