[elephant-devel] Development tools

Ian Eslick eslick at csail.mit.edu
Mon Nov 20 14:53:38 UTC 2006


George,

On writes it does the conversion to little-endian manually using
char-code and ldb to extract bytes.  Similarly, the default read is also
implemented using lisp routines code-char/dpb, but can exploit native
byte-copy routines
in some cases.  I'll implement that later as a performance (re)-enhancement.

The rest of the serializer still uses the underlying C representation of
byte order so integers will reflect the endianness of the host machine
and the width the host's native integer (i.e., 8 bytes on 64-bit machines).

I decided for this version that lisp-independence and the bug freeness
of string decoding was more useful than being machine-type independent. 
With some more work (and performance hits) the whole thing could become
standardized on a particular byte order, but I won't do that unless asked.

Ian



George Khouri wrote:
> Ian,
> Does the storing of unicode strings little-endian require little-endian input strings, or will/do you convert big-endian unicode strings as (I believe) are represented on the PPC (OpenMCL)?
> Thanks,
> George
>
>   
>> Unicode:
>>
>> By the way, I'm just cleaning up the last of my unicode updates.  I kept
>> having problems with the efficiency hacks in the current support for
>> Unicode -- there was no canonical representation of strings in the
>> database; each lisp+machine coded it differently.  Also, even though
>> most strings have codes in the ASCII or Latin-1 character set, SBCL was
>> still storing 32-bit characters.  It now uses the smallest coding size
>> (8,16 or 32) necessary to represent the string.  Support for 8 or 16 is
>> fairly efficient but if you use unicode code pages > 0 there will be a
>> performance and storage hit.  I put in a convention in where all 16/32
>> bit unicode strings are stored little-endian (x86 is a little-endian
>> machine) so I can use native string reader functions to pull shorts and
>> ints out of the byte vectors when possible.  This should greatly compact
>> string storage on most unicode supporting systems (2x on allegro, 4x on
>> SBCL).
>>
>>     
> --------
> George Khouri
> gk1 at four-four.com
>   



More information about the elephant-devel mailing list