[elephant-devel] BTREE Sorting on Symbol Strings?
Ian Eslick
eslick at csail.mit.edu
Sun Jan 21 22:21:02 UTC 2007
How often do users of Elephant rely on BTrees where the symbols are
ordered according to the symbol's name?
Would it be terribly inconvenient to you if you had to convert symbol
keys to strings to get an alphabetical ordering, but were still
assured of contiguity of identical symbol keys in secondary btrees?
The argument is that we serialize symbols to strings all the time
(slot access, etc) and this engenders a great deal of overhead. Most
symbol serialization is highly redundant and can be factored out by
assigning a persistent ID to each symbol as we do with persistent
objects. This results in significantly less disk space (a constant
vs. N*char_width bits for every slot value), reduced IO bandwidth
(and increased locality), and less serialization/deserialization time.
The downside is that the C function passed to BDB which is used to
compare two strings so that the BTree is ordered does not have access
to the persistent table. Thus we can't order strings according to
their characters, but only according to their ID which means a random
order. Of course symbols will be identical to themselves and so will
be grouped together in duplicate indices.
Sorting according to the characters of the symbol may be possible,
but there are a number of implications that require some thinking
about and I want to put this off for now.
As this is a user configurable option (my-config.sexp) and will be
enabled by default I don't think there is any harm in promoting this.
Comments?
Thank you,
Ian
More information about the elephant-devel
mailing list