[elephant-devel] working with many millions of objects

Ian Eslick eslick at csail.mit.edu
Thu Oct 12 01:18:16 UTC 2006


I don't think that performance of the base Elephant is good enough to
justify an every-object-is-persistent model.  It's just not seamless
enough or efficient enough yet. 

DCM takes a good step in the performance direction for most applications
(single lisp images) and I'm hoping that some of the changes I'd like to
get into 0.7.0 or 0.8.0 integrate the DCM ideas into the persistent
object protocol.
>> 2.  using a Hash instead of a BTree in the primary database?  I am 
>> unsure what this means for elephant.
>>     
> I don't think that you make progress with that.  Elephant depends very
> deeply on the BTree,
> and, as far as I know, so does Sleepycat.
Actually BDB has four record types of which BTree is one.  However, I
believe it would be a fairly major rewrite as I believe I recall that
the C API for Hash access is quite different and most of elephant would
have to be adjusted.

The tradeoff boils down to this:
- Every BTree access requires log(N) page accesses, if you have high
locality than N or N-1 of these is cached and you average O(1) with a
lower constant than Hash accesses
- Every Hash access requires log(1) page accesses, but each access has
higher overhead and for large hashes there is no way to ensure disk
access locality other than repeated hits.

So if you are using alot of indexing and accessing pages in some sorted
order, you'll win with BTrees - but if you are doing random accesses
you'll trash the cache by an extra log(N) factor.

Ian



More information about the elephant-devel mailing list