[elephant-devel] summer of code: native lisp backend?

Thu Mar 27 14:05:27 UTC 2008

Just to capture these references for posterity.

By fractal trees I think Leslie means variations on this work which  
optimizes the internal node structure of B+-trees.  i.e.

http://www.pittsburgh.intel-research.net/people/gibbons/papers/fpbptrees.pdf

This approach optimizes the in-memory cache performance of the linear  
scan through the keys of a large BTree node.  To Robert's point,  
having a basic BTree implementation gets us a good chunk of the way to  
this, without the extra complexity of a sophisticated node structure.   
The performance gains of adding that complexity could be quite  
interesting, if we are traversing these structures in memory often  
enough, but I suspect we're going to find with any first pass  
implementation that there are some different issues with Lisp that may  
suggest a different optimization strategy.  (e.g. typical properties  
of the data, overhead of data conversion, etc).  Moreoever, if we're  
disk dominated this performance enhancement ends up on the wrong-end  
of Ahmdal's law

S(b) trees improve disk read performance at the expense of  
complexity.  The key idea is to make long-range scans more efficient  
by changing the disk page allocation strategy to ensure that most/all  
of the children of a B-Tree block are allocated in contiguous pages on- 
disk so you can fetch many pages at once, knowing that in the average  
case you're going to be reading more pages in that contiguous region.   
Update cost is increased, but if updates are far less frequent than  
reads the result is a net win.  This also motivates putting more data  
in-line in the leaf nodes instead of only storing references or short  
key/values.

Reference: http://citeseer.ist.psu.edu/cache/papers/cs/1208/http:zSzzSzwww.cs.umb.eduzSz 
~poneilzSzsb-tree.pdf/the-sb-tree-an.pdf

My observation in my own applications is that I spend a great deal of  
time waiting on disk access - anything we can do to pay in-memory ops  
or computation in exchange for less average disk access time is likely  
to be a bigger win than reducing in-memory costs.

The nice thing about both of these hacks is that they happen under the  
B-Tree abstraction so there is no need to rush to implement them.

Ian

On Mar 26, 2008, at 11:17 PM, Robert L. Read wrote:
> On Wed, 2008-03-26 at 20:33 +0100, Leslie P. Polzer wrote:
>> I suppose this is a good opportunity for me to chime in with
>> a few thoughts. Aren't B+trees a choice that is too conservative
>> for a modern storage backend?
>> There seem to be more modern data structures (S(b) trees or
>> fractal trees) that are especially well suited for storing
>> variable-length keys.
>
> Perhaps.  I would certainly hope that our design would allow such
> structures to be swapped out as a "strategy" pattern.
>
> Personally, I had never heard of those structures until you mentioned
> them, and a quick search does not yield a concise description of their
> advantages---or of how to implement them.
>
> If you can briefly describe the differences that would be great.  On  
> the
> other other hand, a LISP-native back end using B+-trees would be a  
> nice
> leap forward for us; using something better might be even better but
> would not lift us into a new valence band of quality.
>
> The fact that one can find example implementations, possibly even in
> LISP, of the B+-tree, is an advantage.
>
> My personal philosophy is gradualism --- crawling is the best way to
> learn to walk, and having a B+-tree implementation gets us half the  
> way
> to the latest data structure.
>
>>
>>  Leslie
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel