[elephant-devel] summer of code: native lisp backend?
Ian Eslick
eslick at media.mit.edu
Thu Mar 27 14:05:27 UTC 2008
Just to capture these references for posterity.
By fractal trees I think Leslie means variations on this work which
optimizes the internal node structure of B+-trees. i.e.
http://www.pittsburgh.intel-research.net/people/gibbons/papers/fpbptrees.pdf
This approach optimizes the in-memory cache performance of the linear
scan through the keys of a large BTree node. To Robert's point,
having a basic BTree implementation gets us a good chunk of the way to
this, without the extra complexity of a sophisticated node structure.
The performance gains of adding that complexity could be quite
interesting, if we are traversing these structures in memory often
enough, but I suspect we're going to find with any first pass
implementation that there are some different issues with Lisp that may
suggest a different optimization strategy. (e.g. typical properties
of the data, overhead of data conversion, etc). Moreoever, if we're
disk dominated this performance enhancement ends up on the wrong-end
of Ahmdal's law
S(b) trees improve disk read performance at the expense of
complexity. The key idea is to make long-range scans more efficient
by changing the disk page allocation strategy to ensure that most/all
of the children of a B-Tree block are allocated in contiguous pages on-
disk so you can fetch many pages at once, knowing that in the average
case you're going to be reading more pages in that contiguous region.
Update cost is increased, but if updates are far less frequent than
reads the result is a net win. This also motivates putting more data
in-line in the leaf nodes instead of only storing references or short
key/values.
Reference: http://citeseer.ist.psu.edu/cache/papers/cs/1208/http:zSzzSzwww.cs.umb.eduzSz
~poneilzSzsb-tree.pdf/the-sb-tree-an.pdf
My observation in my own applications is that I spend a great deal of
time waiting on disk access - anything we can do to pay in-memory ops
or computation in exchange for less average disk access time is likely
to be a bigger win than reducing in-memory costs.
The nice thing about both of these hacks is that they happen under the
B-Tree abstraction so there is no need to rush to implement them.
Ian
On Mar 26, 2008, at 11:17 PM, Robert L. Read wrote:
> On Wed, 2008-03-26 at 20:33 +0100, Leslie P. Polzer wrote:
>> I suppose this is a good opportunity for me to chime in with
>> a few thoughts. Aren't B+trees a choice that is too conservative
>> for a modern storage backend?
>> There seem to be more modern data structures (S(b) trees or
>> fractal trees) that are especially well suited for storing
>> variable-length keys.
>
> Perhaps. I would certainly hope that our design would allow such
> structures to be swapped out as a "strategy" pattern.
>
> Personally, I had never heard of those structures until you mentioned
> them, and a quick search does not yield a concise description of their
> advantages---or of how to implement them.
>
> If you can briefly describe the differences that would be great. On
> the
> other other hand, a LISP-native back end using B+-trees would be a
> nice
> leap forward for us; using something better might be even better but
> would not lift us into a new valence band of quality.
>
> The fact that one can find example implementations, possibly even in
> LISP, of the B+-tree, is an advantage.
>
> My personal philosophy is gradualism --- crawling is the best way to
> learn to walk, and having a B+-tree implementation gets us half the
> way
> to the latest data structure.
>
>>
>> Leslie
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list