[elephant-devel] A Modest Proposal

Ian Eslick eslick at csail.mit.edu
Sat Sep 9 16:57:49 UTC 2006


Dear Developers,

Every time I have to work directly with BTrees I find myself become
quite vexed.  When I was playing around with factoring out the backends
I had a first hand opportunity to observe how Robert had to play games
to get the SQL back end to implement the BTree interface.   It strikes
me that "the right thing" is to define a more abstract interface that is
exported from the elephant package by default and plan to deprecate the
BTree interface.

I would like to propose to put a new table/index design into the
roadmap, along with a period of guaranteed backwards compatibility for
the BTree interface to expire in a year or so and which does not bind
any new backends to support it.  Existing systems which directly use the
BTree interface can still access it but perhaps by importing it directly
from the backend or via a special package.

So what is the right persistent collection model?  We want something
that plays nicely with transactions, doesn't place any onerous
performance burden on backends and plays nicely with the existing BDB
and SQL implementations.

- An iterator model?  Problematic because iterators are meant to be
something akin to closures, an object that encapsulates the state of an
iteration.  It would be easy to move an iterator around outside the
dynamic extent of a transaction which, from experience, would lead to
crashes in the BDB backend which is deadly for new users and annoying to me.

- A more abstract table class?  What methods?
  - (defclass table ...)
  - (get-value key table) => value
  - (find-key value table) => key, pointer
  - (do-table () )
  - (defclass table-pointer ...)
     - (get pointer)
     - (set pointer)
    - (predecessor pointer &key (skip-dups nil) (keys-only nil)) => key,
value ; update pointer
    - (successor pointer &key (skip-dups nil) (keys-only nil)) => key,
value ; update pointer

This proposes to add a layer of in-memory bookkeeping that can do the
right thing if the bookkeeping objects fall outside of the scope of a
given transaction.  (For example, cursors are released when the
transaction terminates regardless of whether with-btree-cursor was used
and can be re-opened if the pointers are used in a new transaction -
traversal state can now persist cheaply across transactions). 

In fact this would allow a simple iterator model to be supported as well
as the kind of multiple-pointer traversals common in compiled queries. 
This barrier between the user and cursors would simplify use of the BDB
backend, at a minimum.

- Do we want to implement something like the AllegroStore persistent set
object?  This would be something that only keeps OID's so it's
lightweight to serialize, store and manipulate but the retrieval
function de-references appropriately.

I need to do some more research into all this, but wanted to put my
thoughts out there and invite comment.

Ian





More information about the elephant-devel mailing list