[elephant-devel] Lisp Backend Architecture

Tue Oct 28 17:30:22 UTC 2008

I am hoping to clarify some of the prior discussions[1] about the native
Lisp backend for Elephant and propose a basic architecture.  Hopefully we
can modularize development so if somebody wants to hack for a few days on
the backend he can avoid being overwhelmed.

Multiprocess support: What features do the major lisps have for locking and
concurrency?  What features are we missing from C/linux--what does BDB do in
this regard that is hard to do in Lisp?  I do not know too much about
implementing efficient multi-threaded lisp programs where there is a lot of
contention.

Distribution: Are others interested in making the backend distributed?  I
prefer the design to allow adding an efficient distributed architecture at a
later date.

Custom indexing: A lot of discussion has gone into the best implementation
practice for BTrees.  But what about other indexing structures?
Multidimensional indexing is relevant to my project now because we have
spacial information (longitude latitude pairs) we would like to query.
Right now I am using a kd-tree with nodes made persistent by elephant, but
usually this sort of index would be implemented by the database itself.  I
imagine multi-dimensional indexing could improve queries, too.

There are other indexing needs as well.  Document-level search is another
common case.  I imagine an efficient search library is beyond our capacity,
but how could we make the database suited to implement search on top of the
thing?  BTrees are not everything, unless I am missing something.

I propose the following basic architecture for the backend:

Logging package with generic undo/redo logging support.  It would be
customizable but it would not depend on other code from the Lisp backend.  A
database could plug into the log by implementing undo, redo, and
serialization functions.  By modularizing logging, it should make it easier
to hack on it without being familiar with the rest of the backend.  It will
also be reusable, for what it's worth.

A persistent heap.  Beneath indices and data storage would be a heap-on-disk
layer with transaction support.  The heap would have methods for reading,
writing, and locking sequences of bytes.  It would also handle   The
persistent heap alone would qualify as a database and might be useful as a
basis for other DB projects or experiments.  Cachine would probably not be
implemented at this level, though that would be easiest.  It may be possible
to implement the transactional heavy lifting for the persistent heap and
make it extensible enough to be used throughout.

B-Trees.  A MOP-based implementation of B-Trees may be customizable enough
to plug into the the rest of the system with a few additional method
declarations for the persistent, transactional version.  Specialized methods
would access the persistent heap for reads and writes.  Locking of the sort
done by BDB (level 2 etc.) could probably be accomplished in these methods
specialized for our persistent B-Tree.

Bkd-Trees and other indexing structures could be implemented similarly to
B-Trees.

Caching.  B-Tree nodes or other lispy objects should be cached as opposed to
byte arrays.  Caching of B-Tree nodes requires additional multiprocess code,
so not all the transactional magic happens in the persistent heap layer.
I'm a little fuzzy on exactly how this will work, but it sounds reasonable
enough.

The DB user would interact mostly with the B-Tree  and other indexing
structures, and with transactions.

How does this sound for a basic architecture?  The multiprocess details are
not fleshed out so your comments are appreciated.  I have attached my basic
implementation of a logger and persistent heap to make this clearer.

Best regards,

Red

[1]  The most lengthy discussion I can find is here:
http://common-lisp.net/pipermail/elephant-devel/2008-May/004108.html
The trac has a page on the Lisp backend:
http://trac.common-lisp.net/elephant/wiki/LispDataStore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20081028/d566669c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: persistent-heap.tar.gz
Type: application/x-gzip
Size: 7163 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20081028/d566669c/attachment.bin>