[elephant-devel] Lisp-only data store (prototype)

Mon Feb 2 20:29:50 UTC 2009

Hi All,

I was inspired the other day to come up with a minimal, quick-and- 
dirty, all-lisp data store for Elephant based on cl-prevalence.  The  
problem with cl-prevalence is that every access to a persistent slot  
has to be explicitly transactionally protected so it can be  
recovered.  This is onerous and violates the abstractions provided by  
the rest of the elephant data stores.

The general idea is to hide the cl-prevalence transaction model inside  
the existing meta-object protocol.  Instead of the usual hash table,  
we use trees from cl-collections as our replacement for btrees and  
indexes - this gives us reasonably efficient successor/predecessor ops  
and makes it easier to implement the mapping and cursor APIs.

It's not going to be nearly as fast as a full prevalence solution, but  
it will be faster than BDB and should be much easier to install and  
work with on a single-image basis.  The new db-clp store requires a  
patches to cl-prevalence and cl-containers.  I'll see about getting  
those put into the mainstream, but can send them to anyone interesting  
in hacking on this in the meantime.

I checked a prototype of this into elephant-1.0 today; it does not  
interfere with existing functionality at all and I'm not sure yet  
whether it will be part of 1.0.  You can open a new store, subject to  
the following caveats, by:

In elephant root: ln -s src/contrib/eslick/db-clp/ele-clp.asd ele- 
clp.asd

(open-store '(:CLP "/home/me/db/"))

where /home/me/db/ is a fresh directory.

85%+ of the tests pass and a ton of stuff does work: persistent  
classes, btrees, dup-btrees, indexed-btrees, mapping (mostly), and  
class indexing (mostly).

However, there are still some very serious holes.
The ones I'm aware of are:

- You can only create, but not re-open a store
   (this is due to a bootstrapping problem in recreating persistent  
instances
    when loading a snapshot)

- No cursors are supported (all those tests fail), but should be easy  
as there is a good match between the RB tree and the btree  
abstractions but you will need to add a couple of functions to cl- 
containers.

- On-line recovery has not been tested.

- I'm unsure of how cl-prevalence guarantees global transaction  
serializability - I don't see locks anywhere in the code.

- I faked out a bunch of serializer tests by including the default  
serializer,
   because they depend on buffer streams which the xml serializer  
doesn't support;
   the tests should be fixed to be more general and not rely on buffer  
streams.

- Some issues in schema evolution tests

Performance issues:

- Reads need to be serialized so are currently considered transactions  
for simplicity, but they write a transaction log - they should be  
rewritten to only use the serialization mechanism but not write a log  
and the grabbing of a lock during serialization should be done once  
per with-transaction call.

- with-transaction is a no-op, a significant performance enhancement  
would be to bundle a set of primitive operations such as tx-write- 
slot, tx-remove-kv and write a single prevalence log entry for them.   
This would avoid a disk write per primop.

- I started using splay trees, retreated to RB trees due to bugs and  
finally retreated to binary-search-trees due to more bugs.  Moving to  
a more efficient, balanced tree data structure will improve the  
performance of all operations.  However, several tests use linear  
insertion, reducing the asymptotic behavior of the tree to that of a  
list - it really grinds the tests to a halt.

Obvious next steps, in order of increasing difficulty:
- Implement cursor API
- Fix red-black or splay tree implementation in cl-containers
- Figure out how to load from a snapshot
- Use with-transaction to improve performance
- Read transaction optimizations

Regards,
Ian