[elephant-devel] Lisp-only data store (prototype)
Ian Eslick
eslick at media.mit.edu
Mon Feb 2 20:29:50 UTC 2009
Hi All,
I was inspired the other day to come up with a minimal, quick-and-
dirty, all-lisp data store for Elephant based on cl-prevalence. The
problem with cl-prevalence is that every access to a persistent slot
has to be explicitly transactionally protected so it can be
recovered. This is onerous and violates the abstractions provided by
the rest of the elephant data stores.
The general idea is to hide the cl-prevalence transaction model inside
the existing meta-object protocol. Instead of the usual hash table,
we use trees from cl-collections as our replacement for btrees and
indexes - this gives us reasonably efficient successor/predecessor ops
and makes it easier to implement the mapping and cursor APIs.
It's not going to be nearly as fast as a full prevalence solution, but
it will be faster than BDB and should be much easier to install and
work with on a single-image basis. The new db-clp store requires a
patches to cl-prevalence and cl-containers. I'll see about getting
those put into the mainstream, but can send them to anyone interesting
in hacking on this in the meantime.
I checked a prototype of this into elephant-1.0 today; it does not
interfere with existing functionality at all and I'm not sure yet
whether it will be part of 1.0. You can open a new store, subject to
the following caveats, by:
In elephant root: ln -s src/contrib/eslick/db-clp/ele-clp.asd ele-
clp.asd
(open-store '(:CLP "/home/me/db/"))
where /home/me/db/ is a fresh directory.
85%+ of the tests pass and a ton of stuff does work: persistent
classes, btrees, dup-btrees, indexed-btrees, mapping (mostly), and
class indexing (mostly).
However, there are still some very serious holes.
The ones I'm aware of are:
- You can only create, but not re-open a store
(this is due to a bootstrapping problem in recreating persistent
instances
when loading a snapshot)
- No cursors are supported (all those tests fail), but should be easy
as there is a good match between the RB tree and the btree
abstractions but you will need to add a couple of functions to cl-
containers.
- On-line recovery has not been tested.
- I'm unsure of how cl-prevalence guarantees global transaction
serializability - I don't see locks anywhere in the code.
- I faked out a bunch of serializer tests by including the default
serializer,
because they depend on buffer streams which the xml serializer
doesn't support;
the tests should be fixed to be more general and not rely on buffer
streams.
- Some issues in schema evolution tests
Performance issues:
- Reads need to be serialized so are currently considered transactions
for simplicity, but they write a transaction log - they should be
rewritten to only use the serialization mechanism but not write a log
and the grabbing of a lock during serialization should be done once
per with-transaction call.
- with-transaction is a no-op, a significant performance enhancement
would be to bundle a set of primitive operations such as tx-write-
slot, tx-remove-kv and write a single prevalence log entry for them.
This would avoid a disk write per primop.
- I started using splay trees, retreated to RB trees due to bugs and
finally retreated to binary-search-trees due to more bugs. Moving to
a more efficient, balanced tree data structure will improve the
performance of all operations. However, several tests use linear
insertion, reducing the asymptotic behavior of the tree to that of a
list - it really grinds the tests to a halt.
Obvious next steps, in order of increasing difficulty:
- Implement cursor API
- Fix red-black or splay tree implementation in cl-containers
- Figure out how to load from a snapshot
- Use with-transaction to improve performance
- Read transaction optimizations
Regards,
Ian
More information about the elephant-devel
mailing list