Hi,<div><br></div><div>Thanks for working on this. I am particularly interested in the potential of this backend because I need (a) speed and (b) the ability to build a binary which will run on end-user machines without requiring them to install a separate library.</div>

<div><br></div><div>If you send me the necessary patches (or get them integrated into their respective libraries), I would be happy to help you test this code on my application.</div><div><br></div><div>I look forward to hearing from you.</div>

<div><br><div class="gmail_quote">On Mon, Feb 2, 2009 at 12:29 PM, Ian Eslick <span dir="ltr"><<a href="mailto:eslick@media.mit.edu">eslick@media.mit.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi All,<br>

<br>

I was inspired the other day to come up with a minimal, quick-and-<br>

dirty, all-lisp data store for Elephant based on cl-prevalence.  The<br>

problem with cl-prevalence is that every access to a persistent slot<br>

has to be explicitly transactionally protected so it can be<br>

recovered.  This is onerous and violates the abstractions provided by<br>

the rest of the elephant data stores.<br>

<br>

The general idea is to hide the cl-prevalence transaction model inside<br>

the existing meta-object protocol.  Instead of the usual hash table,<br>

we use trees from cl-collections as our replacement for btrees and<br>

indexes - this gives us reasonably efficient successor/predecessor ops<br>

and makes it easier to implement the mapping and cursor APIs.<br>

<br>

It's not going to be nearly as fast as a full prevalence solution, but<br>

it will be faster than BDB and should be much easier to install and<br>

work with on a single-image basis.  The new db-clp store requires a<br>

patches to cl-prevalence and cl-containers.  I'll see about getting<br>

those put into the mainstream, but can send them to anyone interesting<br>

in hacking on this in the meantime.<br>

<br>

I checked a prototype of this into elephant-1.0 today; it does not<br>

interfere with existing functionality at all and I'm not sure yet<br>

whether it will be part of 1.0.  You can open a new store, subject to<br>

the following caveats, by:<br>

<br>

In elephant root: ln -s src/contrib/eslick/db-clp/ele-clp.asd ele-<br>

clp.asd<br>

<br>

(open-store '(:CLP "/home/me/db/"))<br>

<br>

where /home/me/db/ is a fresh directory.<br>

<br>

85%+ of the tests pass and a ton of stuff does work: persistent<br>

classes, btrees, dup-btrees, indexed-btrees, mapping (mostly), and<br>

class indexing (mostly).<br>

<br>

<br>

However, there are still some very serious holes.<br>

The ones I'm aware of are:<br>

<br>

- You can only create, but not re-open a store<br>

   (this is due to a bootstrapping problem in recreating persistent<br>

instances<br>

    when loading a snapshot)<br>

<br>

- No cursors are supported (all those tests fail), but should be easy<br>

as there is a good match between the RB tree and the btree<br>

abstractions but you will need to add a couple of functions to cl-<br>

containers.<br>

<br>

- On-line recovery has not been tested.<br>

<br>

- I'm unsure of how cl-prevalence guarantees global transaction<br>

serializability - I don't see locks anywhere in the code.<br>

<br>

- I faked out a bunch of serializer tests by including the default<br>

serializer,<br>

   because they depend on buffer streams which the xml serializer<br>

doesn't support;<br>

   the tests should be fixed to be more general and not rely on buffer<br>

streams.<br>

<br>

- Some issues in schema evolution tests<br>

<br>

Performance issues:<br>

<br>

- Reads need to be serialized so are currently considered transactions<br>

for simplicity, but they write a transaction log - they should be<br>

rewritten to only use the serialization mechanism but not write a log<br>

and the grabbing of a lock during serialization should be done once<br>

per with-transaction call.<br>

<br>

- with-transaction is a no-op, a significant performance enhancement<br>

would be to bundle a set of primitive operations such as tx-write-<br>

slot, tx-remove-kv and write a single prevalence log entry for them.<br>

This would avoid a disk write per primop.<br>

<br>

- I started using splay trees, retreated to RB trees due to bugs and<br>

finally retreated to binary-search-trees due to more bugs.  Moving to<br>

a more efficient, balanced tree data structure will improve the<br>

performance of all operations.  However, several tests use linear<br>

insertion, reducing the asymptotic behavior of the tree to that of a<br>

list - it really grinds the tests to a halt.<br>

<br>

Obvious next steps, in order of increasing difficulty:<br>

- Implement cursor API<br>

- Fix red-black or splay tree implementation in cl-containers<br>

- Figure out how to load from a snapshot<br>

- Use with-transaction to improve performance<br>

- Read transaction optimizations<br>

<br>

Regards,<br>

Ian<br>

<br>

<br>

<br>

_______________________________________________<br>

elephant-devel site list<br>

<a href="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</a><br>

<a href="http://common-lisp.net/mailman/listinfo/elephant-devel" target="_blank">http://common-lisp.net/mailman/listinfo/elephant-devel</a><br>

</blockquote></div><br><br clear="all"><br>-- <br>Elliott Slaughter<br><br>"Any road followed precisely to its end leads precisely nowhere." - Frank Herbert<br>

</div>