Hi,<div><br></div><div>Thanks for working on this. I am particularly interested in the potential of this backend because I need (a) speed and (b) the ability to build a binary which will run on end-user machines without requiring them to install a separate library.</div>
<div><br></div><div>If you send me the necessary patches (or get them integrated into their respective libraries), I would be happy to help you test this code on my application.</div><div><br></div><div>I look forward to hearing from you.</div>
<div><br><div class="gmail_quote">On Mon, Feb 2, 2009 at 12:29 PM, Ian Eslick <span dir="ltr"><<a href="mailto:eslick@media.mit.edu">eslick@media.mit.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi All,<br>
<br>
I was inspired the other day to come up with a minimal, quick-and-<br>
dirty, all-lisp data store for Elephant based on cl-prevalence. The<br>
problem with cl-prevalence is that every access to a persistent slot<br>
has to be explicitly transactionally protected so it can be<br>
recovered. This is onerous and violates the abstractions provided by<br>
the rest of the elephant data stores.<br>
<br>
The general idea is to hide the cl-prevalence transaction model inside<br>
the existing meta-object protocol. Instead of the usual hash table,<br>
we use trees from cl-collections as our replacement for btrees and<br>
indexes - this gives us reasonably efficient successor/predecessor ops<br>
and makes it easier to implement the mapping and cursor APIs.<br>
<br>
It's not going to be nearly as fast as a full prevalence solution, but<br>
it will be faster than BDB and should be much easier to install and<br>
work with on a single-image basis. The new db-clp store requires a<br>
patches to cl-prevalence and cl-containers. I'll see about getting<br>
those put into the mainstream, but can send them to anyone interesting<br>
in hacking on this in the meantime.<br>
<br>
I checked a prototype of this into elephant-1.0 today; it does not<br>
interfere with existing functionality at all and I'm not sure yet<br>
whether it will be part of 1.0. You can open a new store, subject to<br>
the following caveats, by:<br>
<br>
In elephant root: ln -s src/contrib/eslick/db-clp/ele-clp.asd ele-<br>
clp.asd<br>
<br>
(open-store '(:CLP "/home/me/db/"))<br>
<br>
where /home/me/db/ is a fresh directory.<br>
<br>
85%+ of the tests pass and a ton of stuff does work: persistent<br>
classes, btrees, dup-btrees, indexed-btrees, mapping (mostly), and<br>
class indexing (mostly).<br>
<br>
<br>
However, there are still some very serious holes.<br>
The ones I'm aware of are:<br>
<br>
- You can only create, but not re-open a store<br>
(this is due to a bootstrapping problem in recreating persistent<br>
instances<br>
when loading a snapshot)<br>
<br>
- No cursors are supported (all those tests fail), but should be easy<br>
as there is a good match between the RB tree and the btree<br>
abstractions but you will need to add a couple of functions to cl-<br>
containers.<br>
<br>
- On-line recovery has not been tested.<br>
<br>
- I'm unsure of how cl-prevalence guarantees global transaction<br>
serializability - I don't see locks anywhere in the code.<br>
<br>
- I faked out a bunch of serializer tests by including the default<br>
serializer,<br>
because they depend on buffer streams which the xml serializer<br>
doesn't support;<br>
the tests should be fixed to be more general and not rely on buffer<br>
streams.<br>
<br>
- Some issues in schema evolution tests<br>
<br>
Performance issues:<br>
<br>
- Reads need to be serialized so are currently considered transactions<br>
for simplicity, but they write a transaction log - they should be<br>
rewritten to only use the serialization mechanism but not write a log<br>
and the grabbing of a lock during serialization should be done once<br>
per with-transaction call.<br>
<br>
- with-transaction is a no-op, a significant performance enhancement<br>
would be to bundle a set of primitive operations such as tx-write-<br>
slot, tx-remove-kv and write a single prevalence log entry for them.<br>
This would avoid a disk write per primop.<br>
<br>
- I started using splay trees, retreated to RB trees due to bugs and<br>
finally retreated to binary-search-trees due to more bugs. Moving to<br>
a more efficient, balanced tree data structure will improve the<br>
performance of all operations. However, several tests use linear<br>
insertion, reducing the asymptotic behavior of the tree to that of a<br>
list - it really grinds the tests to a halt.<br>
<br>
Obvious next steps, in order of increasing difficulty:<br>
- Implement cursor API<br>
- Fix red-black or splay tree implementation in cl-containers<br>
- Figure out how to load from a snapshot<br>
- Use with-transaction to improve performance<br>
- Read transaction optimizations<br>
<br>
Regards,<br>
Ian<br>
<br>
<br>
<br>
_______________________________________________<br>
elephant-devel site list<br>
<a href="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</a><br>
<a href="http://common-lisp.net/mailman/listinfo/elephant-devel" target="_blank">http://common-lisp.net/mailman/listinfo/elephant-devel</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>Elliott Slaughter<br><br>"Any road followed precisely to its end leads precisely nowhere." - Frank Herbert<br>
</div>