[elephant-devel] Backend musings

Fri Feb 29 13:48:30 UTC 2008

It strikes me that Tokyo Cabinet's only major benefit is have a better  
licensing model.  Getting it to the same state as BDB would be a  
significant project (and it doesn't support Windows, I believe).

Mirroring Robert's sentiments, I think we'd be far better off spending  
limited developer cycles on a lisp-only solution running that solves  
both licensing, integration, performance, usability and other  
headaches experienced with our ffi or socket based solutions.

An elephant data store has to:
- Support persistent slot writes
- Expose a persistent key-value store, indexed key-value store and
   iterator/cursor interfaces over them
- Support a transaction model that guarantees the ACID properties
- Work correctly in a threading model
- Ideally would also work in a multi-process, shared-memory scenario

I think the big design decisions for a lisp data store include:
- Backing-store model (prevalence-style logging, binary paging,  
Rucksack style maps, etc)
- Multi-threaded transaction support (page/object locking vs. per-txn  
replication)
- Multi-process support?

It would also be nice, eventually, to have a lisp server model where  
we can launch a lisp instance on the same or a remote machine to serve  
access to the underlying store.

My latest thinking is that we can spend disk space to save time/ 
complexity and assume that we have lots of main memory available.   
This can allow us to avoid messing with fixed-sized pages and locking.

So an initial approach might be the following:
- BTree-style nodes and slot value writes are duplicated in a per- 
transaction memory log.
   (btree ops are compressed into edit operations, but the altered  
page is also duplicated for later reads)
- Any reads to these subsystems first look up data in this transaction  
cache
- We serialize commit ops and cancel active transactions that have  
read or written data
   written by the commiting transaction.
- Thus, on commit, we just write to the log file, update the btree  
pages, and write to the slot store

We can make serialization of btree nodes easier by just serializing  
the lisp structure and having bins of differently sized disk regions  
to write them to rather than maintaining fixed sized pages.  Later, we  
could choose to enhance the system by converting to a low-level read/ 
write into the C byte stream that we fetch from files.

Just a few thoughts...

Ian

PS - I don't see B+Trees in CL-CONTAINERS.  Is it in there with  
another name?  Also, we need a B+Tree that actually is mapped to a  
disk file rather than into memory as seems to be the case with most of  
cl-containers

On Feb 29, 2008, at 5:45 AM, Leslie P. Polzer wrote:

>
> I have been thinking about new backend possibilities, and
> would like to know your opinion.
>
> Tokyo Cabinet would require us to write a FFI for it.
> I don't know what it does to prevent deadlocks; probably
> nothing, as I have looked at its B tree code and have
> found just mutexes without any special handling
> whatsoever.
>
> iI'm going to ask the author for more clarification),
> a slight performance hit and possibly a less convoluted
> glueing code than the FFI solution. Plus, one could
> offload the storage to another machine, of course.
>
> Then there would be the possibility of using the
> B+Trees provided by CL-CONTAINERS. This would
> be a Lisp base upon which one could build upon,
> layering transactions and file storage above.
>
> What do you think?
>
>  Leslie
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel