[elephant-devel] Backend musings

Fri Feb 29 19:48:24 UTC 2008

On Feb 29, 2008, at 2:31 PM, Henrik Hjelte wrote:

> On Fri, Feb 29, 2008 at 2:48 PM, Ian Eslick <eslick at media.mit.edu>  
> wrote:
>> An elephant data store has to:
>> - Support persistent slot writes
>> - Expose a persistent key-value store, indexed key-value store and
>>   iterator/cursor interfaces over them
>> - Support a transaction model that guarantees the ACID properties
>> - Work correctly in a threading model
>> - Ideally would also work in a multi-process, shared-memory scenario
>
> To me the last point is is very important. Even more important than
> multithreading, and you will probably get multithreading support at
> the same time if you implement it correctly. A database that can't be
> used from several computers is not much of a database. It can be
> tricky, but should be doable.
>
> If you think conceptually dividing client and server code, and make a
> generic interface between these parts. Then you can have different
> ways of implementing the client-server communication protocol. For a
> single process application, the client-server protocol can of course
> be simple function calls between the server code and the client code
> in the same lisp process. Then for a multi process setup you can
> probably want different ways of communicating. One "simple" and
> straightforward socket interface that works on all platforms, then
> perhaps you (at least I will) want to add another performance
> optimized communication protocol that might be less generic. From the
> start, what need to be done is to mentally see the different client
> and server parts, and make sure no state (variables and memory areas)
> are shared between them. Perhaps have an eye on the design so if
> possible trying to keep the number of roundtrips betwen client and
> server down, Then, it probably becomes easy to add a basic simple
> client-server socket solution.
>
> As an extra bonus, having a clear client/server mindset from the start
> might make the design clearer and better.

What would be the goal of having a lisp-only client/server database  
vs. using the existing postmodern model or cl-perec or one of the  
other ORM solutions that exists?  Wouldn't the socket protocol tends  
to swamp any server-side performance benefits?

Do you how locking is handled and conflicts detected in something like  
Postgresql?

>>
>> I think the big design decisions for a lisp data store include:
>> - Backing-store model (prevalence-style logging, binary paging,
>> Rucksack style maps, etc)
>> - Multi-threaded transaction support (page/object locking vs. per-txn
>> replication)
>> - Multi-process support?
>
> Strong vote for...
>
>> It would also be nice, eventually, to have a lisp server model where
>> we can launch a lisp instance on the same or a remote machine to  
>> serve
>> access to the underlying store.
>
> See my comments above.
>
>> My latest thinking is that we can spend disk space to save time/
>> complexity and assume that we have lots of main memory available.
>> This can allow us to avoid messing with fixed-sized pages and  
>> locking.
>
> Yes, don't optimize these things too early.
>
>> So an initial approach might be the following:
>> - BTree-style nodes and slot value writes are duplicated in a per-
>> transaction memory log.
>>   (btree ops are compressed into edit operations, but the altered
>> page is also duplicated for later reads)
>> - Any reads to these subsystems first look up data in this  
>> transaction
>> cache
>> - We serialize commit ops and cancel active transactions that have
>> read or written data
>>   written by the commiting transaction.
>> - Thus, on commit, we just write to the log file, update the btree
>> pages, and write to the slot store
>
> In this model, isn't there a risk of problems:
> Transaction A starts, the world is in a coherent state and you want to
> get data from different b-trees that are related.
> Transaction B starts, does his things quick and writes new things to
> the database.
> But now, if transaction A wants to read some data that B has updated,
> it will suddently get the brand new fresh data from transaction B,
> which was not intended since A wants to read all data as the world
> looked from the beginning.

Transaction B doesn't write to the database until it's committing, at  
which time transaction A is aborted because B wrote over data that A  
read.  I'm describing something very similar to versioning, except not  
at the object duplicate-on-write model which doesn't fit with the  
current BDB architecture.

> Contrasted by a version numbered transaction model that don't
> overwrite data (like Rucksack does), transaction A can always get
> coherent data since B doesn't overwrite the db, it just write a new
> version.

Versioning, or duplication, makes sense while the transactions are  
occurring.  If there is a conflict between A and B then in the  
versioned case you need to decide which version will be the surviving  
version for a following transaction C.

> Am I right, is this a problem?
>
>> - Thus, on commit, we just write to the log file, update the btree
>> pages, and write to the slot store
>
> If we do a client/server solution, these kind of things can be
> performance optimized perhaps by splitting it into different
> subprocesses.
> The server can maybe just update the log file, save data in its cache,
> then return OK/Pending to the client, so the client can continue. Then
> the server process can continue doing the rest of the work until it is
> finished. This could be done with a multithreaded design as well. But,
> we can leave it for later and start out with something simple. But it
> could be something to think about in the design.

>>
>> We can make serialization of btree nodes easier by just serializing
>> the lisp structure and having bins of differently sized disk regions
>> to write them to rather than maintaining fixed sized pages.  Later,  
>> we
>> could choose to enhance the system by converting to a low-level read/
>> write into the C byte stream that we fetch from files.
>
> Agree. Better hunt performance later. It is not obvious what is the
> most limiting factor for performance from the start.
>
> About other B+tree implementations, there is one in rucksack as well,
> memory based but a clean design that can be adapted for disk usage. It
> is also not astrophysics to make one from scratch, or translate from
> one of the many implementations for other languages.
>
> About project management, how do we discuss the design and what needs
> to be done in the best way? Mails like these? The only problem with
> that is that it might be difficult to contribute to the discussion
> when you don't have the time to respond to these mails, so a design
> document or wiki could also be good because you can contribute more
> when you have time rather than when the discussion is ongoing on the
> list. Or is the mailing list enough, I am not sure...

I think if there is a critical mass of people interested, then we  
should start a design document on the Trac wiki.  I think all you need  
is a common-lisp.net account and be giving a Trac password.

> /Henrik
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel