[elephant-devel] Lisp Btrees: design considerations

Leslie P. Polzer leslie.polzer at gmx.net
Wed May 14 07:28:35 UTC 2008


> For example, in BDB, the primitive is the page.  BTree nodes are layed
> out in one or more pages, each page has some binary metadata
> explaining it's type and organization (free list, etc).  A short key
> value is written directly into the page, a long one is written into an
> overflow page, etc.

InnoDB also uses this approach.


> - new keys and values of differing or less length are written in
> place, otherwise new
>    space is allocated at the end of the file.

Okay, but maybe let the user reserve extent space.


> - transactions simply keep track of the primitive operations on the
> database and the associated data in a memory queue and write those ops
> to disk as part of the txn commit process.  The pages and key/value
> pairs that will be touched in that operation are also stored in that
> txn log.
> - when a transaction commits, it replays the log to write everything
> to disk appropriately.  The list of touched data is then passed up the
> commit chain to invalidate any pending transactions that have a
> conflict.  Everything is speculative in this case, but we don't have
> to deal with locking.

I like this approach.


> A big decision is:
> - Use cffi/uffi and do much of the serialization & btree
> implementation in C/static memory

IME this FFI stuff can be quite hard to debug.


> I lean towards using cffi to manipulate static data, just because it's
> going to be easier to get performance via that method and it's also
> going to be much easier to do a multi-process implementation (by
> operating on static memory and primitive locks in a shared memory
> region).

I cannot estimate the performance trade-offs involved here,
but in general I'm in favor of a Lisp-based approach...


>    - the binary pages could be stored in static data

I don't think I understand this. What does "static" mean here
(and above)?


> and the primitives btree ops
>      could directly manipulate data within the page?  We pass a C
> function that
>      directly sorts binary sequences rather than having to deserialize
> to sort.  We'd
>      need to write that in lisp to operate on static data or on lisp
> arrays.  Deserializing
>      on each key comparison is too expensive.

Yes.


> A nice auxiliary hack would be:
> - rewrite memutils to entirely use uffi/cffi to manipulate static data
> rather
>    than calling out to C to do it.  Maintains efficiency but removes
> the compilation
>    build step except for users of BDB

I have looked at memutils/the serializer, but it's very hard for me
to replace it by something else, because I'm not sure what would be
required of a replacement.

The partial conclusion to which I came was that memutils just models
a bivalent stream so the backend can communicate with the serializer.

The data format is hard to figure out for me, however, because of
all the UFFI stuff involved...

  Leslie




More information about the elephant-devel mailing list