[elephant-devel] Re: BDB vs postmodern

Thu Feb 21 11:12:03 UTC 2008

> So, putting licensing issues aside, what is the real difference/ advantage 
> of one data store over the other? In a recent email I read  by Alex, he 
> mentions he's going to work on improving performance on  the postmodern 
> data store. So, even after that improves and  performance is matched with 
> that of BDB, is there an advantage of one  data store over the other?

matching performance of BDB is not in postmodern backend design goals, and I 
don't think it's possible.
having storage engine separate from application introduces significant 
communtication overhead -- with each query taking 0.1-1 ms, we simply cannot 
do more than 1000-10000 queries per second from a single client.
elephant reads/writes slots individually, so this means we cannot have more 
than 1000-10000 slots operations per second. that sounds quite prohibitive.

 lR> Since Postgres does allow for features such as replication,
 lR> clustering,

to my knowledge it doesn't work good enough, so probably you'll be limited 
to having only one active DB machine

 lR>  and fail-over with multiple active simultaneous client connections,
 lR> does this mean that I could have multiple (separate) lisp clients using
 lR> elephant connecting to a separate Postgres cluster with no concurrency
 lR> issues?

yes, that is an idea behind postmodern backend -- it was made to allow 
scaling to multiple client, multiple machines..
as i understand, at the time db-postmodern was created, other backends did 
not work correctly even if multiple processes on same machine connect to 
single DB.
at least, in grand-prix test suite (that tries to work with DB in two 
processes simultaneously) there is a remark: "Currently (jan 07) is expected 
to fail." for elephant/bdb engine.

besides multiple clients, there are some other benefits of using PostgreSQL 
as a backend -- for example, it doesn't suffer from locking issues due to 
MVCC approach: readers do not block writers, and it never says out-of-memory 
when you read too much.

however, we've found scalability problems mentioned above -- Lisp 
application using persistent objects as if they were local does too many 
slot accesses, reading same data many times etc.

the solution seems to be trivial: implement caching on a client side. simple 
solution is to cache data within single transaction. complex solution is to 
cache data accross transaction, tracking changes and invalidating stale 
cache entries automatically.

we've implemented both options. so now it's possible to have all slots 
cached on a Lisp side. as long as there are much more reads than writes 
(which is common for web applications, for example), strain on a database is 
significantly decreased. all index queries (select object by value or by 
range) still work via database, but typically there's much more slot queries 
than index queries.

but now we have performance not directly comparable with BDB -- it depends 
on usage patterns now. for one kind of applications caching might work very 
well and we'll get performance that is better than BDB's one. on other kind 
of applications caching could be just additional overhead, and postmodern's 
performance will suck.

also keep in mind that indeed postmodern backend is "younger" and it can 
still have some glitches.