[elephant-devel] Re: BDB vs postmodern
Alex Mizrahi
killerstorm at newmail.ru
Thu Feb 21 11:12:03 UTC 2008
> So, putting licensing issues aside, what is the real difference/ advantage
> of one data store over the other? In a recent email I read by Alex, he
> mentions he's going to work on improving performance on the postmodern
> data store. So, even after that improves and performance is matched with
> that of BDB, is there an advantage of one data store over the other?
matching performance of BDB is not in postmodern backend design goals, and I
don't think it's possible.
having storage engine separate from application introduces significant
communtication overhead -- with each query taking 0.1-1 ms, we simply cannot
do more than 1000-10000 queries per second from a single client.
elephant reads/writes slots individually, so this means we cannot have more
than 1000-10000 slots operations per second. that sounds quite prohibitive.
lR> Since Postgres does allow for features such as replication,
lR> clustering,
to my knowledge it doesn't work good enough, so probably you'll be limited
to having only one active DB machine
lR> and fail-over with multiple active simultaneous client connections,
lR> does this mean that I could have multiple (separate) lisp clients using
lR> elephant connecting to a separate Postgres cluster with no concurrency
lR> issues?
yes, that is an idea behind postmodern backend -- it was made to allow
scaling to multiple client, multiple machines..
as i understand, at the time db-postmodern was created, other backends did
not work correctly even if multiple processes on same machine connect to
single DB.
at least, in grand-prix test suite (that tries to work with DB in two
processes simultaneously) there is a remark: "Currently (jan 07) is expected
to fail." for elephant/bdb engine.
besides multiple clients, there are some other benefits of using PostgreSQL
as a backend -- for example, it doesn't suffer from locking issues due to
MVCC approach: readers do not block writers, and it never says out-of-memory
when you read too much.
however, we've found scalability problems mentioned above -- Lisp
application using persistent objects as if they were local does too many
slot accesses, reading same data many times etc.
the solution seems to be trivial: implement caching on a client side. simple
solution is to cache data within single transaction. complex solution is to
cache data accross transaction, tracking changes and invalidating stale
cache entries automatically.
we've implemented both options. so now it's possible to have all slots
cached on a Lisp side. as long as there are much more reads than writes
(which is common for web applications, for example), strain on a database is
significantly decreased. all index queries (select object by value or by
range) still work via database, but typically there's much more slot queries
than index queries.
but now we have performance not directly comparable with BDB -- it depends
on usage patterns now. for one kind of applications caching might work very
well and we'll get performance that is better than BDB's one. on other kind
of applications caching could be just additional overhead, and postmodern's
performance will suck.
also keep in mind that indeed postmodern backend is "younger" and it can
still have some glitches.
More information about the elephant-devel
mailing list