[elephant-devel] BDB vs postmodern

Thu Feb 21 12:44:00 UTC 2008

On Thu, Feb 21, 2008 at 2:44 AM, lists at infoway.net <lists at infoway.net> wrote:
> Hi all,
>
>  I don't mean to start a war here or put any work down. However, I just
>  needed some clarification/direction into which way the data stores
>  work is going.

>  Since Postgres does allow for features such as replication,
>  clustering, and fail-over with multiple active simultaneous client
>  connections, does this mean that I could have multiple (separate) lisp
>  clients using elephant connecting to a separate Postgres cluster with
>  no concurrency issues?

It is going in the scalable direction I hope. Since we have a web
startup in the consumer-segment, we will hopefully need a scalable and
fast solution.
If the postmodern backend can't take it, we can make a new backend,
perhaps made in Lisp all the way.
If elephant itself can't take it we will need a new persistence
solution, but then we will provide a migration opportunity.

The postmodern backend is optimized for use from several processes,
that might be a cluster of several web servers connected to a singe
database engine. If you want to do this you have to choose between
sql, perec or elephant-sql or elephant-postmodern. I experienced
performance problems with AllegroCache for this usecase, but Franz
said that they were going to fix it, so it probably works as well.

When elephant-postmodern is not scalable enough, it is probably
possible to implement a solution that works with clusters of database
servers as well. I have briefly looked into it at some time. The
little tricky part is that the postmodern backend creates a new table
for each btree, so the database schema evolves. Some of the
replication solutions I looked at had a slight problem with that. But,
since we have control of when a new table is created, we have a spot
to trigger schema updates across the cluster, so it is doable.

About performance measurements, since there is no good "real world"
testcase of performance it is difficult to say. Simply counting the
time for the testsuite surely does not measure the effects of caching.
But for a singe user app BDB wins easily.

>  Now, the part that confused me was that of "complex
>  queries and joins". I was certainly under the impression that properly
>  designing your object model could be more beneficial than the relation
>  model and joins of the SQL world. It might have not been directly
>  mentioned on the Elephant project, but has certainly been mentioned by
>  the folks of AllegroCache and, in essence, the two projects seem to
>  have a lot in common.
It is a difficult question to say. It depends on the use-case. One
thing that is good about the sql model is that the query engine can
optimize the query for you, and it might know more of the runtime
environment than you do as a programmer when hand-optimizing the
queries. And in the usual implementation you save roundtrips of data
between server and client. On the other hand, direct pointer access
through an object database might give great performance in some cases.

I am sometimes thinking of ways to marry the two approaches, create a
relational query api to an object oriented database. Or rather make a
new type of relational database embedded in Lisp, with classes as a
datatype (domain), and being able to call methods on the objects in
queries. This is inspired by the writings of C.J Date. This model
might even be more true to the mathematical relational model than
current SQL databases. But that is something for the future, and to
make it you would probably build it on top of a persistence layer like
elephant.

>  One of the things that AllegroCache has that I haven't seen in
>  Elephant is the "oid" keyword parameter to many of their functions. As
>  per their documentation: oid - if true return the object id instead of
>  the object. So, this could be used as a way to speed up certain
>  queries in Elephant, such as getting a COUNT without having to
>  generate the entire object.
Yes, it would be particularly useful for doing homemade joins.

If you could have a more advanced query-api doing joins on the server,
that would save even more roundtrips.

/Henrik