[elephant-devel] Discussions

Sat May 17 03:22:46 UTC 2008

Robert said:

 > I'll go out on a limb and say that offering object-level caching is
 > the single biggest performance enhancement we make for the most  
common
 > cases.

A clarifying question.  How did you ensure ACID properties in the DCM  
scenario in the presence of threading?  Without letting BDB or sql  
know about the reads that you've done, you can't tell if a prior  
transaction has clobbered on data that you are currently using because  
the reads are directly from memory.

e.g. You can easily read the old 'balance' on the checking account, do  
your computing while someone else has written that same object, then  
write back an incorrect value.

Rucksack tracks changes by versioning objects in memory and rolling  
back newer versions when older versions are committed.  This is a copy- 
on-write model which keeps everything in memory during the  
transaction, but then writes the txn log and a version of the object  
to disk, updating the in-memory 'valid' version as appropriate.

Leslie had a good related e-mail on this topic a few days ago:

> I don't know what the best decision might be here.
> But I have a use case that might help; it has the following
> features:
>
>  * I access the slots of two persistent objects.
>
>  * The number of the slots and the times requested
>    together produce very bad performance (think seconds)
>    even with PM txn caching (for comparison, BDB is about
>    three times faster)
>
>  * The environment is multi-threaded (web server), but the
>    slots won't be changed by any other process.
>
>  * Ideally the slots would be cached only for this one
>    function and the functions called by it (and only
>    per-invocation, i.e. slot caches get refreshed right at
>    the beginning of the function).
>
>  * This is currently the only place in my app where I would
>    need the performance advantages of slot caching. In all
>    other places ACID is highly preferred and speed is sufficient.
>
>  * The desired behaviour can be somewhat modelled by CLSQL's
>    OO interface:
>
>      - get the objects from the DB at the beginning
>
>      - work with those in-memory objects
>
>      - write back the values to the DB at the end of the process
>
>    The difference is that I don't want the whole object (other slot
>    values of it might be changed from outside!) but only a few
>    selected slots.

I think we can basically do this today.  A refresh command simply  
reads from the DB for all cached slots (in a transaction this is  
thread safe and avoids the aforementioned problem).  You operate on  
the cached data, nothing happens in the transaction, at the end you do  
a 'save' and those cached slots get written to disk.
I think this meets leslie's use case and I think it's an hour or two  
to implement on top of what is already there.

 > However, I don't know if this is more important than a native-lisp
 > backend, or a query-language.  For the next year at least I am  
working
 > at a job rather than working on my lisp application; and even then  
I was
 > happy with the performance I was getting out of DCM.  So I personally
 > don't have performance need that drives anything.  I wish I knew how
 > many new users we would have from better performance vs. a native- 
lisp
 > backend vs. a query-language, or what our existing users would  
prefer.

My two dollars on this topic is that the most interesting thing to  
improve adoption and overall utility is a lisp-only backend to get  
going with.  The most interesting value to the current users,  
including myself, is a query system that manages and abstracts some of  
the performance query hacks that today you have to write yourself in  
lisp, often over and over.

I think of the query system, by the way, as a DSL (domain specific  
language) extension of lisp, not a SQL syntax.  So it's not an either  
or, it's exactly what Lisp was meant to do, enable linguistic  
abstraction that makes thinking about a given problem easier.  That's  
what I think when I hear 'lisp as the query language'.

Rucksack strikes me as the best way to start on the lisp-only front,  
because so much is there.  It's a non-trivial port/adaptation so  
someone needs to be willing to put in a week or two (at least) of  
serious effort.

I think we may also be able to change it so that it only writes a  
transaction log and doesn't write the underlying DB unless something  
is flushed from the cache.  What I like about Rucksack for a more  
prevalence style model (and maybe I'm misreading this and it's not  
flushing objects to disk on each write) is that it already implements  
versioning as its transaction model, which gets around fine-grained  
locking performance problems.  If we add in Robert's DCM ideas about  
having a cache instead of the whole DB in memory, then we could  
imagine writing flushed objects to disk and effectively incrementally  
syncing the memory objects to disk rather than having to do a full  
snapshot every so often.

Regards,
Ian

Ian