[elephant-devel] Discussions

Sat May 17 20:23:19 UTC 2008

On Fri, 2008-05-16 at 23:22 -0400, Ian Eslick wrote:
> Robert said:
> 
>  > I'll go out on a limb and say that offering object-level caching is
>  > the single biggest performance enhancement we make for the most  
> common
>  > cases.
> 
> A clarifying question.  How did you ensure ACID properties in the DCM  
> scenario in the presence of threading?  Without letting BDB or sql  
> know about the reads that you've done, you can't tell if a prior  
> transaction has clobbered on data that you are currently using because  
> the reads are directly from memory.
> 
> e.g. You can easily read the old 'balance' on the checking account, do  
> your computing while someone else has written that same object, then  
> write back an incorrect value.

I'm assuming what I consider to be a standard 3-tiered model, consisting
or a Presentation Layer, and Business Logic Layer, and Persistence
Layer.  Follwing the Law of Demeter, the Presentation Layer is not
allowed to talk to the Persistence Layer.

An operation like making a journal entry in an accounting system, or
more simply even just decrementing an account, is implemented by a
method on the Business Logic Layer or Domain Object Layer (a "Manager"
in the DCM model).  Such a method is normally called a "Business Rule."
Generally, Business Rules tend to demand atomicity, be loggable, and
change the underlying store by calls to the Persistence Layer.  How the
Domain Object Layer enforces ACID rules is its own business.  In
practice, this would typically be done by obtaining locks on the Domain
Object(s) that are being changed.  Since in this model is there is
exactly one official copy of the object in memory at any time, this is
straightforward.  If I'm decrementing a counter, the code in the Domain
Object Layer obtains a lock on the conter.  If I am debiting one account
and crediting another in a financial transaction, I obtain a lock on
both accounts.  If I'm doing something more complicated, I might have to
lock the entire General Ledger, but in general that should not be
necessary.

In this model, the Persistence Layer doesn't help you with ACID
properties.  This works if you assume that Domain Object Layer is the
ONLY thing that can touch the database.  From an organization point of
view, the Manager is the central point of contact for a collection of
objects (like accounts), and nobody is allowed to permanently manipulate
an account except via operations defined on the Manager.  The Manager
can therefore be responsible for concurrency control.

Concurrency at the domain object level could be done via
versioning/journaling as in Rucksack, or via locks.  I have never used
the former, but it sounds like a great idea.

> 
> Rucksack tracks changes by versioning objects in memory and rolling  
> back newer versions when older versions are committed.  This is a copy- 
> on-write model which keeps everything in memory during the  
> transaction, but then writes the txn log and a version of the object  
> to disk, updating the in-memory 'valid' version as appropriate.
> 
> Leslie had a good related e-mail on this topic a few days ago:
> 
> > I don't know what the best decision might be here.
> > But I have a use case that might help; it has the following
> > features:
> >
> >  * I access the slots of two persistent objects.
> >
> >  * The number of the slots and the times requested
> >    together produce very bad performance (think seconds)
> >    even with PM txn caching (for comparison, BDB is about
> >    three times faster)
> >
> >  * The environment is multi-threaded (web server), but the
> >    slots won't be changed by any other process.
> >
> >  * Ideally the slots would be cached only for this one
> >    function and the functions called by it (and only
> >    per-invocation, i.e. slot caches get refreshed right at
> >    the beginning of the function).
> >
> >  * This is currently the only place in my app where I would
> >    need the performance advantages of slot caching. In all
> >    other places ACID is highly preferred and speed is sufficient.
> >
> >  * The desired behaviour can be somewhat modelled by CLSQL's
> >    OO interface:
> >
> >      - get the objects from the DB at the beginning
> >
> >      - work with those in-memory objects
> >
> >      - write back the values to the DB at the end of the process
> >
> >    The difference is that I don't want the whole object (other slot
> >    values of it might be changed from outside!) but only a few
> >    selected slots.
> 
> I think we can basically do this today.  A refresh command simply  
> reads from the DB for all cached slots (in a transaction this is  
> thread safe and avoids the aforementioned problem).  You operate on  
> the cached data, nothing happens in the transaction, at the end you do  
> a 'save' and those cached slots get written to disk.
> I think this meets leslie's use case and I think it's an hour or two  
> to implement on top of what is already there.
> 
>  > However, I don't know if this is more important than a native-lisp
>  > backend, or a query-language.  For the next year at least I am  
> working
>  > at a job rather than working on my lisp application; and even then  
> I was
>  > happy with the performance I was getting out of DCM.  So I personally
>  > don't have performance need that drives anything.  I wish I knew how
>  > many new users we would have from better performance vs. a native- 
> lisp
>  > backend vs. a query-language, or what our existing users would  
> prefer.
> 
> My two dollars on this topic is that the most interesting thing to  
> improve adoption and overall utility is a lisp-only backend to get  
> going with.  The most interesting value to the current users,  
> including myself, is a query system that manages and abstracts some of  
> the performance query hacks that today you have to write yourself in  
> lisp, often over and over.

Good. I'll support whatever gets to $5 first.  :->. 

> 
> I think of the query system, by the way, as a DSL (domain specific  
> language) extension of lisp, not a SQL syntax.  So it's not an either  
> or, it's exactly what Lisp was meant to do, enable linguistic  
> abstraction that makes thinking about a given problem easier.  That's  
> what I think when I hear 'lisp as the query language'.

Right.  LISP generally does that by the addition of symbols to the
language, not the taking away of them.  So the full power of LISP
remains available, in addition to new macros/functions etc. that are
convenient.

> 
> Rucksack strikes me as the best way to start on the lisp-only front,  
> because so much is there.  It's a non-trivial port/adaptation so  
> someone needs to be willing to put in a week or two (at least) of  
> serious effort.

I agree with this.

> 
> I think we may also be able to change it so that it only writes a  
> transaction log and doesn't write the underlying DB unless something  
> is flushed from the cache.  What I like about Rucksack for a more  
> prevalence style model (and maybe I'm misreading this and it's not  
> flushing objects to disk on each write) is that it already implements  
> versioning as its transaction model, which gets around fine-grained  
> locking performance problems.  If we add in Robert's DCM ideas about  
> having a cache instead of the whole DB in memory, then we could  
> imagine writing flushed objects to disk and effectively incrementally  
> syncing the memory objects to disk rather than having to do a full  
> snapshot every so often.

This does sound exciting.  I guess however that Rucksack offers two
quite different things:  A pre-exising Btree implementation, and an
in-memory versioning concurrency control model.  I assume that you are
mentioning these together because the Rucksack Btree by itself would not
give us an ACID transaction model, as we currently have with the other
backends.

> 
> Regards,
> Ian
> 
> 
> 
> 
> 
> Ian
> 
> 
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel