[elephant-devel] Re: Postmodern, Act II

Mon May 5 13:15:43 UTC 2008

 LPP> you explained that global-sync cache mode only works within a txn.
 LPP> I thought that per-transaction cache mode does this, and that global
 LPP> sync extends this behaviour so transactions/threads can share the
 LPP> cache?
 LPP> If not, what's the difference between them?

both modes rely on transactional semantics -- assumption that transactions 
are isolated and changes from one transaction do not (and should not) 
propagate into other one.
and both use transaction boundaries -- begin/commit -- to their 
initialization/finalization.

in per-transaction mode cache is created on beginning of transaction and is 
abandoned on commit.

global-sync mode extends it reusing cache between transaction. when 
transactions starts, it checks if there are existing cache instances in a 
pool and takes one if it exists, or creates new one.
if existing cache instance is used, it gets synchronized on first use -- and 
once within a single transaction. synchronization is actually reading what's 
changed and wiping those entries, however it is somewhat complicated to pull 
only changes _since_ last update.
when transaction commits successfully, cache is returned into the pool and 
can be re-used later.

sidenote: "global-sync" actually means "globally synchronized" that sort of 
means that all writes in all processes/threads are tracked and cache entries 
are invalidated (synchronized) accordingly, that allows us to re-use cache 
between transactions.

so, both modes only makes sense when there is considerable amount of 
database reads within each transaction: per-transaction mode for obvious 
reasons, and sync cache because synchronization (that must be done inside 
each transaction) requires at least few database commands and it is not 
feasible to synchronize if it's only going to save single read command.

interaction of global-sync cache and threads is another question --  
currently it does not share data among concurrently running threads. if you 
have 10 threads running in parallel, you'll have 10 independent cache 
instances.
cache pool is global though, so if you create new thread for each event, it 
won't be a problem -- it will take first cache in pool if available.
it might seem like this behaviour is suboptimal -- it sharing cache among 
threads will save space.
but it is a tradeoff -- shared cache will require some sort of locking, and 
that might reduce performance. so it actually depends on database usage 
patterns which scheme is prefered.
as cache sharing is considerably more complex to implement, only isolated 
thing is implemented for now.