[elephant-devel] Re: Postmodern, Act II

Alex Mizrahi killerstorm at newmail.ru
Sat May 3 11:31:28 UTC 2008


 ??>> but if you do not start your transactions explicitly, enclosing as
 ??>> many operations as posible, global-sync-cache absolutely makes no
 ??>> sense -- it takes more effort to synchornize changes than to actually
 ??>> load value from database, if that's just a single value. so, maybe, if
 ??>> cache is set into global sync mode, it should signal error if there is
 ??>> no explicit transactions -- because that would be misuse of
 ??>> global sync cache, leading to significant overhead.

 LPP> Can you explain this in a bit more detail?

"global sync cache" works by tracking changes made to btrees in the 
database -- each write to btree is also written into update_log table.
then, at start of each transaction (or more preciously, before first actuall 
btree read/write operation) cache gets synchronized -- basically it pulls 
log of all changes since last update, and invalidates cache entries 
according to what it have read from DB. additionally it does some bookeeping 
for change tracking.

thus, global sync cache only makes sense if you do many (hundreds) database 
reads in each transaction. if you don't have such situation, don't use it :)

 ??>> or you think it makes sense to allow such behaviour? it might make
 ??>> sense in REPL, for example..

 LPP> I put transactions only in an explicit transaction block if it
 LPP> makes sense to me, i.e. if there are several successive operations.

 LPP> Why would I put a single operation into a WITH-TRANSACTION block?
 LPP> It clutters the code.

this cache mode (and postmodern backend in general) is oriented on 
webserver-like workload -- each web request always is wrapped into 
transaction. if request does no DB activity, that's OK -- starting txn 
overhead is not that significant on scale of typical HTTP request time. but 
many requests reads lots of values from database -- on thousands scale --  
and sync cache makes big difference for this case.

even without cache, there is considerable overhead when doing single read 
outside transaction -- at minimal, postmodern will do BEGIN and COMMIT, 
which require roundtrips to server, so we have something like 3x overhead 
here.
if we were optimizing for standalone read statements, we could try relying 
on postgresql implicit transactions -- but that will significantly 
complicate logic, so we don't use this.

but while BEGIN/COMMIT is inevitable evil, cache synchonization overhead can 
be avoided if not needed, so i thought it's worth giving some kind of 
warning in case people are using backend in sub-optimal mode 




More information about the elephant-devel mailing list