[elephant-devel] Object caching

Wed Jun 11 16:35:59 UTC 2008

I'm dissatisfied with the current approach to object caching in  
elephant-unstable.

Issues:
- Not thread safe
- Break most transactional guarantees
- Adds per-instance slot overhead to maintain the cache mode
- Incompatible with indexing API
- No standard usage model with well understood implications

This leads me to some strong statements about the features:

- The :cache option simply reserves space during instance allocation  
to cache values.
- Cached objects can never be considered thread safe or transactional  
(isolated/atomic)
- Directly setting the cache mode of an object is an advanced feature  
intended to be used by a higher level API

I think that something lightweight along the lines of Robert's DCM or  
my snapshot-set model might provide guidance on a higher level model  
that exploits caching.  (i.e. caching is used in some larger context  
like a 'with-cached-objects' macro or a check-in/check-out protocol  
with some guard object as in DCM).

I can think of a couple of primary usage models:

1) Check-in/check-out a set of objects on which a thread will perform  
repeated operations.

    This checkout should be guarded and implicitly turns on a caching  
policy.  (save oncheck-in, write-through for state durability, etc)

2) A read-only pool of objects shared by many threads (in-memory  
objects w/ on-disk indices)

3) There is a variation on #2 which allows for updates to the pool via  
a single-writer.  The user is responsible for using the single-writer  
API to avoid conflicts.  This requires that updates to cached objects  
not be atomic; or that caching is turned off during updates so that it  
is (there may still be race conditions that violate isolation/ 
atomicity here).

To support either case we need something beyond what is already there;  
for example a model that provides cheap mutexes via the DB instead of  
just in-memory?

For example, a web session is rendering a set of objects to a client  
and updating the client on each request.  Rather than hitting the DB  
on every web transaction, we want to allow the session to run for  
awhile, keeping that state in memory, then commit the changes when a  
'commit' button is clicked, or perhaps we want a write-through policy  
that keeps track of changes by only hitting the DB when the user has  
changed something.

In my application a user might be editing a questionnaire and the UI  
provides an explicit indication that the questionnaire is being  
edited.  A session 'checks out' the questionnaire.  The questionnaire  
has an association with its root questions, which have an association  
with their sub-questions, etc.  Ideally this whole tree would be  
checked out; either based on a user-provided function or some  
declaration that defines the checkout set.  I'd use a write-through  
policy so work was never lost and have the application layer implement  
any needed undo/reset functionality.

Any other use cases?

In short, there are alot of ways to get into trouble with this  
mechanism, so I think it behooves us to spec out an API for using this  
facility that is reasonably robust and can give people canned ways to  
gain the performance benefits without putting too many holes in their  
feet.

Thanks,
Ian