[elephant-devel] Object caching
Ian Eslick
eslick at media.mit.edu
Wed Jun 11 16:35:59 UTC 2008
I'm dissatisfied with the current approach to object caching in
elephant-unstable.
Issues:
- Not thread safe
- Break most transactional guarantees
- Adds per-instance slot overhead to maintain the cache mode
- Incompatible with indexing API
- No standard usage model with well understood implications
This leads me to some strong statements about the features:
- The :cache option simply reserves space during instance allocation
to cache values.
- Cached objects can never be considered thread safe or transactional
(isolated/atomic)
- Directly setting the cache mode of an object is an advanced feature
intended to be used by a higher level API
I think that something lightweight along the lines of Robert's DCM or
my snapshot-set model might provide guidance on a higher level model
that exploits caching. (i.e. caching is used in some larger context
like a 'with-cached-objects' macro or a check-in/check-out protocol
with some guard object as in DCM).
I can think of a couple of primary usage models:
1) Check-in/check-out a set of objects on which a thread will perform
repeated operations.
This checkout should be guarded and implicitly turns on a caching
policy. (save oncheck-in, write-through for state durability, etc)
2) A read-only pool of objects shared by many threads (in-memory
objects w/ on-disk indices)
3) There is a variation on #2 which allows for updates to the pool via
a single-writer. The user is responsible for using the single-writer
API to avoid conflicts. This requires that updates to cached objects
not be atomic; or that caching is turned off during updates so that it
is (there may still be race conditions that violate isolation/
atomicity here).
To support either case we need something beyond what is already there;
for example a model that provides cheap mutexes via the DB instead of
just in-memory?
For example, a web session is rendering a set of objects to a client
and updating the client on each request. Rather than hitting the DB
on every web transaction, we want to allow the session to run for
awhile, keeping that state in memory, then commit the changes when a
'commit' button is clicked, or perhaps we want a write-through policy
that keeps track of changes by only hitting the DB when the user has
changed something.
In my application a user might be editing a questionnaire and the UI
provides an explicit indication that the questionnaire is being
edited. A session 'checks out' the questionnaire. The questionnaire
has an association with its root questions, which have an association
with their sub-questions, etc. Ideally this whole tree would be
checked out; either based on a user-provided function or some
declaration that defines the checkout set. I'd use a write-through
policy so work was never lost and have the application layer implement
any needed undo/reset functionality.
Any other use cases?
In short, there are alot of ways to get into trouble with this
mechanism, so I think it behooves us to spec out an API for using this
facility that is reasonably robust and can give people canned ways to
gain the performance benefits without putting too many holes in their
feet.
Thanks,
Ian
More information about the elephant-devel
mailing list