[elephant-devel] Discussions
Ian Eslick
eslick at media.mit.edu
Sat May 17 03:22:46 UTC 2008
Robert said:
> I'll go out on a limb and say that offering object-level caching is
> the single biggest performance enhancement we make for the most
common
> cases.
A clarifying question. How did you ensure ACID properties in the DCM
scenario in the presence of threading? Without letting BDB or sql
know about the reads that you've done, you can't tell if a prior
transaction has clobbered on data that you are currently using because
the reads are directly from memory.
e.g. You can easily read the old 'balance' on the checking account, do
your computing while someone else has written that same object, then
write back an incorrect value.
Rucksack tracks changes by versioning objects in memory and rolling
back newer versions when older versions are committed. This is a copy-
on-write model which keeps everything in memory during the
transaction, but then writes the txn log and a version of the object
to disk, updating the in-memory 'valid' version as appropriate.
Leslie had a good related e-mail on this topic a few days ago:
> I don't know what the best decision might be here.
> But I have a use case that might help; it has the following
> features:
>
> * I access the slots of two persistent objects.
>
> * The number of the slots and the times requested
> together produce very bad performance (think seconds)
> even with PM txn caching (for comparison, BDB is about
> three times faster)
>
> * The environment is multi-threaded (web server), but the
> slots won't be changed by any other process.
>
> * Ideally the slots would be cached only for this one
> function and the functions called by it (and only
> per-invocation, i.e. slot caches get refreshed right at
> the beginning of the function).
>
> * This is currently the only place in my app where I would
> need the performance advantages of slot caching. In all
> other places ACID is highly preferred and speed is sufficient.
>
> * The desired behaviour can be somewhat modelled by CLSQL's
> OO interface:
>
> - get the objects from the DB at the beginning
>
> - work with those in-memory objects
>
> - write back the values to the DB at the end of the process
>
> The difference is that I don't want the whole object (other slot
> values of it might be changed from outside!) but only a few
> selected slots.
I think we can basically do this today. A refresh command simply
reads from the DB for all cached slots (in a transaction this is
thread safe and avoids the aforementioned problem). You operate on
the cached data, nothing happens in the transaction, at the end you do
a 'save' and those cached slots get written to disk.
I think this meets leslie's use case and I think it's an hour or two
to implement on top of what is already there.
> However, I don't know if this is more important than a native-lisp
> backend, or a query-language. For the next year at least I am
working
> at a job rather than working on my lisp application; and even then
I was
> happy with the performance I was getting out of DCM. So I personally
> don't have performance need that drives anything. I wish I knew how
> many new users we would have from better performance vs. a native-
lisp
> backend vs. a query-language, or what our existing users would
prefer.
My two dollars on this topic is that the most interesting thing to
improve adoption and overall utility is a lisp-only backend to get
going with. The most interesting value to the current users,
including myself, is a query system that manages and abstracts some of
the performance query hacks that today you have to write yourself in
lisp, often over and over.
I think of the query system, by the way, as a DSL (domain specific
language) extension of lisp, not a SQL syntax. So it's not an either
or, it's exactly what Lisp was meant to do, enable linguistic
abstraction that makes thinking about a given problem easier. That's
what I think when I hear 'lisp as the query language'.
Rucksack strikes me as the best way to start on the lisp-only front,
because so much is there. It's a non-trivial port/adaptation so
someone needs to be willing to put in a week or two (at least) of
serious effort.
I think we may also be able to change it so that it only writes a
transaction log and doesn't write the underlying DB unless something
is flushed from the cache. What I like about Rucksack for a more
prevalence style model (and maybe I'm misreading this and it's not
flushing objects to disk on each write) is that it already implements
versioning as its transaction model, which gets around fine-grained
locking performance problems. If we add in Robert's DCM ideas about
having a cache instead of the whole DB in memory, then we could
imagine writing flushed objects to disk and effectively incrementally
syncing the memory objects to disk rather than having to do a full
snapshot every so often.
Regards,
Ian
Ian
More information about the elephant-devel
mailing list