[Cl-perec-devel] A couple of questions about the design (plus assitance request)

Tue Apr 21 16:44:04 UTC 2009

Hi,

I have a series of questions about several decisions in cl-perec I've
found counterintuitive or just plain hard to understand. As I don't
really know the requirements that led to perec's design, I need to ask
here to get a proper picture. Apologies in advance for the length of
this writeup :)

First, the background of what I'm doing: we're writing a weblocks
(http://common-lisp.net/project/cl-weblocks/) application, and decided
to back it up by cl-perec. To access any given persistence layer,
weblocks requires an implementation of a fairly minimal generic glue,
called a store. A store allows for querying objects en masse by class
or individually by ID, as well as persisting them into the backend or
deleting them. Thus we started implementing weblocks-store, but in the
course of doing so, run into several hurdles.

Now, the problems: one basic issue is that weblocks doesn't make any
guarantees about not touching objects it doesn't want persisted; the
API for stores has an explicit synchronisation point in the
PERSIST-OBJECT method, and any changes not saved by it are expected to
be transient. That is of course incompatible with perec's "everything
is persisted impliclty" approach.

Another very important obstacle is that perec refuses to even consider
ever touching any data whatsoever outside of a transaction, whereas
weblocks has a fairly loose approach to transactioning things; you can
request transactions in several places, but in general you only get
what you explicitly ask for. That makes it very difficult to access
instances as returned by perec, even if you're only interested in
reading them. The requirement to REVIVE-INSTANCES all the time doesn't
help things either.

As the first solution, we adopted the approach of another store, based
on Elephant, which has a similarly implicit persistence strategy. Ie.
for all requests to the elephant store, proxy objects are returned
instead of the actual instances returned by Elephant/perec, and their
states are only synchronised (by a bit of MOP hackery involving
copying all slots) when PERSIST-OBJECT is called. For each class
that's queried, a proxy class based upon introspected slots is
created, and the proxy instances are instantiated from that. That
worked somewhat, but had a number of problems, not the least of which
was the fact that any methods defined for persistent classes weren't
defined for their proxies, which are what you actually get to work
with inside weblocks. There have been additional issues with
synchronising internal perec slots in the copied objects, which was
hard to do in a way that preserved all the information without
stomping over anything perec might have updated in the meantime.

For the above reasons, we wrote version two, which attempts to use
transient instances of persistent classes. This should have all kinds
of advantages, including being faster, less memory-intensive and
generally avoiding duplicating what's already present in perec, except
that there are a number of rather opaque design decisions that prevent
it from working as expected.

First, it seems that it's not actually possible to query a database
and get anything else back than objects fanatically attached to the
idea that data only exists inside a transaction. I thought
ENSURE-PERSISTENT would be the solution, but it seems not to be the
case:

KABINETT-TEST> (with-transaction
                (defparameter *foo* (select-first-matching-instance 'event))
                (ensure-transient *foo*))
T
KABINETT-TEST> *foo*
#<EVENT :persistent #f 215 {BDCC951}>
KABINETT-TEST> (with-transaction (ensure-persistent *foo*))

Inconsistent cache
  [Condition of type SIMPLE-ERROR]

Restarts:
 0: [TERMINATE-TRANSACTION] return (values) from the WITH-TRANSACTION
block executing the current terminal action :COMMIT
 1: [COMMIT-TRANSACTION] mark transaction for commit only and return
(values) from the WITH-TRANSACTION block
 2: [ROLLBACK-TRANSACTION] mark transaction for rollback only and
return (values) from the WITH-TRANSACTION block
 3: [RESTART-TRANSACTION] rollback the transaction by unwinding the
stack and restart the WITH-TRANSACTION block in a new database
transaction
 4: [ABORT] Return to SLIME's top level.
 5: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
RUNNING {BDC6B19}>)

Backtrace:
 0: (CL-PEREC::UPDATE-INSTANCE-CACHE-FOR-CREATED-INSTANCE
#<unavailable lambda list>)
 1: ((SB-PCL::FAST-METHOD MAKE-PERSISTENT-USING-CLASS (T
PERSISTENT-OBJECT)) #<unavailable argument> #<unavailable argument>
#<PERSISTENT-CLASS EVENT> #<EVENT :persistent #f 215 {BDCC951}>)

It's very surprising to me that objects can't be persisted once
they're made transient. MAKE-PERSISTENT explicitly asks that only
freshly created instances be persisted, and everything else throws the
above error. Even more surprising for me was to find out, upon further
investigation, that the transiented instance is actually removed
completely from the database (!). I can't imagine any reason for doing
so, so I'd be grateful for some explanation of that decision.

The fact that perec insists so strongly on all data accesses happening
inside a transaction is also rather inconvenient and, IMHO, not all
that sensible. The instances returned by a SELECT represent a snapshot
of data at a particular point in time; once they're cached it makes
very little conceptual sense to require them to be read in a
transaction -- the transaction is only there to ensure their internal
consistency, not to ensure that the data is representative of the
current DB state in arbitrarily long-lived objects. While I can see
the reason behind doing it that way (one, to ensure that results
returned by SCROLLs, which only generate a "SELECT COUNT(*)", stay
reasonable, and two, because logically the objects are completely
linked to the data in the RDBMS, which is merely an implementation
detail of their strong persistency and stays in the background), I
think it's overly restrictive and effectively prevents perec being
used with anything that hasn't been specifically written to match its
philosophy from the ground up.

So, with all of the above, I'd like to have a way to do the following:

1. Create, read and modify instances of persistent classes that are
transient in the sense of not being immediately reflected to the
RDBMS, but which can nevertheless be synced to the DB upon request.
It's important that they get assigned stable OIDs, ie. an instance can
or can not have an OID, but once it gets one, it doesn't change, even
if the object changes its status from persistent to transient or
vice-versa.

2. Read back instances of persistent classes and request them to be
fully cached, ie. returned in a form that's safe to read even outside
of a transaction. It's not as important that it holds for results
returned as SCROLLs, which are lazy by definition, but it should be
possible to get a list of plain old objects that can be safely given
to anyone to read. Whether or not writes are supported in this mode is
a separate question that needs extra thinking.

3. Safely manipulate transactions in a predictable and consistent way.
The fact that I'm both required to rely on single global variables AND
disallowed to mix objects between nested transactions makes me uneasy.
Ideally, I'd like an API for explict manipulation of transaction
slots, for which the global *TRANSACTION* variable would just be the
default value.

I'll be happy to do the work required to implement the above, provided
I get the required assistance (ideally by having levy available on IRC
:), but I don't just want to hack blindly without asking.

Cheers,
Maciej