From mathrick at gmail.com Tue Apr 21 16:44:04 2009 From: mathrick at gmail.com (Maciej Katafiasz) Date: Tue, 21 Apr 2009 18:44:04 +0200 Subject: [Cl-perec-devel] A couple of questions about the design (plus assitance request) Message-ID: <50b7e5180904210944x76bc4415y5cb36e1dc33d368a@mail.gmail.com> Hi, I have a series of questions about several decisions in cl-perec I've found counterintuitive or just plain hard to understand. As I don't really know the requirements that led to perec's design, I need to ask here to get a proper picture. Apologies in advance for the length of this writeup :) First, the background of what I'm doing: we're writing a weblocks (http://common-lisp.net/project/cl-weblocks/) application, and decided to back it up by cl-perec. To access any given persistence layer, weblocks requires an implementation of a fairly minimal generic glue, called a store. A store allows for querying objects en masse by class or individually by ID, as well as persisting them into the backend or deleting them. Thus we started implementing weblocks-store, but in the course of doing so, run into several hurdles. Now, the problems: one basic issue is that weblocks doesn't make any guarantees about not touching objects it doesn't want persisted; the API for stores has an explicit synchronisation point in the PERSIST-OBJECT method, and any changes not saved by it are expected to be transient. That is of course incompatible with perec's "everything is persisted impliclty" approach. Another very important obstacle is that perec refuses to even consider ever touching any data whatsoever outside of a transaction, whereas weblocks has a fairly loose approach to transactioning things; you can request transactions in several places, but in general you only get what you explicitly ask for. That makes it very difficult to access instances as returned by perec, even if you're only interested in reading them. The requirement to REVIVE-INSTANCES all the time doesn't help things either. As the first solution, we adopted the approach of another store, based on Elephant, which has a similarly implicit persistence strategy. Ie. for all requests to the elephant store, proxy objects are returned instead of the actual instances returned by Elephant/perec, and their states are only synchronised (by a bit of MOP hackery involving copying all slots) when PERSIST-OBJECT is called. For each class that's queried, a proxy class based upon introspected slots is created, and the proxy instances are instantiated from that. That worked somewhat, but had a number of problems, not the least of which was the fact that any methods defined for persistent classes weren't defined for their proxies, which are what you actually get to work with inside weblocks. There have been additional issues with synchronising internal perec slots in the copied objects, which was hard to do in a way that preserved all the information without stomping over anything perec might have updated in the meantime. For the above reasons, we wrote version two, which attempts to use transient instances of persistent classes. This should have all kinds of advantages, including being faster, less memory-intensive and generally avoiding duplicating what's already present in perec, except that there are a number of rather opaque design decisions that prevent it from working as expected. First, it seems that it's not actually possible to query a database and get anything else back than objects fanatically attached to the idea that data only exists inside a transaction. I thought ENSURE-PERSISTENT would be the solution, but it seems not to be the case: KABINETT-TEST> (with-transaction (defparameter *foo* (select-first-matching-instance 'event)) (ensure-transient *foo*)) T KABINETT-TEST> *foo* # KABINETT-TEST> (with-transaction (ensure-persistent *foo*)) Inconsistent cache [Condition of type SIMPLE-ERROR] Restarts: 0: [TERMINATE-TRANSACTION] return (values) from the WITH-TRANSACTION block executing the current terminal action :COMMIT 1: [COMMIT-TRANSACTION] mark transaction for commit only and return (values) from the WITH-TRANSACTION block 2: [ROLLBACK-TRANSACTION] mark transaction for rollback only and return (values) from the WITH-TRANSACTION block 3: [RESTART-TRANSACTION] rollback the transaction by unwinding the stack and restart the WITH-TRANSACTION block in a new database transaction 4: [ABORT] Return to SLIME's top level. 5: [TERMINATE-THREAD] Terminate this thread (#) Backtrace: 0: (CL-PEREC::UPDATE-INSTANCE-CACHE-FOR-CREATED-INSTANCE #) 1: ((SB-PCL::FAST-METHOD MAKE-PERSISTENT-USING-CLASS (T PERSISTENT-OBJECT)) # # # #) It's very surprising to me that objects can't be persisted once they're made transient. MAKE-PERSISTENT explicitly asks that only freshly created instances be persisted, and everything else throws the above error. Even more surprising for me was to find out, upon further investigation, that the transiented instance is actually removed completely from the database (!). I can't imagine any reason for doing so, so I'd be grateful for some explanation of that decision. The fact that perec insists so strongly on all data accesses happening inside a transaction is also rather inconvenient and, IMHO, not all that sensible. The instances returned by a SELECT represent a snapshot of data at a particular point in time; once they're cached it makes very little conceptual sense to require them to be read in a transaction -- the transaction is only there to ensure their internal consistency, not to ensure that the data is representative of the current DB state in arbitrarily long-lived objects. While I can see the reason behind doing it that way (one, to ensure that results returned by SCROLLs, which only generate a "SELECT COUNT(*)", stay reasonable, and two, because logically the objects are completely linked to the data in the RDBMS, which is merely an implementation detail of their strong persistency and stays in the background), I think it's overly restrictive and effectively prevents perec being used with anything that hasn't been specifically written to match its philosophy from the ground up. So, with all of the above, I'd like to have a way to do the following: 1. Create, read and modify instances of persistent classes that are transient in the sense of not being immediately reflected to the RDBMS, but which can nevertheless be synced to the DB upon request. It's important that they get assigned stable OIDs, ie. an instance can or can not have an OID, but once it gets one, it doesn't change, even if the object changes its status from persistent to transient or vice-versa. 2. Read back instances of persistent classes and request them to be fully cached, ie. returned in a form that's safe to read even outside of a transaction. It's not as important that it holds for results returned as SCROLLs, which are lazy by definition, but it should be possible to get a list of plain old objects that can be safely given to anyone to read. Whether or not writes are supported in this mode is a separate question that needs extra thinking. 3. Safely manipulate transactions in a predictable and consistent way. The fact that I'm both required to rely on single global variables AND disallowed to mix objects between nested transactions makes me uneasy. Ideally, I'd like an API for explict manipulation of transaction slots, for which the global *TRANSACTION* variable would just be the default value. I'll be happy to do the work required to implement the above, provided I get the required assistance (ideally by having levy available on IRC :), but I don't just want to hack blindly without asking. Cheers, Maciej