[elephant-cvs] CVS elephant/doc
ieslick
ieslick at common-lisp.net
Thu Apr 19 22:25:52 UTC 2007
Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv25569/doc
Modified Files:
scenarios.texinfo
Log Message:
final snapshot scenario and code changes
--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/19 05:24:37 1.5
+++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/19 22:25:51 1.6
@@ -7,7 +7,7 @@
@menu
* File System Replacement:: Deployment of Elephant as file replacement
-* Checkpointing Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
+* Checkpointing Conventional Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
* Persistent System Objects:: Making persistent objects a natural part of your system
* Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database
* Multithreaded Web Applications:: Elephant is a natural match for web applications
@@ -99,9 +99,9 @@
@footnote{Example provided by Ian Eslick, April 2007}
- at node Checkpointing Program State
+ at node Checkpointing Conventional Program State
@comment node-name, next, previous, up
- at section Checkpointing Program State
+ at section Checkpointing Conventional Program State
Another challenge for many programs is saving some subset of program
state. This could involve checkpointing an evolving computation,
@@ -168,12 +168,368 @@
@subsection Implementation: The Snapshot Set
-To generalize all this behavior, we will define a new class called a
-snapshot set. The set itself is a persistent object that wraps a
-btree, but provides all the automation to store and recover sets of
-objects.
+In this section we walk through the implementation of the snapshot set
+in detail as it provides:
+
+ at itemize
+ at item Insight into constraints in serialization and lisp object identity
+ at item How to leverage Elephant for some more sophisticated applications than
+ persistent indices and class slots.
+ at item Helps you understand a useful utility (that we may add to an extensions
+ release in the future)
+ at end itemize
+
+To generalize the behavior discussed above, we will define a new
+persistent class called a snapshot set. The set itself is a wrapper
+around the btree, but provides all the automation to store and recover
+sets of standard objects.
+
+ at lisp
+(defpclass snapshot-set ()
+ ((index :accessor snapshot-set-index :initform (make-btree))
+ (next-id :accessor snapshot-set-next-id :initform 0)
+ (root :accessor snapshot-set-root :initform nil)
+ (cache :accessor snapshot-set-cache
+ :initform (make-hash-table :weak-keys t)
+ :transient t)
+ (touched :accessor snapshot-set-touched
+ :initform (make-array 20 :element-type 'fixnum
+ :initial-element 0 :fill-pointer t :adjustable t)
+ :transient t))
+ (:documentation "Keeps track of a set of standard objects
+ allowing a single snapshot call to update the store
+ controller with the latest state of all objects registered with
+ this set"))
+ at end lisp
+
+The set class keeps track of IDs, a set of cached objects in memory,
+the on-disk btree for storing instances by uid and the current uid
+variable value. Notice the use of the transient keyword argument for
+the cache.
+
+There are two major operations supported by sets @code{snapshot} and
+ at code{restore}. These save objects to disk and restore objects to
+memory, along with proper recovery of multiple references to the same
+object.
+
+Additional operations are:
+
+ at itemize
+ at item Registration: Adding and removing objects from a set
+ at item Root operations: Easy access to a single root hash table or object
+ at item Mapping: Walk over all objects in a set
+ at end itemize
+
+To enable snapshots, we have to register a set of root objects with
+the set. This function ignores objects that are already cached,
+otherwise allocates a new ID and caches the object.
+
+ at lisp
+(defmethod register-object ((object standard-object) (set snapshot-set))
+ "Register a standard object. Not recorded until snapshot is called on db"
+ (aif (lookup-cached-id object set)
+ (values object it)
+ (let ((id (incf (snapshot-set-next-id set))))
+ (cache-snapshot-object id object set)
+ (values object id))))
+
+(defun lookup-cached-id (obj set)
+ (gethash obj (snapshot-set-cache set)))
+
+(defun cache-snapshot-object (id obj set)
+ (setf (gethash obj (snapshot-set-cache set)) id))
+ at end lisp
+
+A parallel function registers hash tables. One very important
+invariant implied here is that the cache always contains objects that
+are eq and mapped back to a serialized object in the backing btree.
+There is no need, however, to immediately write objects to the store
+and this gives us some transactional properties: snapshots are atomic,
+consistent and durable. Isolation is not enforced by snapshots.
+
+This means that the transient cache has to be valid immediately after
+the snapshot set is loaded from the data store.
+
+ at lisp
+(defmethod initialize-instance :after ((set snapshot-set) &key lazy-load &allow-other-keys)
+ (unless lazy-load (restore set)))
+ at end lisp
+
+This also has consequences for unregistration. Removing a root object
+should also result in the removal of all objects that are unreachable
+from other roots. However, since side effects are not permanent until
+a snapshot operation, we merely have to garbage collect id's that were
+not touched during a snapshot operation. This makes unregistration
+simple.
+
+ at lisp
+(defmethod unregister-object (object (set snapshot-set))
+ "Drops the object from the cache and backing store"
+ (let ((id (gethash object (snapshot-set-cache set))))
+ (when (null id)
+ (error "Object ~A not registered in ~A" object set))
+ (drop-cached-object object set)))
+ at end lisp
+
+But snapshots are a little bit more work.
+
+ at lisp
+(defmethod snapshot ((set snapshot-set))
+ "Saves all objects in the set (and any objects reachable from the
+ current set of objects) to the persistent store"
+ (with-transaction (:store-controller (get-con (snapshot-set-index set)))
+ (loop for (obj . id) in (get-cache-entries (snapshot-set-cache set)) do
+ (save-snapshot-object id obj set))
+ (collect-untouched set)))
+
+(defun save-snapshot-object (id obj set)
+ (unless (touched id set)
+ (setf (get-value id (snapshot-set-index set))
+ (cond ((standard-object-subclass-p obj)
+ (save-proxy-object obj set))
+ ((hash-table-p obj)
+ (save-proxy-hash obj set))
+ (t (error "Cannot only snapshot standard-objects and hash-tables"))))
+ (touch id set))
+ id)
+
+(defun collect-untouched (set)
+ (map-btree (lambda (k v)
+ (unless (touched k set)
+ (remove-kv k (snapshot-set-index set))))
+ (snapshot-set-index set))
+ (clear-touched set))
+ at end lisp
+
+We go through all objects in the cache, storing objects as we go via
+ at code{save-snapshot-object}. This function is responsible for storing
+objects and hash tables and recursing on any instances that are
+referenced. Any object that is saved is added to a touch list so they
+are not stored again and we can mark stored instances for the
+ at code{collect-untouched} call which ensures that newly unreachable
+objects are deleted from the persistent store. Any newly found
+objects are added to the in-memory cache which, being a weak array,
+should eventually drop references to objects that are not referred to
+elsewhere.
+
+It should be noted that garbage objects not garbage collected from the
+weak-array based cache may be stored to and restored from the
+persistent store. However this is merely a storage overhead as they
+will eventually be dropped across sessions as there are no saved
+references to them.
+
+Now when we serialize a standard object, all the slot values are
+stored inline. This means that by default, a slot that refers to a
+standard object would get an immediately serialized version rather
+than a reference. This of course makes it impossible to restore
+multiple references to a single object. The approach taken here is to
+instantiate a @emphasize{proxy} object which is a copy of the original
+class and stores references to normal values in its slots. Any
+references to hashes or standard classes are replaced with a reference
+object that records the unique id of the object so it can be properly
+restored.
+
+ at lisp
+(defun save-proxy-object (obj set)
+ (let ((svs (subsets 2 (slots-and-values obj))))
+ (if (some #'reified-class-p (mapcar #'second svs))
+ (let ((proxy (make-instance (type-of obj))))
+ (loop for (slotname value) in svs do
+ (setf (slot-value proxy slotname)
+ (if (reify-class-p value)
+ (reify-value value set)
+ value)))
+ proxy)
+ obj)))
+ at end lisp
+
+The function checks whether any slot value can be reified (represented
+by a unique id) and if so, makes a new proxy instance and properly
+instantiates its slots, returning it to the main store function which
+writes the proxy object to the btree.
+
+On restore, we simply load all objects into memory.
+
+ at lisp
+(defmethod restore ((set snapshot-set))
+ "Restores a snapshot by setting the snapshot-set state to the last snapshot.
+ If this is used during runtime, the user needs to drop all references
+ to objects and retrieve again from the snapshot set. Also used to initialize
+ the set state when a set is created, for example pulled from the root of a
+ store-controller, unless :lazy-load is specified"
+ (clear-cache set)
+ (map-btree (lambda (id object)
+ (load-snapshot-object id object set))
+ (snapshot-set-index set)))
+
+(defun load-snapshot-object (id object set)
+ (let ((object (ifret object (get-value id (snapshot-set-index set)))))
+ (cond ((standard-object-subclass-p object)
+ (load-proxy-object id object set))
+ ((hash-table-p object)
+ (load-proxy-hash id object set))
+ (t (error "Unrecognized type ~A for id ~A in set ~A" (type-of object) id set)))))
+ at end lisp
+
+If an object has a reference object in a slot, then we simply restore
+that object as well. @code{load-snapshot-object} accepts null for an
+object so it can be used recursively when a reference object refers to
+an object (via the unique id) that is not yet cached. The @code{load}
+functions return an object so that they can used directly to create
+values for writing slots or hash entries.
+
+ at lisp
+(defun load-proxy-object (id obj set)
+ (ifret (lookup-cached-object id set)
+ (progn
+ (cache-snapshot-object id obj set)
+ (let ((svs (subsets 2 (slots-and-values obj))))
+ (loop for (slotname value) in svs do
+ (when (setrefp value)
+ (setf (slot-value obj slotname)
+ (load-snapshot-object (snapshot-set-reference-id value) nil set)))))
+ obj)))
+ at end lisp
+
+A full set of source code for @code{snapshot-sets} can be found in the
+Elephant source tree under @code{src/conrib/eslick/snapshot-set.lisp}.
+
+ at subsection Using Snapshot Sets
+
+A snapshot set is quite easy to use. Load the complete code and play
+with this simple walk through. First we need to create a set object,
+
+ at lisp
+(setf my-set (make-instance 'snapshot-set))
+ at end lisp
+
+and add it to the root so we don't lose track of it.
- at subsection Isolating snapshot sets
+ at lisp
+(add-to-root 'my-set my-set)
+ at end lisp
+
+Then we need some objects to play with.
+
+ at lisp
+(defclass my-test-class ()
+ ((value :accessor test-value :initarg :value)
+ (reference :accessor test-reference :initarg :reference)))
+
+(setf obj1 (make-instance 'my-test-class :value 1 :reference nil))
+(setf obj2 (make-instance 'my-test-class :value 2 :reference obj1))
+(setf obj3 (make-instance 'my-test-class :value 3 :reference obj2))
+
+(register-object obj3 my-set)
+(snapshot my-set)
+ at end lisp
+
+Now your set should have persistent versions of all three classes that
+are reachable from @code{obj3}.
+
+ at lisp
+(map-set (lambda (x) (print (test-value x))) my-set)
+=>
+3
+2
+1
+ at end lisp
+
+Of course such fully connected objects are not always common, so we'll
+demonstrate using hash tables to create root indexes into our objects
+and sidestep registration calls entirely. We'll create a fresh set to
+work with.
+
+ at lisp
+(setf my-set (make-instance 'snapshot-set))
+(add-to-root 'my-set my-set)
+
+(setf obj4 (make-instance 'my-test-class :value 4 :reference obj1))
+(setf obj5 (make-instance 'my-test-class :value 5 :reference nil))
+
+(setf hash (make-hash-table))
+(setf (snapshot-root my-set) hash)
+
+(setf (gethash 'obj3 hash) obj3)
+(setf (gethash 'obj4 hash) obj4)
+(setf (gethash 'obj5 hash) obj5)
+
+(snapshot my-set)
+ at end lisp
+
+To properly simulate restoring objects, we need to drop our old hash
+table as well as clear the persistent object cache so the snapshot set
+transient object is reset.
+
+ at lisp
+(setf my-set nil)
+(setf hash nil)
+(elephant::flush-instance-cache *store-controller*)
+ at end lisp
+
+Now we'll pretend we're startup up a new session.
+
+ at lisp
+(setf my-set (get-from-root 'my-set))
+(setf hash (snapshot-root my-set))
+ at end lisp
+
+The cache is automatically populated by the implicit @code{restore}
+call during snapshot-set initialization, and our hash table should now
+have all the proper references. We'll pull out a few.
+
+ at lisp
+(setf o4 (gethash 'obj4 hash))
+(setf o3 (gethash 'obj3 hash))
+(setf o2 (test-reference o3))
+
+(not (or (eq o4 obj4)
+ (eq o3 obj3)
+ (eq o2 obj2)))
+=> t
+ at end lisp
+
+The new objects should not be eq the old ones as we have restored
+fresh copies from the disk.
+
+If you review the setup above, @code{obj3} references @code{obj2}
+which references @code{obj1} and @code{obj4} also references
+ at code{obj1}. So if the objects were properly restored, these
+references should be @code{eq}.
+
+ at lisp
+(eq (test-reference o2) (test-reference o4))
+=> t
+ at end lisp
+
+And finally we can demonstrate the restorative power of snapshot sets.
+
+ at lisp
+(remhash 'obj5 hash)
+
+(gethash 'obj5 hash)
+=> nil nil
+
+(restore my-set)
+(setf hash (snapshot-root my-set))
+
+(gethash 'obj5 hash)
+=> #<MY-TEST-CLASS ..> t
+
+(test-value *)
+=> 5
+ at lisp
+
+This means that while our set object was not reset, the restore
+operation properly restored the old reference structure of our root
+hash object. Unfortunately, in this implementation you have to reset
+your lisp pointers to get access to the restored objects.
+
+A future version could traverse the existing object cache, dropping
+new references and restoring old ones so that in-memory lisp pointers
+were still valid.
+
+ at subsection Isolating multiple snapshot sets
A brief note on how to separate out the objects you want to store from
those you don't may be useful. We want to snapshot groups of
@@ -281,10 +637,9 @@
Of course this doesn't work for multi-threaded environments, or for
[11 lines skipped]
More information about the Elephant-cvs
mailing list