[elephant-cvs] CVS elephant/doc

Thu Apr 19 22:25:52 UTC 2007

Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv25569/doc

Modified Files:
	scenarios.texinfo 
Log Message:
final snapshot scenario and code changes

--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/19 05:24:37	1.5
+++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/19 22:25:51	1.6
@@ -7,7 +7,7 @@
 
 @menu 
 * File System Replacement:: Deployment of Elephant as file replacement
-* Checkpointing Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
+* Checkpointing Conventional Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
 * Persistent System Objects:: Making persistent objects a natural part of your system
 * Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database
 * Multithreaded Web Applications:: Elephant is a natural match for web applications
@@ -99,9 +99,9 @@
 
 @footnote{Example provided by Ian Eslick, April 2007}
 
- at node Checkpointing Program State
+ at node Checkpointing Conventional Program State
 @comment node-name, next, previous, up
- at section Checkpointing Program State
+ at section Checkpointing Conventional Program State
 
 Another challenge for many programs is saving some subset of program
 state.  This could involve checkpointing an evolving computation,
@@ -168,12 +168,368 @@
 
 @subsection Implementation: The Snapshot Set
 
-To generalize all this behavior, we will define a new class called a
-snapshot set.  The set itself is a persistent object that wraps a
-btree, but provides all the automation to store and recover sets of
-objects.
+In this section we walk through the implementation of the snapshot set
+in detail as it provides:
+
+ at itemize
+ at item Insight into constraints in serialization and lisp object identity
+ at item How to leverage Elephant for some more sophisticated applications than
+      persistent indices and class slots.
+ at item Helps you understand a useful utility (that we may add to an extensions
+      release in the future)
+ at end itemize
+
+To generalize the behavior discussed above, we will define a new
+persistent class called a snapshot set.  The set itself is a wrapper
+around the btree, but provides all the automation to store and recover
+sets of standard objects.
+
+ at lisp
+(defpclass snapshot-set ()
+  ((index :accessor snapshot-set-index :initform (make-btree))
+   (next-id :accessor snapshot-set-next-id :initform 0)
+   (root :accessor snapshot-set-root :initform nil)
+   (cache :accessor snapshot-set-cache 
+          :initform (make-hash-table :weak-keys t) 
+          :transient t)
+   (touched :accessor snapshot-set-touched 
+            :initform (make-array 20 :element-type 'fixnum 
+                         :initial-element 0 :fill-pointer t :adjustable t)
+            :transient t))
+  (:documentation "Keeps track of a set of standard objects
+    allowing a single snapshot call to update the store
+    controller with the latest state of all objects registered with
+    this set"))
+ at end lisp
+
+The set class keeps track of IDs, a set of cached objects in memory,
+the on-disk btree for storing instances by uid and the current uid
+variable value.  Notice the use of the transient keyword argument for
+the cache.
+
+There are two major operations supported by sets @code{snapshot} and
+ at code{restore}.  These save objects to disk and restore objects to
+memory, along with proper recovery of multiple references to the same
+object.
+
+Additional operations are:
+
+ at itemize 
+ at item Registration: Adding and removing objects from a set
+ at item Root operations: Easy access to a single root hash table or object
+ at item Mapping: Walk over all objects in a set
+ at end itemize
+
+To enable snapshots, we have to register a set of root objects with
+the set. This function ignores objects that are already cached,
+otherwise allocates a new ID and caches the object.  
+
+ at lisp 
+(defmethod register-object ((object standard-object) (set snapshot-set))
+  "Register a standard object.  Not recorded until snapshot is called on db"
+  (aif (lookup-cached-id object set)
+       (values object it)
+       (let ((id (incf (snapshot-set-next-id set))))
+	 (cache-snapshot-object id object set)
+	 (values object id))))
+
+(defun lookup-cached-id (obj set)
+  (gethash obj (snapshot-set-cache set)))
+
+(defun cache-snapshot-object (id obj set)
+  (setf (gethash obj (snapshot-set-cache set)) id))
+ at end lisp
+
+A parallel function registers hash tables.  One very important
+invariant implied here is that the cache always contains objects that
+are eq and mapped back to a serialized object in the backing btree.
+There is no need, however, to immediately write objects to the store
+and this gives us some transactional properties: snapshots are atomic,
+consistent and durable.  Isolation is not enforced by snapshots.
+
+This means that the transient cache has to be valid immediately after
+the snapshot set is loaded from the data store.
+
+ at lisp
+(defmethod initialize-instance :after ((set snapshot-set) &key lazy-load &allow-other-keys)
+  (unless lazy-load (restore set)))
+ at end lisp
+
+This also has consequences for unregistration.  Removing a root object
+should also result in the removal of all objects that are unreachable
+from other roots.  However, since side effects are not permanent until
+a snapshot operation, we merely have to garbage collect id's that were
+not touched during a snapshot operation.  This makes unregistration
+simple.
+
+ at lisp
+(defmethod unregister-object (object (set snapshot-set))
+  "Drops the object from the cache and backing store"
+  (let ((id (gethash object (snapshot-set-cache set))))
+    (when (null id)
+      (error "Object ~A not registered in ~A" object set))
+    (drop-cached-object object set)))
+ at end lisp
+
+But snapshots are a little bit more work.
+
+ at lisp
+(defmethod snapshot ((set snapshot-set))
+  "Saves all objects in the set (and any objects reachable from the
+   current set of objects) to the persistent store"
+  (with-transaction (:store-controller (get-con (snapshot-set-index set)))
+    (loop for (obj . id) in (get-cache-entries (snapshot-set-cache set)) do
+	  (save-snapshot-object id obj set))
+    (collect-untouched set)))
+
+(defun save-snapshot-object (id obj set)
+  (unless (touched id set)
+    (setf (get-value id (snapshot-set-index set))
+	  (cond ((standard-object-subclass-p obj)
+		 (save-proxy-object obj set))
+		((hash-table-p obj)
+		 (save-proxy-hash obj set))
+		(t (error "Cannot only snapshot standard-objects and hash-tables"))))
+    (touch id set))
+  id)
+
+(defun collect-untouched (set)
+  (map-btree (lambda (k v) 
+	       (unless (touched k set)
+		 (remove-kv k (snapshot-set-index set))))
+	     (snapshot-set-index set))
+  (clear-touched set))
+ at end lisp
+
+We go through all objects in the cache, storing objects as we go via
+ at code{save-snapshot-object}.  This function is responsible for storing
+objects and hash tables and recursing on any instances that are
+referenced.  Any object that is saved is added to a touch list so they
+are not stored again and we can mark stored instances for the
+ at code{collect-untouched} call which ensures that newly unreachable
+objects are deleted from the persistent store.  Any newly found
+objects are added to the in-memory cache which, being a weak array,
+should eventually drop references to objects that are not referred to
+elsewhere.
+
+It should be noted that garbage objects not garbage collected from the
+weak-array based cache may be stored to and restored from the
+persistent store.  However this is merely a storage overhead as they
+will eventually be dropped across sessions as there are no saved
+references to them.
+
+Now when we serialize a standard object, all the slot values are
+stored inline.  This means that by default, a slot that refers to a
+standard object would get an immediately serialized version rather
+than a reference.  This of course makes it impossible to restore
+multiple references to a single object.  The approach taken here is to
+instantiate a @emphasize{proxy} object which is a copy of the original
+class and stores references to normal values in its slots.  Any
+references to hashes or standard classes are replaced with a reference
+object that records the unique id of the object so it can be properly
+restored.
+
+ at lisp
+(defun save-proxy-object (obj set)
+  (let ((svs (subsets 2 (slots-and-values obj))))
+    (if (some #'reified-class-p (mapcar #'second svs))
+	(let ((proxy (make-instance (type-of obj))))
+	  (loop for (slotname value) in svs do
+	       (setf (slot-value proxy slotname)
+		     (if (reify-class-p value)
+			 (reify-value value set)
+			 value)))
+	  proxy)
+	obj)))
+ at end lisp
+
+The function checks whether any slot value can be reified (represented
+by a unique id) and if so, makes a new proxy instance and properly
+instantiates its slots, returning it to the main store function which
+writes the proxy object to the btree. 
+
+On restore, we simply load all objects into memory.
+
+ at lisp
+(defmethod restore ((set snapshot-set))
+  "Restores a snapshot by setting the snapshot-set state to the last snapshot.
+   If this is used during runtime, the user needs to drop all references
+   to objects and retrieve again from the snapshot set.  Also used to initialize
+   the set state when a set is created, for example pulled from the root of a
+   store-controller, unless :lazy-load is specified"
+  (clear-cache set)
+  (map-btree (lambda (id object)
+	       (load-snapshot-object id object set))
+	     (snapshot-set-index set)))
+
+(defun load-snapshot-object (id object set)
+  (let ((object (ifret object (get-value id (snapshot-set-index set)))))
+    (cond ((standard-object-subclass-p object)
+	   (load-proxy-object id object set))
+	  ((hash-table-p object)
+	   (load-proxy-hash id object set))
+	  (t (error "Unrecognized type ~A for id ~A in set ~A" (type-of object) id set)))))
+ at end lisp
+
+If an object has a reference object in a slot, then we simply restore
+that object as well.  @code{load-snapshot-object} accepts null for an
+object so it can be used recursively when a reference object refers to
+an object (via the unique id) that is not yet cached.  The @code{load}
+functions return an object so that they can used directly to create
+values for writing slots or hash entries.
+
+ at lisp
+(defun load-proxy-object (id obj set)
+  (ifret (lookup-cached-object id set)
+	 (progn
+	   (cache-snapshot-object id obj set)
+	   (let ((svs (subsets 2 (slots-and-values obj))))
+	     (loop for (slotname value) in svs do
+		  (when (setrefp value)
+		    (setf (slot-value obj slotname)
+			  (load-snapshot-object (snapshot-set-reference-id value) nil set)))))
+	   obj)))
+ at end lisp
+
+A full set of source code for @code{snapshot-sets} can be found in the
+Elephant source tree under @code{src/conrib/eslick/snapshot-set.lisp}.
+
+ at subsection Using Snapshot Sets
+
+A snapshot set is quite easy to use.  Load the complete code and play
+with this simple walk through.  First we need to create a set object,
+
+ at lisp
+(setf my-set (make-instance 'snapshot-set))
+ at end lisp
+
+and add it to the root so we don't lose track of it.
 
- at subsection Isolating snapshot sets
+ at lisp
+(add-to-root 'my-set my-set)
+ at end lisp
+
+Then we need some objects to play with.
+
+ at lisp
+(defclass my-test-class ()
+  ((value :accessor test-value :initarg :value)
+   (reference :accessor test-reference :initarg :reference)))
+
+(setf obj1 (make-instance 'my-test-class :value 1 :reference nil))
+(setf obj2 (make-instance 'my-test-class :value 2 :reference obj1))
+(setf obj3 (make-instance 'my-test-class :value 3 :reference obj2))
+
+(register-object obj3 my-set)
+(snapshot my-set)
+ at end lisp
+
+Now your set should have persistent versions of all three classes that
+are reachable from @code{obj3}.
+
+ at lisp
+(map-set (lambda (x) (print (test-value x))) my-set)
+=>
+3
+2
+1
+ at end lisp
+
+Of course such fully connected objects are not always common, so we'll
+demonstrate using hash tables to create root indexes into our objects
+and sidestep registration calls entirely.  We'll create a fresh set to
+work with.
+
+ at lisp
+(setf my-set (make-instance 'snapshot-set))
+(add-to-root 'my-set my-set)
+
+(setf obj4 (make-instance 'my-test-class :value 4 :reference obj1))
+(setf obj5 (make-instance 'my-test-class :value 5 :reference nil))
+
+(setf hash (make-hash-table))
+(setf (snapshot-root my-set) hash)
+
+(setf (gethash 'obj3 hash) obj3)
+(setf (gethash 'obj4 hash) obj4)
+(setf (gethash 'obj5 hash) obj5)
+
+(snapshot my-set)
+ at end lisp
+
+To properly simulate restoring objects, we need to drop our old hash
+table as well as clear the persistent object cache so the snapshot set
+transient object is reset.
+
+ at lisp
+(setf my-set nil)
+(setf hash nil)
+(elephant::flush-instance-cache *store-controller*)
+ at end lisp
+
+Now we'll pretend we're startup up a new session.
+
+ at lisp
+(setf my-set (get-from-root 'my-set))
+(setf hash (snapshot-root my-set))
+ at end lisp
+
+The cache is automatically populated by the implicit @code{restore}
+call during snapshot-set initialization, and our hash table should now
+have all the proper references.  We'll pull out a few.
+
+ at lisp
+(setf o4 (gethash 'obj4 hash))
+(setf o3 (gethash 'obj3 hash))
+(setf o2 (test-reference o3))
+
+(not (or (eq o4 obj4)
+         (eq o3 obj3)
+         (eq o2 obj2)))
+=> t
+ at end lisp
+
+The new objects should not be eq the old ones as we have restored
+fresh copies from the disk.  
+
+If you review the setup above, @code{obj3} references @code{obj2}
+which references @code{obj1} and @code{obj4} also references
+ at code{obj1}.  So if the objects were properly restored, these
+references should be @code{eq}.
+
+ at lisp
+(eq (test-reference o2) (test-reference o4))
+=> t
+ at end lisp
+
+And finally we can demonstrate the restorative power of snapshot sets.
+
+ at lisp
+(remhash 'obj5 hash)
+
+(gethash 'obj5 hash)
+=> nil nil
+
+(restore my-set)
+(setf hash (snapshot-root my-set))
+
+(gethash 'obj5 hash)
+=> #<MY-TEST-CLASS ..> t
+
+(test-value *)
+=> 5
+ at lisp
+
+This means that while our set object was not reset, the restore
+operation properly restored the old reference structure of our root
+hash object.  Unfortunately, in this implementation you have to reset
+your lisp pointers to get access to the restored objects.
+
+A future version could traverse the existing object cache, dropping
+new references and restoring old ones so that in-memory lisp pointers
+were still valid.
+
+ at subsection Isolating multiple snapshot sets
 
 A brief note on how to separate out the objects you want to store from
 those you don't may be useful.  We want to snapshot groups of
@@ -281,10 +637,9 @@
 
 Of course this doesn't work for multi-threaded environments, or for

[11 lines skipped]