[elephant-cvs] CVS elephant/doc

ieslick ieslick at common-lisp.net
Thu Apr 19 05:24:37 UTC 2007


Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv1234/doc

Modified Files:
	scenarios.texinfo 
Log Message:
Documentation edits; edi's lispworks patch to ele build

--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/12 02:47:23	1.4
+++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/19 05:24:37	1.5
@@ -1,55 +1,292 @@
 @c -*-texinfo-*-
 
- at node Usage Scenarios
+ at node Design Patterns
 @comment node-name, next, previous, up
- at chapter Usage Scenarios
- at cindex Usage Scenarios
+ at chapter Design Patterns
+ at cindex Design Patterns
 
 @menu 
-* File Replacement:: Simple deployment of Elephant as file replacement
+* File System Replacement:: Deployment of Elephant as file replacement
+* Checkpointing Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
 * Persistent System Objects:: Making persistent objects a natural part of your system
-* Crash Recovery:: How to recover application state from application or system crashes
 * Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database
 * Multithreaded Web Applications:: Elephant is a natural match for web applications
 * Graph-oriented Applications:: Elephant is good, but not optimized, for graph-oriented applications.
 * Real-World Application Examples:: See some real-world applications Elephant has been used for and a brief discussion of how it was used and any novel uses of Elephant.
 @end menu
 
- at node File Replacement
+This chapter explores different ways that Elephant can be used to
+solve common problems in user programs.  The term ``Design Pattern''
+may be overkill as there is no formal specification of patterns.
+However the goals is similar to classical design patterns: provide a
+coherent description of how to approach ceratain common problems using
+Elephant as an enabling tool.
+
+Most of this chapter falls short of a tutorial in the application of a
+pattern.  Instead it provides a conceptual guide to implementing the
+pattern along with some code examples to show how Elephant features
+are invoked to support the pattern.
+
+The authors hope that users of Elephant will find this a good source
+of inspiration for how to apply Elephant to their own programs and
+that they will be motivated to contribute design patterns of their own.
+
+
+ at node File System Replacement
 @comment node-name, next, previous, up
- at section File Replacement
+ at section File System Replacement
 
-One of the annoying overheads in writing many programs is persisting
-data between lisp sessions or invocations of that program.  Elements
-such as configuration files, raw data such as graphics and other
-formats take time, attention and are a potential source of bugs.
-Elephant can ease these concerns and allow you to work directly with
-your natural in-memory representations with no work to encode/decode
-formats or manage files in the file system.
+One of the more annoying time-wasting activities in programming is
+saving and restoring data from disk.  Data in configuration files,
+static data such as graphics and other formats take time and attention
+away from solving the main problem and are additional sources of bugs.
+Because Elephant's serializer supports most lisp types, Elephant can
+greatly simplify ease these concerns and allow you to work directly
+with your natural in-memory representations with almost no work to
+encode/decode formats or manage files in the file system.
 
 The simplest way to accomplish this is to simply open a store
 controller and use the root btree as a key-value store instead of a
-file system directory.  You might hide some of the complexity sort of
-like this:
+file system directory.  You might hide some of the details like this:
 
 @lisp
-(defmacro def-resource (name initializer)
-  (assert (symbolp name))
-  `(defparameter name (list nil nil ,initializer)))
-
-(defun call-initializer (init)
-  (case init-stmt
-    (symbol (funcall (symbol-function init-stmt)))
-    (list (apply (first init) (rest init)))))
+(defvar *resources* (make-hash-table))
 
 (defun get-resource (name)
-  (if (and (symbol-value name)
-      (symbol-value-name)
-      (let ((newval (get-from-root name)))
-        (if newval
-            (setq name (add-to-root name newval))
-            (setq name (add-to-root name (call-initializer
- at end lisp 
+  (multiple-value-bind (value foundp) (gethash name *resources*)
+    (if foundp
+        value
+        (multiple-value-bind (value foundp) (get-from-root name)
+           (if foundp
+               value
+               (error "Resource named ~A was not initialized" name))))))
+
+(defun set-resource (value name)
+  (add-to-root name value)
+  (setf (gethash name *resources*) value))
+
+(defsetf get-resource set-resource)
+ at end lisp
+
+Another simple metaphor is to use Elephant btrees as persistent hash
+tables that persist key-value pairs for you.  We'll wrap the Elephant
+btree in a simple class to provide a little conceptual isolation.
+
+ at lisp
+(defclass phash ()
+  ((btree :accessor phash-btree :initarg :btree :initform (make-btree))))
+
+(defun make-persistent-hash (name)
+  (let ((btree (get-from-root name)))
+    (if btree
+        (make-instance 'phash :btree btree)
+        (let ((phash (make-instance 'phash)))
+           (add-to-root name (phash-btree phash))
+           phash))))
+
+(defun getphash (key phash)
+  (get-value key (phash-btree phash)))
+
+(defun setphash (value key phash)
+  (setf (get-value key (phash-btree phash)) value))
+
+(defsetf getphash setphash)
+ at end lisp
+
+Of course to make a proper abstraction we'd want to provide some
+conditions that allowed restarts that initialized values or allowed
+users to update the hash in the background and continue computation.
+
+ at footnote{Example provided by Ian Eslick, April 2007}
+
+ at node Checkpointing Program State
+ at comment node-name, next, previous, up
+ at section Checkpointing Program State
+
+Another challenge for many programs is saving some subset of program
+state.  This could involve checkpointing an evolving computation,
+keeping track of state for the purposes of 'undo' or enabling crash
+recovery at key points in the program's execution.  
+
+One approach is to transform all our program state into persistent
+objects.  However if the use of program state is slot-access
+intensive, this can have a significant performance impact.  To improve
+the performance of the application, careful use of transactions is
+needed which further complicates program design and operation.
+
+Can Elephant be used to provide a simple solution that retains the
+in-memory performance that we want?  Can we do all this without having
+to put a ton of persistence assumptions into our main program code?
+The answer is yes, assuming you are willing to explicitly checkpoint
+your code and adhere to some simple constraints in accessing your
+program objects.
+
+ at subsection Assumptions
+
+To get speed, we want all our objects to be standard lisp objects that
+are in memory and have no special harnesses that would interfere with
+using the full power of lisp.  At some point in execution, we want to
+store the current state of a bunch of objects to disk, but make it
+easy to reproduce the exact state at a later point in time.  For
+simplicity, we'll assume that we are talking about collections of CLOS
+objects.
+
+An additional complication is that many programs have sets of
+interdependant objects.  These could be complex program graphs, the
+state of an active search process or a standard OO system that uses a
+bunch of program objects to function.  This means that we need to
+persist not just object state, but also references and any object that
+is referred to.
+
+Using CLOS reflection we can provide a general solution to capturing
+objects, slot values and references.  However to reproduce references,
+we'll need to be able to find the object referenced and the only
+general way to do that is to store it as well.  Thus a snapshot is a
+closed set of self-referential objects.  
+
+The assumptions required to implement the simple checkpointing
+implemented here is:
+
+ at itemize
+ at item @strong{Use standard CLOS objects and references to other CLOS objects.}
+We need reflection to 
+ at item @strong{Use standard hash tables to keep track of sets of objects.}
+Your program should use the hash table as an entry point to find
+objects.  When objects are restored, just replace an existing hash
+table with the new one and access your objects that way.  Any parts of
+your program that have pointers into your objects but are not
+themselves snapshotted, will need to be able to refresh their pointers
+in some way.
+ at item @strong{Find your root object (s) and know what is ``reachable'' from them.}
+Ensure that you aren't referring to standard objects outside those you
+want to store as they will be stored too (persistent object references
+are fine though).  Make sure your root refers to objects that refers
+to other objects and so on such that all objects you want to store can
+be reached by some set of pointer traversals.  Looping references are
+fine.
+ at end itemize
+
+ at subsection Implementation: The Snapshot Set
+
+To generalize all this behavior, we will define a new class called a
+snapshot set.  The set itself is a persistent object that wraps a
+btree, but provides all the automation to store and recover sets of
+objects.
+
+ at subsection Isolating snapshot sets
+
+A brief note on how to separate out the objects you want to store from
+those you don't may be useful.  We want to snapshot groups of
+inter-referential objects without sucking in the whole system in one
+snapshot.  These object sets must be closed and fully connected.  If
+the program consists of a set of subgraphs, a root element of each
+graph should be stored in a hash table that is then treated as the
+snapshot root.
+
+ at itemize
+ at item @strong{Manual registration:} 
+Objects without external references are easy, just @code{register} or
+ at code{unregister} them from the @code{snapshot-set} as needed and then
+map over them to get them back.
+ at item @strong{Implicit registration:} 
+Just store objects in a hash that is the root of a @code{snapshot-set}
+and you are good to go.
+ at item @strong{Graphs:} 
+Graphs are easy to store as they naturally consist of a closed set of
+objects.  If the graph nodes reference other system objects that you
+don't want to store, you'll need to implement something akin to the
+indirection provided here.  Just store the root of the graph in the
+snapshot set root and go from there.
+ at item @strong{All instances of a type:} 
+Another easy way to create sets is to overload @code{make-instance} to
+store all new objects in a weak hash table that is treated as the root
+of a @code{snapshot-set} (NOTE: I have not verified that weak hashes
+are properly serialized and reproduced - I suspect they are not so you
+might have to copy after a @code{restore}).
+ at end itemize
+
+For more complex applications, you can isolate these closed sets of
+objects by using @code{snapshot-set} root hash tables as an
+indirection mechanism.  Instead of storing direct references in an
+object slot or hash value, isolation is ensured by storing keys and
+indirecting through a hash table to get the target object.  This can
+be hidden from the programmer in multiple ways.  The easiest way is
+just to make sure that when you store references you store a key and
+overload the slot accessor.  A sketch of this follows:
+
+ at lisp
+(defparameter *island1-hash* (make-hash-table))
+(defparameter *island2-hash* (make-hash-table))
+(defvar *unique-id* 0)
+
+(defclass island1-object ()
+  ((pointer-to-island1 :accessor child :initform nil)
+   (pointer-to-island2 :accessor neighbor :initform nil)))
+
+(defmethod neighbor :around ((obj island1-object))
+  (let ((key (call-next-method)))
+     (when key (gethash key *island2-hash*))))
+
+(defmethod (setf neighbor) :around (ref (obj island1-object))
+  (cond ((subtypep (type-of ref) 'island2-object)
+         (let ((key (find-object ref *island2-hash*)))
+           (if key 
+               (progn
+                 (call-next-method key obj)
+                 obj)
+               (progn
+                 (setf (gethash (incf *unique-id*) *island2-hash*) ref)
+                 (call-next-method *unique-id* obj)
+                 obj))))
+        (t (call-next-method))))
+
+(defun find-object (obj hash)
+   (map-hash (lambda (k v)
+               (declare (ignore k))
+               (if (eq obj v)
+                   (return-from find-object obj)))
+              hash))
+ at end lisp
+
+The same template would apply to @code{island2} references to
+ at code{island1} objects.  You could further simplify creating these
+hash table indirections with a little macro:
+
+ at lisp
+(defmacro def-snapshot-reference-wrapper (accessor-name (source-classname target-classname hashname uid))
+  (with-gensysms (obj key ref)
+   `(progn
+      (defmethod ,accessorname :around ((,obj ,source-classname))
+         (let ((,key (call-next-method)))
+           (when ,key (gethash ,key ,hashname))))
+      (defmethod (setf ,accessorname) :around (,ref (,obj ,source-classname))
+         (cond ((subtypep (type-of ,ref) ,target-classname)
+                (let ((,key (find-object ,ref ,hashname)))
+                  (if ,key 
+                      (progn
+                        (call-next-method ,key ,obj)
+                        ,obj)
+                      (progn
+                        (setf (gethash (incf ,uid) ,hashname) ,ref)
+                        (call-next-method ,uid ,obj)
+                        ,obj))))
+               (t (call-next-method)))))))
+
+(defclass island2-object ()
+  ((pointer-to-island2 :accessor child :initform nil)
+   (pointer-to-island1 :accessor neighbor :initform nil)))
+
+(def-snapshot-reference-wrapper neighbor (island2 island1 *island1-hash* *unique-id*))
+ at end lisp
+
+Of course this doesn't work for multi-threaded environments, or for
+separating more complex collections of types.  I am also sure that
+more elegant solutions could be generated.
+
+In most cases, we assume the user will have a natural collection of objects
+that is closed
+
+ at footnote{Example provided by Ian Eslick, April 2007}
 
 @node Persistent System Objects
 @comment node-name, next, previous, up
@@ -62,10 +299,6 @@
 @item Look up objects using class indices
 @end itemize
 
- at node Crash Recovery
- at comment node-name, next, previous, up
- at section Crash Recovery
-
 @node Elephant as Database
 @comment node-name, next, previous, up
 @section Elephant as Database
@@ -135,7 +368,7 @@
 @item Process Components
 @item Bulk storage of post-processed web data
 @item Class indexes on strings
- at item Cheap associations
+ at item User associations
 @item Inverted document index
 @end itemize   
 




More information about the Elephant-cvs mailing list