[elephant-cvs] CVS elephant/doc
ieslick
ieslick at common-lisp.net
Thu Apr 19 05:24:37 UTC 2007
Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv1234/doc
Modified Files:
scenarios.texinfo
Log Message:
Documentation edits; edi's lispworks patch to ele build
--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/12 02:47:23 1.4
+++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/19 05:24:37 1.5
@@ -1,55 +1,292 @@
@c -*-texinfo-*-
- at node Usage Scenarios
+ at node Design Patterns
@comment node-name, next, previous, up
- at chapter Usage Scenarios
- at cindex Usage Scenarios
+ at chapter Design Patterns
+ at cindex Design Patterns
@menu
-* File Replacement:: Simple deployment of Elephant as file replacement
+* File System Replacement:: Deployment of Elephant as file replacement
+* Checkpointing Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence.
* Persistent System Objects:: Making persistent objects a natural part of your system
-* Crash Recovery:: How to recover application state from application or system crashes
* Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database
* Multithreaded Web Applications:: Elephant is a natural match for web applications
* Graph-oriented Applications:: Elephant is good, but not optimized, for graph-oriented applications.
* Real-World Application Examples:: See some real-world applications Elephant has been used for and a brief discussion of how it was used and any novel uses of Elephant.
@end menu
- at node File Replacement
+This chapter explores different ways that Elephant can be used to
+solve common problems in user programs. The term ``Design Pattern''
+may be overkill as there is no formal specification of patterns.
+However the goals is similar to classical design patterns: provide a
+coherent description of how to approach ceratain common problems using
+Elephant as an enabling tool.
+
+Most of this chapter falls short of a tutorial in the application of a
+pattern. Instead it provides a conceptual guide to implementing the
+pattern along with some code examples to show how Elephant features
+are invoked to support the pattern.
+
+The authors hope that users of Elephant will find this a good source
+of inspiration for how to apply Elephant to their own programs and
+that they will be motivated to contribute design patterns of their own.
+
+
+ at node File System Replacement
@comment node-name, next, previous, up
- at section File Replacement
+ at section File System Replacement
-One of the annoying overheads in writing many programs is persisting
-data between lisp sessions or invocations of that program. Elements
-such as configuration files, raw data such as graphics and other
-formats take time, attention and are a potential source of bugs.
-Elephant can ease these concerns and allow you to work directly with
-your natural in-memory representations with no work to encode/decode
-formats or manage files in the file system.
+One of the more annoying time-wasting activities in programming is
+saving and restoring data from disk. Data in configuration files,
+static data such as graphics and other formats take time and attention
+away from solving the main problem and are additional sources of bugs.
+Because Elephant's serializer supports most lisp types, Elephant can
+greatly simplify ease these concerns and allow you to work directly
+with your natural in-memory representations with almost no work to
+encode/decode formats or manage files in the file system.
The simplest way to accomplish this is to simply open a store
controller and use the root btree as a key-value store instead of a
-file system directory. You might hide some of the complexity sort of
-like this:
+file system directory. You might hide some of the details like this:
@lisp
-(defmacro def-resource (name initializer)
- (assert (symbolp name))
- `(defparameter name (list nil nil ,initializer)))
-
-(defun call-initializer (init)
- (case init-stmt
- (symbol (funcall (symbol-function init-stmt)))
- (list (apply (first init) (rest init)))))
+(defvar *resources* (make-hash-table))
(defun get-resource (name)
- (if (and (symbol-value name)
- (symbol-value-name)
- (let ((newval (get-from-root name)))
- (if newval
- (setq name (add-to-root name newval))
- (setq name (add-to-root name (call-initializer
- at end lisp
+ (multiple-value-bind (value foundp) (gethash name *resources*)
+ (if foundp
+ value
+ (multiple-value-bind (value foundp) (get-from-root name)
+ (if foundp
+ value
+ (error "Resource named ~A was not initialized" name))))))
+
+(defun set-resource (value name)
+ (add-to-root name value)
+ (setf (gethash name *resources*) value))
+
+(defsetf get-resource set-resource)
+ at end lisp
+
+Another simple metaphor is to use Elephant btrees as persistent hash
+tables that persist key-value pairs for you. We'll wrap the Elephant
+btree in a simple class to provide a little conceptual isolation.
+
+ at lisp
+(defclass phash ()
+ ((btree :accessor phash-btree :initarg :btree :initform (make-btree))))
+
+(defun make-persistent-hash (name)
+ (let ((btree (get-from-root name)))
+ (if btree
+ (make-instance 'phash :btree btree)
+ (let ((phash (make-instance 'phash)))
+ (add-to-root name (phash-btree phash))
+ phash))))
+
+(defun getphash (key phash)
+ (get-value key (phash-btree phash)))
+
+(defun setphash (value key phash)
+ (setf (get-value key (phash-btree phash)) value))
+
+(defsetf getphash setphash)
+ at end lisp
+
+Of course to make a proper abstraction we'd want to provide some
+conditions that allowed restarts that initialized values or allowed
+users to update the hash in the background and continue computation.
+
+ at footnote{Example provided by Ian Eslick, April 2007}
+
+ at node Checkpointing Program State
+ at comment node-name, next, previous, up
+ at section Checkpointing Program State
+
+Another challenge for many programs is saving some subset of program
+state. This could involve checkpointing an evolving computation,
+keeping track of state for the purposes of 'undo' or enabling crash
+recovery at key points in the program's execution.
+
+One approach is to transform all our program state into persistent
+objects. However if the use of program state is slot-access
+intensive, this can have a significant performance impact. To improve
+the performance of the application, careful use of transactions is
+needed which further complicates program design and operation.
+
+Can Elephant be used to provide a simple solution that retains the
+in-memory performance that we want? Can we do all this without having
+to put a ton of persistence assumptions into our main program code?
+The answer is yes, assuming you are willing to explicitly checkpoint
+your code and adhere to some simple constraints in accessing your
+program objects.
+
+ at subsection Assumptions
+
+To get speed, we want all our objects to be standard lisp objects that
+are in memory and have no special harnesses that would interfere with
+using the full power of lisp. At some point in execution, we want to
+store the current state of a bunch of objects to disk, but make it
+easy to reproduce the exact state at a later point in time. For
+simplicity, we'll assume that we are talking about collections of CLOS
+objects.
+
+An additional complication is that many programs have sets of
+interdependant objects. These could be complex program graphs, the
+state of an active search process or a standard OO system that uses a
+bunch of program objects to function. This means that we need to
+persist not just object state, but also references and any object that
+is referred to.
+
+Using CLOS reflection we can provide a general solution to capturing
+objects, slot values and references. However to reproduce references,
+we'll need to be able to find the object referenced and the only
+general way to do that is to store it as well. Thus a snapshot is a
+closed set of self-referential objects.
+
+The assumptions required to implement the simple checkpointing
+implemented here is:
+
+ at itemize
+ at item @strong{Use standard CLOS objects and references to other CLOS objects.}
+We need reflection to
+ at item @strong{Use standard hash tables to keep track of sets of objects.}
+Your program should use the hash table as an entry point to find
+objects. When objects are restored, just replace an existing hash
+table with the new one and access your objects that way. Any parts of
+your program that have pointers into your objects but are not
+themselves snapshotted, will need to be able to refresh their pointers
+in some way.
+ at item @strong{Find your root object (s) and know what is ``reachable'' from them.}
+Ensure that you aren't referring to standard objects outside those you
+want to store as they will be stored too (persistent object references
+are fine though). Make sure your root refers to objects that refers
+to other objects and so on such that all objects you want to store can
+be reached by some set of pointer traversals. Looping references are
+fine.
+ at end itemize
+
+ at subsection Implementation: The Snapshot Set
+
+To generalize all this behavior, we will define a new class called a
+snapshot set. The set itself is a persistent object that wraps a
+btree, but provides all the automation to store and recover sets of
+objects.
+
+ at subsection Isolating snapshot sets
+
+A brief note on how to separate out the objects you want to store from
+those you don't may be useful. We want to snapshot groups of
+inter-referential objects without sucking in the whole system in one
+snapshot. These object sets must be closed and fully connected. If
+the program consists of a set of subgraphs, a root element of each
+graph should be stored in a hash table that is then treated as the
+snapshot root.
+
+ at itemize
+ at item @strong{Manual registration:}
+Objects without external references are easy, just @code{register} or
+ at code{unregister} them from the @code{snapshot-set} as needed and then
+map over them to get them back.
+ at item @strong{Implicit registration:}
+Just store objects in a hash that is the root of a @code{snapshot-set}
+and you are good to go.
+ at item @strong{Graphs:}
+Graphs are easy to store as they naturally consist of a closed set of
+objects. If the graph nodes reference other system objects that you
+don't want to store, you'll need to implement something akin to the
+indirection provided here. Just store the root of the graph in the
+snapshot set root and go from there.
+ at item @strong{All instances of a type:}
+Another easy way to create sets is to overload @code{make-instance} to
+store all new objects in a weak hash table that is treated as the root
+of a @code{snapshot-set} (NOTE: I have not verified that weak hashes
+are properly serialized and reproduced - I suspect they are not so you
+might have to copy after a @code{restore}).
+ at end itemize
+
+For more complex applications, you can isolate these closed sets of
+objects by using @code{snapshot-set} root hash tables as an
+indirection mechanism. Instead of storing direct references in an
+object slot or hash value, isolation is ensured by storing keys and
+indirecting through a hash table to get the target object. This can
+be hidden from the programmer in multiple ways. The easiest way is
+just to make sure that when you store references you store a key and
+overload the slot accessor. A sketch of this follows:
+
+ at lisp
+(defparameter *island1-hash* (make-hash-table))
+(defparameter *island2-hash* (make-hash-table))
+(defvar *unique-id* 0)
+
+(defclass island1-object ()
+ ((pointer-to-island1 :accessor child :initform nil)
+ (pointer-to-island2 :accessor neighbor :initform nil)))
+
+(defmethod neighbor :around ((obj island1-object))
+ (let ((key (call-next-method)))
+ (when key (gethash key *island2-hash*))))
+
+(defmethod (setf neighbor) :around (ref (obj island1-object))
+ (cond ((subtypep (type-of ref) 'island2-object)
+ (let ((key (find-object ref *island2-hash*)))
+ (if key
+ (progn
+ (call-next-method key obj)
+ obj)
+ (progn
+ (setf (gethash (incf *unique-id*) *island2-hash*) ref)
+ (call-next-method *unique-id* obj)
+ obj))))
+ (t (call-next-method))))
+
+(defun find-object (obj hash)
+ (map-hash (lambda (k v)
+ (declare (ignore k))
+ (if (eq obj v)
+ (return-from find-object obj)))
+ hash))
+ at end lisp
+
+The same template would apply to @code{island2} references to
+ at code{island1} objects. You could further simplify creating these
+hash table indirections with a little macro:
+
+ at lisp
+(defmacro def-snapshot-reference-wrapper (accessor-name (source-classname target-classname hashname uid))
+ (with-gensysms (obj key ref)
+ `(progn
+ (defmethod ,accessorname :around ((,obj ,source-classname))
+ (let ((,key (call-next-method)))
+ (when ,key (gethash ,key ,hashname))))
+ (defmethod (setf ,accessorname) :around (,ref (,obj ,source-classname))
+ (cond ((subtypep (type-of ,ref) ,target-classname)
+ (let ((,key (find-object ,ref ,hashname)))
+ (if ,key
+ (progn
+ (call-next-method ,key ,obj)
+ ,obj)
+ (progn
+ (setf (gethash (incf ,uid) ,hashname) ,ref)
+ (call-next-method ,uid ,obj)
+ ,obj))))
+ (t (call-next-method)))))))
+
+(defclass island2-object ()
+ ((pointer-to-island2 :accessor child :initform nil)
+ (pointer-to-island1 :accessor neighbor :initform nil)))
+
+(def-snapshot-reference-wrapper neighbor (island2 island1 *island1-hash* *unique-id*))
+ at end lisp
+
+Of course this doesn't work for multi-threaded environments, or for
+separating more complex collections of types. I am also sure that
+more elegant solutions could be generated.
+
+In most cases, we assume the user will have a natural collection of objects
+that is closed
+
+ at footnote{Example provided by Ian Eslick, April 2007}
@node Persistent System Objects
@comment node-name, next, previous, up
@@ -62,10 +299,6 @@
@item Look up objects using class indices
@end itemize
- at node Crash Recovery
- at comment node-name, next, previous, up
- at section Crash Recovery
-
@node Elephant as Database
@comment node-name, next, previous, up
@section Elephant as Database
@@ -135,7 +368,7 @@
@item Process Components
@item Bulk storage of post-processed web data
@item Class indexes on strings
- at item Cheap associations
+ at item User associations
@item Inverted document index
@end itemize
More information about the Elephant-cvs
mailing list