[elephant-cvs] CVS elephant/doc

Sun Apr 1 20:22:24 UTC 2007

Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv22287

Modified Files:
	tutorial.texinfo user-guide.texinfo 
Log Message:
Documentation changes, mostly to transaction section of tutorial

--- /project/elephant/cvsroot/elephant/doc/tutorial.texinfo	2007/04/01 14:33:29	1.11
+++ /project/elephant/cvsroot/elephant/doc/tutorial.texinfo	2007/04/01 20:22:24	1.12
@@ -732,6 +732,8 @@
 transaction that performs all the updates atomically and thus
 enforcing consistency.
 
+ at subsection Why do we need Transactions?
+
 Most real applications will need to use explicit transactions rather
 than relying on the primitives alone because you will want multiple
 read-modify-update operations act as an atomic unit.  A good example
@@ -815,6 +817,8 @@
 And presto, we have an ACID compliant, thread-safe, persistent banking
 system!  
 
+ at subsection Using @code{with-transaction}
+
 What is @code{with-transaction} really doing for us?  It first starts
 a new transaction, attempts to execute the body, and if successful
 commit the transaction.  If anywhere along the way there is a deadlock
@@ -823,14 +827,145 @@
 to retry the transaction a fixed number of times by re-executing the
 whole body.
 
-The other value transactions provide is the capability to delay
-flushing dirty data to disk.  The most time-intensive part of
-persistent operations is flushing newly written data to disk.  Using
-the default auto-commit behavior requires a flush on every operation
-which can become very expensive.  Because a transaction caches values,
-all the values read or written are cached in memory until the
-transaction completes, dramatically decreasing the number of flushes
-and the total time taken.
+And this brings us to two important caveats: nested transactions and
+idempotent side-effects.
+
+ at subsection Nesting Transactions
+
+In general, you want to avoid nesting @code{with-transaction}
+statements.  Nested transactions are valid for some data stores
+(namely Berkeley DB), but typically only a single transaction can be
+active at a time.  The purpose of a nested transaction in data stores
+that provide it, is break a long transaction into chunks.  This way if
+there is contention on a given subset of variables, only the inner
+transaction is restarted while the larger transaction can continue.
+When commit their results, those results become part of the outer
+transaction until it in turn commits.
+
+If you have transaction protected primitive operations (such as
+ at code{deposit} and @code{withdraw}) and you want to perform a group of
+such transactions, for example a transfer between accounts, you can
+use the macro @code{ensure-transaction} instead of @code{with-transaction}.
+
+ at lisp
+(defun deposit (account amount)
+  "Wrap the balance read and the setf with the new balance"
+  (ensure-transaction ()
+    (let ((balance (balance account)))
+      (setf (balance account) 
+            (+ balance amount)))))
+
+(defun deposit (account amount)
+  "A more concise version with decf doing both read and write"
+  (ensure-transaction ()
+    (decf (balance account) amount)))
+
+(defun withdraw (account amount)
+  (ensure-transaction ()
+    (decf (balance account) amount)))
+
+(defun transfer (src dst amount)
+  "There are four primitive read/write operations 
+   grouped together in this transaction"
+  (with-transaction ()
+    (withdraw src amount)
+    (deposit dst amount)))
+ at end lisp
+
+ at code{ensure-transaction} is exactly like @code{with-transaction}
+except it will reuse an existing transaction, if there is one, or
+create a new one.  There is no harm, in fact, in using this macro all
+the time.
+
+Notice the use of @code{decf} and @code{incf} above.  The primary
+reason to use Lisp is that it is good at hiding complexity using
+shorthand constructs just like this.  This also means it is also going
+to be good at hiding data dependencies that should be captured in a
+transaction!
+
+ at subsection Idempotent Side Effects
+
+Within the body of a with-transaction, any non database operations
+need to be @emph{idempotent}.  That is the side effects of the body
+must be the same no matter how many times the body is executed.  This
+is done automatically for side effects on the database, but not for
+side effects like pushing a value on a lisp list, or creating a new
+standard object.
+
+ at lisp
+(defparameter *transient-objects* nil)
+
+(defun load-transients (n)
+   "This is the wrong way!"
+   (with-transaction ()
+      (loop for i from 0 upto n do
+         (push (get-from-root i) *transient-objects*))))
+ at end lisp
+
+In this contrived example we are pulling a set of standard objects
+from the database using an integer key and pushing them onto a list
+for later use.  However, if there is a conflict where some other
+process writes a key-value pair to a matching key, the whole
+transaction will abort and the loop will be run again.  In a heavily
+contended system you might see results like the following.
+
+ at lisp
+(defun test-list ()
+   (setf *transient-objects* nil)
+   (load-transients)
+   (length *transient-objects*))
+
+(test-list)
+=> 3
+
+(test-list)
+=> 5
+
+(test-list)
+=> 4
+ at end lisp
+
+So the solution is to make sure that the operation on the lisp
+parameters is atomic if the transaction completes.
+
+ at lisp
+(defun load-transients ()
+  "This is a better way"
+  (setq *transient-objects*
+        (with-transaction ()
+            (loop for i from 0 upto 3 collect
+                  (get-from-root i)))))
+ at end lisp
+
+Of course we would need to use @code{nreverse} if we cared about the
+order of instances in @code{*transient-objects*}.  The best rule of
+thumb is that transaction bodies should be purely functional as above,
+except for side effects to the persistent store such as persistent
+slot writes, adding to btrees, etc).
+
+If you do need side effects to lisp memory, such as writes to
+transient slots, make sure they are idempotent and that other
+processes will not be reading the written values until the transaction
+completes.
+
+ at subsection Transactions and Performance
+
+By now transactions almost look like more work than they are worth!
+Well there are still some significant benefits to be had.  Part of how
+transactions are implemented is that they gather together all the
+writes that are supposed to made to the database and store them until
+the transaction commits, and then writes them atomically.  
+
+The most time-intensive part of persistent operations is flushing
+newly written data to disk.  Using the default auto-committing
+behavior requires a flush for every primitive write operation.  This
+can become very expensive!  Because all the values read or written are
+cached in memory until the transaction completes, the number of
+flushes can be dramatically reduced.
+
+But don't take my word for it, run the following statements and see
+for yourself the visceral impact transactions can have on system
+performance.
 
 @lisp
 (defpclass test ()
@@ -872,52 +1007,42 @@
 thumb is to keep the number of objects touched in a transaction well
 under 1000.
 
-And this brings us to the last caveat we'll introduce in this
-introductory tutorial: nested transactions.
-
-In general, avoid nesting transactions.  Nested transactions are valid
-for some data stores (namely Berkeley DB), but typically only a single
-transaction is valid at a time.  The purpose of a nested transaction
-is to allow a long transaction to be broken up into chunks.  This way
-if there is contention on a given subset of variables, only the
-subtransaction is restarted while the larger transaction can continue.
-Subtransactions commit their results and they become part of the
-outer transaction until it in turn commits.
-
-If you have transaction protected primitive operations (such as
- at code{deposit} and @code{withdraw}) and you want to perform a group of
-such transactions, for example a transfer between accounts, you can
-use the macro @code{ensure-transaction} instead of @code{with-transaction}.
-
- at lisp
-(defun deposit (account amount)
-  (ensure-transaction ()
-    (let ((balance (balance account)))
-      (setf (balance account) 
-            (+ balance amount)))))
-
-(defun withdraw (account amount)
-  (ensure-transaction ()
-    (decf (balance account) amount)))
-
-(defun transfer (src dst amount)
-  (with-transaction ()
-    (withdraw src amount)
-    (deposit dst amount)))
- at end lisp
-
- at code{ensure-transaction} is exactly like @code{with-transaction}
-except it will reuse an existing transaction, if there is one, or
-create a new one.  There is no harm, in fact, in using this macro all
-the time.
+ at subsection Transactions and Applications
 
 Designing and tuning a transactional architecture can become quite
-complicated.  The best strategy at the beginning is a conservative
-one, break things up into the smallest logical sets of primitive
-operations and only wrap higher level functions in transactions when
-they absolutely have to commit together. See @ref{Transaction Details}
-for the full details and @pxref{Usage Scenarios} for more examples of
-how systems can be designed and tuned using transactions.
+complex.  Moreover, bugs in your system can be very difficult to find
+as they only show up when transactions are interleaved within a
+larger, multi-threaded application.  
+
+In many cases, however, you can ignore transactions.  For example,
+when you don't have any other concurrent processes running.  In this
+case all operations are sequential and there is no chance of
+conflicts.  You would only want to use transactions for write
+performance.
+
+You can also ignore transactions if your application can guarantee
+that concurrency won't generate any conflicts.  For example, a web app
+that guarantees only one thread will write to objects in a particular
+session can avoid transactions altogether.  However, it is good to be
+careful about making these assumptions.  In the above example, a
+reporting function that iterates over sessions, users or other objects
+may still see partial updates (i.e. a user's id was written prior to
+the query, but not the name).  However, if you don't care about these
+infrequent glitches, this case would still hold.
+
+If these cases don't apply to your application, or you aren't sure,
+you will fare best by programming defensively.  Break your system into
+the smallest logical sets of primitive operations
+(i.e. @code{withdraw} and @code{deposit}) using
+ at code{ensure-transaction} and then wrap the highest level calls made
+to your system in with-transaction when the operations absolutely have
+to commit together or you need the extra performance.  Try not to have
+more than two levels of transactional accesses with the top using
+with-transaction and the bottom using ensure-transaction.
+
+ at xref{Transaction Details} for more details and @pxref{Usage
+Scenarios} for examples of how systems can be designed and tuned using
+transactions.
 
 @node Advanced Topics
 @comment node-name, next, previous, up
--- /project/elephant/cvsroot/elephant/doc/user-guide.texinfo	2007/04/01 14:33:29	1.5
+++ /project/elephant/cvsroot/elephant/doc/user-guide.texinfo	2007/04/01 20:22:24	1.6
@@ -23,26 +23,6 @@
 * Performance Tuning:: How to get the most from Elephant.
 @end menu
 
- at node Persistent objects
- at comment node-name, next, previous, up
- at section Persistent Objects
-
-Finally, if you for some reason make an instance with a specified OID
-which already exists in the database, @code{initargs} take precedence
-over values in the database, which take precedences over
- at code{initforms}.
-
-Also currently there is a bug where
- at code{initforms} are always evaluated, so beware.
-(What is the current model here?)
-
-Readers, writers, accessors, and @code{slot-value-using-class} are
-employed in redirecting slot accesses to the database, so override
-these with care.  Because @code{slot-value, slot-boundp,
-slot-makunbound} are not generic functions, they are not guaranteed by
-the specification to work properly with persistent slots.  However the
-proper behavior has been verified on SBCL, Allegro and Lispworks.  
-
 @node The Store Controller
 @comment node-name, next, previous, up
 @section The Store Controller
@@ -90,6 +70,26 @@
 
 Empty.
 
+ at node Persistent objects
+ at comment node-name, next, previous, up
+ at section Persistent Objects
+
+Finally, if you for some reason make an instance with a specified OID
+which already exists in the database, @code{initargs} take precedence
+over values in the database, which take precedences over
+ at code{initforms}.
+
+Also currently there is a bug where
+ at code{initforms} are always evaluated, so beware.
+(What is the current model here?)
+
+Readers, writers, accessors, and @code{slot-value-using-class} are
+employed in redirecting slot accesses to the database, so override
+these with care.  Because @code{slot-value, slot-boundp,
+slot-makunbound} are not generic functions, they are not guaranteed by
+the specification to work properly with persistent slots.  However the
+proper behavior has been verified on SBCL, Allegro and Lispworks.  
+
 @node Class Indices
 @comment node-name, next, previous, up
 @section Class Indices
@@ -141,6 +141,111 @@
 @comment node-name, next, previous, up
 @section Querying persistent instances
 
+
+
+A SQL select-like interface is in the works, but for now queries are
+limited to manual mapping over class instances or doing small queries
+with @code{get-instances-*} functions.  One advantage of this is that
+it is easy to estimate the performance costs of your queries and to
+choose standard and derived indices that give you the ordering and
+performance you want.
+
+There is, however, a quick and dirty query API example that is not
+officially supported in the release but is intended to invite comment.
+This is an example of a full query system that would automatically
+perform joins, use the appropriate indices and perhaps even adaptively
+suggest or add indices to facilitate better performance on common
+queries.
+
+There are two functions @ref{Function elephant:get-query-instances}
+and @ref{Function elephant:map-class-query} which accept a set of
+constraints instead of the familiar value or range arguments.
+
+We'll use the classes @code{person} and @code{department} to
+illustrate how to perform queries over a set of objects that may be
+constrainted by their relationships to other objects.
+
+ at lisp
+(defpclass person ()
+  ((name :initarg :name :index t)
+   (salary :initarg :salary :index t)
+   (department :initarg :dept)))
+
+(defmethod print-object ((p person) stream)
+  (format stream "#<PERS: ~A>" (slot-value p 'name)))
+
+(defun print-name (inst)
+  (format t "Name: ~A~%" (slot-value inst 'name)))
+
+(defpclass department ()
+  ((name :initarg :name)
+   (manager :initarg :manager)))
+
+(defmethod print-object ((d department) stream)
+  (format stream "#<DEPT ~A, mgr = ~A>"
+          (slot-value d 'name)
+          (when (slot-boundp d 'manager)
+                (slot-value (slot-value d 'manager) 'name))))
+ at end lisp
+
+Here we have a simple employee database with managers (also of type
+person) and departments.  This simple system will provide fodder for
+some reasonably complex constraints.  Let's create a few departments.
+
+ at lisp
+(setf marketing (make-instance 'department :name "Marketing"))
+(setf engineering (make-instance 'department :name "Engineering"))
+(setf sales (make-instance 'department :name "Sales"))
+ at end lisp
+
+And manager @code{people} for the departments.
+
+ at lisp
+(make-instance 'person :name "George" :salary 140000 :department marketing)
+(setf (slot-value marketing 'manager) *)
+
+(make-instance 'person :name "Sally" :salary 140000 :department engineering)
+(setf (slot-value engineering 'manager) *)
+
+(make-instance 'person :name "Freddy" :salary 180000 :department sales)
+(setf (slot-value sales 'manager) *)
+ at end lisp
+
+And of course we need some folks to manage
+
+ at lisp
+(defparameter *names*
+  '("Jacob" "Emily" "Michael" "Joshua" "Andrew" "Olivia" "Hannah" "Christopher"))
+
+(defun random-element (list)
+  "Choose a random element from the list and return it"
+  (nth (random (length list)) list))
+
+(with-transaction ()
+  (loop for i from 0 upto 40 do
+    (make-instance 'person
+      :name (format nil "~A~A" (random-elephant *names*) i)
+      :salary (floor (+ (* (random 1000) 100) 30000))
+      :department (case (random 3)
+                    (0 marketing)
+                    (1 engineering)
+                    (2 sales)))))
+ at end lisp
+
+Due to the random allocation of 
+In the follwoing examples below, the results will be different due to the random
+allocation of employee names, etc.  However, these examples are
+illustrative of what you should see if you run the same code.
+
+
+
+For those familiar with SQL, if an instance of @code{person} has a
+pointer to an instance of @code{department} then that relation can be
+used to perform a join.  Of course joins in the object world won't
+return a table, instead they will return conjunctions of objects that
+satisfy a mutual set of constraints.
+
+
 @node Using BTrees
 @comment node-name, next, previous, up
 @section Using BTrees
@@ -174,6 +279,14 @@
 @comment node-name, next, previous, up
 @section Transaction Details
 
+You can trace @code{elephant::execute-transaction} to see the sequence
+of calls to @code{execute-transaction} that occur dynamically and
+detect where transactions are and are not happening.  We may add some
+transaction diagnosis and tracing tools in the future, such as
+throwing a condition when @code{with-transaction} forms are nested
+dynamically.
+
+
 ;; Transaction architecture:
 ;;
 ;; User and designer considerations: