[elephant-cvs] CVS elephant/doc

ieslick ieslick at common-lisp.net
Wed Apr 4 15:28:28 UTC 2007


Update of /project/elephant/cvsroot/elephant/doc
In directory clnet:/tmp/cvs-serv9316/doc

Modified Files:
	installation.texinfo scenarios.texinfo tutorial.texinfo 
	user-guide.texinfo 
Log Message:
Added support for complex serialization (no sorting), latest doc changes and a preliminary GC wrapper

--- /project/elephant/cvsroot/elephant/doc/installation.texinfo	2007/04/02 00:51:06	1.7
+++ /project/elephant/cvsroot/elephant/doc/installation.texinfo	2007/04/04 15:28:28	1.8
@@ -175,31 +175,32 @@
 @subsection Packages
 
 Now that Elephant has been loaded, you can call @code{use-package} in
-the cl-user package or create a new package that imports the symbols
-exported from package :elephant.
+the cl-user package, 
 
 @lisp
 CL-USER> (use-package :elephant)
 => T
+ at end lisp
 
-OR
+use a predefined user package, 
 
-(defpackage :elephant-user 
-  (:use :common-lisp :elephant))
+ at lisp
+CL-USER> (in-package :elephant-user)
+=> T
+
+ELE-USER>
 @end lisp
 
-Beginners can skip to the end of this section.  
+or import the symbols into your own project package from :elephant.
 
-Elephant has a common package called elephant that exports a set of
-generic functions.  It also contains a dispatcher based on the first
-element of a specification list that calls the relevant backend
-version of @code{open-controller}, the internal method that creates a
- at code{store-controller}.  Each backend has it's own subclass
-implementing the abstract interface of @code{store-controller}.
+ at lisp
+(defpackage :my-project
+  (:use :common-lisp :elephant))
+ at end lisp
 
 @subsection Opening a Store
 
-As discussed in the tutoral, you can now open a store to begin using
+As discussed in the tutoral, you need to open a store to begin using
 Elephant:
 
 @lisp
--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/02 13:09:46	1.2
+++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo	2007/04/04 15:28:28	1.3
@@ -5,39 +5,138 @@
 @chapter Usage Scenarios
 @cindex Usage Scenarios
 
-Sorry, haven't written this section yet.
-
-Simple file replacement and indexing
-- Keep track of ordinary objects, ignore metaprotocol
-
-Persist system objects
-- Intermingle persistent objects and regular objects
-- Look up objects using class indices
-
-Full database system
-- storage, rich data models, references, queries, etc
-
-Multithreaded web applications
-- DB + multithreading
-
-Object-oriented data storage, large graph traversals
-
-
- at node Commercial Applications
-
-Elephant is used by Konsenti(tm), a for-profit company of Robert L. Read, one of the maintainers of Elephant.  It can be visited at
- at uref{http://konsenti.com}.  Konsenti uses the Data Collection Management (DCM) package, which can be 
-found in the contrib directory, under user rread.  DCM provides prevalence-style in-memory write-through caching.
-The most enjoyable feature about Elephant for this project is that new Business Layer objects can be created without having to 
-deal with an Object-Relational Mapping, which has allowed extremely rapid development.  All Business objects are persisted via
-a director in DCM (which sits on top of Elephant.)  Many of these business objects are in fact finite state machines decorated 
-with functions.  The functions are represented by their lambda expression stored in slots on the business objects.  A complete
-Message Factory and double-entry accounting system are also implemented as DCM objects.  Binary objects, such as uploaded 
-PDFs that can be attached to objects as comments, are treated as simple objects and stored directly in Elephant.  Konsenti
-is completely based on utf-8, and unicode characters outside of the ISO-9959-1 character set are routinely stored in 
-Elephant.  Konsenti uses Postgres as a backend; but Elephant makes it so easy to migrate between repositories that we 
-could change this decision at any time.
-
-
+ at menu 
+* File Replacement:: Simple deployment of Elephant as file replacement
+* Persistent System Objects:: Making persistent objects a natural part of your system
+* Crash Recovery:: How to recover application state from application or system crashes
+* Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database
+* Multithreaded Web Applications:: Elephant is a natural match for web applications
+* Graph-oriented Applications:: Elephant is good, but not optimized, for graph-oriented applications.
+* Real-World Application Examples:: See some real-world applications Elephant has been used for and a brief discussion of how it was used and any novel uses of Elephant.
+ at end menu
+
+ at node File Replacement
+ at comment node-name, next, previous, up
+ at section File Replacement
+
+One of the annoying overheads in writing many programs is persisting
+data between lisp sessions or invocations of that program.  Elements
+such as configuration files, raw data such as graphics and other
+formats take time, attention and are a potential source of bugs.
+Elephant can ease these concerns and allow you to work directly with
+your natural in-memory representations with no work to encode/decode
+formats or manage files in the file system.
+
+The simplest way to accomplish this is to simply open a store
+controller and use the root btree as a key-value store instead of a
+file system directory.  You might hide some of the complexity sort of
+like this:
+
+ at lisp
+(defmacro def-resource (name initializer)
+  (assert (symbolp name))
+  `(defparameter name (list nil nil ,initializer)))
+
+(defun call-initializer (init)
+  (case init-stmt
+    (symbol (funcall (symbol-function init-stmt)))
+    (list (apply (first init) (rest init)))))
+
+(defun get-resource (name)
+  (if (and (symbol-value name)
+      (symbol-value-name)
+      (let ((newval (get-from-root name)))
+        (if newval
+            (setq name (add-to-root name newval))
+            (setq name (add-to-root name (call-initializer 
+ at end lisp 
+
+ at node Persistent System Objects
+ at comment node-name, next, previous, up
+ at section Persistent System Objects
+
+Persist system objects: 
+
+ at itemize
+ at item Intermingle persistent objects and regular objects
+ at item Look up objects using class indices
+ at end itemize
+
+ at node Crash Recovery
+ at comment node-name, next, previous, up
+ at section Crash Recovery
+
+ at node Elephant as Database
+ at comment node-name, next, previous, up
+ at section Elephant as Database
+
+Full database system: storage, rich data models, references, queries, etc
+
+ at node Multithreaded Web Applications
+ at comment node-name, next, previous, up
+ at section Multithreaded Web Applications
+
+Multithreaded web applications: DB + multithreading + web objects
+
+ at node Graph-oriented Applications
+ at comment node-name, next, previous, up
+ at section Graph-oriented Applications
+
+ at node Real-World Applications
+ at comment node-name, next, previous, up
+ at section Real-World Application
+
+ at subsection Konsenti
+
+Elephant is used by Konsenti(tm), a for-profit company of Robert
+L. Read, one of the maintainers of Elephant.  It can be visited at
+ at uref{http://konsenti.com}.  
+
+Konsenti uses the Data Collection Management (DCM) package, found in
+the @verbatim{src/contrib/rread directory}.  DCM provides
+prevalence-style in-memory write-through caching.  The most enjoyable
+feature about Elephant for this project is that new Business Layer
+objects can be created without having to deal with an
+Object-Relational Mapping, enabling extremely rapid development.  
+
+All Business objects are persisted via a @code{director} in DCM (which
+sits on top of Elephant.)  Many of these business objects are in fact
+finite state machines decorated with functions.  The functions are
+represented by lambda s-expressions stored in slots on the business
+objects.  A complete Message Factory and double-entry accounting
+system are also implemented as DCM objects.  Binary objects, such as
+uploaded PDFs, can be attached to objects as comments and are stored
+directly in Elephant.  Konsenti is based on utf-8, and unicode
+characters outside of the ISO-9959-1 character set are routinely
+stored in Elephant.  Konsenti uses Postgres as a backend for licensing
+reasons; but use of other data stores is possible.
+
+ at node Conceptminer
+ at subsection Conceptminer
+
+Conceptminer is an Elephant-based web-mining framework developed by
+Ian Eslick (@uref{http://www.media.mit.edu/~eslick}) that performs
+large-scale text analysis over the web to identify semantic
+relationships such as ``PartOf'', ``DesireOf'' and ``EffectOf''
+between English phrases.  
+
+Elephant's persistence capabilities is used to keep full records of
+all source material, extracted relationships and search queries so
+that it is always possible to trace the source of a learned relation
+and to avoid repeated queries to web search engines.  Conceptminer
+used Elephant 0.6.0 and the development branch of Elephant 0.9 to
+perform months of analysis consisting of millions of pages and a
+page/query database of over ten gigabytes.
+
+There are several uses of Elephant in Conceptminer that bear
+examination:
+
+ at itemize
+ at item Process Components
+ at item Bulk storage of post-processed web data
+ at item Class indexes on strings
+ at item Cheap associations
+ at item Inverted document index
+ at end itemize   
 
 
--- /project/elephant/cvsroot/elephant/doc/tutorial.texinfo	2007/04/02 00:51:06	1.13
+++ /project/elephant/cvsroot/elephant/doc/tutorial.texinfo	2007/04/04 15:28:28	1.14
@@ -520,9 +520,9 @@
 
 But what if we want to read out our friends from oldest to youngest?
 One way is to employ another btree that maps birthdays to names, but
-this will require storing values multiple times for each update and
-increases the burden on the programmer.  Elephant provides a better
-way.
+this requires multiple @code{get-value} calls for each update,
+increasing the burden on the programmer.  Elephant provides several
+better ways to do this.
 
 The next section @ref{Indexing Persistent Classes} shows you how to
 order and retrieve persistent classes by one or more slot values.  
@@ -547,20 +547,20 @@
 (defmethod print-object ((f friend) stream)
   (format stream "#<~A>" (name f)))
 
-(defun encode-birthday (dmy)
+(defun encode-date (dmy)
   (apply #'encode-universal-time 
     (append '(0 0 0) dmy)))
 
 (defmethod (setf birthday) (dmy (f friend))
   (setf (slot-value f 'birthday)
-        (encode-birthday dmy))
+        (encode-date dmy))
   dmy)
 
-(defun decode-birthday (utime)
+(defun decode-date (utime)
   (subseq (multiple-value-list (decode-universal-time utime)) 3 6))
 
 (defmethod birthday ((f friend))
-  (decode-birthday (slot-value f 'birthday)))
+  (decode-date (slot-value f 'birthday)))
 @end lisp
 
 Notice the class argument ``:index t''.  This tells Elephant to store
@@ -579,9 +579,9 @@
 (defun print-friend (friend)
   (format t " name: ~A birthdate: ~A~%" (name friend) (birthday friend)))
 
-(make-instance 'friend :name "Carlos" :birthday (encode-birthday '(1 1 1972)))
-(make-instance 'friend :name "Adriana" :birthday (encode-birthday '(24 4 1980)))
-(make-instance 'friend :name "Zaid" :birthday (encode-birthday '(14 8 1976)))
+(make-instance 'friend :name "Carlos" :birthday (encode-date '(1 1 1972)))
+(make-instance 'friend :name "Adriana" :birthday (encode-date '(24 4 1980)))
+(make-instance 'friend :name "Zaid" :birthday (encode-date '(14 8 1976)))
 
 (get-instances-by-class 'friends)
 => (#<Carlos> #<Adriana> #<Zaid>)
@@ -629,14 +629,14 @@
    (birthday :initarg :birthday :index t)))
 @end lisp
 
-Notice the :index argument to the slots.  Also notice that we dropped
-the class :index argument.  Specifying that a slot is indexed
-automatically registers the class as indexed.  While slot indices
-increase the cost of writes and disk storage, each entry is only
-slightly larger than the size of the slot value.  Numbers, small
-strings and symbols are good candidate types for indexed slots, but
-any value may be used, even different types.  Once a slot is indexed,
-we can use the index to retrieve objects by slot values.
+Notice the :index argument to the slots and that we dropped the class
+:index argument.  Specifying that a slot is indexed automatically
+registers the class as indexed.  While slot indices increase the cost
+of writes and disk storage, each entry is only slightly larger than
+the size of the slot value.  Numbers, small strings and symbols are
+good candidate types for indexed slots, but any value may be used,
+even different types.  Once a slot is indexed, we can use the index to
+retrieve objects by slot values.
 
 @code{get-instances-by-value} will retrieve all instances that are
 equal to the value argument.
@@ -652,7 +652,7 @@
 (get-instances-by-range 'friends 'name "Adam" "Devin")
 => (#<Adriana> #<Carlos>)
 
-(get-instances-by-range 'friend 'birthday (encode-birthday '(1 1 1974)) (encode-birthday '(31 12 1984)))
+(get-instances-by-range 'friend 'birthday (encode-date '(1 1 1974)) (encode-date '(31 12 1984)))
 => (#<Zaid> #<Adriana>)
 
 (mapc #'print-friend *)
@@ -676,8 +676,8 @@
 @end lisp
 
 There are also functions for mapping over instances of a slot index.
-To map over values, use the :value keyword argument.  To map by range,
-use the :start and :end arguments.
+To map over duplicate values, use the :value keyword argument.  To map
+by range, use the :start and :end arguments.
 
 @lisp
 (map-class-index #'print-friend 'friend 'name :value "Carlos")
@@ -690,21 +690,21 @@
 => NIL
 
 (map-class-index #'print-friend 'friend 'birthday
-                 :start (encode-birthday '(1 1 1974)) 
-                 :end (encode-birthday '(31 12 1984)))
+                 :start (encode-date '(1 1 1974)) 
+                 :end (encode-date '(31 12 1984)))
  name: Zaid birthdate: (14 8 1976)
  name: Adriana birthdate: (24 4 1980)
 => NIL
 
 (map-class-index #'print-friend 'friend 'birthday 
                  :start nil 
-                 :end (encode-birthday '(10 10 1978)))
+                 :end (encode-date '(10 10 1978)))
  name: Carlos birthdate: (1 1 1972)
  name: Zaid birthdate: (14 8 1976)
 => NIL
 
 (map-class-index #'print-friend 'friend 'birthday
-                 :start (encode-birthday '(10 10 1975))
+                 :start (encode-date '(10 10 1975))
                  :end nil)
  name: Zaid birthdate: (14 8 1976)
  name: Adriana birthdate: (24 4 1980)
@@ -827,27 +827,29 @@
 @subsection Using @code{with-transaction}
 
 What is @code{with-transaction} really doing for us?  It first starts
-a new transaction, attempts to execute the body, and if successful
-commit the transaction.  If anywhere along the way there is a deadlock
-with another thread, contention, or an error the transaction is
+a new transaction, attempts to execute the body, and commits the
+transaction if successful.  If anytime during the dynamic extent of
+this process there is a conflict with another thread's transaction, an
+error, or other non-local transfer of control, the transaction is
 aborted.  If it was aborted due to contention or deadlock, it attempts
 to retry the transaction a fixed number of times by re-executing the
 whole body.
 
-And this brings us to two important caveats: nested transactions and
-idempotent side-effects.
+And this brings us to two important constraints on transaction bodies:
+no dynamic nesting and idempotent side-effects.
 
 @subsection Nesting Transactions
 
-In general, you want to avoid nesting @code{with-transaction}
-statements.  Nested transactions are valid for some data stores
-(namely Berkeley DB), but typically only a single transaction can be
-active at a time.  The purpose of a nested transaction in data stores
-that provide it, is break a long transaction into chunks.  This way if
-there is contention on a given subset of variables, only the inner
-transaction is restarted while the larger transaction can continue.
-When commit their results, those results become part of the outer
-transaction until it in turn commits.
+In general, you want to avoid nested uses of @code{with-transaction}
+statements over multiple functions.  Nested transactions are valid for
+some data stores (namely Berkeley DB), but typically only a single
+transaction can be active at a time.  The purpose of a nested
+transaction in data stores that support them is to break a long
+transaction into subsets.  This way if there is contention on a given
+subset of variables, only the inner transaction is restarted while the
+larger transaction can continue.  When the inner transaction commits
+its results, those results become part of the outer transaction but
+are not written to disk until the outer transaction commits.
 
 If you have transaction protected primitive operations (such as
 @code{deposit} and @code{withdraw}) and you want to perform a group of
@@ -922,13 +924,13 @@
    (load-transients)
    (length *transient-objects*))
 
-(test-list)
+(test-list 3)
 => 3
 
-(test-list)
+(test-list 3)
 => 5
 
-(test-list)
+(test-list 3)
 => 4
 @end lisp
 
@@ -936,39 +938,40 @@
 parameters is atomic if the transaction completes.
 
 @lisp
-(defun load-transients ()
+(defun load-transients (n)
   "This is a better way"
   (setq *transient-objects*
         (with-transaction ()
-            (loop for i from 0 upto 3 collect
+            (loop for i from 0 upto n collect
                   (get-from-root i)))))
 @end lisp
 
-Of course we would need to use @code{nreverse} if we cared about the
-order of instances in @code{*transient-objects*}.  The best rule of
-thumb is that transaction bodies should be purely functional as above,
-except for side effects to the persistent store such as persistent
-slot writes, adding to btrees, etc).
-
-If you do need side effects to lisp memory, such as writes to
-transient slots, make sure they are idempotent and that other
-processes will not be reading the written values until the transaction
+(Of course we would need to use @code{nreverse} if we cared about the
+order of instances in @code{*transient-objects*})  
+
+The best rule-of-thumb is to ensure that transaction bodies are purely
+functional as above, except for side effects to persistent objects and
+btrees.
+
+If you really do need to execute side-effects into lisp memory, such
+as writes to transient slots, make sure they are idempotent and that
+other processes cannot read the written values until the transaction
 completes.
 
 @subsection Transactions and Performance
 
 By now transactions almost look like more work than they are worth!
-Well there are still some significant benefits to be had.  Part of how
-transactions are implemented is that they gather together all the
-writes that are supposed to made to the database and store them until
-the transaction commits, and then writes them atomically.  
-
-The most time-intensive part of persistent operations is flushing
-newly written data to disk.  Using the default auto-committing
-behavior requires a flush for every primitive write operation.  This
-can become very expensive!  Because all the values read or written are
-cached in memory until the transaction completes, the number of
-flushes can be dramatically reduced.
+Fortunately, there are also performance benefits to explicit use of
+transactions.  Transactions gather together all the writes that are
+supposed to made to the database and store them in memory until the
+transaction commits, and only then writes them to the disk. 
+
+The most time-intensive component of a transaction is waiting while
+flushing newly written data to disk.  Using the default
+auto-committing behavior requires a disk flush for every primitive
+write operation.  This is very, very expensive!  Because all the
+values read or written are cached in memory until the transaction
+completes, the number of flushes can be dramatically reduced.
 
 But don't take my word for it, run the following statements and see
 for yourself the visceral impact transactions can have on system
@@ -1003,12 +1006,11 @@
 
 When we increase the number of objects within the transaction, the
 time cost does not go up linearly.  This is because the total time to
-write a hundred simple objects is still dominated by the final
-synchronization step.
+write a hundred simple objects is still dominated by the disk writes.
 
 These are huge differences in performance!  However we cannot have
-infinitely sized transactions due to the need to cache values in
-working memory.  Large operations (such as loading data into a
+infinitely sized transactions due to the finite size of the data
+store's memory cache.  Large operations (such as loading data into a
 database) need to be split into a sequential set of smaller
 transactions.  When dealing with persistent objects a good rule of
 thumb is to keep the number of objects touched in a transaction well
@@ -1021,11 +1023,11 @@
 as they only show up when transactions are interleaved within a
 larger, multi-threaded application.  
 
-In many cases, however, you can ignore transactions.  For example,
-when you don't have any other concurrent processes running.  In this
-case all operations are sequential and there is no chance of
-conflicts.  You would only want to use transactions for write
-performance.
+In many cases you can simply ignore transactions.  For example, when
+you don't have any other concurrent processes running.  In this case
+all operations are sequential and there is no chance of conflicts.
+You would only want to use transactions to improve performance on
+repeated sets of operations.
 
 You can also ignore transactions if your application can guarantee
 that concurrency won't generate any conflicts.  For example, a web app
@@ -1064,6 +1066,10 @@
 features in the user guide that were not covered in this tutorial.
 
 @itemize
+ at item @strong{Using Multiple Threads and Processes}
+  What constraints must be accommodated to use Elephant data stores in
+multiple threads?  What capabilities are there to share data stores
+among multiple processes or machines?
 @item @strong{Class Heirarchies and Queries}
   There are some subtle issues to take into account when querying
 persistent classes.  For example, how do you query a base class of
@@ -1072,34 +1078,26 @@
 @item @strong{Derived Class Indices}
   You can create your own indices for classes that are arbitrary
 lisp functions of the persistent object.
+ at item @strong{Dynamic Class Index Management} 
+  It is possible to add and remove indexes from classes at runtime.
 @item @strong{Class Definition/Database Conflict Resolution}
   When you startup lisp, there are potential conflicts between the
 class definition and the indexing records in the database.  There are
 some constraints to account for and some facilities to manage
 how slots, class indices and 
- at item @strong{Dynamic Class Index Management} 
-  It is possible to add and remove indexes from classes at runtime.
+ at item @strong{Indexed BTrees}
+  Indexed BTrees are just like BTrees, except it is possible to add
+indexes which are BTrees who's values are primary keys in the parent
+ at code{indexed-btree}.  This allows for multiple ordering and groupings
+of the values of a BTree.
 @item @strong{BTree Cursors}
   If you need to do more than iterate over a collection, or you need
 to delete elements of the collection as you iterate cursors are an
 important data structure.  They implement a variety of operators for
 moving backward and forward over a btree, including ranged operations
 and iterating of duplicate or unique values.
- at item @strong{Indexed BTrees}
-  Indexed BTrees are just like BTrees, except it is possible to add
-indexes which are BTrees who's values are primary keys in the parent
- at code{indexed-btree}.  This allows for multiple ordering and groupings
-of the values of a BTree.
 @item @strong{Using the Map Operators}
   Mapping operators can be very efficient if properly utilized.
- at item @strong{Handling Errors and Conditions}
-  There are a variety of errors that can occur in Elephant that need
-to be dealt with by applications.
- at item @strong{Deadlock Detection in Berkeley DB}
-  Berkeley DB requires an external process to detect deadlock
-conditions among transactions.  The :deadlock-detect keyword argument
-to open-store for Berkeley DB specs will launch this process on most
-lisps.
 @item @strong{Using Multiple Stores}
   Multiple store controllers can be open simultaneously.  However it
 does make the code more complex and you need to be careful about how
@@ -1108,10 +1106,14 @@
   You can implement your own version of with-transaction using the
 underlying controller methods for starting, aborting and committing
 transactions.  You had better know what you are doing, however!
- at item @strong{Using Multiple Threads and Processes}
-  What constraints must be accommodated to use Elephant data stores in
-multiple threads?  What capabilities are there to share data stores
-among multiple processes or machines?
+ at item @strong{Handling Errors and Conditions}
+  There are a variety of errors that can occur in Elephant that need
+to be dealt with by applications.
+ at item @strong{Deadlock Detection in Berkeley DB}
+  Berkeley DB requires an external process to detect deadlock
+conditions among transactions.  The :deadlock-detect keyword argument
+to open-store for Berkeley DB specs will launch this process on most
+lisps.
 @end itemize
 
 Further, @pxref{Usage Scenarios} for information about Elephant design patterns, solutions to common problems and other scenarios with multiple possible solutions.
--- /project/elephant/cvsroot/elephant/doc/user-guide.texinfo	2007/04/02 00:51:06	1.8
+++ /project/elephant/cvsroot/elephant/doc/user-guide.texinfo	2007/04/04 15:28:28	1.9
@@ -8,9 +8,9 @@
 @menu
 * The Store Controller:: Behind the curtain.
 * Serialization details:: The devil hides in the details.
-* Persistent objects:: All the dirt on persistent objects.
+* Persistent Classes and Objects:: All the dirt on persistent objects.
 * Class Indices:: In-depth discussion about indexing persistent indices.
-* Querying persistent instances:: Retrieving instances of classes.
+ at c * Querying persistent instances:: Retrieving instances of classes.
 * Using BTrees:: Using the native btree.
 * Secondary Indices:: Alternative ways to index collections.
 * Using Cursors:: Low-level access to BTrees.
@@ -21,67 +21,248 @@
 * Repository Migration and Upgrade:: How to move objects from one repository to another.
 * Garbage Collection:: How to recover storage and OIDs in long-lived repositories.
 * Performance Tuning:: How to get the most from Elephant.
+* Berkeley DB Data Store:: Commands and concerns specific to the :BDB data store
+* CL-SQL Data Store:: Commands and concerns specific to the :CLSQL data store
 @end menu
 
 @node The Store Controller
 @comment node-name, next, previous, up
 @section The Store Controller
 
-What is @code{open-store} doing?  It creates a @code{store-controller}
-object, and sets the special @code{*store-controller*} to point to it.
-The store controller holds the handles to the database environment and
-tables, and some other bookkeeping.  If for some reason you need to
-run recovery on the database (see sleepycat docs) you can specify that
-with the @code{:recover} and @code{:recover-fatal} keys.
+An instance of the @code{store-controller} class mediates interactions
+between Lisp and a data store.  All elephant operations are performed
+in the context of a store controller.  To be more specific, a data
+store provides a subclass of @code{store-controller} specialized to
+that data store.  Typically this object contains pointers to the disk
+files, foreign memory regions and any other necessary bookkeeping
+information to support Elephant operations such as slot writes and
+btree operations.  The store also contains the root objects and other
+bookeeping common to all data stores.
+
+To obtain a @code{store-controller} object, call the function
+ at code{open-store} with a store controller specification.  The current
+data store specification formats are:
+
+ at itemize
+ at item Berkeley DB: '(:BDB "/path/to/datastore/directory/")
+ at item CL-SQL: '(:CLSQL (<sql-db-name> <sql-connect-command>))
+ at end itemize
+
+Valid CLSQL database tags for @code{<sql-db-name>} are
+ at code{:SQLITE} and @code{:POSTGRESQL}.  The @code{<sql-connect-command>} is
+what you would pass to CLSQL's @code{connect} command.
+
+The open store function uses the first symbol in the specification
+(i.e. :BDB or :CLSQL) to dispatch instance creation to the specified
+data store which returns a specialized instance of
+ at code{store-controller}.  @code{open-store} then initializes the store
+using an internal call to @code{open-controller}.
+
+The final step of @code{open-store} is to set the global variable
+ at code{*store-controller*}.  This special variable is used as a default
+value in the optional or keyword arguments to number of operations
+such as:
+
+ at itemize
+ at item @code{make-instance} for persistent objects
+ at item @code{get-from-root} and @code{add-to-root} for accessing a store's root
+ at item @code{make-btree} for creating persistent index instances
+ at end itemize
+
+Each of these functions also accepts an explicit store controller
+argument for use in multiple store environments.  Normal applications
+should only be aware that this global parameter is used.  For further
+discussion of @code{*store-controller*} @pxref{Multi-repository Operation}.
+
+Additionally, @code{open-store} accepts data store specific keyword
+arguments.  For example, you can force recovery to be run on Berkeley
+DB data stores:
+
+ at lisp
+(open-store *my-spec* :recover t)
+ at end lisp
+
+The data store sections of the user guide (@ref{Berkeley DB Data
+Store} and @ref{CL-SQL Data Store}) list all the data-store specific
+options to various elephant functions.
+
+When you finish your application, @code{close-store} will close the
+store controller.  Failing to do this properly may lead to a need to
+run recovery on the data store during the next session.  Again, see
+the relevant data store sections for more details.
 
-To create one by hand one can do, 
+ at node Serialization details
+ at comment node-name, next, previous, up
+ at section Serialization details
 
- at lisp
-* (setq *store-controller* (make-instance 'store-controller :path "testdb"))
-=> #<STORE-CONTROLLER @{49252F75@}>
+This section captures the details of how various types of objects are
+serialized and some considerations to keep in mind when storing lisp
+objects.
+
+The high level factors that you need to keep in mind are:
+
+ at itemize
+ at item Circular References: 
+The serializer properly handles circular references to/from objects
+such as cons cells, standard objects, arrays, etc.  It accomplishes
+this by assigning an ID to any non-atomic object and keeping a mapping
+between previously serialized objects and these ids.
+ at end itemize
+
+Here is an introduction to 
+
+ at itemize
+ at item 
+ at end itemize
+
+We will also review and add to the considerations outlined in the tutorial:
 
-* (open-controller *store-controller*)
-=> #<STORE-CONTROLLER @{49252F75@}>
+ at enumerate
+
+
+ at item @strong{Lisp identity can't be preserved}.  Since this is a store which
+persists across invocations of Lisp, this probably doesn't even make
+sense.  However if you get an object from the index, store it to a
+lisp variable, then get it again - they will not be eq:
+
+ at lisp
+(setq foo (cons nil nil))
+=> (NIL)
+(add-to-root "my key" foo)
+=> (NIL)
+(add-to-root "my other key" foo)
+=> (NIL)
+(eq (get-from-root "my key")
+      (get-from-root "my other key"))
+=> NIL
 @end lisp
 
-but
+ at item @strong{Nested aggregates are stored in one buffer}.  
+If you store an set of objects in a hash table you try to store a hash
+table, all of those objects will get stored in one large binary buffer
+with the hash keys.  This is true for all other aggregates that can
+store type T (cons, array, standard object, etc).
+
+ at item @strong{Circular References}.
+The serializer properly handles circular references to/from objects
+such as cons cells, standard objects, arrays, etc.  It accomplishes
+this by assigning an ID to any non-atomic object and keeping a mapping
+between previously serialized objects and these ids.
+
+ at item @strong{Mutated substructure does not persist}.
 
 @lisp
-* (open-store "testdb"))
+(setf (car foo) T)
+=> T
+(get-from-root "my key")
+=> (NIL)
 @end lisp
 
-is the preferred mechanism.
+This will affect all aggregate types: objects, conses, hash-tables, et
+cetera.  (You can of course manually re-store the cons.)  In this sense
+elephant does not automatically provide persistent collections.  If you 
+want to persist every access, you have to use BTrees (@pxref{Using BTrees}).
 
-This opens the environment and database.  The @code{persistent-*} objects
-reference the @code{*store-controller*} special.  (This is in part because
-slot accessors can't take additional arguments.)  If for some reason
-you want to operate on 2 store controllers, you'll have to do that by
-flipping the @code{*store-controller*} special.
-
- at code{close-store} closes the store controller.  Alternatively
- at code{close-controller} can be called on a handle.  Don't forget to do
-this or else you may need to run recovery later.  There is a
- at code{with-open-controller} macro.  Opening and closing a controller
-is very expensive.
+ at item @strong{Storage limitations}.
+The serializer writes sequentially into a foreign memory byte array
+before passing that array to a given data store's API.  There are
+practical limits to the size of this buffer.  Moreoever, in most data
+stores there is a practical limit to the size of a transaction.
+Either of these considerations should encourage you to plan to limit
+the size of objects that you serialize to disk.  A good rule of thumb
+is to stay under a megabyte.
 
- at node Serialization details
- at comment node-name, next, previous, up
- at section Serialization details
+ at item @strong{Serialization and deserialization can be costly}. While 
+serialization is pretty fast, but it is still expensive to store large
+objects wholesale.  Also, since object identity is impossible to
+maintain, deserialization must re-cons or re-allocate the entire
+object every time increasing the number of GCs the system does.  This
+eager allocation is contrary to how most people want to use a
+database: one of the reasons to use a database is if your objects
+can't fit into main memory all at once.
+
+ at item @strong{Merge-conflicts in heavily multi-process/threaded situations}.  
+This is the common read-modify-write problem in all databases.  We will talk
+more about this in the @ref{Transactions} section.
+
+ at end enumerate
 
-Empty.
 
- at node Persistent objects
+ at node Persistent Classes and Objects
 @comment node-name, next, previous, up
- at section Persistent Objects
+ at section Persistent Classes and Objects
+
+Persistent classes are Elephant's answer to the limitations of
+ordinary lisp object serialization, namely support for persistent
+references.  Any persistent object, when serialized, only serializes a
+reference to the object and not the whole object.  For example you can
+serialize a node in the graph of persistent objects without worrying
+about serializing the entire graph.
+
+ at subsection{Persistent Class Definition}
+
+Other than specifying the metaclass or using @code{defpclass} the only
+important differences in the @code{defclass} form is the specification
+of a slot storage policy.  Slot storage policy can be specified by a
+boolean argument to the slot initargs @code{:persistent} or
+ at code{:transient}.  Slots are @code{:persistent} by default
+
+ at lisp
+(defclass my-pclass ()
+   ((pslot1 :accessor pslot1 :initarg :pslot1 :initform 'one)
+    (pslot2 :accessor pslot2 :initarg :pslot2 :initform 'two :persistent t)
+    (tslot1 :accessor tslot3 :initarg :tslot3 :initform nil :transient t))
+   (:metaclass persistent-metaclass))
+ at end lisp
+
+The :index options to persistent classes are discussed in persistent
+indices.
+
+Slot storage class implications are straightforward.  Persistent slot
+writes are durably stored to disk in an automatic or encompassing
+transaction.  Transient slots are initialized on instance creation
+according to initforms or to initargs.  They are never stored to nor
+loaded from the database.
+
+
+ at subsection{Instance Creation}
+
+Persistent objects are instances of the persistent classes defined
+above.  All persistent objects inherit from the class
+ at code{persistent} and share two properties: a unique ID and a
+reference to the specification of the @code{store-controller} in which
+they reside.  This is ensured by the instance creation protocol
+implemented by @code{persistent-metaclass}.
+
+Instances are created as normal, with a call to make-instance and
+appropriate initargs.
+
+
+The two properties of @code{persistent} can be specified explicitly
+during instance creation:
+
+ at lisp
+(make-instance 'my-pclass :from-oid 100 :sc *store-controller*)
+ at end lisp
+
+These three elements, class, oid and store controller is all that is
+needed to create a new instance 
+
+If you do make an instance with a specified OID which already exists
+in the database, @code{initargs} to @code{make-instanc} take
+precedence over values in the database, which take precedences over
+any @code{initforms} defined in the class.
+
+* Default store controller & instance creation 
+* What happens to persistent objects when store-controller is closed?
+
+
 
-Finally, if you for some reason make an instance with a specified OID
-which already exists in the database, @code{initargs} take precedence
-over values in the database, which take precedences over
- at code{initforms}.
-
-Also currently there is a bug where
- at code{initforms} are always evaluated, so beware.
-(What is the current model here?)
+:: User-defined persistent objects
+
+* slot types
+* caching
+* slot access protocol
 
 Readers, writers, accessors, and @code{slot-value-using-class} are
 employed in redirecting slot accesses to the database, so override
@@ -90,6 +271,20 @@
 the specification to work properly with persistent slots.  However the
 proper behavior has been verified on SBCL, Allegro and Lispworks.  
 
+:: Initialization
+
+Also currently there is a bug where @code{initforms} are always
+evaluated, so beware.  (What is the current model here?)
+
+:: Class Redefinition and Evolution
+
+* What happens when you redefine a class online? 
+* Drop & add slots?  Change slot status?
+* What if you connect to an old database with a new class specification?
+  (ref to class indicies behavior)
+
+:: Storage and Performance Considerations
+
 @node Class Indices
 @comment node-name, next, previous, up
 @section Class Indices
@@ -137,112 +332,111 @@
 somewhat user customizable; documentation for this exists in the source
 file referenced above.
 
- at node Querying persistent instances
- at comment node-name, next, previous, up
- at section Querying persistent instances
-
-A SQL select-like interface is in the works, but for now queries are
-limited to manual mapping over class instances or doing small queries
-with @code{get-instances-*} functions.  One advantage of this is that
-it is easy to estimate the performance costs of your queries and to
-choose standard and derived indices that give you the ordering and
-performance you want.
-
-There is, however, a quick and dirty query API example that is not
-officially supported in the release but is intended to invite comment.
-This is an example of a full query system that would automatically
-perform joins, use the appropriate indices and perhaps even adaptively
-suggest or add indices to facilitate better performance on common
-queries.
-
-There are two functions @ref{Function elephant:get-query-instances,,,includes/fun-elephant-get-query-instance }
-and @ref{Function elephant:map-class-query,,,includes/fun-elephant-map-class-query} which accept a set of
-constraints instead of the familiar value or range arguments.
-
-We'll use the classes @code{person} and @code{department} to
-illustrate how to perform queries over a set of objects that may be
-constrainted by their relationships to other objects.
-
- at lisp
-(defpclass person ()
-  ((name :initarg :name :index t)
-   (salary :initarg :salary :index t)
-   (department :initarg :dept)))
-
-(defmethod print-object ((p person) stream)
-  (format stream "#<PERS: ~A>" (slot-value p 'name)))
-
-(defun print-name (inst)
-  (format t "Name: ~A~%" (slot-value inst 'name)))
-
-(defpclass department ()
-  ((name :initarg :name)
-   (manager :initarg :manager)))
-
-(defmethod print-object ((d department) stream)
-  (format stream "#<DEPT ~A, mgr = ~A>"
-          (slot-value d 'name)
-          (when (slot-boundp d 'manager)
-                (slot-value (slot-value d 'manager) 'name))))
- at end lisp
-
-Here we have a simple employee database with managers (also of type
-person) and departments.  This simple system will provide fodder for
-some reasonably complex constraints.  Let's create a few departments.
-
- at lisp
-(setf marketing (make-instance 'department :name "Marketing"))
-(setf engineering (make-instance 'department :name "Engineering"))
-(setf sales (make-instance 'department :name "Sales"))
- at end lisp
-
-And manager @code{people} for the departments.
-
- at lisp
-(make-instance 'person :name "George" :salary 140000 :department marketing)
-(setf (slot-value marketing 'manager) *)
-
-(make-instance 'person :name "Sally" :salary 140000 :department engineering)
-(setf (slot-value engineering 'manager) *)
-
-(make-instance 'person :name "Freddy" :salary 180000 :department sales)
-(setf (slot-value sales 'manager) *)
- at end lisp
-

[156 lines skipped]




More information about the Elephant-cvs mailing list