[elephant-cvs] CVS elephant/doc/oldfiles

Sat Apr 28 02:31:15 UTC 2007

Update of /project/elephant/cvsroot/elephant/doc/oldfiles
In directory clnet:/tmp/cvs-serv16753/doc/oldfiles

Added Files:
	INSTALL NEWS NOTES TODO TUTORIAL 
Log Message:
Cleaning up root directory files; map-index performance enhancement, index api cleanup, ensure transaction fix, alpha quality documentation draft

--- /project/elephant/cvsroot/elephant/doc/oldfiles/INSTALL	2007/04/28 02:31:14	NONE
+++ /project/elephant/cvsroot/elephant/doc/oldfiles/INSTALL	2007/04/28 02:31:14	1.1

------------
Requirements
------------

Supported Lisps:
CMUCL 19a Linux
SBCL 0.9.17/1.0+ Linux / Mac OSX
Allegro CL 7.0/8.0 Linux / Mac OSX
OpenMCL 0.14.2
LispWorks (port in-progress)

Lisp libraries:
ASDF        - http://www.cliki.net/asdf
UFFI 1.5.4+ - http://uffi.b9.com/

A Backend Database:
1) Oracle Berkeley DB 4.5 - http://www.oracle.com/database/berkeley-db.html
2) CLSQL - http://clsql.b9.com/ with an appropriate SQL installation.  
   Tested with SQlite3 and Postgresql so far

A C compiler, probably gcc or Visual Studio.  Presumably you have this if you installed

------------------
Short Instructions
------------------

The new build system should work out of the box on most Un*x 
platforms that have asdf, uffi and either clsql or Berkeley DB 
installed in the usual places.

Try:  (asdf:operate 'asdf:load-op :elephant) 
Then: (open-store '(<backend> <spec>))

Where <backend> = { :bdb | :clsql }
      <spec>    = { "fresh directory for BDB files" | '(:sqlite3 "db path") | '(:postgresql "db path")

This should load all files, including compiling libraries, on most
systems.  For Win32, see the instructions below.

(We'll improve the build process for Win32 if there is demand)

-----------------
Long Instructions
-----------------

For SBCL, CMUCL, Allegro 8.0+, MCL and CLISP:

0) Unpack Elephant.  I put mine in the directory

/usr/local/share/common-lisp/elephant-0.6.x/

1) Install ASDF. 

Ensure that you have a recent version of ASDF installed as 
the load process now depends upon it.

2) Install UFFI

3) Install a backend: Either Berkeley DB 4.5, PostGresql, or SQLite 3.

-------
SQL
-------

For relational database systems, refering the formal documentation
other the heading "SQL-BACK-END".

-------------
Berkeley 4.5:
-------------

(Note: 0.6.0 required BDB 4.3; to upgrade 0.6.0 to 0.6.1, upgrade BDB to 4.5, 
 modify my-config.sexp appropriately then run 0.6.1+; your underlying Berekely DB 
 files will automatically upgrade when the DB is opened.  To use 0.6.1, you will
 have to manually migrate your 0.6.0 database to a fresh database created in 0.6.1)

Under Un*x, you may actually already have this installed, though 
it may be compiled with funny options, so if things don't work 
you may want to try to start from scratch.  FreeBSD has a port 
for this, as I'm sure do other BSDs (including DarwinPorts/Fink.)  
Take note of where libdb.so and db.h are installed, usually:

  /usr/local/BerkeleyDB.4.5/lib/libdb.so and
  /usr/local/BerkeleyDB.4.5/include/db.h, or

  /usr/local/lib/db45/libdb.so and
  /usr/local/include/db45/db.h.)

a) Site specific configuration

   config.sexp

Which contains an alist providing string paths pointing to the root
of the Berkeley DB distribution :berkeley-db-root, the library to load
:berkeley-db-lib and the pthreads library if you're running linux :pthread-lib.

For Win32 (directions courtesy of Bill Clementson): 
---------------------------------------------------
Create an MSVC dll project and add src/db-bdb/libberkeley-db.c,
src/db-bdb/libberkeley-db.def and the Berkeley DB libdb43.lib files
to the project (should be in the build_win32/release folder)

Add the Berkeley DB dbinc include files directory and the
build_win32/release directory (where the Berkeley DB install
instructions builds the Berkeley DB objects by default) to
the build directories for the project

Build the Elephant DLL file

Since you've statically included libdb43.lib inside
libberkeley-db.c, it may or may not be necessary to load
libdb43.dll into Lisp (see below.)

4) Compile and load Elephant:  

The new backend load process should work automatically on Un*x 
systems but if there are probolems with loading foreign libraries,
then you can test your C tools setup with 'make' in the elephant
root directory.  This will build the common memutils library 
in src/memutil/libmemutil.so/dylib that all backends require.

There is a new two-phase load process.  The first requires that
you use asdf to load the main elephant front-end:

(asdf:operate 'asdf:load-op :elephant)

This will load and compile Elephant.  This will also automatically 
load UFFI.  

When you call (open-store <spec>) inside lisp it will automatically
load the remaining dependencies for the specified backend via ASDF.

To test the load process explicitly the following asdf files are
provided:

if you are using Berkeley DB, type:
  (asdf:operate 'asdf:load-op :ele-bdb)

if you are using CL-SQL, type:
  (asdf:operate 'asdf:load-op :ele-clsql)

if you are using SQLite3, type:
  (asdf:operate 'asdf:load-op :ele-sqlite3)

5) Make the documentation:

Execute:

make

In the doc directory should be build the HTML version of the texinfo files.

-------
Testing
-------

Elephant uses RT for regression testing, available at:

http://www.cliki.net/RT

Once RT is installed

(asdf:operate 'asdf:load-op :elephant-tests)
(in-package :ele-tests)
(setf *default-spec* <backend>)
   Where <backend> = { *testsqlite3-spec* | *testpg-spec* | *testbdb-spec* }
(do-backend-tests) 

This will test the standalone API for your backend.  Currently all tests are
passing on 0.6.0.  There will be a set of migration tests that will be 'ignored'
but the final message should indicate no failing tests. 

This should take less than 5 minutes on decent hardware.  

The tests are not idempotent, so if you run the tests a second time,
they are likely to fail.  To avoid this, for example if you are
debugging tests, just run the script delscript.sh (or do the
equivalent on Win32) in the elephant/tests directory.

Elephant allows migration between repositories.  To test this:

(do-migration-tests *default-spec* <backend>)
  where <backend> is a different *testXXXXX-spec* variable to test migration
  to that backend.

This should take less than 2 minutes on decent hardware.

A backend is considered "green" if it can pass both the backend tests and the 
migration tests.

--- /project/elephant/cvsroot/elephant/doc/oldfiles/NEWS	2007/04/28 02:31:14	NONE
+++ /project/elephant/cvsroot/elephant/doc/oldfiles/NEWS	2007/04/28 02:31:14	1.1
April, 2006 - Elephant 0.6.0 released by 
Robert Read and Ian Eslick.  Supports class slot
indexing and benefits from a clean refactoring
of backends and a host of other small changes.
This is a solid BETA release.

November 30, 2005 - Elephant 0.3.0 released by
the new maintainer, Robert L. Read, providing 
support for relational database backends, repository
migration, and multi-repository operation.

As of this release, the documentation provides a
lot of information about installation and getting 
things working; I wouldn't at all claim that it 
is complete, smooth, or well organized.  The more 
notes I get about the use of Elephant, the more 
inclined I will be to invest time in improving 
the documentation.

October 7, 2004 - 

Elephant 0.2.1 released.  Thanks to Bill Clementson,
Elephant should compile on Win32 now.  Also, a few minor
fixups.

September 19, 2004 -

Elephant 0.2 released.  This is an BETA release.

New features:

- Secondary indices and cursors
- PPC Darwin OpenMCL / SBCL
- Doc strings and improved documentation
- An RT-based test suite
- many bugfixes

This release has been tested on CMUCL 19a, SBCL 0.8.14 and
Allegro 6.2 on x86 Linux and FreeBSD, and OpenMCL 0.14.2-p1
and SBCL 0.8.14 on PPC Darwin.

September 2, 2004 -

The bad news: there was a bug in 0.1 which made OID
generation inside of manual transactions deadlock.

The good news: this is fixed, and I've added OpenMCL
support.  So I'm releasing 0.1-p1.

August 30, 2004 - 

Elephant 0.1 was released August 30th, 2004.  This is an
ALPHA quality release, so claims about correctness,
performance and safety should be taken with a grain of salt.
This release has been tested on CMUCL 19a, SBCL 0.8.13 and
Allegro 6.2 on x86 Linux and FreeBSD.  OpenMCL and Lispworks
versions will come soon.  As a proof of concept I've
compiled and run CL-IRC

http:://www.common-lisp.net/project/cl-irc

making all objects and slots persistent, except for the
socket-streams.  It runs, and saves everything except for
the socket-streams.
--- /project/elephant/cvsroot/elephant/doc/oldfiles/NOTES	2007/04/28 02:31:15	NONE
+++ /project/elephant/cvsroot/elephant/doc/oldfiles/NOTES	2007/04/28 02:31:15	1.1

-------
GENERAL
-------

this has been optimized for use with CMUCL / SBCL / Allegro.  
OpenMCL has been minimally supported.  Lispworks is a target as well 
but less so as the developers don't have access to it.

Theoretically one can port this to any lisp with a decent
FFI and MOP.  However since those are two of the less
standardized bits of Lisp, in practice this might be
difficult.

>From top to bottom, here are the implementation layers:

ELEPHANT package
persistent meta-object, persistent collections
controller
serializer
memutils package
UFFI / implementation specific stuff
libsleepycat.so
Sleepycat 4.2/3

While I loath specials, since you can't change the signature
of slot accessors, in order to pass parameters to the
database / serializer, specials are needed.  Also specials
will probably play nice with threaded lisps.  

-----------------------
CLASSES AND METACLASSES
-----------------------

Persistent classes which the user defines are declared and
instrumented by using the persistent-metaclass.  Ideally
creating persistent versions of class, slot-defintion, et al
would be enough, but in reality various implementations do
things in different ways.

CMUCL / SBCL: their's a bit of work to make class slot
allocation and reader / writer / slot-boundp work right.

Allegro: is using slot-boundp instead of
slot-boundp-using-class inside of shared-initialize, which
necessitates some work.

CMUCL doesn't do non-standard allocation types correctly, so
we've created our own slot definition keyword :transient.
In the future this will change.

Andrew will add some notes here in the future.

-----------
COLLECTIONS
-----------

While we support serializing and persisting a wide class of
Lisp data types, there are problems with persisting
aggregate types (conses, lists, arrays, objects,
hash-tables...)

1) not automatic: there's no way for elephant to know when
you've changed a value in an aggregate object, so you have
to manually restore it back into the slot to get it saved.

example 1: you put a cons into the database.  you change
it's car.  this is not saved unless you resave the cons into
the database.

example 2: slot-1 of obj A (saved in the database) contains
a cons.  you change the car of the cons.  this is not
reflected into the database unless you resave A.

2) merge-conflicts: changing one value and saving an
aggregate will write out the whole aggregate, possibly
blowing away changes other threads have made behind your
back.  this is not protected by transactions!

3) consing, non-lazy and expensive (de)serialization: you
have to serialize/deserialize the entire aggregate every
time you save it or restore it.  This is pretty fast all
things considered, but it's probably better to use
persistent collections.

4) you have to store the entire collection in memory,
whereas one of the points of the database to store large
collections of objects.....

For these and other reasons, we provide a hash-table-like
interface to Berkeley BTrees.  These have many advantages
over ordinary hash-tables from the point of view of
persistence.

There is a separate table for BTrees.  This is because we
use a hand coded C function for sorting, which understands a
little of the serialized data.  It can handle numbers (up to
64-bit bignums -- they are approximated by floats) and
strings (case-insensitive for 8-bit, code-point-order for
16-bit Unicode.)  It should be fast but we don't want a
performance penalty on objects.

Secondary indices are mostly handled on the lisp side,
because of our weird table layout (see below) and to avoid
crossing FFI boundaries.  Some unscientific microbenchmarks
indicated that there was no performance benefit on CMUCL /
SBCL, and only minor benefit (asymptotically nil) on
OpenMCL.  They have a separate table.  Actually two handles
are opened on this table: one which is plain, and one which
is associated to the primary btree table by a no-op indexing
function.  Since we maintain the secondary keys ourselves,
the associated handle is good for gets / cursor traversals.
We use the unassociated handle for updates.

----------
CONTROLLER
----------

The controller is accessed through the special
*store-controller*.  The controller keeps track of

1) the environment handle
2) the DB handle(s)
3) the instance cache
4) the root object

The environment handle and DB handle currently aren't really
exposed.  Eventually they should be, so that tuning flags
can be set on them.

OIDs are generated by a bit of C code, which isn't great,
nor that safe (to get acceptable performance i use
DB_TXN_NOSYNC.)  Waiting for Sleepycat 4.3.

The instance cache is implemented as a values-weak
hash-table.  This is a hash-table where the values can be
collected, and when they are, the entire key-value entry is
deleted.  There are implementations of this on the various
platforms.  The instance cache is there to make
deserialization of persistant objects faster.  Since we

[249 lines skipped]
--- /project/elephant/cvsroot/elephant/doc/oldfiles/TODO	2007/04/28 02:31:15	NONE
+++ /project/elephant/cvsroot/elephant/doc/oldfiles/TODO	2007/04/28 02:31:15	1.1

[498 lines skipped]
--- /project/elephant/cvsroot/elephant/doc/oldfiles/TUTORIAL	2007/04/28 02:31:15	NONE
+++ /project/elephant/cvsroot/elephant/doc/oldfiles/TUTORIAL	2007/04/28 02:31:15	1.1

[923 lines skipped]