[elephant-devel] Status of Elephant Unstable Branch

Sun Mar 23 14:52:13 UTC 2008

My fellow Elephants,

Unstable isn't unstable anymore!  All BDB tests, including migration,  
for BDB/Mac/Allegro and BDB/Mac/SBCL are green as of today's checkin.

All major new features are implemented, including:
- Instance map, class schema evolution and MOP compliance
- New slot types
   - Cached read, write-through slots
   - Hierarchical indexed slots
   - Virtual, hierarchical derived indices
   - Set-valued slots
   - Many-to-1 and many-to-many slot associations
- Trivial query interface example (query.lisp)
- Migration and upgrade
- Partial test suite (basic association, indexing, migration, basic  
schema-evolution)

There are definitely holes in the test suite that need to be plugged  
and I'm sure that this will uncover bugs, particularly in the schema  
evolution, upgrade or association infrastructure.  The steps needed to  
prepare this branch for the next release are:

- Integrate patches from the main repository
   (Leslie's patch is the only one that I haven't already integrated  
into unstable, I think)

- Evaluate multi-threading issues for schema evolution
   (only one thread should be able to manipulate class objects at a  
time)

- Upgrade Postmodern and CLSQL data stores
   - Support btrees with duplicate keys
   - Some minor API additions for upgrade & bootstrapping

- Testing
   - Expand testing for schema evolution (most complex/subtle bugs  
were there)
   - Validate upgrade procedure 0.9.1 -> 0.9.2
   - Verify referential integrity (delete object, what happens to  
stale refs?)
   - Standard tests for new features

- Documentation of new features

I am tied up with work for the next two weeks.  I'm happy to support  
bug fixes, lisp compatibility issues, etc - but progress will only be  
made for the remainder of March and early April if others step in to  
help.

Robert and I hope to integrate this work into another 0.9.x release in  
late April.  I think this new functionality makes Elephant  
sufficiently feature-rich and robust that after some burn-in time we  
should consider packaging this into a 1.0 release that we can commit  
to support for the longer term.  We can have a 1.1 development branch  
in which add major new features like an all-lisp data store or a query  
compiler as longer term projects.

There are a few features that could use attention that could, but need  
not, make it into the upcoming release:

- Online GC strategy

   Now that we have an oid table that maintains information for each  
object and is used to de-serialize a reference, we can implement  
facilities such as forwarding pointers, counts or marks that makes it  
possible to build an online persistent heap GC facility without an  
overly significant cost or code impact.

- Query language/interpreter

    Daniel Salama is thinking about the query syntax and is motivated  
to help implement something there.  I'd be psyched to see an  
interpreter that extends my sketch to take good advantage of indices  
and associations.

- System-level schema evolution

   Robert is thinking through some system-level schema versioning and  
evolution ideas akin to the Postgresql notion of schemas, but neither  
of us has the bandwidth to implement this right now.  The basic idea  
is to group a set of class schemas into a version set and to use these  
version tags to dispatch a generic-function that can override the  
default transformation of an instance from one schema version to the  
next.  This would allow you to connect to an old DB with new code,  
call a global upgrade fn, and have everything converted in one go.

   This would be an independent application layer so would not impact  
an upcoming release either way.

Regards,
Ian

PS - I did some profiling of the unstable branch on BDB/Mac to see  
what effects different query strategies might have. It though some of  
you would be interested in this.  This is preliminary and not well  
controlled, but the order of magnitude should be about right.

The objects described below are 5-slot objects with a mix of indexed,  
cached, transient, etc.

Persistent object creation: 3000 objects per second
Persistent object reference deserialization w/ object instantiation:  
10k per second
Persistent object reference deserialization of oids only: 40k per second

This last # would be the key factor in handling queries over large  
object databases.  Since we can instantiate using only an oid, we only  
need to instantiate objects we need.  This should make things like  
counts and paging pretty efficient for moderately sized databases.   
Indexing, of course, will have a significant impact on the performance  
of query by reducing the number of manipulated OIDs.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20080323/27726c90/attachment.html>