<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.3.2">

</HEAD>

<BODY>

Dear Team,<BR>

<BR>

    This is a long email exchange about future directions betwen Ian Eslick and myself <BR>

that I thought worth archiving here.<BR>

    <BR>

<BR>

<BR>

Robert,<BR>

<BR>

<BR>

A few follow-on comments:<BR>

<BR>

- The implications and architecture of CRUD is outside my sphere of<BR>

experience.  Can you expand on what you'd like to enable?<BR>

<BR>

- Perhaps we should plan (or dream) about next generation features when<BR>

we have some desired usage cases as targets.  Small webapps,<BR>

multi-server webapps, long-running systems with a need for<BR>

recoverability (my research), etc.<BR>

<BR>

- We should have multiple ways to manage concurrency as concurrency<BR>

problems come in many flavors.  Same should be true for optimizing<BR>

performance.  A couple of models for common applications should be<BR>

trivially supported.<BR>

<BR>

Webapps for example often have similar functionality in separate threads<BR>

operating over a common database.  Session objects don't need protection<BR>

(could be 'local persistent') but shared data structures do if they're<BR>

written (even something as simple as statistics counters).<BR>

<BR>

In my research system I have alot of analyzers groveling over my DB,<BR>

sometimes on separate machines or CPUs so it's mostly read-only but<BR>

sometimes I annotate objects which need to be protected by transactions<BR>

in case of the rare conflict.<BR>

<BR>

> I can't comment intelligently on that.<BR>

<BR>

The point of my three categories was to allow a three-tier storage model<BR>

using the elephant APIs to maintain the programmer's view of managed<BR>

objects.  An automatic generational system can upgrade/downgrade the<BR>

storage class of an object or alternatively a manual director can do so<BR>

by migrating things from in-memory director indexing to on-disk elephant<BR>

indexing.<BR>

<BR>

local = normal in-memory object semantics + non-persistent indexing<BR>

local persistent =  fast reads + slower write-through + persistent<BR>

indexing (this does not provide ACID properties but ensures that all<BR>

writes are logged to the DB as atomic transactions for recoverability)<BR>

global persistent = for objects that are accessed from multiple<BR>

threads/processes (current elephant model of always read from disk)<BR>

<BR>

If an API allowed you to upgrade/downgrade storage classes then it would<BR>

be trivial to write an automatic or manual director model on top of<BR>

those primitive storage classes.<BR>

<BR>

> Yes (I think.)  The ability to add function-based indexes that are very<BR>

> lightweight in-memory, reconstitutable things seems like a powerful<BR>

> feature to me.<BR>

<BR>

I'm not sure what part of what I was talking about you mean by "function<BR>

based indexes" here...<BR>

<BR>

[Rob:] I mean the existing ability to build an index based on a <BR>

function such as (x mod 10).<BR>

<BR>

Do you know what makes the SQL serializer so much slower than the BDB<BR>

serializer?  Is it just the conversion/copying to a 64-bit rep?<BR>

<BR>

[Rob:] It's because I use simple base-64 encoding to encode Unicode into<BR>

ascii so that we can work easily with SQL back-ends; this would be <BR>

easy to improve.<BR>

<BR>

Re: compressing serializer<BR>

<BR>

Not entirely sure what you were getting at here either, but it did lead<BR>

me to think of a couple of things that hadn't occurred to me earlier:<BR>

<BR>

We might think about ways of extending the serializer for objects as<BR>

well as aggregate user structures.  For example we could add some<BR>

serializer tables that could be cached in memory at the cost of lookups<BR>

on infrequently accessed types.<BR>

<BR>

Right now we write the slot name and package for every slot in objects<BR>

as well as for persistent objects.  This adds hugely to the storage cost<BR>

and serialization time.  If we have an ID for slotnames and we have a<BR>

cached serializer record for the class then we know how to map IDs to<BR>

slotnames and can just increment IDs when we redefine classes and<BR>

add/drop slots.  This would work for non-persistent objects as well as<BR>

for persistent slots.<BR>

<BR>

Currently we serialize non-persistent classes as:<BR>

<BR>

type-byte + serialize-id + classname + classpackage + slotname1 +<BR>

slotpackage1 + slotvalue1 + ... slotvalueN<BR>

<BR>

In unicode systems the slotname & classname overhead for meaningful<BR>

names can be 2-4 bytes per character.  At 8 characters a piece a<BR>

standard object in a 2-byte unicode system with one slot with a single<BR>

4-byte fixnum would take 1 + 4 + 16 + 16 + 16 + 16 + 5 = 70 bytes! <BR>

<BR>

To store a persistent reference:<BR>

type + oid + classname + classpackage = 1 + 4 + 16 + 16 = 37 bytes<BR>

<BR>

To store a persistent slot fixnum:<BR>

key: btree-oid + slot-name-string + slot-name-package<BR>

value: serialized value = type + fixnum<BR>

key = 4 + 16 + 16 = 36<BR>

value = 5<BR>

<BR>

Ok, too many numbers but this is for archival purposes.  :)<BR>

<BR>

If we introduce btrees that associate classes with a unique id and a<BR>

serializer record:<BR>

<BR>

classname+package->classid (for finding the id on serialization, can<BR>

cache in memory)<BR>

classid->serializer-record (for deserialization, can cache this in memory)<BR>

<BR>

Then a serialized non-persistent class would be:<BR>

type-byte + serialize-id + classid + slotid1 + slotvalue1 = 1 + 4 + 4 +<BR>

2 + 5 = 16 bytes<BR>

as opposed to 70bytes of storage and a bunch of string manipulations.<BR>

<BR>

There is also great benefit for references:<BR>

type + oid + classid = 9 bytes (vs. 37 on average)<BR>

<BR>

And even 40% for persistent slots:<BR>

key = oid + classid + slotid = 10 (vs. 36)<BR>

value = 5 (same)<BR>

<BR>

There are some challenges here.  What happens if a non-persistent class<BR>

is redefined?  Probably the same that happens today, actually.  This may<BR>

be highly problematic to do in a backwards compatible way, but we could<BR>

have an interim release that predicated deserialization semantics on the<BR>

database version.  So we save 3-5x storage and accordingly, probably<BR>

3-5x time for a negligible increase in main memory but for more<BR>

complexity (tighter coupling between the serializer behavior and the<BR>

current database.<BR>

<BR>

Maybe we could get a 'summer of code' project or two going on to extend<BR>

/ improve elephant and see if we can get it ready for a 1.0 release and<BR>

mainstream use?  Just an idle thought...<BR>

<BR>

You might want to forward excerpts of this discussion to the list for<BR>

archival purposes!<BR>

<BR>

<BR>

Robert L. Read wrote:<BR>

> Dear Ian,<BR>

><BR>

>      You understand my thought better than I do.<BR>

><BR>

> > Ultimately I think all these variations come down to trying to deal with<BR>

> > the tension between persistence semantics, performance and concurrency.<BR>

><BR>

> There is also the desire to implement a consistent CRUD interface for<BR>

> a wide range or Tier 2 (business rule) objects (that's using the<BR>

> terminology<BR>

> of 6 years ago, perhaps it has progressed.)<BR>

><BR>

> ><BR>

> > DCM, if I understand you correctly, deals with archiving in-memory<BR>

> > objects in a simple way.  It is providing ODB semantics, addressing<BR>

> > performance issues by keeping active objects in-memory and allowing for<BR>

> > persistence by moving to a slower storage.  It ignores concurrency.<BR>

> > The semantic benefits are that you access and find objects the same way<BR>

> > regardless of how they're stored; it's a separation of concerns (storage<BR>

> > latency vs. indexing).<BR>

><BR>

> Yes.  The separation of concerns is key; in fact, it support a<BR>

> non-persistent<BR>

> model as well, although I don't actually use that in my current app.<BR>

><BR>

> ><BR>

> > Prevalence tries to solve the performance problem using traditional DB<BR>

> > semantics and ignoring process or machine level concurrency.  This is<BR>

> > appropriate for most web environments with a single, multi-threaded<BR>

> > application server.<BR>

> ><BR>

> > Elephant builds on top of BDB's model of supporting full ACID<BR>

> > characteristics in the face of multi-process (but not multi-machine)<BR>

> > concurrency, moderate performance levels (library vs. client/server) and<BR>

> > so requires the more complex semantics of transactions.<BR>

> ><BR>

> ><BR>

> > Now I'm not so sure that my indexing support helps your performance<BR>

> > goals.  You still have to index into the btree structures which, in the<BR>

> > random case, will require log(N) disk accesses on retrieval.  Elephant<BR>

> > is more than happy to allow you to keep in-memory indices of persistent<BR>

> > objects; they just don't persist between sessions.<BR>

><BR>

> You're right; class indexing doesn't help.  But being able to say<BR>

> "Get me all objects in this class" is much easier with what you have done.<BR>

> In fact, I have not yet felt the need for any indexing (beyond on the<BR>

> primary<BR>

> key, or id, of the objects.)  But its nice to know that arrow is in my<BR>

> quiver.<BR>

> I think people really don't twig to how big a gigabyte really is, and with<BR>

> 16-gig machines just around the corner, the old "look it up in the<BR>

> database<BR>

> every time" is as old a broadcast television.  Of course, sometimes one<BR>

> need to build an in-memory index (data structure) but even then with<BR>

> no I/O<BR>

> one can rip through a large set of objects very quickly on modern<BR>

> machines.<BR>

><BR>

> ><BR>

> > The big improvement in speed I imagine we're looking for is being able<BR>

> > to cache slot values in memory.  This means for fast objects we have to<BR>

> > back off on supporting concurrency.  Ideally it seems we'd want to be<BR>

> > able to declare a class or more likely an instance, to be one of<BR>

> > 'local', 'local persistent' and 'global persistent' with semantics as<BR>

> > follows:<BR>

><BR>

> Yes --- personally, I like managing concurrency at the "director" level,<BR>

> rather than the database level.  I have a hunch that one ends up doing<BR>

> it anyway---it just takes a little in-memory state to force concurrency<BR>

> headaches.<BR>

> ><BR>

> > 'local' - uses the local indexing system but otherwise just like any<BR>

> > other in-memory object<BR>

> > 'local persistent' - uses the local indexing system, caches values<BR>

> > in-memory but supports write through semantics so that any writes are<BR>

> > persistent.  This is basically like prevalence but might be slightly<BR>

> > more costly than just writing a log file as we have to index and update<BR>

> > the btree.  This does not support concurrency and a given thread or<BR>

> > process could perhaps 'lock' the entry so that there is no<BR>

> > interference.  The downside vs. typical prevalence is that we are still<BR>

> > using the more complex ODB semantics rather than a simple logging<BR>

> semantics.<BR>

> > 'global persistent' - the current model<BR>

><BR>

> I can't comment intelligently on that.<BR>

><BR>

> ><BR>

> > Some cheap features we might add to elephant to enable the DCM and<BR>

> > prevalence models:<BR>

> ><BR>

> > 1) Allow for cached instance data - requires modifying the object<BR>

> > shared-initialize protocol for allocating storage and the read/write<BR>

> > protocol for accessing slot data (effort: medium)<BR>

> ><BR>

> > We can have a write-through mode to support the prevalence model or a<BR>

> > 'snapshot' only model which does not provide ACID compliance but avoids<BR>

> > the write overhead and instead allows a snapshot to be taken<BR>

> > periodically by writing instance data to the underlying elephant store.<BR>

> ><BR>

> > new parameter: (make-instance 'foo :ele-allocation<BR>

> > :local/local-writethrough/global)<BR>

> > new api: (change-allocation instance :local/local-writethrough/global)<BR>

> >                (snapshot instance/index/:all)<BR>

> ><BR>

> > If you downgraded from local-persistent to global the local storage<BR>

> > wouldn't be reclaimed until the object was garbage collected and<BR>

> > re-retrieved because we'd rely on the underlying object storage.<BR>

> ><BR>

> > 2) Hierarchical indexing: not sure there's a natural semantics for<BR>

> > this.  One way might be to allow for an option which indicates :local,<BR>

> > :global or :all to the retrieval functions (get-instances-for...).  A<BR>

> > local inverted index would be maintained in-memory allowing for :local<BR>

> > retrievals to avoid going to memory, global to go after long-term memory<BR>

> > and :all to do both (effort: medium)<BR>

><BR>

> Yes (I think.)  The ability to add function-based indexes that are very<BR>

> lightweight in-memory, reconstitutable things seems like a powerful<BR>

> feature to me.<BR>

><BR>

> Let me mention another direction that I have been thinking of (which your<BR>

> "liveness" discussion already hints at):<BR>

><BR>

> 1)  I would like to build an intelligent, use-analyzing "generational"<BR>

> manager.<BR>

> The relative use of each object is recorded, and the desired resource<BR>

> allocation<BR>

> (in terms of megabytes) in each layer of the storage hierarchy is<BR>

> automatically<BR>

> managed.  So invisibly, without thought from the program, the system<BR>

> decides to<BR>

> move objects up and down the storager hierarchy based on usage.  (This is<BR>

> quite different than the current explicit usage that gdcm requires;<BR>

> but it<BR>

> is a step.<BR>

><BR>

> 2)  The current CLSQL serialization is horribly inefficient (by a<BR>

> factor or<BR>

> five, easy.)  As I've mentioned, this doesn't bother me, because I only<BR>

> go to the database on writes and at startup.  But it could become an<BR>

> issue.<BR>

> An OBDB (or a relational one) could instead of using a context-free<BR>

> serializer<BR>

> could attempt to serialize an object in the context of the DCM<BR>

> collection (or<BR>

> your class indexing) in which it resides.  For example, if one images a<BR>

> class that has a slot that has some Guassian or Poisson distribution,<BR>

> then one<BR>

> could build a "compressing serializer".  The effect would, I think, be<BR>

> especially<BR>

> tiny representations of the stored data.  In my model of the universe<BR>

> this<BR>

> makes it all the more reasonable to move "up" the storage hierarchy<BR>

> for the<BR>

> "home" position of some object.  In essence, you can make a 1 gigabyte<BR>

> store<BR>

> even bigger, and the number of pages you have to read from the disc<BR>

> even smaller.<BR>

> This would make a nice PhD thesis, btw.  My own dissertation was somewhat<BR>

> related to this.  Unfortunately, doing that now would be "premature<BR>

> optimization"<BR>

> of my current business plans.<BR>

><BR>

> ><BR>

> ><BR>

> > Anyway, a few thoughts to let simmer until the next time we feel<BR>

> ambitious!<BR>

> ><BR>

> > Ian<BR>

> ><BR>

><BR>

> Thanks.  I really appreciate all you have done; I don't mind<BR>

> continuing to<BR>

> be the maintainer of Elephant, but if you would rather take over that<BR>

> would<BR>

> be fine with me.<BR>

><BR>

<BR>

</BODY>

</HTML>