<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.3.2">
</HEAD>
<BODY>
Dear Team,<BR>
<BR>
This is a long email exchange about future directions betwen Ian Eslick and myself <BR>
that I thought worth archiving here.<BR>
<BR>
<BR>
<BR>
Robert,<BR>
<BR>
<BR>
A few follow-on comments:<BR>
<BR>
- The implications and architecture of CRUD is outside my sphere of<BR>
experience. Can you expand on what you'd like to enable?<BR>
<BR>
- Perhaps we should plan (or dream) about next generation features when<BR>
we have some desired usage cases as targets. Small webapps,<BR>
multi-server webapps, long-running systems with a need for<BR>
recoverability (my research), etc.<BR>
<BR>
- We should have multiple ways to manage concurrency as concurrency<BR>
problems come in many flavors. Same should be true for optimizing<BR>
performance. A couple of models for common applications should be<BR>
trivially supported.<BR>
<BR>
Webapps for example often have similar functionality in separate threads<BR>
operating over a common database. Session objects don't need protection<BR>
(could be 'local persistent') but shared data structures do if they're<BR>
written (even something as simple as statistics counters).<BR>
<BR>
In my research system I have alot of analyzers groveling over my DB,<BR>
sometimes on separate machines or CPUs so it's mostly read-only but<BR>
sometimes I annotate objects which need to be protected by transactions<BR>
in case of the rare conflict.<BR>
<BR>
> I can't comment intelligently on that.<BR>
<BR>
The point of my three categories was to allow a three-tier storage model<BR>
using the elephant APIs to maintain the programmer's view of managed<BR>
objects. An automatic generational system can upgrade/downgrade the<BR>
storage class of an object or alternatively a manual director can do so<BR>
by migrating things from in-memory director indexing to on-disk elephant<BR>
indexing.<BR>
<BR>
local = normal in-memory object semantics + non-persistent indexing<BR>
local persistent = fast reads + slower write-through + persistent<BR>
indexing (this does not provide ACID properties but ensures that all<BR>
writes are logged to the DB as atomic transactions for recoverability)<BR>
global persistent = for objects that are accessed from multiple<BR>
threads/processes (current elephant model of always read from disk)<BR>
<BR>
If an API allowed you to upgrade/downgrade storage classes then it would<BR>
be trivial to write an automatic or manual director model on top of<BR>
those primitive storage classes.<BR>
<BR>
> Yes (I think.) The ability to add function-based indexes that are very<BR>
> lightweight in-memory, reconstitutable things seems like a powerful<BR>
> feature to me.<BR>
<BR>
I'm not sure what part of what I was talking about you mean by "function<BR>
based indexes" here...<BR>
<BR>
[Rob:] I mean the existing ability to build an index based on a <BR>
function such as (x mod 10).<BR>
<BR>
Do you know what makes the SQL serializer so much slower than the BDB<BR>
serializer? Is it just the conversion/copying to a 64-bit rep?<BR>
<BR>
[Rob:] It's because I use simple base-64 encoding to encode Unicode into<BR>
ascii so that we can work easily with SQL back-ends; this would be <BR>
easy to improve.<BR>
<BR>
Re: compressing serializer<BR>
<BR>
Not entirely sure what you were getting at here either, but it did lead<BR>
me to think of a couple of things that hadn't occurred to me earlier:<BR>
<BR>
We might think about ways of extending the serializer for objects as<BR>
well as aggregate user structures. For example we could add some<BR>
serializer tables that could be cached in memory at the cost of lookups<BR>
on infrequently accessed types.<BR>
<BR>
Right now we write the slot name and package for every slot in objects<BR>
as well as for persistent objects. This adds hugely to the storage cost<BR>
and serialization time. If we have an ID for slotnames and we have a<BR>
cached serializer record for the class then we know how to map IDs to<BR>
slotnames and can just increment IDs when we redefine classes and<BR>
add/drop slots. This would work for non-persistent objects as well as<BR>
for persistent slots.<BR>
<BR>
Currently we serialize non-persistent classes as:<BR>
<BR>
type-byte + serialize-id + classname + classpackage + slotname1 +<BR>
slotpackage1 + slotvalue1 + ... slotvalueN<BR>
<BR>
In unicode systems the slotname & classname overhead for meaningful<BR>
names can be 2-4 bytes per character. At 8 characters a piece a<BR>
standard object in a 2-byte unicode system with one slot with a single<BR>
4-byte fixnum would take 1 + 4 + 16 + 16 + 16 + 16 + 5 = 70 bytes! <BR>
<BR>
To store a persistent reference:<BR>
type + oid + classname + classpackage = 1 + 4 + 16 + 16 = 37 bytes<BR>
<BR>
To store a persistent slot fixnum:<BR>
key: btree-oid + slot-name-string + slot-name-package<BR>
value: serialized value = type + fixnum<BR>
key = 4 + 16 + 16 = 36<BR>
value = 5<BR>
<BR>
Ok, too many numbers but this is for archival purposes. :)<BR>
<BR>
If we introduce btrees that associate classes with a unique id and a<BR>
serializer record:<BR>
<BR>
classname+package->classid (for finding the id on serialization, can<BR>
cache in memory)<BR>
classid->serializer-record (for deserialization, can cache this in memory)<BR>
<BR>
Then a serialized non-persistent class would be:<BR>
type-byte + serialize-id + classid + slotid1 + slotvalue1 = 1 + 4 + 4 +<BR>
2 + 5 = 16 bytes<BR>
as opposed to 70bytes of storage and a bunch of string manipulations.<BR>
<BR>
There is also great benefit for references:<BR>
type + oid + classid = 9 bytes (vs. 37 on average)<BR>
<BR>
And even 40% for persistent slots:<BR>
key = oid + classid + slotid = 10 (vs. 36)<BR>
value = 5 (same)<BR>
<BR>
There are some challenges here. What happens if a non-persistent class<BR>
is redefined? Probably the same that happens today, actually. This may<BR>
be highly problematic to do in a backwards compatible way, but we could<BR>
have an interim release that predicated deserialization semantics on the<BR>
database version. So we save 3-5x storage and accordingly, probably<BR>
3-5x time for a negligible increase in main memory but for more<BR>
complexity (tighter coupling between the serializer behavior and the<BR>
current database.<BR>
<BR>
Maybe we could get a 'summer of code' project or two going on to extend<BR>
/ improve elephant and see if we can get it ready for a 1.0 release and<BR>
mainstream use? Just an idle thought...<BR>
<BR>
You might want to forward excerpts of this discussion to the list for<BR>
archival purposes!<BR>
<BR>
<BR>
Robert L. Read wrote:<BR>
> Dear Ian,<BR>
><BR>
> You understand my thought better than I do.<BR>
><BR>
> > Ultimately I think all these variations come down to trying to deal with<BR>
> > the tension between persistence semantics, performance and concurrency.<BR>
><BR>
> There is also the desire to implement a consistent CRUD interface for<BR>
> a wide range or Tier 2 (business rule) objects (that's using the<BR>
> terminology<BR>
> of 6 years ago, perhaps it has progressed.)<BR>
><BR>
> ><BR>
> > DCM, if I understand you correctly, deals with archiving in-memory<BR>
> > objects in a simple way. It is providing ODB semantics, addressing<BR>
> > performance issues by keeping active objects in-memory and allowing for<BR>
> > persistence by moving to a slower storage. It ignores concurrency.<BR>
> > The semantic benefits are that you access and find objects the same way<BR>
> > regardless of how they're stored; it's a separation of concerns (storage<BR>
> > latency vs. indexing).<BR>
><BR>
> Yes. The separation of concerns is key; in fact, it support a<BR>
> non-persistent<BR>
> model as well, although I don't actually use that in my current app.<BR>
><BR>
> ><BR>
> > Prevalence tries to solve the performance problem using traditional DB<BR>
> > semantics and ignoring process or machine level concurrency. This is<BR>
> > appropriate for most web environments with a single, multi-threaded<BR>
> > application server.<BR>
> ><BR>
> > Elephant builds on top of BDB's model of supporting full ACID<BR>
> > characteristics in the face of multi-process (but not multi-machine)<BR>
> > concurrency, moderate performance levels (library vs. client/server) and<BR>
> > so requires the more complex semantics of transactions.<BR>
> ><BR>
> ><BR>
> > Now I'm not so sure that my indexing support helps your performance<BR>
> > goals. You still have to index into the btree structures which, in the<BR>
> > random case, will require log(N) disk accesses on retrieval. Elephant<BR>
> > is more than happy to allow you to keep in-memory indices of persistent<BR>
> > objects; they just don't persist between sessions.<BR>
><BR>
> You're right; class indexing doesn't help. But being able to say<BR>
> "Get me all objects in this class" is much easier with what you have done.<BR>
> In fact, I have not yet felt the need for any indexing (beyond on the<BR>
> primary<BR>
> key, or id, of the objects.) But its nice to know that arrow is in my<BR>
> quiver.<BR>
> I think people really don't twig to how big a gigabyte really is, and with<BR>
> 16-gig machines just around the corner, the old "look it up in the<BR>
> database<BR>
> every time" is as old a broadcast television. Of course, sometimes one<BR>
> need to build an in-memory index (data structure) but even then with<BR>
> no I/O<BR>
> one can rip through a large set of objects very quickly on modern<BR>
> machines.<BR>
><BR>
> ><BR>
> > The big improvement in speed I imagine we're looking for is being able<BR>
> > to cache slot values in memory. This means for fast objects we have to<BR>
> > back off on supporting concurrency. Ideally it seems we'd want to be<BR>
> > able to declare a class or more likely an instance, to be one of<BR>
> > 'local', 'local persistent' and 'global persistent' with semantics as<BR>
> > follows:<BR>
><BR>
> Yes --- personally, I like managing concurrency at the "director" level,<BR>
> rather than the database level. I have a hunch that one ends up doing<BR>
> it anyway---it just takes a little in-memory state to force concurrency<BR>
> headaches.<BR>
> ><BR>
> > 'local' - uses the local indexing system but otherwise just like any<BR>
> > other in-memory object<BR>
> > 'local persistent' - uses the local indexing system, caches values<BR>
> > in-memory but supports write through semantics so that any writes are<BR>
> > persistent. This is basically like prevalence but might be slightly<BR>
> > more costly than just writing a log file as we have to index and update<BR>
> > the btree. This does not support concurrency and a given thread or<BR>
> > process could perhaps 'lock' the entry so that there is no<BR>
> > interference. The downside vs. typical prevalence is that we are still<BR>
> > using the more complex ODB semantics rather than a simple logging<BR>
> semantics.<BR>
> > 'global persistent' - the current model<BR>
><BR>
> I can't comment intelligently on that.<BR>
><BR>
> ><BR>
> > Some cheap features we might add to elephant to enable the DCM and<BR>
> > prevalence models:<BR>
> ><BR>
> > 1) Allow for cached instance data - requires modifying the object<BR>
> > shared-initialize protocol for allocating storage and the read/write<BR>
> > protocol for accessing slot data (effort: medium)<BR>
> ><BR>
> > We can have a write-through mode to support the prevalence model or a<BR>
> > 'snapshot' only model which does not provide ACID compliance but avoids<BR>
> > the write overhead and instead allows a snapshot to be taken<BR>
> > periodically by writing instance data to the underlying elephant store.<BR>
> ><BR>
> > new parameter: (make-instance 'foo :ele-allocation<BR>
> > :local/local-writethrough/global)<BR>
> > new api: (change-allocation instance :local/local-writethrough/global)<BR>
> > (snapshot instance/index/:all)<BR>
> ><BR>
> > If you downgraded from local-persistent to global the local storage<BR>
> > wouldn't be reclaimed until the object was garbage collected and<BR>
> > re-retrieved because we'd rely on the underlying object storage.<BR>
> ><BR>
> > 2) Hierarchical indexing: not sure there's a natural semantics for<BR>
> > this. One way might be to allow for an option which indicates :local,<BR>
> > :global or :all to the retrieval functions (get-instances-for...). A<BR>
> > local inverted index would be maintained in-memory allowing for :local<BR>
> > retrievals to avoid going to memory, global to go after long-term memory<BR>
> > and :all to do both (effort: medium)<BR>
><BR>
> Yes (I think.) The ability to add function-based indexes that are very<BR>
> lightweight in-memory, reconstitutable things seems like a powerful<BR>
> feature to me.<BR>
><BR>
> Let me mention another direction that I have been thinking of (which your<BR>
> "liveness" discussion already hints at):<BR>
><BR>
> 1) I would like to build an intelligent, use-analyzing "generational"<BR>
> manager.<BR>
> The relative use of each object is recorded, and the desired resource<BR>
> allocation<BR>
> (in terms of megabytes) in each layer of the storage hierarchy is<BR>
> automatically<BR>
> managed. So invisibly, without thought from the program, the system<BR>
> decides to<BR>
> move objects up and down the storager hierarchy based on usage. (This is<BR>
> quite different than the current explicit usage that gdcm requires;<BR>
> but it<BR>
> is a step.<BR>
><BR>
> 2) The current CLSQL serialization is horribly inefficient (by a<BR>
> factor or<BR>
> five, easy.) As I've mentioned, this doesn't bother me, because I only<BR>
> go to the database on writes and at startup. But it could become an<BR>
> issue.<BR>
> An OBDB (or a relational one) could instead of using a context-free<BR>
> serializer<BR>
> could attempt to serialize an object in the context of the DCM<BR>
> collection (or<BR>
> your class indexing) in which it resides. For example, if one images a<BR>
> class that has a slot that has some Guassian or Poisson distribution,<BR>
> then one<BR>
> could build a "compressing serializer". The effect would, I think, be<BR>
> especially<BR>
> tiny representations of the stored data. In my model of the universe<BR>
> this<BR>
> makes it all the more reasonable to move "up" the storage hierarchy<BR>
> for the<BR>
> "home" position of some object. In essence, you can make a 1 gigabyte<BR>
> store<BR>
> even bigger, and the number of pages you have to read from the disc<BR>
> even smaller.<BR>
> This would make a nice PhD thesis, btw. My own dissertation was somewhat<BR>
> related to this. Unfortunately, doing that now would be "premature<BR>
> optimization"<BR>
> of my current business plans.<BR>
><BR>
> ><BR>
> ><BR>
> > Anyway, a few thoughts to let simmer until the next time we feel<BR>
> ambitious!<BR>
> ><BR>
> > Ian<BR>
> ><BR>
><BR>
> Thanks. I really appreciate all you have done; I don't mind<BR>
> continuing to<BR>
> be the maintainer of Elephant, but if you would rather take over that<BR>
> would<BR>
> be fine with me.<BR>
><BR>
<BR>
</BODY>
</HTML>