[elephant-devel] An intereste exchange between Ian and Rob....

Tue May 9 17:14:05 UTC 2006

Dear Team,

    This is a long email exchange about future directions betwen Ian
Eslick and myself 
that I thought worth archiving here.

Robert,

A few follow-on comments:

- The implications and architecture of CRUD is outside my sphere of
experience.  Can you expand on what you'd like to enable?

- Perhaps we should plan (or dream) about next generation features when
we have some desired usage cases as targets.  Small webapps,
multi-server webapps, long-running systems with a need for
recoverability (my research), etc.

- We should have multiple ways to manage concurrency as concurrency
problems come in many flavors.  Same should be true for optimizing
performance.  A couple of models for common applications should be
trivially supported.

Webapps for example often have similar functionality in separate threads
operating over a common database.  Session objects don't need protection
(could be 'local persistent') but shared data structures do if they're
written (even something as simple as statistics counters).

In my research system I have alot of analyzers groveling over my DB,
sometimes on separate machines or CPUs so it's mostly read-only but
sometimes I annotate objects which need to be protected by transactions
in case of the rare conflict.

> I can't comment intelligently on that.

The point of my three categories was to allow a three-tier storage model
using the elephant APIs to maintain the programmer's view of managed
objects.  An automatic generational system can upgrade/downgrade the
storage class of an object or alternatively a manual director can do so
by migrating things from in-memory director indexing to on-disk elephant
indexing.

local = normal in-memory object semantics + non-persistent indexing
local persistent =  fast reads + slower write-through + persistent
indexing (this does not provide ACID properties but ensures that all
writes are logged to the DB as atomic transactions for recoverability)
global persistent = for objects that are accessed from multiple
threads/processes (current elephant model of always read from disk)

If an API allowed you to upgrade/downgrade storage classes then it would
be trivial to write an automatic or manual director model on top of
those primitive storage classes.

> Yes (I think.)  The ability to add function-based indexes that are
very
> lightweight in-memory, reconstitutable things seems like a powerful
> feature to me.

I'm not sure what part of what I was talking about you mean by "function
based indexes" here...

[Rob:] I mean the existing ability to build an index based on a 
function such as (x mod 10).

Do you know what makes the SQL serializer so much slower than the BDB
serializer?  Is it just the conversion/copying to a 64-bit rep?

[Rob:] It's because I use simple base-64 encoding to encode Unicode into
ascii so that we can work easily with SQL back-ends; this would be 
easy to improve.

Re: compressing serializer

Not entirely sure what you were getting at here either, but it did lead
me to think of a couple of things that hadn't occurred to me earlier:

We might think about ways of extending the serializer for objects as
well as aggregate user structures.  For example we could add some
serializer tables that could be cached in memory at the cost of lookups
on infrequently accessed types.

Right now we write the slot name and package for every slot in objects
as well as for persistent objects.  This adds hugely to the storage cost
and serialization time.  If we have an ID for slotnames and we have a
cached serializer record for the class then we know how to map IDs to
slotnames and can just increment IDs when we redefine classes and
add/drop slots.  This would work for non-persistent objects as well as
for persistent slots.

Currently we serialize non-persistent classes as:

type-byte + serialize-id + classname + classpackage + slotname1 +
slotpackage1 + slotvalue1 + ... slotvalueN

In unicode systems the slotname & classname overhead for meaningful
names can be 2-4 bytes per character.  At 8 characters a piece a
standard object in a 2-byte unicode system with one slot with a single
4-byte fixnum would take 1 + 4 + 16 + 16 + 16 + 16 + 5 = 70 bytes! 

To store a persistent reference:
type + oid + classname + classpackage = 1 + 4 + 16 + 16 = 37 bytes

To store a persistent slot fixnum:
key: btree-oid + slot-name-string + slot-name-package
value: serialized value = type + fixnum
key = 4 + 16 + 16 = 36
value = 5

Ok, too many numbers but this is for archival purposes.  :)

If we introduce btrees that associate classes with a unique id and a
serializer record:

classname+package->classid (for finding the id on serialization, can
cache in memory)
classid->serializer-record (for deserialization, can cache this in
memory)

Then a serialized non-persistent class would be:
type-byte + serialize-id + classid + slotid1 + slotvalue1 = 1 + 4 + 4 +
2 + 5 = 16 bytes
as opposed to 70bytes of storage and a bunch of string manipulations.

There is also great benefit for references:
type + oid + classid = 9 bytes (vs. 37 on average)

And even 40% for persistent slots:
key = oid + classid + slotid = 10 (vs. 36)
value = 5 (same)

There are some challenges here.  What happens if a non-persistent class
is redefined?  Probably the same that happens today, actually.  This may
be highly problematic to do in a backwards compatible way, but we could
have an interim release that predicated deserialization semantics on the
database version.  So we save 3-5x storage and accordingly, probably
3-5x time for a negligible increase in main memory but for more
complexity (tighter coupling between the serializer behavior and the
current database.

Maybe we could get a 'summer of code' project or two going on to extend
/ improve elephant and see if we can get it ready for a 1.0 release and
mainstream use?  Just an idle thought...

You might want to forward excerpts of this discussion to the list for
archival purposes!

Robert L. Read wrote:
> Dear Ian,
>
>      You understand my thought better than I do.
>
> > Ultimately I think all these variations come down to trying to deal
with
> > the tension between persistence semantics, performance and
concurrency.
>
> There is also the desire to implement a consistent CRUD interface for
> a wide range or Tier 2 (business rule) objects (that's using the
> terminology
> of 6 years ago, perhaps it has progressed.)
>
> >
> > DCM, if I understand you correctly, deals with archiving in-memory
> > objects in a simple way.  It is providing ODB semantics, addressing
> > performance issues by keeping active objects in-memory and allowing
for
> > persistence by moving to a slower storage.  It ignores concurrency.
> > The semantic benefits are that you access and find objects the same
way
> > regardless of how they're stored; it's a separation of concerns
(storage
> > latency vs. indexing).
>
> Yes.  The separation of concerns is key; in fact, it support a
> non-persistent
> model as well, although I don't actually use that in my current app.
>
> >
> > Prevalence tries to solve the performance problem using traditional
DB
> > semantics and ignoring process or machine level concurrency.  This
is
> > appropriate for most web environments with a single, multi-threaded
> > application server.
> >
> > Elephant builds on top of BDB's model of supporting full ACID
> > characteristics in the face of multi-process (but not multi-machine)
> > concurrency, moderate performance levels (library vs. client/server)
and
> > so requires the more complex semantics of transactions.
> >
> >
> > Now I'm not so sure that my indexing support helps your performance
> > goals.  You still have to index into the btree structures which, in
the
> > random case, will require log(N) disk accesses on retrieval.
Elephant
> > is more than happy to allow you to keep in-memory indices of
persistent
> > objects; they just don't persist between sessions.
>
> You're right; class indexing doesn't help.  But being able to say
> "Get me all objects in this class" is much easier with what you have
done.
> In fact, I have not yet felt the need for any indexing (beyond on the
> primary
> key, or id, of the objects.)  But its nice to know that arrow is in my
> quiver.
> I think people really don't twig to how big a gigabyte really is, and
with
> 16-gig machines just around the corner, the old "look it up in the
> database
> every time" is as old a broadcast television.  Of course, sometimes
one
> need to build an in-memory index (data structure) but even then with
> no I/O
> one can rip through a large set of objects very quickly on modern
> machines.
>
> >
> > The big improvement in speed I imagine we're looking for is being
able
> > to cache slot values in memory.  This means for fast objects we have
to
> > back off on supporting concurrency.  Ideally it seems we'd want to
be
> > able to declare a class or more likely an instance, to be one of
> > 'local', 'local persistent' and 'global persistent' with semantics
as
> > follows:
>
> Yes --- personally, I like managing concurrency at the "director"
level,
> rather than the database level.  I have a hunch that one ends up doing
> it anyway---it just takes a little in-memory state to force
concurrency
> headaches.
> >
> > 'local' - uses the local indexing system but otherwise just like any
> > other in-memory object
> > 'local persistent' - uses the local indexing system, caches values
> > in-memory but supports write through semantics so that any writes
are
> > persistent.  This is basically like prevalence but might be slightly
> > more costly than just writing a log file as we have to index and
update
> > the btree.  This does not support concurrency and a given thread or
> > process could perhaps 'lock' the entry so that there is no
> > interference.  The downside vs. typical prevalence is that we are
still
> > using the more complex ODB semantics rather than a simple logging
> semantics.
> > 'global persistent' - the current model
>
> I can't comment intelligently on that.
>
> >
> > Some cheap features we might add to elephant to enable the DCM and
> > prevalence models:
> >
> > 1) Allow for cached instance data - requires modifying the object
> > shared-initialize protocol for allocating storage and the read/write
> > protocol for accessing slot data (effort: medium)
> >
> > We can have a write-through mode to support the prevalence model or
a
> > 'snapshot' only model which does not provide ACID compliance but
avoids
> > the write overhead and instead allows a snapshot to be taken
> > periodically by writing instance data to the underlying elephant
store.
> >
> > new parameter: (make-instance 'foo :ele-allocation
> > :local/local-writethrough/global)
> > new api: (change-allocation instance :local/local-
writethrough/global)
> >                (snapshot instance/index/:all)
> >
> > If you downgraded from local-persistent to global the local storage
> > wouldn't be reclaimed until the object was garbage collected and
> > re-retrieved because we'd rely on the underlying object storage.
> >
> > 2) Hierarchical indexing: not sure there's a natural semantics for
> > this.  One way might be to allow for an option which
indicates :local,
> > :global or :all to the retrieval functions (get-instances-for...).
A
> > local inverted index would be maintained in-memory allowing
for :local
> > retrievals to avoid going to memory, global to go after long-term
memory
> > and :all to do both (effort: medium)
>
> Yes (I think.)  The ability to add function-based indexes that are
very
> lightweight in-memory, reconstitutable things seems like a powerful
> feature to me.
>
> Let me mention another direction that I have been thinking of (which
your
> "liveness" discussion already hints at):
>
> 1)  I would like to build an intelligent, use-analyzing "generational"
> manager.
> The relative use of each object is recorded, and the desired resource
> allocation
> (in terms of megabytes) in each layer of the storage hierarchy is
> automatically
> managed.  So invisibly, without thought from the program, the system
> decides to
> move objects up and down the storager hierarchy based on usage.  (This
is
> quite different than the current explicit usage that gdcm requires;
> but it
> is a step.
>
> 2)  The current CLSQL serialization is horribly inefficient (by a
> factor or
> five, easy.)  As I've mentioned, this doesn't bother me, because I
only
> go to the database on writes and at startup.  But it could become an
> issue.
> An OBDB (or a relational one) could instead of using a context-free
> serializer
> could attempt to serialize an object in the context of the DCM
> collection (or
> your class indexing) in which it resides.  For example, if one images
a
> class that has a slot that has some Guassian or Poisson distribution,
> then one
> could build a "compressing serializer".  The effect would, I think, be
> especially
> tiny representations of the stored data.  In my model of the universe
> this
> makes it all the more reasonable to move "up" the storage hierarchy
> for the
> "home" position of some object.  In essence, you can make a 1 gigabyte
> store
> even bigger, and the number of pages you have to read from the disc
> even smaller.
> This would make a nice PhD thesis, btw.  My own dissertation was
somewhat
> related to this.  Unfortunately, doing that now would be "premature
> optimization"
> of my current business plans.
>
> >
> >
> > Anyway, a few thoughts to let simmer until the next time we feel
> ambitious!
> >
> > Ian
> >
>
> Thanks.  I really appreciate all you have done; I don't mind
> continuing to
> be the maintainer of Elephant, but if you would rather take over that
> would
> be fine with me.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20060509/223ed811/attachment.html>