[elephant-devel] Learning about Elephant

Ian Eslick eslick at media.mit.edu
Tue Jan 27 18:32:26 UTC 2009


On Jan 27, 2009, at 11:00 AM, John wrote:
>
> 1) Every read/write of pclass slots is done directly from/to the  
> database, so no in-memory "copy" exists (unless some sort of  
> transient cache slot model is used). This is good. However, is  
> Elephant "intelligent" enough so that if you attempt to setf a slot  
> value with the same value stored in the slot, Elephant will "avoid"  
> the writing to the database, since the value hasn't really changed?  
> If that's not the out-of-the-box behavior, I pressume that it would  
> be relatively easy to add this functionality using MOP. I also  
> assume that doing this will require another read before the write in  
> order to compare the value to be written, but then again, Elephant  
> claims that reads are much cheaper than writes.
>

Such a thing would be possible, but you would need to have a local  
copy of the value in memory and this could create more problems than  
it solves.  It's also unclear to me how common this case.  I don't  
write the same value to the same slot very often!

> 2) According to Oracle: "Berkeley DB includes support for building  
> highly available applications based on replication..." (http://www.oracle.com/technology/documentation/berkeley-db/db/ref/rep/intro.html 
> ) So, I assume that in the event that a single machine becomes  
> unable to handle the load of a busy application, BDB supports  
> replication to multiple machines with near-instant replication  
> effects (performance is inversly proportional to replication speed).  
> So, the question is, can Elephant accommodate to this model? If I  
> read correctly, the manual states that elephant maintains a weak  
> hash of persistent objects. So, if BDB is deployed in a distributed  
> model and Elephant is running in each separate machine, we could in  
> theory "trust" that the data read/written from/to disk will be all  
> in sync across all machines (as long as database IO on same object/ 
> slot occurs at a frequency greater than the replication rate). The  
> question I have is with the weak hash. If a write is made in one  
> machine, the data on disk is updated across all machines. However,  
> the weak hash remains stale in the machines were the data was only  
> replicated to. Is this an actual problem or does Elephant use the  
> weak hash "intelligently" to recover in the event that the weak hash  
> becomes "unexpectedly" stale?

The weak hash is simply used to speed up the 'recreation' of a  
persistent reference when it is deserialized from the underlying  
storage medium (BDB in this case) so fits into this model.  I've  
looked several times at building-in the support to enable a full  
replication model for BDB, but it's a fair bit of work to deal with  
the single-master requirement for replication and distributed  
transactions for global coherence. The write performance and  
contention performance may decline noticeably, but conflict-free read  
performance should be the same as in the non-replicated case.  I don't  
fully understand all these issues yet and am prioritizing a lisp-only  
Prevalence style backend in my development roadmap.

> 3) I come from a RDBMS world, so I'm still learning the modalities  
> of connected objects vs just related rows. So, reading the tutorials  
> you describe a friends model using PSets. So, imagine a concept  
> similar to Facebook in terms of a friends database. I have millions  
> of people created in the system and they all create their list of  
> friends. Some people may have "few" or no friends while others could  
> have hundreds of thousands of friends (e.g. Pres. Obama). Are PSets  
> the correct way to model this for larger number of objects or is  
> there a more appropriate methodology recommended in Elephant?  
> Obviously the idea behind this is so that you could perform  
> manipulations on these "friends" relatively easy, such as add/remove  
> friends or perform global queries as to list all friends of people  
> who are friends of Pres. Obama. There are references on the list  
> about a query system being worked on and some vanilla version being  
> available, but independently of that, I think my question is more  
> related to the object model implementation. Maybe I'm wrong.

Psets today are a convenience API around the BTree that can be  
overridden by data stores later for more efficient implementation  
(e.g. hash vs. tree) as appropriate for that store.  BTrees are cheap  
for BDB to create and use so use of them at any size is fine.  You  
might want to wrap an abstraction around the BTree directly since the  
pset API is an un-ordered set.  If you want to do range extraction or  
set intersection you'll want the btree's support for ordered objects.

The query system currently doesn't support psets, although it should  
eventually.

Slot-associations currently don't deal with reflexive references  
(associations among instances of a single class), but that would also  
address the 'friends-of' problem.

I hope you enjoy elephant!  If you want to add/implement or spec out  
some of these changes you are thinking about, I'd be happy to support  
you.

Cheers,
Ian

> Anyway, thank you for your help in advanced. Look forward to hearing  
> back from anyone soon and keep learning more about Elephant.
>
> Thanks,
> JD
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel





More information about the elephant-devel mailing list