[elephant-devel] Representational Question

Ian Eslick eslick at media.mit.edu
Sat Mar 1 00:03:10 UTC 2008


On Feb 29, 2008, at 6:50 PM, lists at infoway.net wrote:

> Hi Ian,
>
> Thanks for the prompt response. I know the querying facility is not  
> necessarily a priority at this time, but will someday become a  
> reality :)
>
> To tell you the truth, we haven't really had any direct experience  
> with Elephant in production or larger-scale type projects. However,  
> we do feel that the whole concept of object prevalence given the  
> complexity of the overall data model would make Elephant a more  
> appropriate framework than continuing the relational path (maybe  
> we're just wrong and Elephant is not best suited for this at all).  
> As it is, we currently need to do a lot of work to maintain all the  
> data relations and integrity in the current system and hopefully  
> working only with the object models would make things easier and  
> more "maintainable". Granted, I agree that at this moment, it's a  
> lot easier to formulate those queries in SQL, but I'd like to at  
> least be able to setup a parallel model and migrate data over so we  
> could compare performance (we're not even going to talk about the  
> complexity/difficulty of querying in Elephant, since we know that at  
> this stage, it is much more complex than SQL queries).
>
> I hope I'm not wrong, but definitely your opinion is worth more  
> since you (et al) know a lot more about this than us.

Well, the first thing that occurs to me is search.  I don't know what  
the performance would be, but you might just try a search-style  
algorithm.

Use map-index: and for each element, just chase the dependency chain.   
(accessor (accessor (accessor object)))   If you have a multi-valued  
slot, you'll have to expand the search for all children.  There may be  
issues with this, not least of all of which is performance, but it's a  
good place to start.  It would be interesting to compare the  
performance, look at different approaches and compare that to a pure  
SQL solution.

> As for the second question, the answer is no. The objects would not  
> be stored in bulk. The idea is to keep an audit log of user- 
> initiated changes on individual entities (e.g. changing a Person's  
> address, or correcting a name, or assigning a health insurance plan,  
> etc).

Hmmm...One way to do this is to use :after methods on the slot  
accessors you want to log.  Those :after methods will write a log  
entry into the database and the log and the change will all be  
committed at the same time.  You can arrange this so that all writes  
within a transaction gets written at once.

You could also use :after methods on initialize-instance to catch  
object creation, etc.

I haven't thought this through fully, but that's one way to do it.

> Thanks,
> Waldo
>
> On Feb 29, 2008, at 3:36 PM, Ian Eslick wrote:
>
>> Hi Waldo,
>>
>> Why do you want to migrate to Elephant for production and not stick  
>> with something like CL-SQL or cl-perec on top of a relational  
>> database so you get all the facilities that you're familiar with?
>>
>> Also, please don't expect a query system anytime soon.  Finishing  
>> it is not in my critical path right now and no one else has stepped  
>> up and volunteered to lead or help with it.
>>
>> As for your query problem, I think the SQL solution for queries  
>> like that is likely to be faster in the end than putting this into  
>> Elephant.  Elephant is not intended or designed to support  
>> efficient relational operations.  That's what relational DBs are  
>> for!  :)
>>
>> Wait until the next big update to elephant before you go too far  
>> down this road, I'm hoping that some new features I'm planning at  
>> least make this a little bit easier.
>>
>> For your second question, if you are going to save/store the  
>> objects in bulk, you can just use standard classes.  Then you can  
>> have a transaction to fetch/diff/write the composite object to  
>> ensure atomicity of updates.  This diff would also produce your  
>> log.  However that means that you lose the indexing capability of  
>> persistent objects.
>>
>> Ian
>>
>> On Feb 29, 2008, at 2:47 PM, lists at infoway.net wrote:
>>
>>> Hi all,
>>>
>>> As I'm further exploring more and more things to do in Elephant  
>>> and Lisp, I think we're ready to start migrating some of our RoR  
>>> apps over, if not just as an exercise, we'll someday migrate them  
>>> to production.
>>>
>>> Since we all have a very strong and hard-headed background on  
>>> MySQL and relational models, it's been extremely difficult for us  
>>> to migrate away from that mentality and think of objects and some  
>>> of Elephant's terminology such as class indexes, which kind of  
>>> confuse us into thinking that a class index allows us to look at a  
>>> set of objects in a similar way as a MySQL table.
>>>
>>> I've read and seen in the src the beginning efforts to building a  
>>> query system into Elephant. That would be great and as our efforts  
>>> approach that phase, we hope to contribute to it.
>>>
>>> So, in this email, first I will ask for advise as to how to best  
>>> represent the structure of our objects/classes and indices in  
>>> Elephant in order to ultimately be able to query the data. Again,  
>>> I'm not going to ask for the querying strategy (just yet) but  
>>> ultimately, we will need to be able to answer queries like this.  
>>> Obviously I don't expect anyone to give me the full representation  
>>> of this, but any advise/hints as to best represent them will help  
>>> greatly.
>>>
>>> We have a database with many related tables. For simplicity  
>>> purposes, we'll describe a simplified scenario. We have a table  
>>> with people information (e.g. first,last names, date of birth, and  
>>> gender). We have a linked table with each person's addresses  
>>> (multiple addresses in case they moved. Each address is  
>>> timestamped so the most recent address is the current address).  
>>> Then, each person may be subscribed to one or more health  
>>> insurance plans, and so there is a table linking each person to  
>>> one or more health insurance plans (and a table that defines the  
>>> health insurance plans)
>>>
>>> Now, each person may select up to N preferred medical offices  
>>> where they would like to receive treatment. Again, there is a  
>>> table that links the person with one or more medical office.  
>>> Needless to say, there is a table of medical offices. Each medical  
>>> office is also linked to a timestamped address table, where the  
>>> most recent address is the current one (in the event the office  
>>> moves). To further expand on the issue, each office has one or  
>>> more doctors rendering services, so there is a table that links  
>>> the offices to the doctors, and of course, there is a table of  
>>> doctors that contains basic information, such as fname, lname, and  
>>> gender. Last, but not least, a doctor may be specialized in  
>>> multiple areas, so there is a table that links doctors to all the  
>>> specialties they have been certified on, and thus there is yet  
>>> another table that lists all possible specialties.
>>>
>>> Now, assuming I was able to explain the scenario correctly, we  
>>> then have users asking the system for information such as:
>>>
>>> "List all people (subscribers), who are male and live in zip code  
>>> 33012 who are contracted under Health Insurance Plan A that have  
>>> selected (as their preferred medical office) medical offices with  
>>> male cardiologists that work within 10 miles of 33012 zip code or  
>>> in MIAMI-DADE county and whose office names contain the sequence  
>>> of letters 'HEAL'"
>>>
>>> The way we see it, the concept of tables disappears and so do the  
>>> tables that provide many-to-many joins. So, we end up with some  
>>> classes such as "Person" which contains a reference to a list of  
>>> "Address" objects, and a list of preferred "Medical-Office"  
>>> objects, where each Medical-Office object has a list of Doctor  
>>> objects and each Doctor has a list of Specialty objects, etc, etc.
>>>
>>> Now, we assume that each of these classes will need to maintain  
>>> multiple indices, such as the Person class being index on first  
>>> name, last name, dob, gender, among others. The Address class  
>>> indexed on zip code, county name, among others, and so on and so  
>>> forth.
>>>
>>> The querying is one problem. The data representation is another.  
>>> We think it's clear that we should have, as an example, a Person  
>>> class. However, the representation of the links between a Person  
>>> and its Addresses or Medical-Offices is not 100% clear. If we  
>>> represent them as a slot in the Person class, where this slot  
>>> would be a List or a set of references to the Address class, then  
>>> in order for us to query on those, means that we always need to  
>>> fetch all objects in those slots in order to apply any search  
>>> criteria, which seems like a bottleneck. If that was the solution,  
>>> I assume we could implement logic such that Addresses are pushed  
>>> into the list, so that the most recent address is in the CAR, so  
>>> we wouldn't necessarily need to read the entire list of Addresses  
>>> for each member, but just fetch the CAR of the slot.
>>>
>>> Now, onto the second question. One of the other requirements we  
>>> have is that we need to keep an audit log of data changes. The way  
>>> we do it in RoR is relatively simple. We fetch an object from the  
>>> DB and present it on the browser. When the user submits, we fetch  
>>> another fresh copy from the DB and if the timestamps are the same  
>>> (meaning no one else changed the record) we compare changes to the  
>>> object's attributes (slots). If there are any differences, we save  
>>> the changes (we're trying to avoid unnecessary trips to the DB)  
>>> and if the changes are saved successfully, we write a log of ONLY  
>>> the attributes that were changed (which is pretty trivial in Ruby).
>>>
>>> From what we've read in Elephant's manual, this seems harder  
>>> because we don't want to work directly off the Elephant object but  
>>> a memory copy while the user takes his/her time in the browser and  
>>> after submitting, we would take the changes and commit them to the  
>>> Elephant object. Makes me think that we would need to classes for  
>>> each object (one with and one without the persistent metaclass).  
>>> The other problem would be how to "easily" have two objects  
>>> introspect themselves and spit out the slots that changed between  
>>> the two.
>>>
>>> Are we looking at this incorrectly? Any advise would be greatly  
>>> appreciated.
>>>
>>> Thanks,
>>> Waldo
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list