[elephant-devel] Representational Question

Fri Feb 29 19:47:41 UTC 2008

Hi all,

As I'm further exploring more and more things to do in Elephant and  
Lisp, I think we're ready to start migrating some of our RoR apps  
over, if not just as an exercise, we'll someday migrate them to  
production.

Since we all have a very strong and hard-headed background on MySQL  
and relational models, it's been extremely difficult for us to migrate  
away from that mentality and think of objects and some of Elephant's  
terminology such as class indexes, which kind of confuse us into  
thinking that a class index allows us to look at a set of objects in a  
similar way as a MySQL table.

I've read and seen in the src the beginning efforts to building a  
query system into Elephant. That would be great and as our efforts  
approach that phase, we hope to contribute to it.

So, in this email, first I will ask for advise as to how to best  
represent the structure of our objects/classes and indices in Elephant  
in order to ultimately be able to query the data. Again, I'm not going  
to ask for the querying strategy (just yet) but ultimately, we will  
need to be able to answer queries like this. Obviously I don't expect  
anyone to give me the full representation of this, but any advise/ 
hints as to best represent them will help greatly.

We have a database with many related tables. For simplicity purposes,  
we'll describe a simplified scenario. We have a table with people  
information (e.g. first,last names, date of birth, and gender). We  
have a linked table with each person's addresses (multiple addresses  
in case they moved. Each address is timestamped so the most recent  
address is the current address). Then, each person may be subscribed  
to one or more health insurance plans, and so there is a table linking  
each person to one or more health insurance plans (and a table that  
defines the health insurance plans)

Now, each person may select up to N preferred medical offices where  
they would like to receive treatment. Again, there is a table that  
links the person with one or more medical office. Needless to say,  
there is a table of medical offices. Each medical office is also  
linked to a timestamped address table, where the most recent address  
is the current one (in the event the office moves). To further expand  
on the issue, each office has one or more doctors rendering services,  
so there is a table that links the offices to the doctors, and of  
course, there is a table of doctors that contains basic information,  
such as fname, lname, and gender. Last, but not least, a doctor may be  
specialized in multiple areas, so there is a table that links doctors  
to all the specialties they have been certified on, and thus there is  
yet another table that lists all possible specialties.

Now, assuming I was able to explain the scenario correctly, we then  
have users asking the system for information such as:

"List all people (subscribers), who are male and live in zip code  
33012 who are contracted under Health Insurance Plan A that have  
selected (as their preferred medical office) medical offices with male  
cardiologists that work within 10 miles of 33012 zip code or in MIAMI- 
DADE county and whose office names contain the sequence of letters  
'HEAL'"

The way we see it, the concept of tables disappears and so do the  
tables that provide many-to-many joins. So, we end up with some  
classes such as "Person" which contains a reference to a list of  
"Address" objects, and a list of preferred "Medical-Office" objects,  
where each Medical-Office object has a list of Doctor objects and each  
Doctor has a list of Specialty objects, etc, etc.

Now, we assume that each of these classes will need to maintain  
multiple indices, such as the Person class being index on first name,  
last name, dob, gender, among others. The Address class indexed on zip  
code, county name, among others, and so on and so forth.

The querying is one problem. The data representation is another. We  
think it's clear that we should have, as an example, a Person class.  
However, the representation of the links between a Person and its  
Addresses or Medical-Offices is not 100% clear. If we represent them  
as a slot in the Person class, where this slot would be a List or a  
set of references to the Address class, then in order for us to query  
on those, means that we always need to fetch all objects in those  
slots in order to apply any search criteria, which seems like a  
bottleneck. If that was the solution, I assume we could implement  
logic such that Addresses are pushed into the list, so that the most  
recent address is in the CAR, so we wouldn't necessarily need to read  
the entire list of Addresses for each member, but just fetch the CAR  
of the slot.

Now, onto the second question. One of the other requirements we have  
is that we need to keep an audit log of data changes. The way we do it  
in RoR is relatively simple. We fetch an object from the DB and  
present it on the browser. When the user submits, we fetch another  
fresh copy from the DB and if the timestamps are the same (meaning no  
one else changed the record) we compare changes to the object's  
attributes (slots). If there are any differences, we save the changes  
(we're trying to avoid unnecessary trips to the DB) and if the changes  
are saved successfully, we write a log of ONLY the attributes that  
were changed (which is pretty trivial in Ruby).

 From what we've read in Elephant's manual, this seems harder because  
we don't want to work directly off the Elephant object but a memory  
copy while the user takes his/her time in the browser and after  
submitting, we would take the changes and commit them to the Elephant  
object. Makes me think that we would need to classes for each object  
(one with and one without the persistent metaclass). The other problem  
would be how to "easily" have two objects introspect themselves and  
spit out the slots that changed between the two.

Are we looking at this incorrectly? Any advise would be greatly  
appreciated.

Thanks,
Waldo