[elephant-devel] Representational Question

Thu Mar 6 15:10:50 UTC 2008

I agree with Robert.  The best way to start is to use lisp as a query  
language and essential do a search/match over the object graph.

The rub comes when you start looking at performance.  A linear scan of  
alot of class instances can become prohibitive, even for small set  
sizes, especially since you are randomly accessing the elements of  
related class instances.  This is in large part what class indexing is  
for, to reduce those scans to a reasonable range of instances and  
avoid too many random accesses of disk btree pages.

One of the big things that SQL engines exploit are the way relations  
are stored on disk.  A relational query using foreign key references  
from one table (class) to another allows SQL engines to filter rows of  
table A using a list of rows of table B via join tables.

(select a
  :where (= (fkey a) (oid b))
  :where (and (< 2/5/2004 b.date) (< b.date 2/5/2006))).

A simple lisp implementation is forced to be O(N) where N is the  
number of instances of A - we simply scan instances of A and test  
(date (fkey a)).  With an index on (fkey a) and (date b) we can make  
this into O( n + m log N + m log m ) I think.  We do a range scan over  
date values from instances of B -- O(m log m) -- and sort them.   Then  
using the ordered index of B oids -- the index for (fkey a) -- we can  
simply skip along finding all instances of A that have one of the  
matching instances of B.  This is m doctors * log n accesses.

This is necessarily rough since the BTree branching factor + key  
scanning has an impact as does access patterns influcing memory ops  
vs. # of disk seeks, memory, simple caching and streaming cache effects.

Even for small set sizes (ignoring constant factors), the complex  
approach is helpful.

A = 1000
B = 100
Selecting 10 A's using 3 B's

simple cost = 1000
complex cost = 19

A = 1000000 (person->doctor)
B = 200000 (doctor->degree date)
Selecting 10000 A's using 100 B's

simple cost = 1000000
complex cost = 10000 + 300 + 200 = 10600

My #'s may be off here, but you get the general idea.  I'm hoping that  
the stuff we're writing now makes it reasonably easy for users to  
perform the optimizations mentioned above manually.  As we get this to  
work, we may be able to get a query interpreter to do the above select  
query for us.  There are further optimizations, such as deciding what  
linear scans, index scans and join ordering will best optimize a query  
(i.e. a query planner) - but one step at a time!

Ian

Ian

On Mar 6, 2008, at 9:13 AM, Robert L. Read wrote:

> On Wed, 2008-03-05 at 21:50 -0800, Paul Legato wrote:
>> I think the thing is not that we need professional consulting, but we
>> are trying to become the professional consultants. :) Could you more
>> knowledgable ODBMS coders recommend some good books or papers on the
>> topic of ODBMS care and feeding that we should read to get a better
>> handle on all this?
>>
>> Cheers,
>> Paul
>
> Thanks for the input, Paul.
>
> I'm afraid I can't recommend a book on ODBMS, but for me a starting
> point is this:
>
> http://www.ibm.com/developerworks/library/wa-objprev/
>
> I have personally always used Elephant in "prevalence" style where  
> (Ian
> doesn't do this, for example, because his database is too big.)
>
> I personally think this mad would be clarified if someone (like me or
> this community) would write a short guide to "doing SQL like things"
> with LISP analogs.  This would have the advantage of showing how  
> simple
> it is to do the SQL operations in LISP.
>
> To me, the fact that Elepahnt allows you to use LISP as your query
> language rather than creating yet another mismatch between the
> programming language and the query language is a major advantage.
>
> I will make a 30-second beginning for such a document:
>
> *) We have a way to map over every instance of a class.  A simple  
> filter
> performs the equivalent of a "select" in SQL, but with much wider
> computational power.
>
> *)  The "aggregate" operations in SQL (SUM, MAX, MIN, AVG) can be  
> easily
> expressed in terms of the the "reduce" operation in LISP.
>
> *)  Any sort of "graph" problem (topological sort, find-all-children,
> find-all-descendants, etc.) is easier to program in LISP than in SQL.
>
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel