[elephant-devel] Representational Question
Ian Eslick
eslick at media.mit.edu
Thu Mar 6 15:10:50 UTC 2008
I agree with Robert. The best way to start is to use lisp as a query
language and essential do a search/match over the object graph.
The rub comes when you start looking at performance. A linear scan of
alot of class instances can become prohibitive, even for small set
sizes, especially since you are randomly accessing the elements of
related class instances. This is in large part what class indexing is
for, to reduce those scans to a reasonable range of instances and
avoid too many random accesses of disk btree pages.
One of the big things that SQL engines exploit are the way relations
are stored on disk. A relational query using foreign key references
from one table (class) to another allows SQL engines to filter rows of
table A using a list of rows of table B via join tables.
(select a
:where (= (fkey a) (oid b))
:where (and (< 2/5/2004 b.date) (< b.date 2/5/2006))).
A simple lisp implementation is forced to be O(N) where N is the
number of instances of A - we simply scan instances of A and test
(date (fkey a)). With an index on (fkey a) and (date b) we can make
this into O( n + m log N + m log m ) I think. We do a range scan over
date values from instances of B -- O(m log m) -- and sort them. Then
using the ordered index of B oids -- the index for (fkey a) -- we can
simply skip along finding all instances of A that have one of the
matching instances of B. This is m doctors * log n accesses.
This is necessarily rough since the BTree branching factor + key
scanning has an impact as does access patterns influcing memory ops
vs. # of disk seeks, memory, simple caching and streaming cache effects.
Even for small set sizes (ignoring constant factors), the complex
approach is helpful.
A = 1000
B = 100
Selecting 10 A's using 3 B's
simple cost = 1000
complex cost = 19
A = 1000000 (person->doctor)
B = 200000 (doctor->degree date)
Selecting 10000 A's using 100 B's
simple cost = 1000000
complex cost = 10000 + 300 + 200 = 10600
My #'s may be off here, but you get the general idea. I'm hoping that
the stuff we're writing now makes it reasonably easy for users to
perform the optimizations mentioned above manually. As we get this to
work, we may be able to get a query interpreter to do the above select
query for us. There are further optimizations, such as deciding what
linear scans, index scans and join ordering will best optimize a query
(i.e. a query planner) - but one step at a time!
Ian
Ian
On Mar 6, 2008, at 9:13 AM, Robert L. Read wrote:
> On Wed, 2008-03-05 at 21:50 -0800, Paul Legato wrote:
>> I think the thing is not that we need professional consulting, but we
>> are trying to become the professional consultants. :) Could you more
>> knowledgable ODBMS coders recommend some good books or papers on the
>> topic of ODBMS care and feeding that we should read to get a better
>> handle on all this?
>>
>> Cheers,
>> Paul
>
> Thanks for the input, Paul.
>
> I'm afraid I can't recommend a book on ODBMS, but for me a starting
> point is this:
>
> http://www.ibm.com/developerworks/library/wa-objprev/
>
> I have personally always used Elephant in "prevalence" style where
> (Ian
> doesn't do this, for example, because his database is too big.)
>
> I personally think this mad would be clarified if someone (like me or
> this community) would write a short guide to "doing SQL like things"
> with LISP analogs. This would have the advantage of showing how
> simple
> it is to do the SQL operations in LISP.
>
> To me, the fact that Elepahnt allows you to use LISP as your query
> language rather than creating yet another mismatch between the
> programming language and the query language is a major advantage.
>
> I will make a 30-second beginning for such a document:
>
> *) We have a way to map over every instance of a class. A simple
> filter
> performs the equivalent of a "select" in SQL, but with much wider
> computational power.
>
> *) The "aggregate" operations in SQL (SUM, MAX, MIN, AVG) can be
> easily
> expressed in terms of the the "reduce" operation in LISP.
>
> *) Any sort of "graph" problem (topological sort, find-all-children,
> find-all-descendants, etc.) is easier to program in LISP than in SQL.
>
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list