[elephant-devel] Query performance

Wed Feb 11 03:36:29 UTC 2009

On Tue, Feb 10, 2009 at 6:33 PM, Ian Eslick <eslick at media.mit.edu> wrote:

> For that particular query there is a more efficient way, but it's not
> natively supported in any of the high level APIs.  We're not (yet)
> trying to reproduce all the queries you are accustomed to in SQL
> databases.  The goal of the query system is to move in that direction
> when it makes sense.  For example we don't have an efficient COUNT.

I can live with that for the time being. I just need to learn how to be able
to perform these queries efficiently, even if they need to access low-level
API calls. Maybe when the query system is ready we won't need those, maybe
we still may need those.

> If you need to do lots of table ops, better to use a SQL database!
> The goal of an OODB is really to support queries based on the link-
> structure of the objects and not based on treating their slots like
> ORM columns.
>

Agree. Maybe I could create a class STATE with an associated slot for all
the "offices" in each state and access it through there. Would that be more
efficient? I suppose that for any "relational" slot, I could "link" it to a
new persistent class and then just follow the links. I just have a feeling
that may not necessarily be as efficient.

Regardless, I think the query system will be a big plus to all this. After
all the links are defined and done, there's always going to be a query which
will require some type of table op.

> That said you can use the underlying dup-btrees directly when you need
> to...but it turns out I wrote one for my own use that you can look at
> as an example:
>
> (ele::get-unique-values (find-inverted-index 'provider 'state))
>
> I can probably export some of the macros I use to make using cursors
> more convenient, but I'd rather put the limited time I have into the
> query system which we can extend to do stuff like this...
>
> I'm curious what performance you get on the get-unique-values call...

Ask and you shall get :) Well the performance difference is tremendous and
more efficient in terms of consing as well:

(time (ele::get-unique-values (find-inverted-index 'provider 'state)))
Evaluation took:
  0.349 seconds of real time
  0.014830 seconds of total run time (0.010738 user, 0.004092 system)
  4.30% CPU
  27 lambdas converted
  724,402,627 processor cycles
  489,624 bytes consed

("AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "GA" "HI" "IA" "ID" "IL" "IN" "KS"
 "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC" "ND" "NE" "NH" "NJ"
 "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VA" "VT"
 "WA" "WI" "WV" "WY")

I will look at your get-unique-values and learn from it. I'm sure similar
solutions can be applied to other similar types of queries.

Thanks again
JD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20090210/f2aed955/attachment.html>