[elephant-devel] Re: traversing btree using multiple indices

Fri Apr 4 12:04:47 UTC 2008

 ??>>  i believe that SQL RDBMS work this way too -- if one needs fast
 ??>> retrieval by several keys, he should create index on them. RDMBS knows
 ??>> how to sort tuples, though

 SR> Well yes, but generally SQL RDBMS's will make efficient use of indexes
 SR> it they are created on all the keys without requiring a combined
 SR> index.

sure it's more optimized, but algorithmically it has same worst case as 
elephant has -- it has to pick _all_ elements from one of indices and check 
if they satisfy another condition.
thus, if you have 10000 events with name = "Sean" and 10000 events for given 
date range, it will _have_ to read all 10000 events and verify them.

for example, in postgresql query:

 explain select count(*) from tree1771 where qi = 5 and (value > 15000) and 
(value < 30000);

plan with combined index:

 Aggregate  (cost=29.93..29.94 rows=1 width=0)
   ->  Index Scan using tree1771_idx on tree1771  (cost=0.00..29.91 rows=9 
width=0)
         Index Cond: ((qi = 5) AND (value > 15000) AND (value < 30000))

and a plan with individual ones:

Aggregate  (cost=42.67..42.68 rows=1 width=0)
   ->  Index Scan using tree1771_qi_idx on tree1771  (cost=0.00..42.65 
rows=9 width=0)
         Index Cond: (qi = 5)
         Filter: ((value > 15000) AND (value < 30000))

so it gets all tuples from index and filters them -- we can do this with 
elephant too.
or, another example how postgresql joins two relations (that's more like 
situation we have, because we have to use btrees of pairs rather than 
arbitrary tuples):

 Aggregate  (cost=581.32..581.33 rows=1 width=0) (actual time=6.074..6.075 
rows=1 loops=1)
   ->  Hash Join  (cost=298.64..581.22 rows=38 width=0) (actual 
time=3.965..5.994 rows=102 loops=1)
         Hash Cond: ("outer".value = "inner".value)
         ->  Bitmap Heap Scan on tree1770  (cost=18.16..293.56 rows=1360 
width=8) (actual time=0.405..1.575 rows=1252 loops=1)
               Recheck Cond: ((qi > 8) AND (qi < 10))
               ->  Bitmap Index Scan on tree1770_idx  (cost=0.00..18.16 
rows=1360 width=0) (actual time=0.366..0.366 rows=1252 loops=1)
                     Index Cond: ((qi > 8) AND (qi < 10))
         ->  Hash  (cost=277.71..277.71 rows=1107 width=8) (actual 
time=3.429..3.429 rows=1145 loops=1)
               ->  Bitmap Heap Scan on tree1771  (cost=8.87..277.71 
rows=1107 width=8) (actual time=0.244..2.112 rows=1145 loops=1)
                     Recheck Cond: (qi = 5)
                     ->  Bitmap Index Scan on tree1771_qi_idx 
(cost=0.00..8.87 rows=1107 width=0) (actual time=0.208..0.208 rows=1145 
loops=1)
                           Index Cond: (qi = 5)
 Total runtime: 6.145 ms
(13 rows)

it gets retrieves values from both relations (1252 from one and 1145 from 
another) and finds there intersection via "hash join".
it's also possible to do this in elephant on user level -- it might be more 
efficient than doing btree lookup to check each row individually.