[elephant-devel] Re: traversing btree using multiple indices
Alex Mizrahi
killerstorm at newmail.ru
Fri Apr 4 12:04:47 UTC 2008
??>> i believe that SQL RDBMS work this way too -- if one needs fast
??>> retrieval by several keys, he should create index on them. RDMBS knows
??>> how to sort tuples, though
SR> Well yes, but generally SQL RDBMS's will make efficient use of indexes
SR> it they are created on all the keys without requiring a combined
SR> index.
sure it's more optimized, but algorithmically it has same worst case as
elephant has -- it has to pick _all_ elements from one of indices and check
if they satisfy another condition.
thus, if you have 10000 events with name = "Sean" and 10000 events for given
date range, it will _have_ to read all 10000 events and verify them.
for example, in postgresql query:
explain select count(*) from tree1771 where qi = 5 and (value > 15000) and
(value < 30000);
plan with combined index:
Aggregate (cost=29.93..29.94 rows=1 width=0)
-> Index Scan using tree1771_idx on tree1771 (cost=0.00..29.91 rows=9
width=0)
Index Cond: ((qi = 5) AND (value > 15000) AND (value < 30000))
and a plan with individual ones:
Aggregate (cost=42.67..42.68 rows=1 width=0)
-> Index Scan using tree1771_qi_idx on tree1771 (cost=0.00..42.65
rows=9 width=0)
Index Cond: (qi = 5)
Filter: ((value > 15000) AND (value < 30000))
so it gets all tuples from index and filters them -- we can do this with
elephant too.
or, another example how postgresql joins two relations (that's more like
situation we have, because we have to use btrees of pairs rather than
arbitrary tuples):
Aggregate (cost=581.32..581.33 rows=1 width=0) (actual time=6.074..6.075
rows=1 loops=1)
-> Hash Join (cost=298.64..581.22 rows=38 width=0) (actual
time=3.965..5.994 rows=102 loops=1)
Hash Cond: ("outer".value = "inner".value)
-> Bitmap Heap Scan on tree1770 (cost=18.16..293.56 rows=1360
width=8) (actual time=0.405..1.575 rows=1252 loops=1)
Recheck Cond: ((qi > 8) AND (qi < 10))
-> Bitmap Index Scan on tree1770_idx (cost=0.00..18.16
rows=1360 width=0) (actual time=0.366..0.366 rows=1252 loops=1)
Index Cond: ((qi > 8) AND (qi < 10))
-> Hash (cost=277.71..277.71 rows=1107 width=8) (actual
time=3.429..3.429 rows=1145 loops=1)
-> Bitmap Heap Scan on tree1771 (cost=8.87..277.71
rows=1107 width=8) (actual time=0.244..2.112 rows=1145 loops=1)
Recheck Cond: (qi = 5)
-> Bitmap Index Scan on tree1771_qi_idx
(cost=0.00..8.87 rows=1107 width=0) (actual time=0.208..0.208 rows=1145
loops=1)
Index Cond: (qi = 5)
Total runtime: 6.145 ms
(13 rows)
it gets retrieves values from both relations (1252 from one and 1145 from
another) and finds there intersection via "hash join".
it's also possible to do this in elephant on user level -- it might be more
efficient than doing btree lookup to check each row individually.
More information about the elephant-devel
mailing list