[elephant-devel] Re: db-postmodern: performance considerations

Sun Dec 9 02:15:33 UTC 2007

 IE> 2) In retrospect it was a mistake to expose the cursor API to the
 IE> users, but Elephant started as a low-level interface to BDB with some
 IE> minimal support for persistent objects and has grown from there so at
 IE> the time it made sense.  I recommend we consider deprecating the
 IE> cursor API for user use and find the use cases that people care about
 IE> and implement a higher level API to them (abstract datastructures, a
 IE> query language, etc).  Then we drop support for the cursor API (but
 IE> document what works for those who want to shoot themselves in the
 IE> foot).

i know at least one usage pattern that can be done via cursor API, but it's 
unlikely that map can support this even if we improve it.
say we need 10 latest items. we can get them via map, passing it a tricky 
function that will abort enumeration as soon as it will collect 10 items.
but what if need 10 latest items of different types (coming from different 
indices) -- say apples and oranges? with cursors it's straightforward to do 
this, but with map it's quite tricky if doable at all

 IE> The map API is intended to provide a slightly higher level interface
 IE> to cursors so that you can imagine a traversal of a large set, only
 IE> the current element of which 'must' be in memory so you can reasonably
 IE> expect GC's to occur while doing a single map operation.

 IE> What prospects are there for having a reasonable map implementation in
 IE> postmodern?

to implement map in "reasonable" way (so it will be interruptible and will 
not demand all dataset being retrieved to fit in memory) we need kind of 
cursors, so it's simplier just to implement good cursors, and we already 
have a map implementation :).

the problem with current db-postmodern cursor implementation is that it 
fetches data via SQL cursors, and assumtion that they work fast was not 
correct.

i have an idea how to do fetching efficiently: we have current key/value in 
cursor, and when we need next (for example) we execute such query:

SELECT key, value FROM btree11 WHERE (key > our_key) ORDER BY key LIMIT 1 ; 
for duplicate-less btrees
SELECT key, value FROM btree11 WHERE (key > our_key) OR ((key = our_key) AND 
(value > our_value)) ORDER BY key, value LIMIT 1 ; for btree-index

i've tested queries like this, seem to work pretty fast in PostgreSQL.
of course we can fetch next 10 into cache, for example, to minimize 
communication overhead.
and this seems to fit well with the rest of implementation, so only code 
that is fetching data from backend needs to be changed.
well, i think i'll just try to implement this..

 IE> Thanks for all the hard work Alex!

actually hard work was done by Henrik, so all thanks for postmodern backend 
should go to him. i've just polished it a little..

 IE> PS - Has anyone validated my new default map-index under postmodern,
 IE> or is that moot now that you have a specialized version?

only value-set-p case is specialized, other cases use generic 
implementation.
if there are tests for this (grep finds some), it passes them.