[elephant-devel] Re: db-postmodern: performance considerations
Alex Mizrahi
killerstorm at newmail.ru
Sun Dec 9 02:15:33 UTC 2007
IE> 2) In retrospect it was a mistake to expose the cursor API to the
IE> users, but Elephant started as a low-level interface to BDB with some
IE> minimal support for persistent objects and has grown from there so at
IE> the time it made sense. I recommend we consider deprecating the
IE> cursor API for user use and find the use cases that people care about
IE> and implement a higher level API to them (abstract datastructures, a
IE> query language, etc). Then we drop support for the cursor API (but
IE> document what works for those who want to shoot themselves in the
IE> foot).
i know at least one usage pattern that can be done via cursor API, but it's
unlikely that map can support this even if we improve it.
say we need 10 latest items. we can get them via map, passing it a tricky
function that will abort enumeration as soon as it will collect 10 items.
but what if need 10 latest items of different types (coming from different
indices) -- say apples and oranges? with cursors it's straightforward to do
this, but with map it's quite tricky if doable at all
IE> The map API is intended to provide a slightly higher level interface
IE> to cursors so that you can imagine a traversal of a large set, only
IE> the current element of which 'must' be in memory so you can reasonably
IE> expect GC's to occur while doing a single map operation.
IE> What prospects are there for having a reasonable map implementation in
IE> postmodern?
to implement map in "reasonable" way (so it will be interruptible and will
not demand all dataset being retrieved to fit in memory) we need kind of
cursors, so it's simplier just to implement good cursors, and we already
have a map implementation :).
the problem with current db-postmodern cursor implementation is that it
fetches data via SQL cursors, and assumtion that they work fast was not
correct.
i have an idea how to do fetching efficiently: we have current key/value in
cursor, and when we need next (for example) we execute such query:
SELECT key, value FROM btree11 WHERE (key > our_key) ORDER BY key LIMIT 1 ;
for duplicate-less btrees
SELECT key, value FROM btree11 WHERE (key > our_key) OR ((key = our_key) AND
(value > our_value)) ORDER BY key, value LIMIT 1 ; for btree-index
i've tested queries like this, seem to work pretty fast in PostgreSQL.
of course we can fetch next 10 into cache, for example, to minimize
communication overhead.
and this seems to fit well with the rest of implementation, so only code
that is fetching data from backend needs to be changed.
well, i think i'll just try to implement this..
IE> Thanks for all the hard work Alex!
actually hard work was done by Henrik, so all thanks for postmodern backend
should go to him. i've just polished it a little..
IE> PS - Has anyone validated my new default map-index under postmodern,
IE> or is that moot now that you have a specialized version?
only value-set-p case is specialized, other cases use generic
implementation.
if there are tests for this (grep finds some), it passes them.
More information about the elephant-devel
mailing list