[elephant-devel] map-index

Fri Nov 30 15:54:20 UTC 2007

Yeah, map-index is complex but that's due to problems with cursors on  
secondary btrees in Elephant.  I'd like to rewrite it but didn't find  
a good strategy before.  See my notes below.

I have one idea I'll look into for the standard case of walking a  
simple range.

map-index is generic, you are welcome to override it with a postmodern  
specific version rather than stressing about making the cursor APIs  
work such that they make map-index work.

On Nov 29, 2007, at 1:40 PM, Alex Mizrahi wrote:

> hello
>
> i'm now trying to improve implementation of cursor in postmodern  
> backend.
> it would be nice to optimize it for usage patterns of map-index, but
> map-index implementation frightens me: is there a reason for such
> complexity?
>
> for example, just to enumarate index with two values it does calls  
> like
> this:
>
> (cursor-pfirst c) => 1
> (cursor-pnext-dup c) => nil
> (cursor-pset c 1)
> (cursor-pnext-nodup c) => 2
> (cursor-pnext-dup c) => nil
> (cursor-pset c 2)
> (cursor-pnext-nodup c) => nil
>
> while in my opinion it would be enough to do this:
>
> (cursor-pfirst c) => 1
> (cursor-pnext c) => 2
> (cursor-pnext c) => nil

What if there are more than two values = 2?  This will only get the  
first one.  In this case, you need to get all duplicates for each value.

The pset is used because pnext-dup = nil causes the cursor to have an  
invalid location (Elephant design problem) so we use pset to get back  
into the current duplicate set so we can do a pnext-no-dup on a valid  
cursor.

If your pnext-dup can return nil and still point at the last element  
of the duplicate set, then you could remove that call in your version  
of map-index.

> even if cursor is aware of such behaviour and caches cursor-pset,  
> it's still
> a waste of CPU cycles to do 3 times more calls and demand cursor to do
> dup/nodup checks.

Althoughthese calls are probably not a big deal because the pages are  
cached in memory and page fetch time swamps a few function calls so  
it's in the noise for overall performance.  For really large DBs there  
may be issues keeping all the higher pages in memory to support pset  
(since pnext only fetches adjacent pages).  I'll think about this a  
bit...

I think this is basically what happens since you don't need to do a  
pset when pnext-dup returns nil.

> next, if we have query by value, i think optimal sequence would be:
>
> (cursor-pset c value) => 1
> (cursor-pnext-dup c) => 2
> (cursor-pnext-dup c) => 3
> (cursor-pnext-dup c) => nil
> ...
>
> that is, if cursor has enough logic to enumerate duplicates  (all  
> values for
> given key), why not use it?
> however when cursor-pnext-dup returns NIL, that means there's no  
> more values
> with such key, elephant does once more (cursor-pset c value), and  
> tries
> (cursor-pnext-nodup c), and then uses lisp-compare<= on returned  
> value.
>
> this was actually a reason for error Henrik have reported.
> db-postmodern's ordering of keys differs from lisp-compare ordering  
> (and in
> some cases it's just random). normally that should break only range  
> queries.
> but with aforementioned (overly-general?) map-index logic it breaks  
> even
> simple get-instances-by-value calls, which is not good.
>
> so, am i missing something, or really it would be better to refactor
> map-index?
>
> if it would be calling only what's needed, but won't do cursor-pset  
> "for
> generality" i can skip cursor-pset optimization -- i don't think  
> anybody in
> real code working with cursors will repeatadly call cursor-pset  
> without a
> reason.
>
> with best regards, Alex 'killer_storm' Mizrahi.
>
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel