[elephant-devel] Secondary DB behavior

Sun Jan 29 00:31:51 UTC 2006

I looked into the code further and the current bdb-controller is
manually maintaining the association between the btree database and the
indices secondary database.  (It's opened with
sleepycat::db-fake-associate).  This means that lisp code is determining
what values are written into the secondary indices.  sleepycat still
maintains the association for reads against the indices-assoc db so you
get the primary btree key/value pair on reads to a secondary index and
proper cursor read behavior, but it doesn't take care of cleaning out
prior references to the primary key in the updated secondary index. 

(I assume this change was to avoid allowing people to write arbitrary
lisp code that was called via a callback from the berkeley code?)

What we really want is an inverted index where the value is guaranteed
to be unique, but not the keys so we can rapidly index to the primary
value via the non-unique keys.  I couldn't find any place in the
sleepycat docs where it clarified this.  Without having the ability to
lookup entries by the value (inverted index) I don't know if this is
easily possible or not...secondary indices would have to be dual
structures, one containing the key-ordered btree and another, perhaps a
hash, that maps primary keys onto btree entries.

I worked around this for now by just removing the primary key-value pair
and then rewriting it after updating with the new slot values so that
all the secondary indices delete their references based on the prior
value and then update with the new value, but this is an expensive
solution in the long term.  I can do better writing a custom version of
(setf get-value) for my class indices, but fixing the underlying problem
seems like a better solution.

I'm not sure exactly how to integrate with the RT code yet, but while
I'm figuring that out for my index tests here is a test case.

(defun mul-key (skey pkey val)
    (if (numberp val)
        (values t (* val 2))
        (values nil)))

(defun test-secondary-index ()
    (let* ((primary (make-indexed-btree))
             (secondary (add-index primary :index-name 'test
                                                          :key-form 'mul-key
                                                          :populate nil)))
        (setf (get-value 'key primary) 2)
        (assert (eq (get-value 4 secondary) 2))  ;; this should work
        (setf (get-value 'key primary) 4)
        (assert (eq (get-value 8 secondary) 4))  ;; good so far, this
should also work
        (assert (eq (get-value 4 secondary) nil)) ;; ** bug **  this
assert should succeed, but doesn't
         ))

I reproduced this phenomenon on the current HEAD (0.5.0 release
candidate). 

This is only an issue when you update a secondary index based on a
varying value in the primary index value.  For example, if I compute a
secondary index key based on a slot value of an object in the primary
index, then I overwrite that value, I get the duplicated key-pkey pairs
in the secondary index.  I think this is a feature because the current
code clearly has no provision for this behavior.  However it's a feature
I think we'd like to think about carefully and document clearly if we
don't change it.

Ian

Robert L. Read wrote:
> I seriously doubt that is the intended behavior.
>
> It seems to me that is a very serious bug (possibly a relatively
> recent one, as well.)
>
> It is unfortunate that we don't have  test that would detect that.  If
> you have the energy, please write such a test (probably based on code you
> already have.)
>
> If you can't do it, I will be able to do it, but not until later this
> week.
>
>
> On Sat, 2006-01-28 at 14:37 -0500, Ian Eslick wrote:
>> This was not clear in the documentation, but secondary db's appear to
>> have the following behavior.
>>
>> btree - an indexed btree
>> inverted - an secondary index of btree
>> fn - A function that computes the secondary key given the value given to
>> btree
>>
>> Write pkey/value1 into btree
>> inverted has fn(value1)/pkey
>>
>> Change the association of pkey by writing
>> pkey/value2 into btree
>>
>> inverted now has:
>> fn(value1)/pkey
>> fn(value2)/pkey
>>
>> Namely, each secondary index contains the entire history of the values
>> of pkey and associates
>> them with pkey.  If you are sorting and searching objects by using
>> secondary indices this is
>> not very useful. 
>>
>> Is this the intended functionality?  This could be a setup problem on my
>> end or the way secondary
>> indices were intended to work.  In which case it makes sense for me to
>> make sure that when
>> a secondary value changes, the original secondary key/value pair is
>> deleted from the secondary
>> index.  I just discovered this and the uniqueness of mappings to pkey in
>> the above example is
>> crucial for my application.
>>
>> Thank you,
>> Ian
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net <mailto:elephant-devel at common-lisp.net>
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>