[elephant-devel] virtual subtree?

Tue Apr 17 15:34:57 UTC 2007

Comments are inline below...

On Apr 17, 2007, at 11:11 AM, Joe Corneli wrote:

> Thanks again for the detailed response... here are some follow-up
> questions.  Not critical, but a few answers might satisfy my
> curiousity...
>
>    The easiest thing I can think of is to create a derived index on
>    the classes of elements in your btree and then use map-index to map
>    over the instances.
>
>    ELE-TESTS> (setf my-things (make-indexed-btree))
>    ELE-TESTS> (add-index my-things :index-name 'thing-type
>                          :key-form '(lambda (index k v)
>                                      (values t (type-of v))))
>    ELE-TESTS> (map-index (lambda (sk v pk) (print v))
>                          (get-index my-things 'thing-type) :value  
> 'symbol)
>
> OK, this will enable me to grab all of the triples, which is good.
> But if I understand correctly, it is not associated with *subsequent*
> efficient search through the results, so I should use idioms like:

Yup, thus my second example...

>    ELE-TESTS> (add-index my-things :index-name 'triples-first
>                           :key-form '(lambda (index k v)
>                                         (if (subtypep (type-of v)  
> 'triple)
>                                             (values t (triple-first  
> v))
>                                             (values nil nil)))
>                           :populate t)
>
> [Note that I have fiddled with the `lambda' form you supplied, the
> original
>
> (lambda (index k v)
>   (if (subtypep (type-of v)
>                 'triple) t)
>   (values t (triple-first v))
>   (values nil nil))
>
> did not make sense to me -- a typo?]

A typo!

> You say:
>
>    This will create an index 'triples-first which only indexes triples
>    and does so by the value of the first element.  Thus you can easily
>    retrieve all triples with the first element eq to 5.
>
> This sounds good -- but -- I would also like to be able to look up
> triples by the middle, end, and combinations (like, "match beginning
> and end").  No problem from the coding point of view, but I want to
> check that creating all of these indices isn't nuts from the storage
> point of view.  Such additional indices don't take up too much space,
> right?  (I certainly don't have any other better ideas in mind!  but
> thought I should clarify that there may be as many as 7 indices for
> all the different look-up combinations.)

Each index has one entry for every triple you store.  The entry  
contains the 'value' returned by the lambda expression (as small as 5- 
bytes for fixnums) and size of the primary key (5-bytes if using a  
fixnum id) of the triples in the main btree.  Add to this this the  
overhead for the BTree data structures.  Since it's all on disk the  
storage really isn't that obscene.

Combinations are usually dealt by joins (elements where head = 1 AND  
tail = 2 AND type = 10).

You could do this manually, but it means deserializing each set  
individually and then doing an intersection between them.  If your  
sets are likely to be small, you could just do the intersection in  
lisp.  If your sets are likely to be very large, then you need to be  
more clever.

I'm working on a query system (for class instances only) that would  
automatically exploit the three indexes and do cheap joins using  
sorted instance oids.  i.e.

(constrain (t thing)
    where (eq (type t) 'canDo)
          (eq (target t) 'bark))

To get all instances of class triple that represent sources that  
canDo bark.  (X canDo 'bark)  If the indexes are defined only over  
triples, than no class-of or type-of constraint is needed.  The user  
will still need to keep track of some of these constraints as not all  
optimizations can be automatically determined.

Computationally, this would involve getting all the oids from the  
type index, all oids from the bark index, doing a sorted merge then  
using the oids to pull out the instances during a mapping operation.   
i.e.

(map-query-result
   (lambda (triple) (print (source triple)))
   (constraint (t thing)
     ...))

The syntax will evolve, of course, but it will be something like  
this.  The query engine will go into the 1.0 or 1.1 release.  I'll  
try to write it so someone could write a more general version to do  
this for their own btrees rather than relying on the automated class  
support.  The query language uses slot/accessor semantics and  
reflection so a more general solution would require a different and  
probably more verbose syntax.

>    This does create a parallel 'index' which is its own btree.  There
>    isn't a good way to do this using only a single btree as we don't
>    expose the sort order constraints or allow the user to create  
> partial
>    keys so they can use cursors to iterate over a subset of the btree.
>    Now if you indexed your things by typename, then you could do this
>    with a single tree, although you'd have to go over indexes

Actually this would _only_ work for non-numeric types and is perhaps  
too dependent on the current serializer.  I think I'd rather not  
restrict the data store implementation that much given that we're due  
to have 4 separate ones in the future.

> Hm, this is an interesting suggestion, but I think it would eliminate
> the "elegance" (perhaps questionable) of being able to easily and
> cleanly reach any Thing by its unique ID number (my current strategy).
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel