[elephant-devel] virtual subtree?

Tue Apr 17 15:56:51 UTC 2007

A note on BTree storage overhead for an index entry...

The actual disk overhead of an Elephant BTree entry will vary.  In  
the BDB backend, there is a 4-byte BTree id that is used to identify  
the specific Elephant btree within a single BDB Btree.  So for the  
Elephnat BDB backend that's a minimum of 14 bytes of storage, not  
including btree node headers, page overhead and pointers for the BDB  
Btree (available from the BDB docs).

I'm not sure about the layers of overhead in the CL-SQL backend.

Ian

On Apr 17, 2007, at 11:34 AM, Ian Eslick wrote:

> Comments are inline below...
>
> On Apr 17, 2007, at 11:11 AM, Joe Corneli wrote:
>
>> Thanks again for the detailed response... here are some follow-up
>> questions.  Not critical, but a few answers might satisfy my
>> curiousity...
>>
>>    The easiest thing I can think of is to create a derived index on
>>    the classes of elements in your btree and then use map-index to  
>> map
>>    over the instances.
>>
>>    ELE-TESTS> (setf my-things (make-indexed-btree))
>>    ELE-TESTS> (add-index my-things :index-name 'thing-type
>>                          :key-form '(lambda (index k v)
>>                                      (values t (type-of v))))
>>    ELE-TESTS> (map-index (lambda (sk v pk) (print v))
>>                          (get-index my-things 'thing-type) :value  
>> 'symbol)
>>
>> OK, this will enable me to grab all of the triples, which is good.
>> But if I understand correctly, it is not associated with *subsequent*
>> efficient search through the results, so I should use idioms like:
>
> Yup, thus my second example...
>
>>    ELE-TESTS> (add-index my-things :index-name 'triples-first
>>                           :key-form '(lambda (index k v)
>>                                         (if (subtypep (type-of v)  
>> 'triple)
>>                                             (values t (triple- 
>> first v))
>>                                             (values nil nil)))
>>                           :populate t)
>>
>> [Note that I have fiddled with the `lambda' form you supplied, the
>> original
>>
>> (lambda (index k v)
>>   (if (subtypep (type-of v)
>>                 'triple) t)
>>   (values t (triple-first v))
>>   (values nil nil))
>>
>> did not make sense to me -- a typo?]
>
> A typo!
>
>> You say:
>>
>>    This will create an index 'triples-first which only indexes  
>> triples
>>    and does so by the value of the first element.  Thus you can  
>> easily
>>    retrieve all triples with the first element eq to 5.
>>
>> This sounds good -- but -- I would also like to be able to look up
>> triples by the middle, end, and combinations (like, "match beginning
>> and end").  No problem from the coding point of view, but I want to
>> check that creating all of these indices isn't nuts from the storage
>> point of view.  Such additional indices don't take up too much space,
>> right?  (I certainly don't have any other better ideas in mind!  but
>> thought I should clarify that there may be as many as 7 indices for
>> all the different look-up combinations.)
>
> Each index has one entry for every triple you store.  The entry  
> contains the 'value' returned by the lambda expression (as small as  
> 5-bytes for fixnums) and size of the primary key (5-bytes if using  
> a fixnum id) of the triples in the main btree.  Add to this this  
> the overhead for the BTree data structures.  Since it's all on disk  
> the storage really isn't that obscene.
>
> Combinations are usually dealt by joins (elements where head = 1  
> AND tail = 2 AND type = 10).
>
> You could do this manually, but it means deserializing each set  
> individually and then doing an intersection between them.  If your  
> sets are likely to be small, you could just do the intersection in  
> lisp.  If your sets are likely to be very large, then you need to  
> be more clever.
>
> I'm working on a query system (for class instances only) that would  
> automatically exploit the three indexes and do cheap joins using  
> sorted instance oids.  i.e.
>
> (constrain (t thing)
>    where (eq (type t) 'canDo)
>          (eq (target t) 'bark))
>
> To get all instances of class triple that represent sources that  
> canDo bark.  (X canDo 'bark)  If the indexes are defined only over  
> triples, than no class-of or type-of constraint is needed.  The  
> user will still need to keep track of some of these constraints as  
> not all optimizations can be automatically determined.
>
> Computationally, this would involve getting all the oids from the  
> type index, all oids from the bark index, doing a sorted merge then  
> using the oids to pull out the instances during a mapping  
> operation.  i.e.
>
> (map-query-result
>   (lambda (triple) (print (source triple)))
>   (constraint (t thing)
>     ...))
>
> The syntax will evolve, of course, but it will be something like  
> this.  The query engine will go into the 1.0 or 1.1 release.  I'll  
> try to write it so someone could write a more general version to do  
> this for their own btrees rather than relying on the automated  
> class support.  The query language uses slot/accessor semantics and  
> reflection so a more general solution would require a different and  
> probably more verbose syntax.
>
>>    This does create a parallel 'index' which is its own btree.  There
>>    isn't a good way to do this using only a single btree as we don't
>>    expose the sort order constraints or allow the user to create  
>> partial
>>    keys so they can use cursors to iterate over a subset of the  
>> btree.
>>    Now if you indexed your things by typename, then you could do this
>>    with a single tree, although you'd have to go over indexes
>
> Actually this would _only_ work for non-numeric types and is  
> perhaps too dependent on the current serializer.  I think I'd  
> rather not restrict the data store implementation that much given  
> that we're due to have 4 separate ones in the future.
>
>> Hm, this is an interesting suggestion, but I think it would eliminate
>> the "elegance" (perhaps questionable) of being able to easily and
>> cleanly reach any Thing by its unique ID number (my current  
>> strategy).
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel