[elephant-devel] virtual subtree?
Ian Eslick
eslick at csail.mit.edu
Tue Apr 17 15:34:57 UTC 2007
Comments are inline below...
On Apr 17, 2007, at 11:11 AM, Joe Corneli wrote:
> Thanks again for the detailed response... here are some follow-up
> questions. Not critical, but a few answers might satisfy my
> curiousity...
>
> The easiest thing I can think of is to create a derived index on
> the classes of elements in your btree and then use map-index to map
> over the instances.
>
> ELE-TESTS> (setf my-things (make-indexed-btree))
> ELE-TESTS> (add-index my-things :index-name 'thing-type
> :key-form '(lambda (index k v)
> (values t (type-of v))))
> ELE-TESTS> (map-index (lambda (sk v pk) (print v))
> (get-index my-things 'thing-type) :value
> 'symbol)
>
> OK, this will enable me to grab all of the triples, which is good.
> But if I understand correctly, it is not associated with *subsequent*
> efficient search through the results, so I should use idioms like:
Yup, thus my second example...
> ELE-TESTS> (add-index my-things :index-name 'triples-first
> :key-form '(lambda (index k v)
> (if (subtypep (type-of v)
> 'triple)
> (values t (triple-first
> v))
> (values nil nil)))
> :populate t)
>
> [Note that I have fiddled with the `lambda' form you supplied, the
> original
>
> (lambda (index k v)
> (if (subtypep (type-of v)
> 'triple) t)
> (values t (triple-first v))
> (values nil nil))
>
> did not make sense to me -- a typo?]
A typo!
> You say:
>
> This will create an index 'triples-first which only indexes triples
> and does so by the value of the first element. Thus you can easily
> retrieve all triples with the first element eq to 5.
>
> This sounds good -- but -- I would also like to be able to look up
> triples by the middle, end, and combinations (like, "match beginning
> and end"). No problem from the coding point of view, but I want to
> check that creating all of these indices isn't nuts from the storage
> point of view. Such additional indices don't take up too much space,
> right? (I certainly don't have any other better ideas in mind! but
> thought I should clarify that there may be as many as 7 indices for
> all the different look-up combinations.)
Each index has one entry for every triple you store. The entry
contains the 'value' returned by the lambda expression (as small as 5-
bytes for fixnums) and size of the primary key (5-bytes if using a
fixnum id) of the triples in the main btree. Add to this this the
overhead for the BTree data structures. Since it's all on disk the
storage really isn't that obscene.
Combinations are usually dealt by joins (elements where head = 1 AND
tail = 2 AND type = 10).
You could do this manually, but it means deserializing each set
individually and then doing an intersection between them. If your
sets are likely to be small, you could just do the intersection in
lisp. If your sets are likely to be very large, then you need to be
more clever.
I'm working on a query system (for class instances only) that would
automatically exploit the three indexes and do cheap joins using
sorted instance oids. i.e.
(constrain (t thing)
where (eq (type t) 'canDo)
(eq (target t) 'bark))
To get all instances of class triple that represent sources that
canDo bark. (X canDo 'bark) If the indexes are defined only over
triples, than no class-of or type-of constraint is needed. The user
will still need to keep track of some of these constraints as not all
optimizations can be automatically determined.
Computationally, this would involve getting all the oids from the
type index, all oids from the bark index, doing a sorted merge then
using the oids to pull out the instances during a mapping operation.
i.e.
(map-query-result
(lambda (triple) (print (source triple)))
(constraint (t thing)
...))
The syntax will evolve, of course, but it will be something like
this. The query engine will go into the 1.0 or 1.1 release. I'll
try to write it so someone could write a more general version to do
this for their own btrees rather than relying on the automated class
support. The query language uses slot/accessor semantics and
reflection so a more general solution would require a different and
probably more verbose syntax.
> This does create a parallel 'index' which is its own btree. There
> isn't a good way to do this using only a single btree as we don't
> expose the sort order constraints or allow the user to create
> partial
> keys so they can use cursors to iterate over a subset of the btree.
> Now if you indexed your things by typename, then you could do this
> with a single tree, although you'd have to go over indexes
Actually this would _only_ work for non-numeric types and is perhaps
too dependent on the current serializer. I think I'd rather not
restrict the data store implementation that much given that we're due
to have 4 separate ones in the future.
> Hm, this is an interesting suggestion, but I think it would eliminate
> the "elegance" (perhaps questionable) of being able to easily and
> cleanly reach any Thing by its unique ID number (my current strategy).
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list