[elephant-devel] Lisp Btrees: design considerations

Wed May 14 14:20:57 UTC 2008

On May 14, 2008, at 10:02 AM, Robert L. Read wrote:

> On Wed, 2008-05-14 at 09:40 -0400, Ian Eslick wrote:
>> If the BTree node binary data is stored in a lisp octet array, and
>> we
>> don't want to deserialize to compare keys, then we need to write the
>> procedure that you can find in libberkeley-db.c in the src/db-bdb
>> file.  It performs a content-sensitive comparison on a byte-by-byte
>> basis.  This has huge benefits of allowing us to do key comparisons
>> 'in-place' without creating lisp objects.  We pass this C function
>> to
>> BDB which uses it to compare keys and values directly in the binary
>> file pages.
>
> I understand the value of this, but is it not possible to write the
> comparator on serialized values in LISP?  This would then be passed  
> down
> to the lowest level IO operation, which is still in LISP.

I think we can do a much more elegant job, and I like your vision here.

> In general I'm imagining a system that does just that as its "query
> language", --- take a very lisp-like specification of a predicate and
> pass it down to the lowest level I/O operation, or as low as you can
> get, and apply it there.  This allows us to retain the elegant  
> "mapcar"
> and "map-reduce" strategies as basic querying mechanisms.

It would be nice to unify the comparisons in the map operators with  
those in the database!!

> If we make a modular serializer, that serializes on the basis of data
> type, then it is indeed a challenge for the implementor to implement a
> function that performs comparisons on the serialized representation of
> that type...but a very worthwhile challenge, I think.
>

It should be since we always know the type before we consume:

(defmethod compare-serialized-< ((typetag (eql +string+)) stream1  
stream2)
	Use ops like read-octet and read-int32 to read data from the stream
         and compare...  Recurse on substructure as necessary
       (let ((type1 (read-octet stream1))
             (type2 (read-octet stream2)))
          (if (eq type1 type2)
              (compare-serialized type1 stream1 stream2)
              (< type1 type2))))

Which of course is a common idiom that you could compress into:
   (dispatch-on-subtype stream1 stream2)

Which returns an ordering on types or an ordering on the values of the  
same types.  Makes it easy to implement equal and equalp sorting  
directly on top of the dbs!

This code isn't like to be terribly pretty given the preponderance of  
low-level ops, but if you had a compressed data format you could  
create a tag for it and dispatch your own comparison function when  
that tag was reached...

Ian

>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel