[elephant-devel] Fwd: some patches

Ian Eslick eslick at csail.mit.edu
Fri Aug 3 14:01:18 UTC 2007


First of all, BDB does not sort on the serialized values except for  
values for which no lisp ordering exists.  BDB is given a C function  
which decodes the serialized format on the fly, without talking to  
Lisp, but properly orders all of lisp's orderable objects.

Orderable lisp types:
1. Numeric types: all numeric types are sorted identically to lisp (=  
< > <= >=, etc)
    Numeric types include fixnum, integer, float, double-float, rational
2. String types: all string types are sorted identically to lisp  
(string< string<= string=, etc)

Unorderable lisp types:
1. Symbols: I do sort symbols by symbol-name as a convenience.
    Lisp does not provide an ordering fn for symbols so this is additive
2. Objects, structs, etc: lisp does not provide an ordering function  
for objects
3. Arrays, hash tables, etc: lisp does not provide an ordering  
function for these types

Some of these types can be grouped by eq, eql, equal or equalp.  More  
on this below.

Because a BTree requires some order for all objects, we need to  
expand on the lisp specification of ordering.  First we sort on type  
so that all unorderable objects of a given type are stored as a  
group.  Then we choose an arbitrary order (binary rep).  Now most of  
the hairy issues arise when we mix types in an index and want to  
traverse a range in the index or do cursor ops on an index.  The  
specification is this:

1) Elephant makes no guarantee about the order of unorderable types.   
Portable implementations should not depend on an order being  
consistent.  Unorderable types should be treated as sets.  Anything  
that is equalp in lisp will be grouped together in the data store  
(arrays, structs, symbols)

2) Elephant makes no guarantee about the relative order of different  
types in an index.

*** There is an open issue: how to specify a range by object type.   
You can't do :start and :end because you don't know what object will  
be at the start and end due to #1 and #2 above.  I think this should  
be dealt with as a special case and a less efficient procedure use

I can't agree with either of your proposals due to performance issues:

1) The performance impact of (position, value) is extreme.  I turn  
all of the BTree O(log N) operations into O(N).  I now have to read  
all values out of the BTree until I identify the insertion point,  
then renumber all the following elements.  This defeats the whole  
purpose of having a BTree in the first place.

2) I hate to say it but the CL-SQL approach to cursor operations is  
not just terribly inefficient, it also defeats the purpose of having  
indicies on disk by requiring that all values be pulled into memory  
to support cursor or map operations.  I see the CL-SQL backend as a  
solution to licensing problems for small databases.  For complex  
queries over large databases one is better off with an object  
relational solution like that implemented by CL-SQL's native metaclass.

The reason that lisp needs to implement a more complex sorting  
function to support cursor operations over BTrees is that BTrees  
benefit from ordering types relative to each other (we can't  
intersperse numeric and non-numeric types without violating lisp  
ordering) and the cursor fn's need to mirror this.  This doesn't  
violate lisp ordering, it is additive.  The clarity of the required  
constraints could be improved, though.

 > There is a hidden danger in relying upon an order based on the  
serialized value---namely,
 > you can now not swap out serializers without drastic side  
effects.  Since one of the main
 > ways that we can improve performance is by writing better  
serializers (and, in particular,
 > serializers that are specific to particular data types), this  
seems like a bad idea.

If designers adhere to the two restrictions above, then a new  
serializer that supports lisp ordering and introduces a new ordering  
of types or within types that are unordered in lisp will have no  
impact on application code.

Sorry if any point is unclear, I'm in a hurry.  Please ask clarifying  
questions if necessary.

Ian

PS - This leaves me with a question.  Is it possible in any  
relational DB to register a sorting fn that you can sort by when  
doing a sorted query?  Typically rows are sorted by their unique ID  
field (i.e. the key used in the underlying BTree).



On Aug 2, 2007, at 5:55 PM, Robert L. Read wrote:

> Yes, I think I understand this.  However, a costly alternative does  
> exist:  just never let
> BDB use its own order.  Always impose one that we can compute in  
> lisp.  Then in
> BDB you store a (position,value) pair instead of a value, and  
> either ensure that BDB
> sorts on the first part of the binary representation of the  
> position the way you want it to,
> or you add a lot of logic into the "cursor-next" operation.  This  
> is how it is done on the
> CLSQL side.
>
> It is almost certainly a bit slower, and it is certainly a bit  
> harder to code.
>
> It seems to me that the root of the problem is that BDB does indeed  
> order based on a
> serialized value.  That is what we should remove.  Certainly, if  
> someone were to write
> a Pure-LISP backend, which I hope will occur eventually, it would  
> seem silly for them to
> have to respect an artifact inherited from BDB, when part of the  
> purpose of such a project
> is to escape dependence on BDB.
>
> Forgive me if I'm confused but I assert that we should reverse your  
> argument: we should
> force BDB to be isomorphic to a lisp sorter, not build a lisp  
> sorted isomorphic to BDB.
>
>
> On Tue, 2007-07-31 at 15:24 -0400, Ian Eslick wrote:
>> The practical problem that led to the current design of index  
>> sorting is that we cannot use lisp code to define the sorting  
>> function for serialized values inside BDB Btrees (same problem I  
>> imagine that Henrik had with postmodern). Instead, there is a  
>> hairy custom C procedure that is registered with BDB that parses  
>> the serialized format so that sorting is done first by type  
>> (symbol, string, object, pobject, etc) followed by ordering within  
>> numeric types, strings and symbols. Everything else is ordered  
>> based on the byte ordering of its serialized representation. To  
>> map across indices correctly, we need to know up front whether the  
>> start value is less than the end value. And so we need a standard  
>> lisp function that is isomorphic to the BDB sorting function.  
>> Ideally postmodern would have a similar sorting function that    
>> properly interprets the serialized format just like the BDB  
>> function does. I think it's best to have a single standard  
>> ordering that is as close to lisp's notion of ordering as possible  
>> so we don't have to maintain different orderings. Ian PS - It  
>> might be possible to have a lisp ordering function implement BDB's  
>> notion of sorting by registering it as a callback, however it  
>> would have to deserialize the BDB values each time. So the  
>> problems with this are both stability concerns for foreign  
>> callbacks and the performance impact of serialization/ 
>> deserialization for internal BDB operations. On the cleanliness/ 
>> performance axis, I think the current approach is the right  
>> tradeoff (it's the original one Ben made, FYI). On Jul 31, 2007,  
>> at 12:50 PM, Robert L. Read wrote: > Personally, I think the only  
>> sensible way to handle this problem is > to require the user to >  
>> specify an ordering function. We can of course provide a default,  
>> > which will be error-prone > but tend to work most of the time. >  
>> > The function called "my-generic-less-than" which is in the  
>> source > tree now could be > a starting point for a generic  
>> ordering. > > > On Tue, 2007-07-24 at 09:48 -0400, Ian Eslick  
>> wrote: >> Robert and I have had some extended discussions on  
>> ordering in >> indices. I think that all we really need to agree  
>> on is _some_ >> canonical ordering. If we have mixed types in an  
>> index, how should >> they be ordered relative to each other? In  
>> BDB we have a C >> function which implements the ordering based on  
>> the type tag and >> then based on the type within it. Are you  
>> relying on a pure binary >> sort in postmodern? Robert or I will  
>> get to submitting that patch >> shortly. I have recently sent in a  
>> patch to lisp-compare<= so >> we'll see if we had to make parallel  
>> changes. Thanks, Ian On Jul >> 24, 2007, at 3:50 AM, Henrik Hjelte  
>> wrote: > I sent this message >> yesterday but I guess it got stuck  
>> in the mailing > list filter. >> Perhaps the attachment was too  
>> big. Since my > common-lisp.net >> user hhjelte does not have  
>> write access to elephant I > have >> placed the patches from here  
>> instead: > darcs get http://common- >> lisp.net/project/grand-prix/ 
>> darcs/elephant > > ---------- >> Forwarded message ---------- >  
>> From: Henrik Hjelte >> <henrik at evahjelte.com> > Date: Jul 23, 2007  
>> 11:28 PM > Subject: >> some patches > To: elephant-devel at common- 
>> lisp.net > > > Here are >> some darcs patches that might be of  
>> interest. I had some > >> problems with map-index on db-postmodern  
>> that made me almost rip >> my > hair of, but finally I made it to  
>> work again. The problem is >> that > map-index for a string value  
>> rely on the ordering in the >> btree > (continue-p makes use of  
>> less than for strings). The >> postmodern > backend relies on how  
>> the database backend orders >> things, which is not > always the  
>> same thing. Is it a necessary >> feature that b-trees of > string  
>> and objects are required to be >> ordered by lisp-compare<=? > >  
>> In the process of solving the bug I >> have upgraded the test  
>> framework > to use FiveAM instead of RT, It >> has in my opinion a  
>> very nice syntax > and some useful features to >> track  
>> dependencies between tests. I hope > you agree that it >> improves  
>> on things. > > /Henrik Hjelte > >>  
>> _______________________________________________ > elephant-devel  
>> >> site list > elephant-devel at common-lisp.net > http://common- >>  
>> lisp.net/mailman/listinfo/elephant-devel >>  
>> _______________________________________________ elephant-devel >>  
>> site list elephant-devel at common-lisp.net http://common-lisp.net/  
>> >> mailman/listinfo/elephant-devel >  
>> _______________________________________________ > elephant-devel  
>> site list > elephant-devel at common-lisp.net > http://common- 
>> lisp.net/mailman/listinfo/elephant-devel  
>> _______________________________________________ elephant-devel  
>> site list elephant-devel at common-lisp.net http://common-lisp.net/ 
>> mailman/listinfo/elephant-devel
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list