<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.3.2">
</HEAD>
<BODY>
I agree with all of your last email.<BR>
<BR>
However, I think in your discussion of complexity of the existing operations in the CL-SQL side<BR>
you are implying there are lots of O(n) operations when in fact this only occurs when one <BR>
indexes a slot that has a small number of values relative to the number of objects or uses <BR>
a very non-selective functional index. I wrote that stuff before you implemented slot indexing.<BR>
However, you are correct --- in some cases it will perform very poorly. However, we now have<BR>
the Postmodern back-end (which I am already using in production), and it probably doesn't suffer<BR>
from the same problems. Moreover, the licensing issues with BDB seem to have gone away.<BR>
<BR>
I understand that you and I have slightly different approaches. I am thinking of Prevalence-style<BR>
implementations, that don't rely on disk-based indices (for example, DCM uses hash-tables), and <BR>
you are thinking of databases way to large to fit into memory, and relying upon designing an <BR>
index structure using the powerful Elephant mechanism to have an efficient system. I much<BR>
prefer that than supporting the notion of using an ORM, as you mention. In fact, I hope Elephant<BR>
will be an ORM-zilla --- I hope it will teach people there is no need to deal with that impedance<BR>
mismatch.<BR>
<BR>
I think you clarification of the specification (by saying it is unspecified) is very helpful.<BR>
<BR>
With respect to the "open issue" you raise about "get-instances-by-range", I wrote the following <BR>
little article:<BR>
<BR>
I take as a basic assumption that a B-Tree is a data structure, not an<BR>
abstract data type. The abstract data type that it implements is an <BR>
Ordered Set, that is a set equipped with a total order (a total order<BR>
being one that is antisymmetic, transitive and complete.)<BR>
<BR>
A system like Elephant, or a relational database, should present an<BR>
abstract data type, and not expose implementation details. That's<BR>
we call relational databases "relational" rather than "ISAM-al" or <BR>
"BTree-al". Even though in practice they don't (since typically<BR>
duplicate tuples are allowed, though implementing multisets rather than<BR>
sets, properly), what we call RDBMS proper are based upon the notion <BR>
of the relation, an abstract data type. I think Elephant should <BR>
be based on the simpler notion of a Set or a Dictionary, and <BR>
support the notion of an Ordered Set.<BR>
<BR>
There are two basic Abstract Data Types that Elephant provides. This<BR>
is a kind legacy of the original API. We implement the "Map" ADT and<BR>
the "Ordered Set" ADT. Example of "Map" are "add-to-store(key,value)"<BR>
and "get-from-store(key,value)". Persistent classes are examples<BR>
of the "Set" or "Ordered Set" ADT. A fundamental problem with <BR>
thinking in terms of these Abstract Data Types as defined by the <BR>
Wikipedia, for example, is that they are not defined to have an<BR>
"enumerate all values" operation. This is an abstract operation but<BR>
is essential in practice. If you can't do that, you have to have <BR>
a separate store to keep track of the objects in the store, and that <BR>
is obviously no good.<BR>
<BR>
(We provide nice mechanisms for enumerating all objects in both cases.<BR>
We have the "map-btree" function and you have written "map-class".<BR>
In practice, there are also cursors to do similar things.)<BR>
<BR>
It seems to me an open question whether we should define persistent<BR>
classes to be a "Set" or an "Ordered Set".<BR>
<BR>
As you point out, lisp doesn't provide an ordering for objects. This<BR>
is because, unlike the integers, it is unclear what the ordering should<BR>
be. Even for strings, the ordering should technically be dependent on<BR>
the language in which you interpret the strings---not every culture <BR>
orders their letters in the same way.<BR>
<BR>
It seems to me that everything would be a lot cleaner if we think<BR>
of fundamentally presenting the "Set" ADT rather than the "Ordered Set"<BR>
ADT. The "Set" ADT does not support "get-instances-by-range", but the <BR>
"Ordered Set" does (however, this must be based on a LISP-defined order.)<BR>
<BR>
In your previous email you suggest a specification wherein Elephant<BR>
makes not guarantee about the order of unorderable types. This is <BR>
effectively the specification of a "Set" type. The clearest thing<BR>
of course would be our test suite reflected the distinction between<BR>
Sets and Ordered Sets by testing the persistence of order on Ordered<BR>
Sets but not relying upon it all for Sets.<BR>
<BR>
If one thinks about the asymptotic complexity of both of these kinds<BR>
of ADTs, it is reasonable to expect O(log(n)) insertion, deletion, and<BR>
lookups in all cases. <BR>
<BR>
Elephant provides several index mechanisms. The most elegant is <BR>
the indexing of a slot value which you implemented; but there are <BR>
also function indexes. For example, one can implement a boolean<BR>
function "rhymes-with-cat" and build an index on that. This allows<BR>
you to efficiently enumerate only lisp objects that for which <BR>
"rhymes-with-cat" is true. More reaonably, one could build an<BR>
index on a class slot that had a boolean value. <BR>
This is the place where the current CL-SQL backend has an O(n) operation--<BR>
when there are duplicate keys and index. Recall that I coded that <BR>
before you wrote the class indexing stuff. I personally don't think it <BR>
is very important, and I use DCM anyway. If anybody actually encounters <BR>
this as a performance problem, I'll be happy to work on it --- <BR>
unless of course you're using<BR>
Postgres, in which case I would recommend switching to the new (and not<BR>
officially released) Postmodern back-end. I have done that on my<BR>
production site already.<BR>
<BR>
<BR>
<BR>
On Fri, 2007-08-03 at 10:01 -0400, Ian Eslick wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">First of all, BDB does not sort on the serialized values except for </FONT>
<FONT COLOR="#000000">values for which no lisp ordering exists. BDB is given a C function </FONT>
<FONT COLOR="#000000">which decodes the serialized format on the fly, without talking to </FONT>
<FONT COLOR="#000000">Lisp, but properly orders all of lisp's orderable objects.</FONT>
<FONT COLOR="#000000">Orderable lisp types:</FONT>
<FONT COLOR="#000000">1. Numeric types: all numeric types are sorted identically to lisp (= </FONT>
<FONT COLOR="#000000">< > <= >=, etc)</FONT>
<FONT COLOR="#000000"> Numeric types include fixnum, integer, float, double-float, rational</FONT>
<FONT COLOR="#000000">2. String types: all string types are sorted identically to lisp </FONT>
<FONT COLOR="#000000">(string< string<= string=, etc)</FONT>
<FONT COLOR="#000000">Unorderable lisp types:</FONT>
<FONT COLOR="#000000">1. Symbols: I do sort symbols by symbol-name as a convenience.</FONT>
<FONT COLOR="#000000"> Lisp does not provide an ordering fn for symbols so this is additive</FONT>
<FONT COLOR="#000000">2. Objects, structs, etc: lisp does not provide an ordering function </FONT>
<FONT COLOR="#000000">for objects</FONT>
<FONT COLOR="#000000">3. Arrays, hash tables, etc: lisp does not provide an ordering </FONT>
<FONT COLOR="#000000">function for these types</FONT>
<FONT COLOR="#000000">Some of these types can be grouped by eq, eql, equal or equalp. More </FONT>
<FONT COLOR="#000000">on this below.</FONT>
<FONT COLOR="#000000">Because a BTree requires some order for all objects, we need to </FONT>
<FONT COLOR="#000000">expand on the lisp specification of ordering. First we sort on type </FONT>
<FONT COLOR="#000000">so that all unorderable objects of a given type are stored as a </FONT>
<FONT COLOR="#000000">group. Then we choose an arbitrary order (binary rep). Now most of </FONT>
<FONT COLOR="#000000">the hairy issues arise when we mix types in an index and want to </FONT>
<FONT COLOR="#000000">traverse a range in the index or do cursor ops on an index. The </FONT>
<FONT COLOR="#000000">specification is this:</FONT>
<FONT COLOR="#000000">1) Elephant makes no guarantee about the order of unorderable types. </FONT>
<FONT COLOR="#000000">Portable implementations should not depend on an order being </FONT>
<FONT COLOR="#000000">consistent. Unorderable types should be treated as sets. Anything </FONT>
<FONT COLOR="#000000">that is equalp in lisp will be grouped together in the data store </FONT>
<FONT COLOR="#000000">(arrays, structs, symbols)</FONT>
<FONT COLOR="#000000">2) Elephant makes no guarantee about the relative order of different </FONT>
<FONT COLOR="#000000">types in an index.</FONT>
<FONT COLOR="#000000">*** There is an open issue: how to specify a range by object type. </FONT>
<FONT COLOR="#000000">You can't do :start and :end because you don't know what object will </FONT>
<FONT COLOR="#000000">be at the start and end due to #1 and #2 above. I think this should </FONT>
<FONT COLOR="#000000">be dealt with as a special case and a less efficient procedure use</FONT>
<FONT COLOR="#000000">I can't agree with either of your proposals due to performance issues:</FONT>
<FONT COLOR="#000000">1) The performance impact of (position, value) is extreme. I turn </FONT>
<FONT COLOR="#000000">all of the BTree O(log N) operations into O(N). I now have to read </FONT>
<FONT COLOR="#000000">all values out of the BTree until I identify the insertion point, </FONT>
<FONT COLOR="#000000">then renumber all the following elements. This defeats the whole </FONT>
<FONT COLOR="#000000">purpose of having a BTree in the first place.</FONT>
<FONT COLOR="#000000">2) I hate to say it but the CL-SQL approach to cursor operations is </FONT>
<FONT COLOR="#000000">not just terribly inefficient, it also defeats the purpose of having </FONT>
<FONT COLOR="#000000">indicies on disk by requiring that all values be pulled into memory </FONT>
<FONT COLOR="#000000">to support cursor or map operations. I see the CL-SQL backend as a </FONT>
<FONT COLOR="#000000">solution to licensing problems for small databases. For complex </FONT>
<FONT COLOR="#000000">queries over large databases one is better off with an object </FONT>
<FONT COLOR="#000000">relational solution like that implemented by CL-SQL's native metaclass.</FONT>
<FONT COLOR="#000000">The reason that lisp needs to implement a more complex sorting </FONT>
<FONT COLOR="#000000">function to support cursor operations over BTrees is that BTrees </FONT>
<FONT COLOR="#000000">benefit from ordering types relative to each other (we can't </FONT>
<FONT COLOR="#000000">intersperse numeric and non-numeric types without violating lisp </FONT>
<FONT COLOR="#000000">ordering) and the cursor fn's need to mirror this. This doesn't </FONT>
<FONT COLOR="#000000">violate lisp ordering, it is additive. The clarity of the required </FONT>
<FONT COLOR="#000000">constraints could be improved, though.</FONT>
<FONT COLOR="#000000"> > There is a hidden danger in relying upon an order based on the </FONT>
<FONT COLOR="#000000">serialized value---namely,</FONT>
<FONT COLOR="#000000"> > you can now not swap out serializers without drastic side </FONT>
<FONT COLOR="#000000">effects. Since one of the main</FONT>
<FONT COLOR="#000000"> > ways that we can improve performance is by writing better </FONT>
<FONT COLOR="#000000">serializers (and, in particular,</FONT>
<FONT COLOR="#000000"> > serializers that are specific to particular data types), this </FONT>
<FONT COLOR="#000000">seems like a bad idea.</FONT>
<FONT COLOR="#000000">If designers adhere to the two restrictions above, then a new </FONT>
<FONT COLOR="#000000">serializer that supports lisp ordering and introduces a new ordering </FONT>
<FONT COLOR="#000000">of types or within types that are unordered in lisp will have no </FONT>
<FONT COLOR="#000000">impact on application code.</FONT>
<FONT COLOR="#000000">Sorry if any point is unclear, I'm in a hurry. Please ask clarifying </FONT>
<FONT COLOR="#000000">questions if necessary.</FONT>
<FONT COLOR="#000000">Ian</FONT>
<FONT COLOR="#000000">PS - This leaves me with a question. Is it possible in any </FONT>
<FONT COLOR="#000000">relational DB to register a sorting fn that you can sort by when </FONT>
<FONT COLOR="#000000">doing a sorted query? Typically rows are sorted by their unique ID </FONT>
<FONT COLOR="#000000">field (i.e. the key used in the underlying BTree).</FONT>
<FONT COLOR="#000000">On Aug 2, 2007, at 5:55 PM, Robert L. Read wrote:</FONT>
<FONT COLOR="#000000">> Yes, I think I understand this. However, a costly alternative does </FONT>
<FONT COLOR="#000000">> exist: just never let</FONT>
<FONT COLOR="#000000">> BDB use its own order. Always impose one that we can compute in </FONT>
<FONT COLOR="#000000">> lisp. Then in</FONT>
<FONT COLOR="#000000">> BDB you store a (position,value) pair instead of a value, and </FONT>
<FONT COLOR="#000000">> either ensure that BDB</FONT>
<FONT COLOR="#000000">> sorts on the first part of the binary representation of the </FONT>
<FONT COLOR="#000000">> position the way you want it to,</FONT>
<FONT COLOR="#000000">> or you add a lot of logic into the "cursor-next" operation. This </FONT>
<FONT COLOR="#000000">> is how it is done on the</FONT>
<FONT COLOR="#000000">> CLSQL side.</FONT>
<FONT COLOR="#000000">></FONT>
<FONT COLOR="#000000">> It is almost certainly a bit slower, and it is certainly a bit </FONT>
<FONT COLOR="#000000">> harder to code.</FONT>
<FONT COLOR="#000000">></FONT>
<FONT COLOR="#000000">> It seems to me that the root of the problem is that BDB does indeed </FONT>
<FONT COLOR="#000000">> order based on a</FONT>
<FONT COLOR="#000000">> serialized value. That is what we should remove. Certainly, if </FONT>
<FONT COLOR="#000000">> someone were to write</FONT>
<FONT COLOR="#000000">> a Pure-LISP backend, which I hope will occur eventually, it would </FONT>
<FONT COLOR="#000000">> seem silly for them to</FONT>
<FONT COLOR="#000000">> have to respect an artifact inherited from BDB, when part of the </FONT>
<FONT COLOR="#000000">> purpose of such a project</FONT>
<FONT COLOR="#000000">> is to escape dependence on BDB.</FONT>
<FONT COLOR="#000000">></FONT>
<FONT COLOR="#000000">> Forgive me if I'm confused but I assert that we should reverse your </FONT>
<FONT COLOR="#000000">> argument: we should</FONT>
<FONT COLOR="#000000">> force BDB to be isomorphic to a lisp sorter, not build a lisp </FONT>
<FONT COLOR="#000000">> sorted isomorphic to BDB.</FONT>
<FONT COLOR="#000000">></FONT>
<FONT COLOR="#000000">></FONT>
<FONT COLOR="#000000">> On Tue, 2007-07-31 at 15:24 -0400, Ian Eslick wrote:</FONT>
<FONT COLOR="#000000">>> The practical problem that led to the current design of index </FONT>
<FONT COLOR="#000000">>> sorting is that we cannot use lisp code to define the sorting </FONT>
<FONT COLOR="#000000">>> function for serialized values inside BDB Btrees (same problem I </FONT>
<FONT COLOR="#000000">>> imagine that Henrik had with postmodern). Instead, there is a </FONT>
<FONT COLOR="#000000">>> hairy custom C procedure that is registered with BDB that parses </FONT>
<FONT COLOR="#000000">>> the serialized format so that sorting is done first by type </FONT>
<FONT COLOR="#000000">>> (symbol, string, object, pobject, etc) followed by ordering within </FONT>
<FONT COLOR="#000000">>> numeric types, strings and symbols. Everything else is ordered </FONT>
<FONT COLOR="#000000">>> based on the byte ordering of its serialized representation. To </FONT>
<FONT COLOR="#000000">>> map across indices correctly, we need to know up front whether the </FONT>
<FONT COLOR="#000000">>> start value is less than the end value. And so we need a standard </FONT>
<FONT COLOR="#000000">>> lisp function that is isomorphic to the BDB sorting function. </FONT>
<FONT COLOR="#000000">>> Ideally postmodern would have a similar sorting function that </FONT>
<FONT COLOR="#000000">>> properly interprets the serialized format just like the BDB </FONT>
<FONT COLOR="#000000">>> function does. I think it's best to have a single standard </FONT>
<FONT COLOR="#000000">>> ordering that is as close to lisp's notion of ordering as possible </FONT>
<FONT COLOR="#000000">>> so we don't have to maintain different orderings. Ian PS - It </FONT>
<FONT COLOR="#000000">>> might be possible to have a lisp ordering function implement BDB's </FONT>
<FONT COLOR="#000000">>> notion of sorting by registering it as a callback, however it </FONT>
<FONT COLOR="#000000">>> would have to deserialize the BDB values each time. So the </FONT>
<FONT COLOR="#000000">>> problems with this are both stability concerns for foreign </FONT>
<FONT COLOR="#000000">>> callbacks and the performance impact of serialization/ </FONT>
<FONT COLOR="#000000">>> deserialization for internal BDB operations. On the cleanliness/ </FONT>
<FONT COLOR="#000000">>> performance axis, I think the current approach is the right </FONT>
<FONT COLOR="#000000">>> tradeoff (it's the original one Ben made, FYI). On Jul 31, 2007, </FONT>
<FONT COLOR="#000000">>> at 12:50 PM, Robert L. Read wrote: > Personally, I think the only </FONT>
<FONT COLOR="#000000">>> sensible way to handle this problem is > to require the user to > </FONT>
<FONT COLOR="#000000">>> specify an ordering function. We can of course provide a default, </FONT>
<FONT COLOR="#000000">>> > which will be error-prone > but tend to work most of the time. > </FONT>
<FONT COLOR="#000000">>> > The function called "my-generic-less-than" which is in the </FONT>
<FONT COLOR="#000000">>> source > tree now could be > a starting point for a generic </FONT>
<FONT COLOR="#000000">>> ordering. > > > On Tue, 2007-07-24 at 09:48 -0400, Ian Eslick </FONT>
<FONT COLOR="#000000">>> wrote: >> Robert and I have had some extended discussions on </FONT>
<FONT COLOR="#000000">>> ordering in >> indices. I think that all we really need to agree </FONT>
<FONT COLOR="#000000">>> on is _some_ >> canonical ordering. If we have mixed types in an </FONT>
<FONT COLOR="#000000">>> index, how should >> they be ordered relative to each other? In </FONT>
<FONT COLOR="#000000">>> BDB we have a C >> function which implements the ordering based on </FONT>
<FONT COLOR="#000000">>> the type tag and >> then based on the type within it. Are you </FONT>
<FONT COLOR="#000000">>> relying on a pure binary >> sort in postmodern? Robert or I will </FONT>
<FONT COLOR="#000000">>> get to submitting that patch >> shortly. I have recently sent in a </FONT>
<FONT COLOR="#000000">>> patch to lisp-compare<= so >> we'll see if we had to make parallel </FONT>
<FONT COLOR="#000000">>> changes. Thanks, Ian On Jul >> 24, 2007, at 3:50 AM, Henrik Hjelte </FONT>
<FONT COLOR="#000000">>> wrote: > I sent this message >> yesterday but I guess it got stuck </FONT>
<FONT COLOR="#000000">>> in the mailing > list filter. >> Perhaps the attachment was too </FONT>
<FONT COLOR="#000000">>> big. Since my > common-lisp.net >> user hhjelte does not have </FONT>
<FONT COLOR="#000000">>> write access to elephant I > have >> placed the patches from here </FONT>
<FONT COLOR="#000000">>> instead: > darcs get <A HREF="http://common">http://common</A>- >> lisp.net/project/grand-prix/ </FONT>
<FONT COLOR="#000000">>> darcs/elephant > > ---------- >> Forwarded message ---------- > </FONT>
<FONT COLOR="#000000">>> From: Henrik Hjelte >> <<A HREF="mailto:henrik@evahjelte.com">henrik@evahjelte.com</A>> > Date: Jul 23, 2007 </FONT>
<FONT COLOR="#000000">>> 11:28 PM > Subject: >> some patches > To: elephant-devel@common- </FONT>
<FONT COLOR="#000000">>> lisp.net > > > Here are >> some darcs patches that might be of </FONT>
<FONT COLOR="#000000">>> interest. I had some > >> problems with map-index on db-postmodern </FONT>
<FONT COLOR="#000000">>> that made me almost rip >> my > hair of, but finally I made it to </FONT>
<FONT COLOR="#000000">>> work again. The problem is >> that > map-index for a string value </FONT>
<FONT COLOR="#000000">>> rely on the ordering in the >> btree > (continue-p makes use of </FONT>
<FONT COLOR="#000000">>> less than for strings). The >> postmodern > backend relies on how </FONT>
<FONT COLOR="#000000">>> the database backend orders >> things, which is not > always the </FONT>
<FONT COLOR="#000000">>> same thing. Is it a necessary >> feature that b-trees of > string </FONT>
<FONT COLOR="#000000">>> and objects are required to be >> ordered by lisp-compare<=? > > </FONT>
<FONT COLOR="#000000">>> In the process of solving the bug I >> have upgraded the test </FONT>
<FONT COLOR="#000000">>> framework > to use FiveAM instead of RT, It >> has in my opinion a </FONT>
<FONT COLOR="#000000">>> very nice syntax > and some useful features to >> track </FONT>
<FONT COLOR="#000000">>> dependencies between tests. I hope > you agree that it >> improves </FONT>
<FONT COLOR="#000000">>> on things. > > /Henrik Hjelte > >> </FONT>
<FONT COLOR="#000000">>> _______________________________________________ > elephant-devel </FONT>
<FONT COLOR="#000000">>> >> site list > <A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A> > <A HREF="http://common">http://common</A>- >> </FONT>
<FONT COLOR="#000000">>> lisp.net/mailman/listinfo/elephant-devel >> </FONT>
<FONT COLOR="#000000">>> _______________________________________________ elephant-devel >> </FONT>
<FONT COLOR="#000000">>> site list <A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A> <A HREF="http://common-lisp.net/">http://common-lisp.net/</A> </FONT>
<FONT COLOR="#000000">>> >> mailman/listinfo/elephant-devel > </FONT>
<FONT COLOR="#000000">>> _______________________________________________ > elephant-devel </FONT>
<FONT COLOR="#000000">>> site list > <A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A> > <A HREF="http://common">http://common</A>- </FONT>
<FONT COLOR="#000000">>> lisp.net/mailman/listinfo/elephant-devel </FONT>
<FONT COLOR="#000000">>> _______________________________________________ elephant-devel </FONT>
<FONT COLOR="#000000">>> site list <A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A> <A HREF="http://common-lisp.net/">http://common-lisp.net/</A> </FONT>
<FONT COLOR="#000000">>> mailman/listinfo/elephant-devel</FONT>
<FONT COLOR="#000000">> _______________________________________________</FONT>
<FONT COLOR="#000000">> elephant-devel site list</FONT>
<FONT COLOR="#000000">> <A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A></FONT>
<FONT COLOR="#000000">> <A HREF="http://common-lisp.net/mailman/listinfo/elephant-devel">http://common-lisp.net/mailman/listinfo/elephant-devel</A></FONT>
<FONT COLOR="#000000">_______________________________________________</FONT>
<FONT COLOR="#000000">elephant-devel site list</FONT>
<FONT COLOR="#000000"><A HREF="mailto:elephant-devel@common-lisp.net">elephant-devel@common-lisp.net</A></FONT>
<FONT COLOR="#000000"><A HREF="http://common-lisp.net/mailman/listinfo/elephant-devel">http://common-lisp.net/mailman/listinfo/elephant-devel</A></FONT>
</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>