[elephant-devel] Sorting
Robert L. Read
read at robertlread.net
Mon Oct 22 19:55:42 UTC 2007
I think this is the best policy that we can generate.
I personally don't like it, as I've mentioned before. I don't think
we should have an arbitrary order. If you don't have a specified order,
then you don't really have a btree in meaningful sense. You can't do a
range query over something that is arbitrary, and if you don't have a
range query, why call it a btree? You should just call it a dictionary,
(which happens to be implemented with btrees) equip it with an
enumeration, and have something just as efficient and simpler than a
btree, which furthermore decreases the chances of someone accidentally
depending on the order, since you never provide a way to utilize it
(although of course humans, being human, will accidentally depend on the
ordering of objects returned by the enumeration, so the problem never
goes away from a software engineering point of view.)
However, I don't have time to reorganize things to present a
"dictionary" in addition to a "btree" abstraction; and even if someone
volunteered to do this work, it would be worthy of a new release, rather
than a discussion of the current release. And since Ian has done much
more work than I have in the last year, I will defer to his judgment.
Is there a reason that string sorting is case insensitive? This seems
like asking for trouble --- that's something that people are likely to
depend on and be surprised when they switch repositories and find it is
different.
On Mon, 2007-10-22 at 13:18 -0400, Ian Eslick wrote:
> There have been a number of discussions about sorting issues and
> guarantees among the various data store implementations. This is
> difficult due to different backend serialized representations. This
> is my understanding of the current guarantees. Let's commit to this
> as part of locking down 0.9.1.
>
> Any ordered data structure, such as btree or indexed-btrees, has to
> make some guarantees about the function that defines the ordering.
> Currently, the BDB data store and Elephant core implements the
> following policy (I believe this is also true for CL-SQL stores as
> well):
>
> General policies:
>
> In a BTree with multiple types, each type is sorted as an independent
> subsequence. Types are guaranteed to be separated, but the order of
> the type sub-sequences is not guaranteed.
>
> In an Index with multiple values, there is no guarantee about the
> ordering of duplicate values, the the index is treated as a sorted
> sequence of unordered equivalence classes. Tests and users should
> not depend on this.
>
> Specific types:
>
> Numeric types (fixnum, bignum, float, double, rational and complex):
> Sorted properly over the continuous real numbers, smallest to biggest
>
> Characters:
> Currently considered and sorted in with all the above numeric types
> (this is probably wrong)
>
> Strings:
> Sorted alphabetically, case-INSENSITIVE
>
> Symbols/Pathnames:
> Sorted as strings above, but symbols and paths are sorted
> independently as type classes.
>
> Persistent objects are sorted according to OID, but the OID ordering
> may change at any time, so long as object equivalency is maintained
> (for example, if we implement GC). Users should not depend on the
> OID for anything other than uniqueness unless they are implemented a
> new low-level persistent-collection data structure like (N:1 and N:N)
> associations that is GC aware. Otherwise the order can only be
> guaranteed within a given transaction.
>
> All other types have no user-visible ordering guarantees. They have
> an arbitrary ordering within the data store, but there is no lisp
> ordering that makes sense for hash tables, standard objects, etc.
> (You can create derived indices of an indexed-btree that compute a
> number, string, etc that can serve as an ordering for arbitrary
> objects).
>
> Open issues:
> - Characters as independent class, or as numeric codes?
> - If/how to map over a type subset in an index or btree
>
>
> Anything I missed?
>
> Ian
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list