[elephant-devel] Sorting
Ian Eslick
eslick at csail.mit.edu
Mon Oct 22 17:18:36 UTC 2007
There have been a number of discussions about sorting issues and
guarantees among the various data store implementations. This is
difficult due to different backend serialized representations. This
is my understanding of the current guarantees. Let's commit to this
as part of locking down 0.9.1.
Any ordered data structure, such as btree or indexed-btrees, has to
make some guarantees about the function that defines the ordering.
Currently, the BDB data store and Elephant core implements the
following policy (I believe this is also true for CL-SQL stores as
well):
General policies:
In a BTree with multiple types, each type is sorted as an independent
subsequence. Types are guaranteed to be separated, but the order of
the type sub-sequences is not guaranteed.
In an Index with multiple values, there is no guarantee about the
ordering of duplicate values, the the index is treated as a sorted
sequence of unordered equivalence classes. Tests and users should
not depend on this.
Specific types:
Numeric types (fixnum, bignum, float, double, rational and complex):
Sorted properly over the continuous real numbers, smallest to biggest
Characters:
Currently considered and sorted in with all the above numeric types
(this is probably wrong)
Strings:
Sorted alphabetically, case-INSENSITIVE
Symbols/Pathnames:
Sorted as strings above, but symbols and paths are sorted
independently as type classes.
Persistent objects are sorted according to OID, but the OID ordering
may change at any time, so long as object equivalency is maintained
(for example, if we implement GC). Users should not depend on the
OID for anything other than uniqueness unless they are implemented a
new low-level persistent-collection data structure like (N:1 and N:N)
associations that is GC aware. Otherwise the order can only be
guaranteed within a given transaction.
All other types have no user-visible ordering guarantees. They have
an arbitrary ordering within the data store, but there is no lisp
ordering that makes sense for hash tables, standard objects, etc.
(You can create derived indices of an indexed-btree that compute a
number, string, etc that can serve as an ordering for arbitrary
objects).
Open issues:
- Characters as independent class, or as numeric codes?
- If/how to map over a type subset in an index or btree
Anything I missed?
Ian
More information about the elephant-devel
mailing list