[elephant-devel] Sorting
Ian Eslick
eslick at csail.mit.edu
Mon Oct 22 19:45:34 UTC 2007
Lisp, by the way, only allows comparisons between characters in terms
of their code. So sorting them as one of numerous numeric values, or
independently by code, is adding to the lisp spec, not violating it.
Ian
On Oct 22, 2007, at 1:18 PM, Ian Eslick wrote:
> There have been a number of discussions about sorting issues and
> guarantees among the various data store implementations. This is
> difficult due to different backend serialized representations.
> This is my understanding of the current guarantees. Let's commit
> to this as part of locking down 0.9.1.
>
> Any ordered data structure, such as btree or indexed-btrees, has to
> make some guarantees about the function that defines the ordering.
> Currently, the BDB data store and Elephant core implements the
> following policy (I believe this is also true for CL-SQL stores as
> well):
>
> General policies:
>
> In a BTree with multiple types, each type is sorted as an
> independent subsequence. Types are guaranteed to be separated, but
> the order of the type sub-sequences is not guaranteed.
>
> In an Index with multiple values, there is no guarantee about the
> ordering of duplicate values, the the index is treated as a sorted
> sequence of unordered equivalence classes. Tests and users should
> not depend on this.
>
> Specific types:
>
> Numeric types (fixnum, bignum, float, double, rational and complex):
> Sorted properly over the continuous real numbers, smallest to biggest
>
> Characters:
> Currently considered and sorted in with all the above numeric types
> (this is probably wrong)
>
> Strings:
> Sorted alphabetically, case-INSENSITIVE
>
> Symbols/Pathnames:
> Sorted as strings above, but symbols and paths are sorted
> independently as type classes.
>
> Persistent objects are sorted according to OID, but the OID
> ordering may change at any time, so long as object equivalency is
> maintained (for example, if we implement GC). Users should not
> depend on the OID for anything other than uniqueness unless they
> are implemented a new low-level persistent-collection data
> structure like (N:1 and N:N) associations that is GC aware.
> Otherwise the order can only be guaranteed within a given transaction.
>
> All other types have no user-visible ordering guarantees. They
> have an arbitrary ordering within the data store, but there is no
> lisp ordering that makes sense for hash tables, standard objects,
> etc. (You can create derived indices of an indexed-btree that
> compute a number, string, etc that can serve as an ordering for
> arbitrary objects).
>
> Open issues:
> - Characters as independent class, or as numeric codes?
> - If/how to map over a type subset in an index or btree
>
>
> Anything I missed?
>
> Ian
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list