[elephant-devel] Fwd: some patches

Henrik Hjelte henrik at evahjelte.com
Wed Aug 1 08:23:24 UTC 2007


On 7/31/07, Ian Eslick <eslick at csail.mit.edu> wrote:
> The practical problem that led to the current design of index sorting
> is that we cannot use lisp code to define the sorting function for
> serialized values inside BDB Btrees (same problem I imagine that
> Henrik had with postmodern).

Exactly, the postmodern backend relies on how postgresql sorts things.
And when I changed strings to be saved as blobs instead of strings
in order to overcome a limit on the length of strings, I got a problem
with map-index which relies on the ordering of values. I have reverted
the change now, which means that keys as strings are truncated
if they are longer than 2000 characters.

 Instead, there is a hairy custom C
> procedure that is registered with BDB that parses the serialized
> format so that sorting is done first by type (symbol, string, object,
> pobject, etc) followed by ordering within numeric types, strings and
> symbols.  Everything else is ordered based on the byte ordering of
> its serialized representation.
In postmodern, integers and strings are sorted according to how
postgresql sorts things. Objects are sorted by order of creation, older
first, and equal serialized representation gets the same number.

> To map across indices correctly, we need to know up front whether the
> start value is less than the end value.

Yes now, but in the the common special case where you want to map over
the same value you shouldn't actually need a sorting. You could just
let the backend position the cursor on the value where the key is the
value you want, and move
the cursor over all duplicates. For this case, you need not in theory have an
ordering of the values, only the ability to compare equality.

But this is no longer a big problem, since I reverted how postmodern works.
However if you use objects as keys in btree, relying on the ordering of their
serialized representation might not get what you expect when you query
for identical objects. This problem is shared with the BDB backend.

> Ideally postmodern would have a similar sorting function that
> properly interprets the serialized format just like the BDB function
> does.
It is at least quite similar.

> I think it's best to have a single standard ordering that is as close
> to lisp's notion of ordering as possible so we don't have to maintain
> different orderings.
I agree.

/Henrik



More information about the elephant-devel mailing list