[elephant-devel] Lisp Btrees: design considerations

Thu May 15 03:15:32 UTC 2008

I feel compelled to respectfully disagree with Alex's views stated
below.

The purpose of compression is of course NOT to make the store smaller.
It is to decrease the amount of I/O that is required to access the
store.  I think this is where most of the time is spent.  I suspect Alex
is correct that even if Elephant were coded perfectly (which it is not),
there would be some databases for which it might be CPU-bound.  It is
hard for me to imagine this is true of most databases.

I would like to make three points.  Firstly, irrespective of the
ever-changing ratio of I/O costs within the storage hierarchy, smaller
will always be faster.  Solid state drives and gigantic RAMs don't
change this; they merely move the benefit by allowing one to store more
data inside the the cache.  As technology improves, this cache can be
thought of as RAM caching disk, or fast ram caching slow ram, or
register sets caching fast ram.  The fact that locality of reference is
a positive good for performance remains an iron-clad law as long as you
actually have a storage hierarchy with different levels of smaller and
faster vs. bigger and slower storage.  All of my professional and
academic experience leads me to believe that any database application is
I/O bound and finding a way to trade a constant amount of CPU time to
decrease I/O load is a good idea.

Secondly the compression that we're talking about must not be "optimal"
compression that seeks maximum compression.  It is instead something
that can be efficiently computed, yet saves space.

Finally, the examples that you name, small objects, integers, small
strings, and object pointers, are precisely those objects for which it 
is efficient to compute compact representations.

For example, simple frequency-based Huffman encoding based on character
frequency will give about a factor of 3 compression on strings.

If one imagines an integer-valued slot with a Poisson distribution of
values, for example having an average value of less than a thousand but
still occasionally containing a much large number, then one can expect a
simple codec that employs this fact to use about 10 bits for each value,
and 30 bits for the outliers.  The result is again a 3-fold reduction in
I/O.

Object pointers may of course cross the entire space in theory, but OIDs
that we store in the database may indeed come from a set of quite
knowable size, allowing us to use less than 64 bits or 32 bits to
represent them.  If we know the type of a slot to be only OIDs of a
particular class, we may very well know that that class has only 2**20
members, and therefore demands only 20 bits, as opposed to 64, to
represent the entire space of possible OIDs.

If one imagines a page-based system that brings in a group of objects in
one page, then getting 3 times as many objects for the same I/O is a
huge overall win.

However, my point here was that I wanted to stay in "pure" LISP so that
I/we would have the power to do this.  (I understand now that Ian was
not proposing otherwise.)  Whether it will work is somewhat academic,
because it should certainly be deployed as a "strategy" -- that is, the
system either decides based on statistics to employ compression for a
given slot, or you declare a slot to be :poisson 1000 or :english-text
or something to suggest the most efficient codec.

As you point out, developing a type-specific codec, that might do a very
good job compressing and improving performance based upon specialized
knowledge of a particular application-dependent datatype may be quite
rare, but I think it is an important ability.

Therefore, since we should undoubtedly just develop a plain B+Tree
first, and any implementation of compression should be a strategy-based
approach, this argument is somewhat academic.

However, in theory someone could write a compressing serializer right
now, independently of implementing a pure-lisp Btree-based backend.
For example, the serializer that Rucksack has written could be inserted
in place of our existing serializer (or even set a configurable option)
if we found it to be a better codec.

On Wed, 2008-05-14 at 18:36 +0300, Alex Mizrahi wrote:
> IE> I was quite wrong-headed on the performance point.  The biggest
> time
>  IE> cost in database operations is time spent moving data to/from
> disk, in
>  IE> comparison CPU time is quite small.
> 
> this depends on size of a database -- relatively small ones can be
> cached in 
> RAM, and with asynchronous writes disk won't be a bottleneck.
> also note that RAM gets larger and cheaper each year, so current
> "relatively 
> small ones", on scale of several gigabytes, were considered big some
> years 
> ago.
> 
> also database are often optimized to minimize HDD seek times. but for
> now we 
> have quite affordable Solid State Drives which do not have any seek
> time 
> overhead at all and are pretty fast, so such optimization do not have
> any 
> sense anymore.
> 
> also elephant's workload is likely to be very different from
> relational 
> database's one -- with RDMBS people often try to do as much work as
> possible 
> with single optimized query, while Elephant users tend to do lots of
> small 
> queries -- because it's, um, easy.  also Elephant lacks optimization
> tricks 
> that RDBMS can do, so it has to rely on crude processing power.
> 
> so i believe that Elephant will be CPU-bound for most of uses/users,
> and any 
> complications like compression will make it slower.
> 
> people who will use Elephant for storage and rapid retrieval of some 
> terabytes of text will find compression a cool feature, but i
> seriously 
> doubt that there are many people who consider elephant a tool for
> such 
> stuff.
> 
> if you want to optimize something, better make sure that small
> objects, like 
> numbers, object pointers, small strings etc. can be stored/retrieved
> with as 
> little overhead as possible -- i think most values in database are
> such.