[elephant-devel] Lisp Btrees: design considerations

Thu May 15 07:39:14 UTC 2008

 RLR> The purpose of compression is of course NOT to make the store smaller.
 RLR> It is to decrease the amount of I/O that is required to access the
 RLR> store.

time to do I/O does not linearly depend on I/O amount.

for example, HDD reads data in blocks, with each block definitely not less 
than 512 bytes (most likely larger due to HDD doing read-aheads).
if you compress your data into 50 bytes, 10 times, it won't make I/O anyhow 
faster.

moreover, random reads for HDD are orders (!) of magnitude slower than 
linear reads -- time to reposition device head is larger than time reading 
consecutive blocks.

i can show this in numbers, just checked on iometer, doing random reads:
512 byte blocks: 106 operations per second (~10 ms seek time), 0.05 MiB/sec
64 KiB   blocks:  95 operations per second (~10 ms seek time), 5.95 MiB/sec

you see, we have 120 times (!) more I/O, but very low difference in number 
of operations per second.
so compression only will matter to people who store at least hundred KiB as 
their values.

or it will matter if people are doing lots of linear reads -- but it's hard 
for me to imagine such database access patterns.

 RLR> there would be some databases for which it might be CPU-bound.  It is
 RLR> hard for me to imagine this is true of most databases.

then it's worth defining what is "most databases" in case of elephant --
 optimization without having a concrete target absolutely has no sense.

IMHO most people will be working with databases smaller than few 
gigabytes --
 so read I/O has no sense for them (DDR reads are on scale of gigabytes per 
second),
 write I/O might matter, though.

 RLR> If one imagines an integer-valued slot with a Poisson distribution of
 RLR> values, for example having an average value of less than a thousand
 RLR> but still occasionally containing a much large number, then one can
 RLR> expect a simple codec that employs this fact to use about 10 bits for
 RLR> each value, and 30 bits for the outliers.  The result is again a
 RLR> 3-fold reduction in I/O.

bits, huh? what storage nowadays works on bit level?
even cache lines are 32 bytes wide, so make no difference to read 1 byte or 
32 of them.

 RLR> Object pointers may of course cross the entire space in theory, but
 RLR> OIDs that we store in the database may indeed come from a set of quite
 RLR> knowable size, allowing us to use less than 64 bits or 32 bits to
 RLR> represent them.  If we know the type of a slot to be only OIDs of a
 RLR> particular class, we may very well know that that class has only 2**20
 RLR> members, and therefore demands only 20 bits, as opposed to 64, to
 RLR> represent the entire space of possible OIDs.

if you expect database being large enough for I/O to matter, why do you 
think then that there would be that few objects?