[elephant-devel] Lisp Btrees: design considerations
Alex Mizrahi
killerstorm at newmail.ru
Thu May 15 07:39:14 UTC 2008
RLR> The purpose of compression is of course NOT to make the store smaller.
RLR> It is to decrease the amount of I/O that is required to access the
RLR> store.
time to do I/O does not linearly depend on I/O amount.
for example, HDD reads data in blocks, with each block definitely not less
than 512 bytes (most likely larger due to HDD doing read-aheads).
if you compress your data into 50 bytes, 10 times, it won't make I/O anyhow
faster.
moreover, random reads for HDD are orders (!) of magnitude slower than
linear reads -- time to reposition device head is larger than time reading
consecutive blocks.
i can show this in numbers, just checked on iometer, doing random reads:
512 byte blocks: 106 operations per second (~10 ms seek time), 0.05 MiB/sec
64 KiB blocks: 95 operations per second (~10 ms seek time), 5.95 MiB/sec
you see, we have 120 times (!) more I/O, but very low difference in number
of operations per second.
so compression only will matter to people who store at least hundred KiB as
their values.
or it will matter if people are doing lots of linear reads -- but it's hard
for me to imagine such database access patterns.
RLR> there would be some databases for which it might be CPU-bound. It is
RLR> hard for me to imagine this is true of most databases.
then it's worth defining what is "most databases" in case of elephant --
optimization without having a concrete target absolutely has no sense.
IMHO most people will be working with databases smaller than few
gigabytes --
so read I/O has no sense for them (DDR reads are on scale of gigabytes per
second),
write I/O might matter, though.
RLR> If one imagines an integer-valued slot with a Poisson distribution of
RLR> values, for example having an average value of less than a thousand
RLR> but still occasionally containing a much large number, then one can
RLR> expect a simple codec that employs this fact to use about 10 bits for
RLR> each value, and 30 bits for the outliers. The result is again a
RLR> 3-fold reduction in I/O.
bits, huh? what storage nowadays works on bit level?
even cache lines are 32 bytes wide, so make no difference to read 1 byte or
32 of them.
RLR> Object pointers may of course cross the entire space in theory, but
RLR> OIDs that we store in the database may indeed come from a set of quite
RLR> knowable size, allowing us to use less than 64 bits or 32 bits to
RLR> represent them. If we know the type of a slot to be only OIDs of a
RLR> particular class, we may very well know that that class has only 2**20
RLR> members, and therefore demands only 20 bits, as opposed to 64, to
RLR> represent the entire space of possible OIDs.
if you expect database being large enough for I/O to matter, why do you
think then that there would be that few objects?
More information about the elephant-devel
mailing list