[elephant-devel] Lisp Btrees: design considerations

Wed May 14 15:36:16 UTC 2008

 IE> I was quite wrong-headed on the performance point.  The biggest time
 IE> cost in database operations is time spent moving data to/from disk, in
 IE> comparison CPU time is quite small.

this depends on size of a database -- relatively small ones can be cached in 
RAM, and with asynchronous writes disk won't be a bottleneck.
also note that RAM gets larger and cheaper each year, so current "relatively 
small ones", on scale of several gigabytes, were considered big some years 
ago.

also database are often optimized to minimize HDD seek times. but for now we 
have quite affordable Solid State Drives which do not have any seek time 
overhead at all and are pretty fast, so such optimization do not have any 
sense anymore.

also elephant's workload is likely to be very different from relational 
database's one -- with RDMBS people often try to do as much work as possible 
with single optimized query, while Elephant users tend to do lots of small 
queries -- because it's, um, easy.  also Elephant lacks optimization tricks 
that RDBMS can do, so it has to rely on crude processing power.

so i believe that Elephant will be CPU-bound for most of uses/users, and any 
complications like compression will make it slower.

people who will use Elephant for storage and rapid retrieval of some 
terabytes of text will find compression a cool feature, but i seriously 
doubt that there are many people who consider elephant a tool for such 
stuff.

if you want to optimize something, better make sure that small objects, like 
numbers, object pointers, small strings etc. can be stored/retrieved with as 
little overhead as possible -- i think most values in database are such.