[elephant-devel] Lisp Btrees: design considerations

Ian Eslick eslick at media.mit.edu
Fri May 16 15:26:56 UTC 2008


>

I just wanted to clarify a few points in this interesting discussion.

> but if we are using high-level language like Lisp, it's much more  
> likely that CPU overhead will be large comparing to memory latency.
> and as compression, even simpiest/fastest one will only add to CPU  
> overhead, i'm very skeptical about it's benefits.

I doubt that using a Lisp with a modern compiler affects CPU time in  
any interesting way; I've observed that algorithm implementation with  
reasonable data structure choices is within 10's of % points of the  
same algorithm in C.  I don't care how fast your CPU is, it's the data  
access patterns that dominate time unless you're working on small or  
highly local datasets, then the inner loop speed _might_ matter.

> by the way, interesting fact: some time ago i've found that even  
> with cache enabled in elehant/postmodern
> slots reads are relatively slow -- on 10 thousands per second scale  
> or so. it was quite surprising
> that abusing function is form-slot-key (that is used in CL-SQL too):

At least in Allegro, the profiler only counts thread time and that  
much of the delay shown by 'time' vs. the profiler is taken up waiting  
on cache-miss IO.  It may be that thread time is only a handful of  
percentage points of the total time?

> (defun form-slot-key (oid name)
> (format nil "~A ~A" oid name))

Format is notoriously slow.  Still, I find it hard to believe that it  
matters that much.  (See above)

> and even more surprising was to find out that there is no portable  
> and fast way to convert interger to string.
> so i had to use SBCL's internal function called sb-impl::quick- 
> integer-to-string -- apparently SBCL
> developers knew about this problem so they made this function for  
> themselves.

For fun, I did a test with princ and for integers only it's 2x faster,  
for the above idiom it's only about 50% faster, but some of that is  
the with-string-stream idea.   If you do it manually as they did and  
write it to an array one character at a time it can get pretty fast,  
but probably only 3x-4x.

> so i think that subtle effects like I/O bandwidth will be only seen  
> in we'll hardcore optimize elephant and applications themselves.
> but as it is now, storage backend won't be of a big influence of  
> overal performance and just needs to be moderately good

I agree with this in general.  However I feel like this whole  
discussion is premature optimization being done while blindfolded.  I  
do think that over time it would be nice to give people tools like  
compression to enable/disable for their particular applications, but  
it's probably not worth all the debate until we solve some of the  
bigger problems in our copious free time.

By the way, if you are caching your data in memory and using the most  
conservative transaction model of BDB (blocking commits) you are write  
bandwidth limited.  At the end of each little transaction, you have to  
write a log entry to disk, then flush a 16kb page to disk, then wait  
for the controller to say OK, then write the log again, then return to  
your regularly scheduled programming.  In prevalence, you only write  
the log and occasionally dump to disk.  I suspect that the SQL back  
ends function similarly.  When you finish a transaction you have to  
wait for this to all happen over a socket...

Prevalence is much, much faster because you don't have to flush data  
structures on each commit, so cl-prevalence performance with  
Elephant's data and transaction abstractions would be a really nice  
design point.  I wonder if we would get some of this benefit from a  
Rucksack adaptation?

Ian





More information about the elephant-devel mailing list