[elephant-devel] Lisp Btrees: design considerations
Ian Eslick
eslick at media.mit.edu
Fri May 16 15:26:56 UTC 2008
>
I just wanted to clarify a few points in this interesting discussion.
> but if we are using high-level language like Lisp, it's much more
> likely that CPU overhead will be large comparing to memory latency.
> and as compression, even simpiest/fastest one will only add to CPU
> overhead, i'm very skeptical about it's benefits.
I doubt that using a Lisp with a modern compiler affects CPU time in
any interesting way; I've observed that algorithm implementation with
reasonable data structure choices is within 10's of % points of the
same algorithm in C. I don't care how fast your CPU is, it's the data
access patterns that dominate time unless you're working on small or
highly local datasets, then the inner loop speed _might_ matter.
> by the way, interesting fact: some time ago i've found that even
> with cache enabled in elehant/postmodern
> slots reads are relatively slow -- on 10 thousands per second scale
> or so. it was quite surprising
> that abusing function is form-slot-key (that is used in CL-SQL too):
At least in Allegro, the profiler only counts thread time and that
much of the delay shown by 'time' vs. the profiler is taken up waiting
on cache-miss IO. It may be that thread time is only a handful of
percentage points of the total time?
> (defun form-slot-key (oid name)
> (format nil "~A ~A" oid name))
Format is notoriously slow. Still, I find it hard to believe that it
matters that much. (See above)
> and even more surprising was to find out that there is no portable
> and fast way to convert interger to string.
> so i had to use SBCL's internal function called sb-impl::quick-
> integer-to-string -- apparently SBCL
> developers knew about this problem so they made this function for
> themselves.
For fun, I did a test with princ and for integers only it's 2x faster,
for the above idiom it's only about 50% faster, but some of that is
the with-string-stream idea. If you do it manually as they did and
write it to an array one character at a time it can get pretty fast,
but probably only 3x-4x.
> so i think that subtle effects like I/O bandwidth will be only seen
> in we'll hardcore optimize elephant and applications themselves.
> but as it is now, storage backend won't be of a big influence of
> overal performance and just needs to be moderately good
I agree with this in general. However I feel like this whole
discussion is premature optimization being done while blindfolded. I
do think that over time it would be nice to give people tools like
compression to enable/disable for their particular applications, but
it's probably not worth all the debate until we solve some of the
bigger problems in our copious free time.
By the way, if you are caching your data in memory and using the most
conservative transaction model of BDB (blocking commits) you are write
bandwidth limited. At the end of each little transaction, you have to
write a log entry to disk, then flush a 16kb page to disk, then wait
for the controller to say OK, then write the log again, then return to
your regularly scheduled programming. In prevalence, you only write
the log and occasionally dump to disk. I suspect that the SQL back
ends function similarly. When you finish a transaction you have to
wait for this to all happen over a socket...
Prevalence is much, much faster because you don't have to flush data
structures on each commit, so cl-prevalence performance with
Elephant's data and transaction abstractions would be a really nice
design point. I wonder if we would get some of this benefit from a
Rucksack adaptation?
Ian
More information about the elephant-devel
mailing list