[elephant-devel] Lisp Btrees: design considerations

Fri May 16 19:50:39 UTC 2008

 IE> I doubt that using a Lisp with a modern compiler affects CPU time in
 IE> any interesting way;
 IE> I've observed that algorithm implementation with reasonable data
 IE> structure choices is within 10's of % points of the same algorithm in
 IE> C.

it's possible to optimize single function, but typical (idiomatic) Lisp 
programs:
 * generate megatons of garbage
 * use dynamic type dispatch (lacking type declarations)
and often have signficantly inferior performance.

 IE>  I don't care how fast your CPU is, it's the data access patterns that
 IE> dominate time unless you're working on small or highly local datasets,
 IE> then the inner loop speed _might_ matter.

i'm assuming  you're speaking about memory bandwidth, because i think
disk I/O will matter only on relatively large databases.

memory bandwidth is also eaten in large quantities by garbage generator.

let's take a concrete piece of software: Elephant's serializer. i'm assuming 
it's important piece of code and was optimized a lot?

i'm taking test case from code Robert once have posted, little bit modifed:

(defparameter *teststring* "supercalifragiliciousexpialidocious")
(defun dotest (n s)
      (dotimes (x n)
        (in-out-value s)))

ELE-TESTS> (time (dotest (truncate (* 1000 1000) (length *teststring*)) 
*teststring*))
Evaluation took:
  0.249 seconds of real time
  0.248016 seconds of user run time
  0.0 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  6,398,328 bytes consed.

;; testing on core2 2.1 GHz, SBCL 1.0.13

not that great, ain't it?
it processed roughly 10 megabytes of raw string data per second (5 MB 
serialized, 5 MB deserialized) -- that's 10 times slower than linear read 
spead of modern HDDs.
and for 1 MB of data it produced 2 MB of garbage.

or there is something wrong with my test?

 ??>> that abusing function is form-slot-key (that is used in CL-SQL too):

 IE> At least in Allegro, the profiler only counts thread time and that
 IE> much of the delay shown by 'time' vs. the profiler is taken up waiting
 IE> on cache-miss IO.  It may be that thread time is only a handful of
 IE> percentage points of the total time?

i'm pretty sure there was very little I/O in my tests.

 ??>> and even more surprising was to find out that there is no portable
 ??>> and fast way to convert interger to string.
 ??>> so i had to use SBCL's internal function called sb-impl::quick-
 ??>> integer-to-string -- apparently SBCL
 ??>> developers knew about this problem so they made this function for
 ??>> themselves.

 IE> For fun, I did a test with princ and for integers only it's 2x faster,
 IE> for the above idiom it's only about 50% faster, but some of that is
 IE> the with-string-stream idea.   If you do it manually as they did and
 IE> write it to an array one character at a time it can get pretty fast,
 IE> but probably only 3x-4x.

maybe they fixed something in SBCL? i'm using version 1.0.13 and get very 
different results, with functions that are actually in elephant
(db-postmodern's function is optimized, db-clsql's is not):

CL-USER> (defun dotest-pm (n o)
    (loop for i from 0 below n
              summing (length
                                (db-postmodern::form-slot-key i o))))
DOTEST-PM

CL-USER> (defun dotest-clsql (n o)
                       (loop for i from 0 below n
                                summing (length
             (db-clsql::form-slot-key i o))))
DOTEST-CLSQL

CL-USER> (time (dotest-pm 100000 'o))
Evaluation took:
  0.05 seconds of real time
  0.052003 seconds of user run time
  0.0 seconds of system run time
  [Run times include 0.024 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  5,592,976 bytes consed.
688890
CL-USER> (time (dotest-clsql 100000 'o))
Evaluation took:
  1.502 seconds of real time
  1.416088 seconds of user run time
  0.084005 seconds of system run time
  [Run times include 0.7 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  404,779,792 bytes consed.
688890

optimized version seems to be 30 times faster.

 IE> I agree with this in general.  However I feel like this whole
 IE> discussion is premature optimization being done while blindfolded.

well, you guys started it discussing compression and other aspects :)

 IE> Prevalence is much, much faster because you don't have to flush data
 IE> structures on each commit, so cl-prevalence performance with
 IE> Elephant's data and transaction abstractions would be a really nice
 IE> design point.

yep, that would be very interesting

 IE>   I wonder if we would get some of this benefit from a Rucksack
 IE> adaptation?

perhaps there is a way to hack it one way or another.
one (high-level) way is to serialize dirty objects into different location 
(log) in some customized format that will allow us bring them back to normal 
data store later.
another (low-level) way is to hijack heap (as far as i understood from a 
brief look rucksack does all reads/writes via heap object) to
log changes instead of writing them to disk.

also it might be good to use mmaped files -- OS can decide when to write 
changes by itself, while we could write log for safety.
but mmaped files are definitely not pure Lisp :(