[elephant-devel] Lisp Btrees: design considerations

Ian Eslick eslick at media.mit.edu
Fri May 16 23:02:39 UTC 2008


I should also say that the serializer has to be thread safe, so there  
is some non-trivial time overhead in terms of locking on each  
serializer call.  I wish I could do primitive atomic ops or at least  
something like a futex in common lisp, then I could create deadlock  
free data structures and dispense with this overhead....

Ian

On May 16, 2008, at 6:54 PM, Ian Eslick wrote:

> I don't disagree with you that it's easy to create expensive/slow  
> programs.  My only point was that when it matters, it doesn't  
> necessarily have to be a determinative issue.
>
> The big overheads in Elephant are due to Lisp's data model and the  
> use of the MOP.  The serializer is currently designed for safety and  
> generality across lisps with reasonable, but not completely optimal  
> performance (since the whole point is dealing with the  
> implementation of the lisp data model, doing well would require  
> specializing on each lisp's internal data representation so I can  
> just do byte copies rather than using the general indirection  
> apis).  The code is there; perhaps I should selective re-enable the  
> optimizations for SBCL...
>
> Perhaps these points underscore your argument about the overhead of  
> Lisp.  It is more work to write fast programs in lisp because  
> performance is at odds from the common idioms.  Other languages,  
> like C, make it easy to write efficient programs, but much harder to  
> write programs quickly as well as to maintain large programs.  There  
> is no free lunch, as my physicists friends are fond of saying.
>
>
> I was curious where the time was going in the example.  The  
> deserializer creates fresh strings of type 'character.  Aren't  
> characters 4 bytes in unicode SBCL?  That would mean you're handling  
> 8M bytes and for 4M bytes of data it produced 2M bytes of garbage  
> (probably all consing in the test framework).  Allegro uses UTF-16  
> so the same test is only 2MBytes of total string input.  I'm not  
> sure where the 25% overhead came from though.
>
> The time cost is still surprising though.
>
> ELE-USER> (time (dotest (truncate (* 1000 1000) (length  
> *teststring*)) *teststring*))
> ; cpu time (non-gc) 140 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  140 msec user, 0 msec system
> ; real time  151 msec
> ; space allocation:
> ;  228,879 cons cells, 2,516,384 other bytes, 0 static bytes
> NIL
> ELE-USER>
>
> About half of the memory overhead is consing in dotimes, etc.   
> However, notice that it didn't trigger a GC. Even when the GC runs,  
> it isn't a significant cause of time overhead.  Calling free to  
> malloc a bunch is also a little expensive...
>
> Cheers,
> Ian
>
> PS - There was also an ill-conceived safety check that crept into  
> the serializer recently which accounts for about half of the time in  
> the example.  I'll have to think about why it is there.
>
>
>
>
> On May 16, 2008, at 3:50 PM, Alex Mizrahi wrote:
>
>> IE> I doubt that using a Lisp with a modern compiler affects CPU  
>> time in
>> IE> any interesting way;
>> IE> I've observed that algorithm implementation with reasonable data
>> IE> structure choices is within 10's of % points of the same  
>> algorithm in
>> IE> C.
>>
>> it's possible to optimize single function, but typical (idiomatic)  
>> Lisp programs:
>> * generate megatons of garbage
>> * use dynamic type dispatch (lacking type declarations)
>> and often have signficantly inferior performance.
>>
>> IE>  I don't care how fast your CPU is, it's the data access  
>> patterns that
>> IE> dominate time unless you're working on small or highly local  
>> datasets,
>> IE> then the inner loop speed _might_ matter.
>>
>> i'm assuming  you're speaking about memory bandwidth, because i think
>> disk I/O will matter only on relatively large databases.
>>
>> memory bandwidth is also eaten in large quantities by garbage  
>> generator.
>>
>> let's take a concrete piece of software: Elephant's serializer. i'm  
>> assuming it's important piece of code and was optimized a lot?
>>
>> i'm taking test case from code Robert once have posted, little bit  
>> modifed:
>>
>> (defparameter *teststring* "supercalifragiliciousexpialidocious")
>> (defun dotest (n s)
>>    (dotimes (x n)
>>      (in-out-value s)))
>>
>> ELE-TESTS> (time (dotest (truncate (* 1000 1000) (length  
>> *teststring*)) *teststring*))
>> Evaluation took:
>> 0.249 seconds of real time
>> 0.248016 seconds of user run time
>> 0.0 seconds of system run time
>> 0 calls to %EVAL
>> 0 page faults and
>> 6,398,328 bytes consed.
>>
>>
>> ;; testing on core2 2.1 GHz, SBCL 1.0.13
>>
>> not that great, ain't it?
>> it processed roughly 10 megabytes of raw string data per second (5  
>> MB serialized, 5 MB deserialized) -- that's 10 times slower than  
>> linear read spead of modern HDDs.
>> and for 1 MB of data it produced 2 MB of garbage.
>>
>> or there is something wrong with my test?
>>
>>
>> ??>> that abusing function is form-slot-key (that is used in CL-SQL  
>> too):
>>
>> IE> At least in Allegro, the profiler only counts thread time and  
>> that
>> IE> much of the delay shown by 'time' vs. the profiler is taken up  
>> waiting
>> IE> on cache-miss IO.  It may be that thread time is only a handful  
>> of
>> IE> percentage points of the total time?
>>
>> i'm pretty sure there was very little I/O in my tests.
>>
>> ??>> and even more surprising was to find out that there is no  
>> portable
>> ??>> and fast way to convert interger to string.
>> ??>> so i had to use SBCL's internal function called sb-impl::quick-
>> ??>> integer-to-string -- apparently SBCL
>> ??>> developers knew about this problem so they made this function  
>> for
>> ??>> themselves.
>>
>> IE> For fun, I did a test with princ and for integers only it's 2x  
>> faster,
>> IE> for the above idiom it's only about 50% faster, but some of  
>> that is
>> IE> the with-string-stream idea.   If you do it manually as they  
>> did and
>> IE> write it to an array one character at a time it can get pretty  
>> fast,
>> IE> but probably only 3x-4x.
>>
>> maybe they fixed something in SBCL? i'm using version 1.0.13 and  
>> get very different results, with functions that are actually in  
>> elephant
>> (db-postmodern's function is optimized, db-clsql's is not):
>>
>> CL-USER> (defun dotest-pm (n o)
>>  (loop for i from 0 below n
>>            summing (length
>>                              (db-postmodern::form-slot-key i o))))
>> DOTEST-PM
>>
>> CL-USER> (defun dotest-clsql (n o)
>>                     (loop for i from 0 below n
>>                              summing (length
>>           (db-clsql::form-slot-key i o))))
>> DOTEST-CLSQL
>>
>>
>> CL-USER> (time (dotest-pm 100000 'o))
>> Evaluation took:
>> 0.05 seconds of real time
>> 0.052003 seconds of user run time
>> 0.0 seconds of system run time
>> [Run times include 0.024 seconds GC run time.]
>> 0 calls to %EVAL
>> 0 page faults and
>> 5,592,976 bytes consed.
>> 688890
>> CL-USER> (time (dotest-clsql 100000 'o))
>> Evaluation took:
>> 1.502 seconds of real time
>> 1.416088 seconds of user run time
>> 0.084005 seconds of system run time
>> [Run times include 0.7 seconds GC run time.]
>> 0 calls to %EVAL
>> 0 page faults and
>> 404,779,792 bytes consed.
>> 688890
>>
>>
>> optimized version seems to be 30 times faster.
>>
>>
>> IE> I agree with this in general.  However I feel like this whole
>> IE> discussion is premature optimization being done while  
>> blindfolded.
>>
>> well, you guys started it discussing compression and other aspects :)
>>
>> IE> Prevalence is much, much faster because you don't have to flush  
>> data
>> IE> structures on each commit, so cl-prevalence performance with
>> IE> Elephant's data and transaction abstractions would be a really  
>> nice
>> IE> design point.
>>
>> yep, that would be very interesting
>>
>> IE>   I wonder if we would get some of this benefit from a Rucksack
>> IE> adaptation?
>>
>> perhaps there is a way to hack it one way or another.
>> one (high-level) way is to serialize dirty objects into different  
>> location (log) in some customized format that will allow us bring  
>> them back to normal data store later.
>> another (low-level) way is to hijack heap (as far as i understood  
>> from a brief look rucksack does all reads/writes via heap object) to
>> log changes instead of writing them to disk.
>>
>> also it might be good to use mmaped files -- OS can decide when to  
>> write changes by itself, while we could write log for safety.
>> but mmaped files are definitely not pure Lisp :(
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list