[elephant-devel] Lisp Btrees: design considerations

Fri May 16 22:54:35 UTC 2008

I don't disagree with you that it's easy to create expensive/slow  
programs.  My only point was that when it matters, it doesn't  
necessarily have to be a determinative issue.

The big overheads in Elephant are due to Lisp's data model and the use  
of the MOP.  The serializer is currently designed for safety and  
generality across lisps with reasonable, but not completely optimal  
performance (since the whole point is dealing with the implementation  
of the lisp data model, doing well would require specializing on each  
lisp's internal data representation so I can just do byte copies  
rather than using the general indirection apis).  The code is there;  
perhaps I should selective re-enable the optimizations for SBCL...

Perhaps these points underscore your argument about the overhead of  
Lisp.  It is more work to write fast programs in lisp because  
performance is at odds from the common idioms.  Other languages, like  
C, make it easy to write efficient programs, but much harder to write  
programs quickly as well as to maintain large programs.  There is no  
free lunch, as my physicists friends are fond of saying.

I was curious where the time was going in the example.  The  
deserializer creates fresh strings of type 'character.  Aren't  
characters 4 bytes in unicode SBCL?  That would mean you're handling  
8M bytes and for 4M bytes of data it produced 2M bytes of garbage  
(probably all consing in the test framework).  Allegro uses UTF-16 so  
the same test is only 2MBytes of total string input.  I'm not sure  
where the 25% overhead came from though.

The time cost is still surprising though.

ELE-USER> (time (dotest (truncate (* 1000 1000) (length *teststring*))  
*teststring*))
; cpu time (non-gc) 140 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  140 msec user, 0 msec system
; real time  151 msec
; space allocation:
;  228,879 cons cells, 2,516,384 other bytes, 0 static bytes
NIL
ELE-USER>

About half of the memory overhead is consing in dotimes, etc.   
However, notice that it didn't trigger a GC. Even when the GC runs, it  
isn't a significant cause of time overhead.  Calling free to malloc a  
bunch is also a little expensive...

Cheers,
Ian

PS - There was also an ill-conceived safety check that crept into the  
serializer recently which accounts for about half of the time in the  
example.  I'll have to think about why it is there.

On May 16, 2008, at 3:50 PM, Alex Mizrahi wrote:

> IE> I doubt that using a Lisp with a modern compiler affects CPU  
> time in
> IE> any interesting way;
> IE> I've observed that algorithm implementation with reasonable data
> IE> structure choices is within 10's of % points of the same  
> algorithm in
> IE> C.
>
> it's possible to optimize single function, but typical (idiomatic)  
> Lisp programs:
> * generate megatons of garbage
> * use dynamic type dispatch (lacking type declarations)
> and often have signficantly inferior performance.
>
> IE>  I don't care how fast your CPU is, it's the data access  
> patterns that
> IE> dominate time unless you're working on small or highly local  
> datasets,
> IE> then the inner loop speed _might_ matter.
>
> i'm assuming  you're speaking about memory bandwidth, because i think
> disk I/O will matter only on relatively large databases.
>
> memory bandwidth is also eaten in large quantities by garbage  
> generator.
>
> let's take a concrete piece of software: Elephant's serializer. i'm  
> assuming it's important piece of code and was optimized a lot?
>
> i'm taking test case from code Robert once have posted, little bit  
> modifed:
>
> (defparameter *teststring* "supercalifragiliciousexpialidocious")
> (defun dotest (n s)
>     (dotimes (x n)
>       (in-out-value s)))
>
> ELE-TESTS> (time (dotest (truncate (* 1000 1000) (length  
> *teststring*)) *teststring*))
> Evaluation took:
> 0.249 seconds of real time
> 0.248016 seconds of user run time
> 0.0 seconds of system run time
> 0 calls to %EVAL
> 0 page faults and
> 6,398,328 bytes consed.
>
>
> ;; testing on core2 2.1 GHz, SBCL 1.0.13
>
> not that great, ain't it?
> it processed roughly 10 megabytes of raw string data per second (5  
> MB serialized, 5 MB deserialized) -- that's 10 times slower than  
> linear read spead of modern HDDs.
> and for 1 MB of data it produced 2 MB of garbage.
>
> or there is something wrong with my test?
>
>
> ??>> that abusing function is form-slot-key (that is used in CL-SQL  
> too):
>
> IE> At least in Allegro, the profiler only counts thread time and that
> IE> much of the delay shown by 'time' vs. the profiler is taken up  
> waiting
> IE> on cache-miss IO.  It may be that thread time is only a handful of
> IE> percentage points of the total time?
>
> i'm pretty sure there was very little I/O in my tests.
>
> ??>> and even more surprising was to find out that there is no  
> portable
> ??>> and fast way to convert interger to string.
> ??>> so i had to use SBCL's internal function called sb-impl::quick-
> ??>> integer-to-string -- apparently SBCL
> ??>> developers knew about this problem so they made this function for
> ??>> themselves.
>
> IE> For fun, I did a test with princ and for integers only it's 2x  
> faster,
> IE> for the above idiom it's only about 50% faster, but some of that  
> is
> IE> the with-string-stream idea.   If you do it manually as they did  
> and
> IE> write it to an array one character at a time it can get pretty  
> fast,
> IE> but probably only 3x-4x.
>
> maybe they fixed something in SBCL? i'm using version 1.0.13 and get  
> very different results, with functions that are actually in elephant
> (db-postmodern's function is optimized, db-clsql's is not):
>
> CL-USER> (defun dotest-pm (n o)
>   (loop for i from 0 below n
>             summing (length
>                               (db-postmodern::form-slot-key i o))))
> DOTEST-PM
>
> CL-USER> (defun dotest-clsql (n o)
>                      (loop for i from 0 below n
>                               summing (length
>            (db-clsql::form-slot-key i o))))
> DOTEST-CLSQL
>
>
> CL-USER> (time (dotest-pm 100000 'o))
> Evaluation took:
> 0.05 seconds of real time
> 0.052003 seconds of user run time
> 0.0 seconds of system run time
> [Run times include 0.024 seconds GC run time.]
> 0 calls to %EVAL
> 0 page faults and
> 5,592,976 bytes consed.
> 688890
> CL-USER> (time (dotest-clsql 100000 'o))
> Evaluation took:
> 1.502 seconds of real time
> 1.416088 seconds of user run time
> 0.084005 seconds of system run time
> [Run times include 0.7 seconds GC run time.]
> 0 calls to %EVAL
> 0 page faults and
> 404,779,792 bytes consed.
> 688890
>
>
> optimized version seems to be 30 times faster.
>
>
> IE> I agree with this in general.  However I feel like this whole
> IE> discussion is premature optimization being done while blindfolded.
>
> well, you guys started it discussing compression and other aspects :)
>
> IE> Prevalence is much, much faster because you don't have to flush  
> data
> IE> structures on each commit, so cl-prevalence performance with
> IE> Elephant's data and transaction abstractions would be a really  
> nice
> IE> design point.
>
> yep, that would be very interesting
>
> IE>   I wonder if we would get some of this benefit from a Rucksack
> IE> adaptation?
>
> perhaps there is a way to hack it one way or another.
> one (high-level) way is to serialize dirty objects into different  
> location (log) in some customized format that will allow us bring  
> them back to normal data store later.
> another (low-level) way is to hijack heap (as far as i understood  
> from a brief look rucksack does all reads/writes via heap object) to
> log changes instead of writing them to disk.
>
> also it might be good to use mmaped files -- OS can decide when to  
> write changes by itself, while we could write log for safety.
> but mmaped files are definitely not pure Lisp :(
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel