[elephant-devel] Re: Severely poor performance in some obvious cases

Fri Nov 30 15:36:33 UTC 2007

Is there any reason that we can't store the byte-stream data directly  
in postmodern?  We already have an efficient, mostly non-consing byte- 
array serializer with the following format:

[btree_id][data_type][data_format]

If you used a new table for each btree, then you could strip the  
btree_id and pass the type + format to postgres.  Integers are stored  
big-endian and strings in left to write lisp-code order so might just  
work without modification.

However I'm speculating as I don't understand all the issues.  As I  
understand it, CL-SQL has the problem that the byte storage methods  
for different SQL engines are different enough to make a common API  
difficult to implement.

Ian

On Nov 29, 2007, at 10:06 AM, Robert L. Read wrote:

> On Thu, 2007-11-29 at 16:16 +0200, Alex Mizrahi wrote:
>> AP> Am I missing something really basic here?
>>
>> actually it's quite strange situation that you have *many*  
>> employees with
>> same name but you want just one (random one). i cannot imagine why  
>> one needs
>> this in real world..
>>
>> or you're saying that all have different names, but it still does  
>> consing?
>> this could be a bug then..
>>
>
> With respect to consing, it is important to point out that our
> serializer is very consing (for postmodern and CL-SQL backends.)  This
> is because I used base64 to transform the byte-streams into character
> strings.
>
> Most relational databases (including Postgres) provide a way of  
> storing
> byte sequences directly.  However, this is not standardized and not
> portable.  In fact, I spoke to Kevin Rosenberg, the author of CL-SQL,
> and he and CL-SQL don't have a good way to do it.
>
> However, since postmodern is Postgres specific, it could avoid this
> step, by using a back-end specific serializer.  I suspect this would
> have a huge impact on performance, both by decreasing consing (minor)
> and by decreasing the amount of disc I/O that has to be done (major).
>
> (BDB doesn't have this problem, because it natively uses byte- 
> sequences,
> not character-sequences.)
>
> Please see the code below, which demonstrates that pushing 1 million
> bytes through the serializer (without even going to the database)
> creates 8 million bytes of garbage in 0.433 seconds. (This is on a  
> new,
> fast, 2 gigabyte 64-bit machine, against postmodern:
>
> asdf:operate 'asdf:load-op :elephant)
> (asdf:operate 'asdf:load-op :ele-clsql)
> (asdf:operate 'asdf:load-op :postmodern)
>
> (asdf:operate 'asdf:load-op :elephant-tests)
> (in-package "ELEPHANT-TESTS")
>
> (setq *default-spec* *testpm-spec*)
>
> (setq teststring "supercalifragiliciousexpialidocious")
> (setq testint 42)
>
> (setq totalseriazationload (* 1000 1000))
>
> (setq n (ceiling (/ totalseriazationload (length teststring))))
>
> (open-store *default-spec*)
>
> (time
> (dotimes (x n)
>   (in-out-value teststring)))
>
> (close-store)
>
> *****
> Results in:
> Evaluation took:
>  0.433 seconds of real time
>  0.172974 seconds of user run time
>  0.058991 seconds of system run time
>  0 calls to %EVAL
>  0 page faults and
>  8,731,728 bytes consed.
> NIL
> ELE-TESTS>
>
> I personally think making a back-end specific serializer to avoid the
> base64 encoding would make a significant performance difference.  This
> is not much of an issue for me personally, since I keep everything
> cached in memory anyway.
>
>
> -- 
> Robert L. Read, PhD
> http://konsenti.com
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel