[rucksack-devel] rucksack performance

Cyrus Harmon ch-rucksack at bobobeach.com
Fri Jan 12 16:33:17 UTC 2007


Well, I guess I'm willing to buy this argument, in principle. In  
practice, with no indexing I can load my .5M objects into rucksack.  
With an (:index t) but no indices on any slot, performance is  
relatively good, but then I get heap exhaustion (or was it the p-car  
problem? I can't remember at the moment) after 150k or 200k objects.  
With two slot indices, performance degrades significantly around 40k  
objects into the load (this is with the restructured loop, batch  
transactions into groups of 500 objects), and the load process  
eventually falls over at some point, although it's still running on  
my latest attempt, with around 55K objects loaded so far.

For my next attempt, I'll disable the GC. this is done by commenting  
the call in the with-transaction form, right?

Cyrus

On Jan 12, 2007, at 7:53 AM, Arthur Lemmens wrote:

> Cyrus Harmon wrote:
>
>> I guess performance is the only issue I can think of. Yes, you have
>> to pay the cost of indexing either way, but, at least in many
>> systems, it can be faster to do a bunch of "inserts" and then index
>> the table, using rdbms-speak. It's not so much an issue of debugging
>> performance problems, as working around the performance bottleneck of
>> inserting into an index. I guess in an ideal world we wouldn't need
>> to disable indexing during a bulk creation phase.
>
> I don't see why that would be any faster in Rucksack.  As far as I can
> see you just move the indexing work to a later stage, but I don't  
> think
> you optimize it.  But maybe I'm missing something.
>
> Arthur
>
>




More information about the rucksack-devel mailing list