[rucksack-devel] rucksack performance
Cyrus Harmon
ch-rucksack at bobobeach.com
Fri Jan 12 16:33:17 UTC 2007
Well, I guess I'm willing to buy this argument, in principle. In
practice, with no indexing I can load my .5M objects into rucksack.
With an (:index t) but no indices on any slot, performance is
relatively good, but then I get heap exhaustion (or was it the p-car
problem? I can't remember at the moment) after 150k or 200k objects.
With two slot indices, performance degrades significantly around 40k
objects into the load (this is with the restructured loop, batch
transactions into groups of 500 objects), and the load process
eventually falls over at some point, although it's still running on
my latest attempt, with around 55K objects loaded so far.
For my next attempt, I'll disable the GC. this is done by commenting
the call in the with-transaction form, right?
Cyrus
On Jan 12, 2007, at 7:53 AM, Arthur Lemmens wrote:
> Cyrus Harmon wrote:
>
>> I guess performance is the only issue I can think of. Yes, you have
>> to pay the cost of indexing either way, but, at least in many
>> systems, it can be faster to do a bunch of "inserts" and then index
>> the table, using rdbms-speak. It's not so much an issue of debugging
>> performance problems, as working around the performance bottleneck of
>> inserting into an index. I guess in an ideal world we wouldn't need
>> to disable indexing during a bulk creation phase.
>
> I don't see why that would be any faster in Rucksack. As far as I can
> see you just move the indexing work to a later stage, but I don't
> think
> you optimize it. But maybe I'm missing something.
>
> Arthur
>
>
More information about the rucksack-devel
mailing list