[rucksack-devel] Re: Fwd: State of the nation and heap patch

Cyrus Harmon ch-rucksack at bobobeach.com
Tue Feb 12 20:01:55 UTC 2008


Yes, turning off the indices does seem to be helping the load time.  
Based on current progress, I'd guess that this will bring the load  
time down into the 30 min. range instead of the two hour range.

Is there a way to add the indices back once the data is loaded?

It's still not as fast as I would like, but one step at a time...

No, the values being indexed are pretty spread out, so I doubt that's  
the problem.

Thanks again for your help,

Cyrus

On Feb 12, 2008, at 3:26 AM, Arthur Lemmens wrote:

> Cyrus Harmon wrote:
>
>> Thanks for these changes. My initial tests suggest that the new
>> version is faster, but not overwhelmingly so. I'll try to do some  
>> more
>> rigorous benchmarking (and profiling) and see what I can came up  
>> with.
>
> I expect that most of the time is caused by class and slot indexing,
> but it would be interesting to test that first.  For example, maybe  
> you
> could time how much time it takes to import your data without any
> indexing at all?  And then with class indexing but no slot indexing?
>
> Do you have slot indexes where relatively few slot values map to many
> objects?  In that case, the current implementation is far too slow,
> because it uses a plain list to represent the set of all btree values
> that belong to to one key.  I'm working on changing that to a  
> different
> data structure, but I haven't finished that yet.
>
> Arthur
>




More information about the rucksack-devel mailing list