[elephant-devel] working with many millions of objects
Red Daly
reddaly at gmail.com
Tue Oct 10 07:46:01 UTC 2006
I will be running experiments in informatics and modeling in the future
that may contain (tens or hundreds of) millions of objects. Given the
ease of use of elephant so far, it would be great to use it as the
persistent store and avoid creating too many custom data structures.
I have recently run up against some performance bottlenecks when using
elephant to work with very large datasets (in the hundreds of millions
of objects). Using SleepyCat, I am able to import data very quickly
with a DB_CONFIG file with the following contents:
set_lk_max_locks 500000
set_lk_max_objects 500000
set_lk_max_lockers 500000
set_cachesize 1 0 0
I can import data very quickly until the 1 gb cache is too small to
allow complete in-memory access to the database. at this point it seems
that disk IO makes additional writes happen much slower. (I have also
tried increasing the 1 gb cache size, but the database fails to open if
it is too large--e.g. 2 gbs. I have 1.25 gb physical memory and 4 gb
swap, so the constraint seems to be physical memory.) the max_lock,
etc. lines allow transactions to contain hundreds of thousands of
individual locks, limiting the transaction throughput bottleneck
What are the technical restrictions on writing several million objects
to the datastore? Is it feasible to create a batch import feature to
allow large datasets to be imported using reasonable amounts of memory
for a desktop computer?
I hope this email is at least amusing!
Thanks again,
red daly
More information about the elephant-devel
mailing list