[elephant-devel] working with many millions of objects

Robert L. Read read at robertlread.net
Tue Oct 10 12:53:16 UTC 2006


Yes, it's amusing.

In my own work I use the Postgres backend; I know very little about
SleepyCat.  It seems
to me that this is more of a SleepyCat issue, then an Elephant issue.
Perhaps you should
ask the SleepyCat list?

Are you importing things into SleepyCat directly in the correct
serialization format that 
they can be read by Elephant?  If so, I assume it is just a question of
solving the SleepyCat
problems.

An alternative would be to use the SQL-based backend.  However, I doubt
this will solve
your problem, since at present we (well, I wrote it) use a very
inefficient serialization scheme
for the SQL-based backend that base64 encodes everything.  This had the
advantage that 
it makes it work trouble-free with different database backends, but
could clearly be improved upon.
However, it is more than efficient enough for all my work, and at
present nobody is clamoring
to have it improved.

Is your problem importing the data or using it once it is imported?
It's hard for me to imagine
a problem so large that even the import time is a problem --- suppose it
takes 24 hours --- can
you not afford to pay that?

A drastic measure and potentially expensive measure would be to switch
to a 64-bit architecture
with a huge memory.  I intend to do that when forced by performance
issues in my own work.



On Tue, 2006-10-10 at 00:46 -0700, Red Daly wrote:

> I will be running experiments in informatics and modeling in the future 
> that may contain (tens or hundreds of) millions of objects.  Given the 
> ease of use of elephant so far, it would be great to use it as the 
> persistent store and avoid creating too many custom data structures.
> 
> I have recently run up against some performance bottlenecks when using 
> elephant to work with very large datasets (in the hundreds of millions 
> of objects).  Using SleepyCat, I am able to import data very quickly 
> with a DB_CONFIG file with the following contents:
> 
> set_lk_max_locks 500000
> set_lk_max_objects 500000
> set_lk_max_lockers 500000
> set_cachesize 1 0 0
> 
> I can import data very quickly until the 1 gb cache is too small to 
> allow complete in-memory access to the database.  at this point it seems 
> that disk IO makes additional writes happen much slower.  (I have also 
> tried increasing the 1 gb cache size, but the database fails to open if 
> it is too large--e.g. 2 gbs.  I have 1.25 gb physical memory and 4 gb 
> swap, so the constraint seems to be physical memory.)  the max_lock, 
> etc. lines allow transactions to contain hundreds of thousands of 
> individual locks, limiting the transaction throughput bottleneck
> 
> What are the technical restrictions on writing several million objects 
> to the datastore?  Is it feasible to create a batch import feature to 
> allow large datasets to be imported using reasonable amounts of memory 
> for a desktop computer?
> 
> I hope this email is at least amusing!
> 
> Thanks again,
> red daly
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061010/ab5a54bc/attachment.html>


More information about the elephant-devel mailing list