[elephant-devel] Berkeley DB error: Cannot allocate memory.

Wed Sep 24 12:45:17 UTC 2008

> the main problem for us is that we automatically import data files and
> data fragments, where one of them can easily result in the creation of
> thousands or even tens of thousands heavily cross-referencing objects,
> each with a number of index entries associated to it. The complexity  
> of
> imports is not known in advance and can vary drastically from import  
> to
> import.
>
> None of the objects is big in itself, but the number adds easily up to
> tens of thousands of inserts, updates or deletesin one logical
> transaction (concurrency is less of an issue for our imports and we
> could easily live with blocking the DB during the various imports).
> Since transactions of this size don't work with BDB as a backend, we
> currently do not use transactions for the import as a whole. We do use
> them for the import of single entries each consisting of few dozen  
> or so
> objects to ensure consistency within an entry and to speed up  
> insertions.

Then it sounds like you're doing the right thing already, given the  
current constraints in Elephant.  See some comments below.

>
>>
>> Max concurrent transactions is upper-bounded by your machine's
>> processing capacity and max # of locks is proportional to the
>> logarithm of the number of objects which will grow very slowly so can
>> be given practical upper bounds.
> naive question, but why is the # of logs proportional to the logarithm
> of the number of objects to be inserted? Well, probably I'll have to
> check how elephant uses locks in its BTrees.

This discussion is about the BDB backend, elephant makes no  
commitments about locking in the various stores.

 From BDB docs:

"For Btree and Recno access methods, you will need one lock object per  
level of the database tree, at a minimum. (Unless keys are quite large  
with respect to the page size, neither Recno nor Btree database trees  
should ever be deeper than five levels.) Then, you will need one lock  
object for each leaf page of the database tree that will be  
simultaneously accessed."

> I'm a bit surprised about this. Most relational databases use
> journalling and / or rollback files to ensure that they can commit or
> rollback large transactions as one atomic unit. The procedure is quite
> neatly described in http://www.sqlite.org/atomiccommit.html and
> http://www.sqlite.org/tempfiles.html, but works AFAIK more or less the
> same on other RDMSs.

Actually, as I understand it, SQlite makes all the edits in user  
memory rather than in a shared memory region so it's just easier to  
allocate the memory in that case.  The tradeoff as you point out is  
serializing side-effecting transactions at the whole-DB level.   
Actually if you had shared locks you could probably still pull of the  
same rough model...

BDB is tuned for highly concurrent, but independent, write  
transactions across multiple threads/processes.  The shared memory  
region is used for sharing the fine-grained locks as well as for a  
shared cache so each process doesn't have to separately allocate  
memory and that you can have processes with various levels of read  
isolation.  This unfortunately leads to the requirement to pre- 
allocate that shared memory region and your current dilemma.

BDB does journaling for recovery/durability purposes, but if you  
shared memory gets mucked up by an allocation error, you need to flush  
everything (even the dirty stuff), recover from the last known good DB  
state and clean up any in-process transactions in the journal.

> I've just tried it out with sqlite which quite happily handles a
> transaction of 10 million inserts as an atomic unit both for commit  
> and
> rollback – to be sure, in this case at the price of an extended read /
> write lock on the DB and, of course, not in one physical write
> operation. Main memory hardly enters the equation there.

Yep, this is definitely a sweet spot for SQLite.

We actually might be able to configure BDB to behave similarly (tune  
it for single-writer, multiple-readers) as an optional setup to  
Elephant; but we'd have to look into this a bit as I'm not entirely  
sure BDB can meet your constraints here.  I'm a little stale on all  
this these days...

Thanks,
Ian