[elephant-devel] Berkeley DB error: Cannot allocate memory.
Ian Eslick
eslick at media.mit.edu
Wed Sep 24 12:45:17 UTC 2008
> the main problem for us is that we automatically import data files and
> data fragments, where one of them can easily result in the creation of
> thousands or even tens of thousands heavily cross-referencing objects,
> each with a number of index entries associated to it. The complexity
> of
> imports is not known in advance and can vary drastically from import
> to
> import.
>
> None of the objects is big in itself, but the number adds easily up to
> tens of thousands of inserts, updates or deletesin one logical
> transaction (concurrency is less of an issue for our imports and we
> could easily live with blocking the DB during the various imports).
> Since transactions of this size don't work with BDB as a backend, we
> currently do not use transactions for the import as a whole. We do use
> them for the import of single entries each consisting of few dozen
> or so
> objects to ensure consistency within an entry and to speed up
> insertions.
Then it sounds like you're doing the right thing already, given the
current constraints in Elephant. See some comments below.
>
>>
>> Max concurrent transactions is upper-bounded by your machine's
>> processing capacity and max # of locks is proportional to the
>> logarithm of the number of objects which will grow very slowly so can
>> be given practical upper bounds.
> naive question, but why is the # of logs proportional to the logarithm
> of the number of objects to be inserted? Well, probably I'll have to
> check how elephant uses locks in its BTrees.
This discussion is about the BDB backend, elephant makes no
commitments about locking in the various stores.
From BDB docs:
"For Btree and Recno access methods, you will need one lock object per
level of the database tree, at a minimum. (Unless keys are quite large
with respect to the page size, neither Recno nor Btree database trees
should ever be deeper than five levels.) Then, you will need one lock
object for each leaf page of the database tree that will be
simultaneously accessed."
> I'm a bit surprised about this. Most relational databases use
> journalling and / or rollback files to ensure that they can commit or
> rollback large transactions as one atomic unit. The procedure is quite
> neatly described in http://www.sqlite.org/atomiccommit.html and
> http://www.sqlite.org/tempfiles.html, but works AFAIK more or less the
> same on other RDMSs.
Actually, as I understand it, SQlite makes all the edits in user
memory rather than in a shared memory region so it's just easier to
allocate the memory in that case. The tradeoff as you point out is
serializing side-effecting transactions at the whole-DB level.
Actually if you had shared locks you could probably still pull of the
same rough model...
BDB is tuned for highly concurrent, but independent, write
transactions across multiple threads/processes. The shared memory
region is used for sharing the fine-grained locks as well as for a
shared cache so each process doesn't have to separately allocate
memory and that you can have processes with various levels of read
isolation. This unfortunately leads to the requirement to pre-
allocate that shared memory region and your current dilemma.
BDB does journaling for recovery/durability purposes, but if you
shared memory gets mucked up by an allocation error, you need to flush
everything (even the dirty stuff), recover from the last known good DB
state and clean up any in-process transactions in the journal.
> I've just tried it out with sqlite which quite happily handles a
> transaction of 10 million inserts as an atomic unit both for commit
> and
> rollback – to be sure, in this case at the price of an extended read /
> write lock on the DB and, of course, not in one physical write
> operation. Main memory hardly enters the equation there.
Yep, this is definitely a sweet spot for SQLite.
We actually might be able to configure BDB to behave similarly (tune
it for single-writer, multiple-readers) as an optional setup to
Elephant; but we'd have to look into this a bit as I'm not entirely
sure BDB can meet your constraints here. I'm a little stale on all
this these days...
Thanks,
Ian
More information about the elephant-devel
mailing list