[elephant-devel] BDB Run recovery errors seen after switching to ELEPHANT-1-0-A2

Yarek Kowalik yarek.kowalik at gmail.com
Wed Mar 11 19:38:43 UTC 2009


More info about my configuration:

#+(and (or sbcl allegro openmcl lispworks) (not (or mswindows windows)) (not
(or macosx darwin)))
((:compiler             . :gcc)
 (:berkeley-db-version         . "4.7")
 (:berkeley-db-include-dir    . "/usr/local/BerkeleyDB.4.7/include/")
 (:berkeley-db-lib-dir         . "/usr/local/BerkeleyDB.4.7/lib/")
 (:berkeley-db-lib         . "/usr/local/BerkeleyDB.4.7/lib/libdb-4.7.so")
 (:berkeley-db-deadlock     . "/usr/local/BerkeleyDB.4.7/bin/db_deadlock")
 (:berkeley-db-cachesize     . 20971520)
 (:berkeley-db-max-locks     . 2000)
 (:berkeley-db-max-objects     . 2000)
 (:berkeley-db-map-degree2     . t)
 (:clsql-lib-paths         . nil)
 (:prebuilt-libraries         . nil))



On Wed, Mar 11, 2009 at 11:42 AM, Yarek Kowalik <yarek.kowalik at gmail.com>wrote:

> - I did a fresh pull from darcs:
>
>   darcs get http://www.common-lisp.net/project/elephant/darcs/elephant-1.0
>
> - I rebuilt elephant, and weblocks/elephant
>
> - I set up my weblocks store with recover and register flags both set to T:
>
> (defstore *elephant-store* :elephant
>   :spec `(:BDB ,(namestring (merge-pathnames (make-pathname :directory
> '(:relative "data/store"))
>                          (asdf-system-directory :fashion-origami)))
>                 :register t :recover t))
>
> - Run db_recovery in the data store directory
>
> - I started the two processes (one serving port 80 the other 443).
>
> - Went to 443 first.  All is well.
>
> - Went to 80 next.  Webapp application dies (trace below).
>
> *Note:* The trace does not look much different from before.
>
> *Note2:* SLIME died when compiling elephant - got stuck on
>
> gcc -L/usr/local/BerkeleyDB.4.7/lib/ -I/usr/local/BerkeleyDB.4.7/include/
> -shared -march=x86-64 -fPIC -Wall -g -O2 -g
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.c -o
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.so -lm
>
> I recompiled from command line and got these warnings:
>
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.c: In
> function ‘lisp_compare2’:
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.c:1053:
> warning: unused variable ‘i’
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.c: In
> function ‘lisp_compare_key2’:
> /home/yarek/.sbcl/site/elephant-1.0/src/db-bdb/libberkeley-db.c:1147:
> warning: unused variable ‘i’
>
>
>
> Not sure how to proceed.  In order to ensure that I'm doing the right
> steps, can you please confirm:
>
> 1. Are my weblocks store settings correct?
> 2. Did I get the right repository for darcs
> 3. is manual compilation OK for libberkeley-db.so?
>
> Yarek
>
> Here is what I get as debug trace:
>
> Berkeley DB error #-30974: DB_RUNRECOVERY: Fatal error, run database
> recovery
>    [Condition of type ELEPHANT:BDB-DB-ERROR]
>
> Restarts:
>  0: [TERMINATE-THREAD] Terminate this thread (#<THREAD
> "hunchentoot-worker-2" RUNNING {1003705211}>)
>
> Backtrace:
>   0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION (LAMBDA #)
> {100296C8F9}>)
>   1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT:BDB-DB-ERROR {1003081261}>)
>   2: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {1003089A39}>)
>   3: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION {10036CB5E1}>
> #<CLOSURE (LAMBDA #) {1003089A59}>)
>   4: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {10036CB5E1}>
> #<CLOSURE (LAMBDA #) {1003089A39}>)
>   5: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT:BDB-DB-ERROR {1003081261}>)
>   6: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {10030899F9}>)
>   7: (INVOKE-DEBUGGER #<ELEPHANT:BDB-DB-ERROR {1003081261}>)
>   8: (INVOKE-DEBUGGER #<ELEPHANT:BDB-DB-ERROR {1003081261}>)[:EXTERNAL]
>   9: ((SB-PCL::FAST-METHOD HUNCHENTOOT:MAYBE-INVOKE-DEBUGGER (T))
> #<unavailable argument> #<unavailable argument> #<ELEPHANT:BDB-DB-ERROR
> {1003081261}>)
>  10: (SIGNAL #<ELEPHANT:BDB-DB-ERROR {1003081261}>)[:EXTERNAL]
>  11: (ERROR #<ELEPHANT:BDB-DB-ERROR {1003081261}>)[:EXTERNAL]
>  12: ((FLET #:LAMBDA43) #<ELEPHANT:BDB-DB-ERROR {1003081261}>)
>  13: ((FLET #:LAMBDA43) #<ELEPHANT:BDB-DB-ERROR {1003081261}>)[:EXTERNAL]
>  14: (SIGNAL #<ELEPHANT:BDB-DB-ERROR {1003081261}>)[:EXTERNAL]
>  15: (ERROR ELEPHANT:BDB-DB-ERROR)[:EXTERNAL]
>       Locals:
>         SB-DEBUG::ARG-0 = 3
>         SB-DEBUG::ARG-1 = ELEPHANT:BDB-DB-ERROR
>  16: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unavailable argument> #<unavailable
> argument> #<unavailable argument> #<unavailable argument>)[:EXTERNAL]
>       Locals:
>         SB-DEBUG::ARG-0 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-2 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-3 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-4 = :<NOT-AVAILABLE>
>  17: (ELEPHANT::MAP-BTREE-VALUES #<unavailable lambda list>)
>       [No Locals]
>  18: (ELEPHANT::GET-DB-SCHEMAS #<unavailable lambda list>)
>       [No Locals]
>  19: (ELEPHANT:MAP-CLASS #<unavailable argument> #<unavailable
> argument>)[:EXTERNAL]
>       Locals:
>         SB-DEBUG::ARG-0 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-2 = :<NOT-AVAILABLE>
>  20: ((SB-PCL::FAST-METHOD WEBLOCKS:COUNT-PERSISTENT-OBJECTS
> (WEBLOCKS-ELEPHANT:ELEPHANT-STORE T)) ..)[:EXTERNAL]
>       Locals:
>         SB-DEBUG::ARG-0 = 4
>         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-2 = :<NOT-AVAILABLE>
>         SB-DEBUG::ARG-3 = #<WEBLOCKS-ELEPHANT:ELEPHANT-STORE {1002D92011}>
>         SB-DEBUG::ARG-4 = FASHION-ORIGAMI::PRODUCT-SET
>
>
>
> On Wed, Mar 11, 2009 at 10:38 AM, Ian Eslick <eslick at media.mit.edu> wrote:
>
>> BDB docs state that :register t :recover t solve the multi-process recover
>> problem.  If you have already registered a process, it will inhibit recovery
>> when a second process connects.
>>
>> I have another user with a deserialization error problem.  Try updating to
>> the latest elephant-1.0 and you should see an updated deserialization-error
>> report in the backtrace that tells you more about why there was an error.
>>
>> Ian
>>
>>
>> On Mar 11, 2009, at 1:28 PM, Yarek Kowalik wrote:
>>
>>  Ok so that did not work.
>>>
>>> I remember that last time I had to remove :recover t from OPEN-STORE
>>> method in weblocks/src/store/elephant/elephant.lisp since that was the
>>> source of problems with opening a store with two processes.  I think it
>>> forces the recovery mode at connection startup.
>>>
>>> I wonder if I should try :recover nil :register t -- doing that now.
>>>
>>> Yarek
>>>
>>> On Wed, Mar 11, 2009 at 10:06 AM, Yarek Kowalik <yarek.kowalik at gmail.com>
>>> wrote:
>>> Info:
>>>
>>> - BDB 4.7.
>>> - Ubuntu 8.04 on both 32 bit (Intel and whatever Amazon EC2 is using) and
>>> 64 bit (AMD X2 64) versions.
>>> - got the ELEPHANT-1-0-A2 via darcs
>>>
>>> I'm trying out the :register t and :recover t options now.
>>>
>>> Yarek
>>>
>>>
>>> On Tue, Mar 10, 2009 at 4:20 PM, Ian Eslick <eslick at media.mit.edu>
>>> wrote:
>>> You're using BDB 4.7, right?  What machine, os, word-width, etc?
>>>
>>> Add :register t and :recover t to the open-store keyword list.
>>>
>>> You can also try downloading from the elephant-1.0 repo but using:
>>>
>>> darcs get --tag=ELEPHANT-1-0-A2
>>> http://www.common-lisp.net/project/elephant/darcs/elephant-1.0
>>>
>>> Ian
>>>
>>>
>>>
>>>
>>> On Mar 10, 2009, at 6:57 PM, Yarek Kowalik wrote:
>>>
>>> > Hi Ian,
>>> >
>>> > Thanks for replying.
>>> >
>>> > The only difference on my end between Jan version and now is the
>>> > version of Elephant. I can switch back to the unstable version and
>>> > the current app works fine.
>>> >
>>> > Re: slots on the controler: all are set to some value - none are
>>> > unbound, but some are (see below).
>>> >
>>> > THe only reason I was upgrading was do to some other erros seen with
>>> > map-inverted-index that returned nils (though there is a way to do a
>>> > cleanup of those).
>>> >
>>> > I hope that Leslie can shed more light. Is there a way to set the
>>> > 'regsiter' flag?
>>> >
>>> > Yarek
>>> >
>>> >
>>> > #<DB-BDB::BDB-STORE-CONTROLLER {10024ADF61}>
>>> > --------------------
>>> > Class: #<STANDARD-CLASS DB-BDB::BDB-STORE-CONTROLLER>
>>> > --------------------
>>> > All Slots:
>>> > BTREES                 = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X006417A0 :TYPE (* T)> [set value] [make unbound]
>>> > CID-SEQ                = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X006439A0 :TYPE (* T)> [set value] [make unbound]
>>> > DB                     = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X00641060 :TYPE (* T)> [set value] [make unbound]
>>> > DB-VERSION             = 100 [set value] [make unbound]
>>> > DEADLOCK-DETECT-THREAD = NIL [set value] [make unbound]
>>> > DEADLOCK-PID           = NIL [set value] [make unbound]
>>> > DESERIALIZE            = ELEPHANT-SERIALIZER2::DESERIALIZE [set
>>> > value] [make unbound]
>>> > DESERIALIZE-FN         = #<FUNCTION (SB-C::&OPTIONAL-DISPATCH ..)>
>>> > [set value] [make unbound]
>>> > DUP-BTREES             = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X00641EE0 :TYPE (* T)> [set value] [make unbound]
>>> > ENVIRONMENT            = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X0063F320 :TYPE (* T)> [set value] [make unbound]
>>> > GC-MARK-LIST           = NIL [set value] [make unbound]
>>> > GC-MARK-TABLE          = NIL [set value] [make unbound]
>>> > GC-MARKING-P           = NIL [set value] [make unbound]
>>> > GC-MAX-OID             = NIL [set value] [make unbound]
>>> > INDEX-TABLE            = #<BDB-BTREE oid:-2> [set value] [make
>>> > unbound]
>>> > INDICES                = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X00642620 :TYPE (* T)> [set value] [make unbound]
>>> > INDICES-ASSOC          = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X00642D60 :TYPE (* T)> [set value] [make unbound]
>>> > INSTANCE-CACHE         = #<HASH-TABLE :TEST EQL :COUNT 8
>>> > {100271C671}> [set value] [make unbound]
>>> > INSTANCE-CACHE-LOCK    = #S(SB-THREAD:MUTEX :NAME NIL :%OWNER
>>> > NIL :STATE 0) [set value] [make unbound]
>>> > INSTANCE-CLASS-INDEX   = #<BDB-BTREE-INDEX oid:1> [set value] [make
>>> > unbound]
>>> > INSTANCE-TABLE         = #<BDB-INDEXED-BTREE oid:-3> [set value]
>>> > [make unbound]
>>> > METADATA               = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X006409B0 :TYPE (* T)> [set value] [make unbound]
>>> > OID-DB                 = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X006434E0 :TYPE (* T)> [set value] [make unbound]
>>> > OID-SEQ                = #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP
>>> > #X00643B70 :TYPE (* T)> [set value] [make unbound]
>>> > ROOT                   = #<BDB-BTREE oid:-1> [set value] [make
>>> > unbound]
>>> > SCHEMA-CACHE           = #<HASH-TABLE :TEST EQ :COUNT 0
>>> > {100271C561}> [set value] [make unbound]
>>> > SCHEMA-CACHE-LOCK      = #S(SB-THREAD:MUTEX :NAME NIL :%OWNER
>>> > NIL :STATE 0) [set value] [make unbound]
>>> > SCHEMA-CLASSES         = NIL [set value] [make unbound]
>>> > SCHEMA-NAME-INDEX      = #<BDB-BTREE-INDEX oid:0> [set value] [make
>>> > unbound]
>>> > SCHEMA-TABLE           = #<BDB-INDEXED-BTREE oid:-4> [set value]
>>> > [make unbound]
>>> > SERIALIZE              = ELEPHANT-SERIALIZER2::SERIALIZE [set value]
>>> > [make unbound]
>>> > SERIALIZE-FN           = #<FUNCTION ELEPHANT-SERIALIZER2::SERIALIZE>
>>> > [set value] [make unbound]
>>> > SERIALIZER-VERSION     = 2 [set value] [make unbound]
>>> > SPEC                   = (:BDB "/home/yarek/lisp/projects/zzz/data/
>>> > store/" :RECOVER NIL) [set value] [make unbound]
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Mar 10, 2009 at 2:50 PM, Ian Eslick <eslick at media.mit.edu>
>>> > wrote:
>>> > Unfortunately that's not a highly informative backtrace.  Did you
>>> > upgrade to the latest, and this caused it, or did something suddenly
>>> > change that caused the January '09 version to work?
>>> >
>>> > Some possible sources of these problems:
>>> >
>>> > 1) Somehow the 'register' flag that helps support multiple processes
>>> > is causing problems; it is no longer set by default I believe.
>>> > (Leslie may know more)
>>> >
>>> > 2) The store-controller is not being opened properly.  Are all the
>>> > slots set in the controller after the second process is opened?
>>> >
>>> > Can you be more specific about what changed between January and now?
>>> >
>>> > Ian
>>> >
>>> > On Mar 10, 2009, at 5:31 PM, Yarek Kowalik wrote:
>>> >
>>> > > Hi folks,
>>> > >
>>> > > I have two processes accessing the same BDB.  One process manages
>>> > > weblocks requests on port 80, the other on port 443. In the elephant
>>> > > from last January, I was able to start, connect and use BDB from
>>> > > both processes.  Now, when I the user is redirected to port 443 and
>>> > > the process tries to retrieve data from the BDB, I get a
>>> > > DB_RUNRECOVERY error (see trace below).   This happens when the port
>>> > > 443 porcess connects for the very first time to BDB.
>>> > >
>>> > > I think I have seen this before, and I think it had to do with some
>>> > > default configuration on the controler, some argument  that forced
>>> > > the BDB into the recovery mode when process first starts up.
>>> > >
>>> > > Any idea how to resolve this? It's killing my secure connection on
>>> > > my web app - it's urgent.
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Yarek
>>> > >
>>> > >
>>> > >
>>> > > Berkeley DB error #-30974: DB_RUNRECOVERY: Fatal error, run database
>>> > > recovery
>>> > >    [Condition of type ELEPHANT:BDB-DB-ERROR]
>>> > >
>>> > > Restarts:
>>> > >  0: [TERMINATE-THREAD] Terminate this thread (#<THREAD "hunchentoot-
>>> > > worker-6" RUNNING {B6CD101}>)
>>> > >
>>> > > Backtrace:
>>> > >   0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION (LAMBDA
>>> > > #) {AD4FBA5}>)
>>> > >   1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)
>>> > >   2: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>> > > SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {B729FED}>)
>>> > >   3: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION {AF23819}>
>>> > > #<CLOSURE (LAMBDA #) {B729FFD}>)
>>> > >   4: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {AF23819}>
>>> > > #<CLOSURE (LAMBDA #) {B729FED}>)
>>> > >   5: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT:BDB-DB-ERROR
>>> > {B729BF9}>)
>>> > >   6: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>> > > SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {B729FCD}>)
>>> > >   7: (INVOKE-DEBUGGER #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)
>>> > >   8: (INVOKE-DEBUGGER #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)[:EXTERNAL]
>>> > >   9: ((SB-PCL::FAST-METHOD HUNCHENTOOT:MAYBE-INVOKE-DEBUGGER (T))
>>> > > #<unavailable argument> #<unavailable argument> #<ELEPHANT:BDB-DB-
>>> > > ERROR {B729BF9}>)
>>> > >  10: (SIGNAL #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)[:EXTERNAL]
>>> > >  11: (ERROR #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)[:EXTERNAL]
>>> > >  12: ((FLET #:LAMBDA43) #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)
>>> > >  13: ((FLET #:LAMBDA43) #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)
>>> > [:EXTERNAL]
>>> > >  14: (SIGNAL #<ELEPHANT:BDB-DB-ERROR {B729BF9}>)[:EXTERNAL]
>>> > >  15: (ERROR ELEPHANT:BDB-DB-ERROR)[:EXTERNAL]
>>> > >       Locals:
>>> > >         SB-DEBUG::ARG-0 = 3
>>> > >         SB-DEBUG::ARG-1 = ELEPHANT:BDB-DB-ERROR
>>> > >  16: ((SB-PCL::FAST-METHOD ELEPHANT:GET-VALUE (T DB-BDB::BDB-BTREE))
>>> > > #<unavailable lambda list>)
>>> > >       [No Locals]
>>> > >  17: (ELEPHANT::ENSURE-SLOT-DEF-INDEX #<unavailable argument>
>>> > > #<unavailable argument>)
>>> > >       Locals:
>>> > >         SB-DEBUG::ARG-0 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>>> > >  18: ((SB-PCL::FAST-METHOD ELEPHANT:FIND-INVERTED-INDEX
>>> > > (ELEPHANT:PERSISTENT-METACLASS T)) #<unavailable argument>
>>> > > #<unavailable argument> #<unavailable argument> #<unavailable
>>> > > argument>)[:EXTERNAL]
>>> > >       Locals:
>>> > >         SB-DEBUG::ARG-0 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-2 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-3 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-4 = :<NOT-AVAILABLE>
>>> > >  19: (ELEPHANT:MAP-INVERTED-INDEX #<unavailable argument>
>>> > > #<unavailable argument> #<unavailable argument>)[:EXTERNAL]
>>> > >       Locals:
>>> > >         SB-DEBUG::ARG-0 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-2 = :<NOT-AVAILABLE>
>>> > >         SB-DEBUG::ARG-3 = :<NOT-AVAILABLE>
>>> > >
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > elephant-devel site list
>>> > > elephant-devel at common-lisp.net
>>> > > http://common-lisp.net/mailman/listinfo/elephant-devel
>>> >
>>> >
>>> > _______________________________________________
>>> > elephant-devel site list
>>> > elephant-devel at common-lisp.net
>>> > http://common-lisp.net/mailman/listinfo/elephant-devel
>>> >
>>> > _______________________________________________
>>> > elephant-devel site list
>>> > elephant-devel at common-lisp.net
>>> > http://common-lisp.net/mailman/listinfo/elephant-devel
>>>
>>>
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20090311/d92effaf/attachment.html>


More information about the elephant-devel mailing list