[elephant-devel] Strange behavior with indexed-btree

Ian Eslick eslick at media.mit.edu
Sun Jan 11 03:51:31 UTC 2009


This is very odd behavior.  I have a live webapp that uses all this  
stuff pretty heavily.

I did see this active cursor message earlier, but it was due to some  
debugging churn in the BDB transaction handler a day or two ago.   
Those issues should all be resolved now, but it's possible something  
isn't working right.  Since I did make some changes to the main  
transaction macros and that means that everything depending on those  
macros needs to be compiled - asdf should do that automatically but  
you may want to force a rebuild on the new elephant tree and your app.

Sometimes I find it useful to manually run a full, recover-fatal pass  
manually just to be sure BDB itself is happy.  (db_recover -f)

Deserialization errors have become pretty rare.  I suppose it is  
possible that you have a database that has a value that somehow got  
corrupted.  Have you tried your app on a fresh database?  It might be  
worth doing that to see if you can reproduce the problem.  (How  
critical is the data you have in there now?)

By the way:
- Do you have a 32 or 64 bit BDB/SBCL?
- Have you double checked your my-config.sexp to make sure everything  
is set right?

Can you produce a stand-alone test case I can try locally?  If there  
aren't too many dependencies, I'd be happy to try reproduce this  
locally or login to a machine and look at it remotely.

By the way, stack traces should be a bit more meaningful now if you  
want to send me the latest full stack trace.

Another thing you can try is tracing 'get-value' and (setf get-value)  
to see all the data you are reading/writing; perhaps it's one  
particular value that is causing the serializer to choke?

Ian

On Jan 10, 2009, at 6:27 PM, Kevin Raison wrote:

> Unfortunately, things remain as before.  Here is some additional debug
> output that I found in the *inferior-lisp* buffer of my emacs session:
>
> Deserialization error in map: returning nil for element
> transaction has active cursors
> PANIC: Invalid argument
> PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
> PANIC: fatal region error detected; run recovery
>
>
>
> The deserialization error happens in one of two spots:
> - when trying to get-from-root a btree or indexed btree
> - when trying to get a pset via a call to get-value on the btree
>
> Running recovery does not help;  the errors start as soon as I load  
> and
> execute my code post-recovery.
>
> Ian, can you enlighten me as to where in the code you believe these
> errors to originate?  Also, can you think of anything that I might be
> doing incorrectly in my code that would cause this sort of behavior?
> Here is a brief outline of what I am doing:
>
> There is
>  - a waitqueue (via sbcl's sb-thread:make-waitqueue)
>  - a tcp listener thread that writes its input to an incoming work  
> queue
>  - 3 reader threads who use the waitqueue to get work items from the
> work queue
> The readers parse the incoming data and create a persistent object  
> from
> it.  They also index the individual words of one of the persistent
> object's slots using a btree.  These actions are contained in a
> with-transaction block.
> There is also a hunchentoot web server thread running.  Users can  
> search
> the indexed text via this web interface.
>
> It is really rather simple.  The errors crop up randomly, sometimes  
> when
> I am running searches via the web interface while the listener and
> reader threads are active, and sometimes while the listener and reader
> threads are doing their thing without any web-based reads happening.
>
> Cheers,
> Kevin
>
>
> Kevin Raison wrote:
>> Thanks, Ian.  All tests pass for me as well.  I am running my
>> application with the new code and will let you know how it goes.
>>
>> Thanks again!
>> Kevin
>>
>> Ian Eslick wrote:
>>> I just checked in a few more fixes a few minutes ago.  I finally was
>>> able to reproduce some of this locally.  Try cleaning out the test
>>> database and re-running the tests.  Everything passes for me on a
>>> fresh DB and my own application is running fine too.
>>>
>>> Ian
>>>
>>> On Jan 10, 2009, at 2:04 PM, Kevin Raison wrote:
>>>
>>>> After pulling the latest patches and rebuilding without
>>>> optimizations as
>>>> you suggest, I get the following errors, the second of which is
>>>> slightly
>>>> different than what I was receiving previously.
>>>>
>>>> First, a deserialization error:
>>>> Condition ELEPHANT-TYPE-DESERIALIZATION-ERROR was signalled.
>>>>   [Condition of type ELEPHANT-TYPE-DESERIALIZATION-ERROR]
>>>>
>>>> Restarts:
>>>> 0: [RETRY] Retry SLIME REPL evaluation request.
>>>> 1: [ABORT] Return to SLIME's top level.
>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>>> RUNNING {10034F31F1}>)
>>>>
>>>> Backtrace:
>>>>  0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION  
>>>> (LAMBDA #)
>>>> {10040B2389}>)
>>>>  1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>> {1002CB3531}>)
>>>>  2: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-
>>>> ERROR
>>>> {1002CB3531}>)
>>>>  3: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {1002CB3AC9}>)
>>>>  4: (INVOKE-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>> {1002CB3531}>)
>>>>  5: ((FLET #:FUN23) #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>> {1002CB3531}>)
>>>>  6: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) ..)[:EXTERNAL]
>>>>  7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (REINDEX-LOG-ENTRIES) #<NULL-
>>>> LEXENV>)
>>>>  8: (SWANK::EVAL-REGION "(reindex-log-entries)\n")
>>>>  9: ((LAMBDA ()))
>>>> 10: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {1002C596F9}>)
>>>> 11: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>>> request." #<CLOSURE (LAMBDA #) {1002C59619}>)
>>>> 12: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>>> {1002C595E9}>)
>>>> 13: (SWANK::REPL-EVAL "(reindex-log-entries)\n")
>>>>
>>>> And then this one with any subsequent access to the bdb:
>>>>
>>>> Bad type argument:
>>>>
>>>>
>>>>  BDB-DB-ERROR
>>>>   [Condition of type SIMPLE-TYPE-ERROR]
>>>>
>>>> Restarts:
>>>> 0: [RETRY] Retry SLIME REPL evaluation request.
>>>> 1: [ABORT] Return to SLIME's top level.
>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>>> RUNNING {10034F31F1}>)
>>>>
>>>> Backtrace:
>>>>  0: (MAKE-CONDITION BDB-DB-ERROR)[:EXTERNAL]
>>>>  1: (ERROR BDB-DB-ERROR)[:EXTERNAL]
>>>>  2: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unused argument> #<unused
>>>> argument>
>>>> #<BDB-STORE-CONTROLLER /srv/console/db/> #<CLOSURE (LAMBDA #)
>>>> {10037C53D9$
>>>>  3: (ADD-LOG-ENTRY)[:EXTERNAL]
>>>>  4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ADD-LOG-ENTRY :SEVERITY 2
>>>> :HOST-UUID "123" :TIMESTAMP ...) #<NULL-LEXENV>)
>>>>  5: (SWANK::EVAL-REGION "(add-log-entry :severity 2 :host-uuid
>>>> \"123\"
>>>> :timestamp (get-universal-time)\n                       :program
>>>> \"blah\" :text \"test entry\")\n")
>>>>  6: ((LAMBDA ()))
>>>>  7: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {10037C1BF9}>)
>>>>  8: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>>> request." #<CLOSURE (LAMBDA #) {1003636499}>)
>>>>  9: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>>> {1003636469}>)
>>>> 10: (SWANK::REPL-EVAL "(add-log-entry :severity 2 :host-uuid  
>>>> \"123\"
>>>> :timestamp (get-universal-time)\n                 :program \"blah\"
>>>> :text \"test entry\")\n")
>>>> 11: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(add-log-
>>>> entry
>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>>                :program \"blah\" :text \"test entry\")\n"$
>>>> 12: (SWANK::EVAL-FOR-EMACS (SWANK:LISTENER-EVAL "(add-log-entry
>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>>               :program \"blah\" :text \"test entry\")\n") "CONSO$
>>>> 13: (SWANK::PROCESS-REQUESTS NIL)
>>>> 14: ((LAMBDA ()))
>>>> 15: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>> 16: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION  
>>>> {10032FB281}>
>>>> #<CLOSURE (LAMBDA #) {10038AD289}>)
>>>> 17: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {10032FB281}>
>>>> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>> 18: (SWANK::HANDLE-REQUESTS #<SWANK::CONNECTION {10032FB281}> NIL)
>>>> 19: (SWANK::CALL-WITH-BINDINGS NIL #<CLOSURE (LAMBDA #)
>>>> {10034F5189}>)
>>>> 20: ((FLET SB-THREAD::WITH-MUTEX-THUNK))
>>>> 21: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477))
>>>> 22: (SB-THREAD::CALL-WITH-MUTEX ..)
>>>> 23: ((LAMBDA ()))
>>>> 24: ("foreign function: call_into_lisp")
>>>> 25: ("foreign function: new_thread_trampoline")
>>>>
>>>>
>>>> Ian Eslick wrote:
>>>>> Also try pushing :elephant-without-optimize onto *features*  
>>>>> prior to
>>>>> doing a force rebuild of elephant.  This should give you some more
>>>>> information and rule out any optimization/declaration related  
>>>>> bugs.
>>>>>
>>>>> Thank you,
>>>>> Ian
>>>>>
>>>>>
>>>>> On Jan 7, 2009, at 9:10 PM, Kevin Raison wrote:
>>>>>
>>>>>> I was finally able to recreate this in the repl (as opposed to
>>>>>> seeing it
>>>>>> in my error logs), so here is a trace:
>>>>>>
>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>>> #<BDB-INDEXED-BTREE oid:2>.
>>>>>>  [Condition of type UNBOUND-SLOT]
>>>>>>
>>>>>> Backtrace:
>>>>>> 0: ((SB-PCL::FAST-METHOD SLOT-UNBOUND (T T T)) #<unavailable
>>>>>> argument> #<unavailable argument> #<unavailable argument>
>>>>>> #<BDB-INDEXED-BTREE oid:2> DB-BDB::INDICES-CACHE)
>>>>>> 1: (SB-PCL::SLOT-UNBOUND-INTERNAL #<BDB-INDEXED-BTREE oid:2> 2)
>>>>>> 2: ((SB-PCL::FAST-METHOD (SETF GET-VALUE) (T T
>>>>>> DB-BDB::BDB-INDEXED-BTREE)) #<unavailable lambda list>)
>>>>>> 3: ((LAMBDA (WORD)) "34766")
>>>>>> 4: (SB-IMPL::MAP1 #<CLOSURE (LAMBDA #) {1003CA4ED9}> (("" "asa"
>>>>>> "106007" "deny" "inbound" "udp" ...)) :LIST T)
>>>>>> 5: (MAPCAR #<CLOSURE (LAMBDA #) {1003CA4ED9}> ("" "asa" "106007"
>>>>>> "deny" "inbound" "udp" ...))[:EXTERNAL]
>>>>>> 6: ((SB-PCL::FAST-METHOD INDEX-LOG-ENTRY (LOG-ENTRY))  
>>>>>> #<unavailable
>>>>>> argument> #<unavailable argument> #<LOG-ENTRY oid:6373>)
>>>>>> 7: ((LAMBDA ()))
>>>>>> 8: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unavailable argument>
>>>>>> #<unavailable
>>>>>> argument> #<unavailable argument> #<unavailable argument>)
>>>>>> [:EXTERNAL]
>>>>>> 9: (ADD-LOG-ENTRY ..)
>>>>>> 10: (SB-INT:SIMPLE-EVAL-IN-LEXENV ..)
>>>>>>
>>>>>> And here is the code that sometimes causes the issue:
>>>>>> (defmethod index-log-entry ((log-entry log-entry))
>>>>>> (let ((*store-controller* *syslog-controller*))
>>>>>>   (let ((btree (get-from-root "log-entry-index" :sc
>>>>>> *syslog-controller*)))
>>>>>>     (map-words #'(lambda (word)
>>>>>>                    (unless (or (member word *superfluous-
>>>>>> words* :test
>>>>>> #'string-equal)
>>>>>>                                (< (length word) 2))
>>>>>>                      (unless (existsp word btree)
>>>>>>                        (setf (get-value word btree) (make- 
>>>>>> pset :sc
>>>>>> *syslog-controller*)))
>>>>>>                      (insert-item log-entry
>>>>>>                                   (get-value word btree))))
>>>>>>                (text log-entry)))))
>>>>>>
>>>>>>
>>>>>> If I drop and recreate the btree, everything runs smoothly (no
>>>>>> errors)
>>>>>> for a few hours and then the above error message starts showing  
>>>>>> up
>>>>>> about
>>>>>> half to three quarters of the time.
>>>>>>
>>>>>> Thanks again,
>>>>>> Kevin
>>>>>>
>>>>>>
>>>>>> Kevin Raison wrote:
>>>>>>> I am seeing an intermittent error with 1.0 alpha when trying to
>>>>>>> write to
>>>>>>> an indexed btree (using BerkeleyDB 4.7 as provided by Ubuntu's
>>>>>>> package
>>>>>>> repositories):
>>>>>>>
>>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>>>> #<BDB-INDEXED-BTREE oid:2>
>>>>>>>
>>>>>>> Within the same thread, sometimes this happens and sometimes I  
>>>>>>> am
>>>>>>> able
>>>>>>> to read and write to the btree.  I am using sbcl 1.0.24 on 32  
>>>>>>> bit
>>>>>>> intel
>>>>>>> linux.  Is there something obvious that might help alleviate  
>>>>>>> this
>>>>>>> or
>>>>>>> should I provide more context?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Kevin
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> elephant-devel site list
>>>>>>> elephant-devel at common-lisp.net
>>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>>> _______________________________________________
>>>>>> elephant-devel site list
>>>>>> elephant-devel at common-lisp.net
>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>> _______________________________________________
>>>>> elephant-devel site list
>>>>> elephant-devel at common-lisp.net
>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>> _______________________________________________
>>>> elephant-devel site list
>>>> elephant-devel at common-lisp.net
>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel





More information about the elephant-devel mailing list