[elephant-devel] Strange behavior with indexed-btree

Kevin Raison raison at chatsubo.net
Sat Jan 10 23:27:43 UTC 2009


Unfortunately, things remain as before.  Here is some additional debug 
output that I found in the *inferior-lisp* buffer of my emacs session:

Deserialization error in map: returning nil for element
transaction has active cursors
PANIC: Invalid argument
PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: fatal region error detected; run recovery 
 


The deserialization error happens in one of two spots:
- when trying to get-from-root a btree or indexed btree
- when trying to get a pset via a call to get-value on the btree

Running recovery does not help;  the errors start as soon as I load and 
execute my code post-recovery.

Ian, can you enlighten me as to where in the code you believe these 
errors to originate?  Also, can you think of anything that I might be 
doing incorrectly in my code that would cause this sort of behavior? 
Here is a brief outline of what I am doing:

There is
  - a waitqueue (via sbcl's sb-thread:make-waitqueue)
  - a tcp listener thread that writes its input to an incoming work queue
  - 3 reader threads who use the waitqueue to get work items from the 
work queue
The readers parse the incoming data and create a persistent object from 
it.  They also index the individual words of one of the persistent 
object's slots using a btree.  These actions are contained in a 
with-transaction block.
There is also a hunchentoot web server thread running.  Users can search 
the indexed text via this web interface.

It is really rather simple.  The errors crop up randomly, sometimes when 
I am running searches via the web interface while the listener and 
reader threads are active, and sometimes while the listener and reader 
threads are doing their thing without any web-based reads happening.

Cheers,
Kevin


Kevin Raison wrote:
> Thanks, Ian.  All tests pass for me as well.  I am running my 
> application with the new code and will let you know how it goes.
> 
> Thanks again!
> Kevin
> 
> Ian Eslick wrote:
>> I just checked in a few more fixes a few minutes ago.  I finally was  
>> able to reproduce some of this locally.  Try cleaning out the test  
>> database and re-running the tests.  Everything passes for me on a  
>> fresh DB and my own application is running fine too.
>>
>> Ian
>>
>> On Jan 10, 2009, at 2:04 PM, Kevin Raison wrote:
>>
>>> After pulling the latest patches and rebuilding without  
>>> optimizations as
>>> you suggest, I get the following errors, the second of which is  
>>> slightly
>>> different than what I was receiving previously.
>>>
>>> First, a deserialization error:
>>> Condition ELEPHANT-TYPE-DESERIALIZATION-ERROR was signalled.
>>>    [Condition of type ELEPHANT-TYPE-DESERIALIZATION-ERROR]
>>>
>>> Restarts:
>>>  0: [RETRY] Retry SLIME REPL evaluation request.
>>>  1: [ABORT] Return to SLIME's top level.
>>>  2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>> RUNNING {10034F31F1}>)
>>>
>>> Backtrace:
>>>   0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION (LAMBDA #)
>>> {10040B2389}>)
>>>   1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>> {1002CB3531}>)
>>>   2: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION- 
>>> ERROR
>>> {1002CB3531}>)
>>>   3: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {1002CB3AC9}>)
>>>   4: (INVOKE-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-ERROR  
>>> {1002CB3531}>)
>>>   5: ((FLET #:FUN23) #<ELEPHANT-TYPE-DESERIALIZATION-ERROR  
>>> {1002CB3531}>)
>>>   6: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>> (DB-BDB::BDB-STORE-CONTROLLER T)) ..)[:EXTERNAL]
>>>   7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (REINDEX-LOG-ENTRIES) #<NULL- 
>>> LEXENV>)
>>>   8: (SWANK::EVAL-REGION "(reindex-log-entries)\n")
>>>   9: ((LAMBDA ()))
>>>  10: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {1002C596F9}>)
>>>  11: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>> request." #<CLOSURE (LAMBDA #) {1002C59619}>)
>>>  12: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>> {1002C595E9}>)
>>>  13: (SWANK::REPL-EVAL "(reindex-log-entries)\n")
>>>
>>> And then this one with any subsequent access to the bdb:
>>>
>>> Bad type argument:
>>>
>>>
>>>   BDB-DB-ERROR
>>>    [Condition of type SIMPLE-TYPE-ERROR]
>>>
>>> Restarts:
>>>  0: [RETRY] Retry SLIME REPL evaluation request.
>>>  1: [ABORT] Return to SLIME's top level.
>>>  2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>> RUNNING {10034F31F1}>)
>>>
>>> Backtrace:
>>>   0: (MAKE-CONDITION BDB-DB-ERROR)[:EXTERNAL]
>>>   1: (ERROR BDB-DB-ERROR)[:EXTERNAL]
>>>   2: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unused argument> #<unused  
>>> argument>
>>> #<BDB-STORE-CONTROLLER /srv/console/db/> #<CLOSURE (LAMBDA #)  
>>> {10037C53D9$
>>>   3: (ADD-LOG-ENTRY)[:EXTERNAL]
>>>   4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ADD-LOG-ENTRY :SEVERITY 2
>>> :HOST-UUID "123" :TIMESTAMP ...) #<NULL-LEXENV>)
>>>   5: (SWANK::EVAL-REGION "(add-log-entry :severity 2 :host-uuid  
>>> \"123\"
>>> :timestamp (get-universal-time)\n                       :program
>>> \"blah\" :text \"test entry\")\n")
>>>   6: ((LAMBDA ()))
>>>   7: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {10037C1BF9}>)
>>>   8: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>> request." #<CLOSURE (LAMBDA #) {1003636499}>)
>>>   9: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>> {1003636469}>)
>>>  10: (SWANK::REPL-EVAL "(add-log-entry :severity 2 :host-uuid \"123\"
>>> :timestamp (get-universal-time)\n                 :program \"blah\"
>>> :text \"test entry\")\n")
>>>  11: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(add-log- 
>>> entry
>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>                 :program \"blah\" :text \"test entry\")\n"$
>>>  12: (SWANK::EVAL-FOR-EMACS (SWANK:LISTENER-EVAL "(add-log-entry
>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>                :program \"blah\" :text \"test entry\")\n") "CONSO$
>>>  13: (SWANK::PROCESS-REQUESTS NIL)
>>>  14: ((LAMBDA ()))
>>>  15: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>  16: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION {10032FB281}>
>>> #<CLOSURE (LAMBDA #) {10038AD289}>)
>>>  17: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {10032FB281}>
>>> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>  18: (SWANK::HANDLE-REQUESTS #<SWANK::CONNECTION {10032FB281}> NIL)
>>>  19: (SWANK::CALL-WITH-BINDINGS NIL #<CLOSURE (LAMBDA #)  
>>> {10034F5189}>)
>>>  20: ((FLET SB-THREAD::WITH-MUTEX-THUNK))
>>>  21: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477))
>>>  22: (SB-THREAD::CALL-WITH-MUTEX ..)
>>>  23: ((LAMBDA ()))
>>>  24: ("foreign function: call_into_lisp")
>>>  25: ("foreign function: new_thread_trampoline")
>>>
>>>
>>> Ian Eslick wrote:
>>>> Also try pushing :elephant-without-optimize onto *features* prior to
>>>> doing a force rebuild of elephant.  This should give you some more
>>>> information and rule out any optimization/declaration related bugs.
>>>>
>>>> Thank you,
>>>> Ian
>>>>
>>>>
>>>> On Jan 7, 2009, at 9:10 PM, Kevin Raison wrote:
>>>>
>>>>> I was finally able to recreate this in the repl (as opposed to
>>>>> seeing it
>>>>> in my error logs), so here is a trace:
>>>>>
>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>> #<BDB-INDEXED-BTREE oid:2>.
>>>>>   [Condition of type UNBOUND-SLOT]
>>>>>
>>>>> Backtrace:
>>>>>  0: ((SB-PCL::FAST-METHOD SLOT-UNBOUND (T T T)) #<unavailable
>>>>> argument> #<unavailable argument> #<unavailable argument>
>>>>> #<BDB-INDEXED-BTREE oid:2> DB-BDB::INDICES-CACHE)
>>>>>  1: (SB-PCL::SLOT-UNBOUND-INTERNAL #<BDB-INDEXED-BTREE oid:2> 2)
>>>>>  2: ((SB-PCL::FAST-METHOD (SETF GET-VALUE) (T T
>>>>> DB-BDB::BDB-INDEXED-BTREE)) #<unavailable lambda list>)
>>>>>  3: ((LAMBDA (WORD)) "34766")
>>>>>  4: (SB-IMPL::MAP1 #<CLOSURE (LAMBDA #) {1003CA4ED9}> (("" "asa"
>>>>> "106007" "deny" "inbound" "udp" ...)) :LIST T)
>>>>>  5: (MAPCAR #<CLOSURE (LAMBDA #) {1003CA4ED9}> ("" "asa" "106007"
>>>>> "deny" "inbound" "udp" ...))[:EXTERNAL]
>>>>>  6: ((SB-PCL::FAST-METHOD INDEX-LOG-ENTRY (LOG-ENTRY)) #<unavailable
>>>>> argument> #<unavailable argument> #<LOG-ENTRY oid:6373>)
>>>>>  7: ((LAMBDA ()))
>>>>>  8: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unavailable argument>
>>>>> #<unavailable
>>>>> argument> #<unavailable argument> #<unavailable argument>) 
>>>>> [:EXTERNAL]
>>>>>  9: (ADD-LOG-ENTRY ..)
>>>>> 10: (SB-INT:SIMPLE-EVAL-IN-LEXENV ..)
>>>>>
>>>>> And here is the code that sometimes causes the issue:
>>>>> (defmethod index-log-entry ((log-entry log-entry))
>>>>>  (let ((*store-controller* *syslog-controller*))
>>>>>    (let ((btree (get-from-root "log-entry-index" :sc
>>>>> *syslog-controller*)))
>>>>>      (map-words #'(lambda (word)
>>>>>                     (unless (or (member word *superfluous-
>>>>> words* :test
>>>>> #'string-equal)
>>>>>                                 (< (length word) 2))
>>>>>                       (unless (existsp word btree)
>>>>>                         (setf (get-value word btree) (make-pset :sc
>>>>> *syslog-controller*)))
>>>>>                       (insert-item log-entry
>>>>>                                    (get-value word btree))))
>>>>>                 (text log-entry)))))
>>>>>
>>>>>
>>>>> If I drop and recreate the btree, everything runs smoothly (no  
>>>>> errors)
>>>>> for a few hours and then the above error message starts showing up
>>>>> about
>>>>> half to three quarters of the time.
>>>>>
>>>>> Thanks again,
>>>>> Kevin
>>>>>
>>>>>
>>>>> Kevin Raison wrote:
>>>>>> I am seeing an intermittent error with 1.0 alpha when trying to
>>>>>> write to
>>>>>> an indexed btree (using BerkeleyDB 4.7 as provided by Ubuntu's
>>>>>> package
>>>>>> repositories):
>>>>>>
>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>>> #<BDB-INDEXED-BTREE oid:2>
>>>>>>
>>>>>> Within the same thread, sometimes this happens and sometimes I am
>>>>>> able
>>>>>> to read and write to the btree.  I am using sbcl 1.0.24 on 32 bit
>>>>>> intel
>>>>>> linux.  Is there something obvious that might help alleviate this  
>>>>>> or
>>>>>> should I provide more context?
>>>>>>
>>>>>> Thanks!
>>>>>> Kevin
>>>>>>
>>>>>> _______________________________________________
>>>>>> elephant-devel site list
>>>>>> elephant-devel at common-lisp.net
>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>> _______________________________________________
>>>>> elephant-devel site list
>>>>> elephant-devel at common-lisp.net
>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>> _______________________________________________
>>>> elephant-devel site list
>>>> elephant-devel at common-lisp.net
>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
> 
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list