[elephant-devel] Re: Updated version of last Postmodern patch bundle

Alex Mizrahi killerstorm at newmail.ru
Fri Mar 21 12:49:35 UTC 2008


i have several notes about concurrent tests:

1. threaded-idx-access and provoke-deadlock are essentially same -- they 
change slot values of zork instances.
    there are some differences: threaded-idx-access runs multiple test 
batches, provoke-deadlock works in one batch.
    provoke-deadlock runs offending number of threads -- 30, instead of 5 in 
theaded-idx-acess.
    provoke-deadlock concentrates on first object, while threaded-idx-access 
works on all of them.

    i do not think any of these differences are essential. having multiple 
slightly different tests hoping they will spot errors is like "voodoo 
programming", i think we should delete provoke-deadlock.
    or rewrite it in a way it will be really different. (classic way to 
provoke deadlock is to do criss-cross slot updates).

2. provoke-deadlock does not wait until threads are finished before it wipes 
classes and closes controller. obviously, this causes weird errors.
    is it intentianal behaviour?

3. threaded-idx-access joins to all threads it finds. i think it's not OK 
because there might be threads spawned by SLIME etc.
    it's better to collect threads created and join to those ones we 
created, i.e.

    (mapcar #'sb-thread:join-thread
          (loop for i from 1 to 5
            collect (let ((i i))
                         (bt:make-thread (lambda () ...)))

   also note (let ((i i)) -- this is essential, otherwise all closures will 
refer to same i binding.

4. join-thread seems to be SBCL-specific. can't we find some thread-synch 
primitive that will work on all implementations? (i haven't yet looked into 
this).

5. threaded-idx-access is not really an automatic test -- if something gets 
broken, it lands into a debugger.
    i think that there should be something like handler-case inside each 
thread, and if error happens inside the thread, it should report it to main 
thread, that will re-throw it, or something.
    also, it's worth checking that operations were correct -- classes are 
accessible via index, slot values are correct etc.

now, about test runs. indeed, it yields lots of _different_ deadlocks in 
isolation mode "read commited". most bizzare one i've seen was result of 
interaction of four(!) threads, however there were ones with mere two 
participants.

OTOH results of running this with serializable isolation mode are quite 
consistent:

WARNING:
   Error while executing prepared statement "TREE12DELETE-BOTH" (params: (0
                                                                          433)).
retrying txn due to:Database error 40001: could not serialize access due to 
concurrent update

conclusions are clear to me: use serializable isolation mode FTW.
it's not worth trying to fix working in "read committed" mode, because there 
is no logical grounds for it to be working fine when data can be changed at 
any time and it not consistent within a single transaction.
i think it's better to completely ban "read committed" mode, but i can make 
this configurable for people with masochistic intentions :).

as i have ideas how to "improve" concurrent test suite, probably it would be 
better if i'll just take over the suite and implement them. however, i'm 
open about other ideas about future of this test suite :).

also i have a question unrelated to test suite: it says me that 
thread-alive-p used in reap-orphaned-connections in db-postmodern is not 
defined. where should this function come from, newer version of bordeax 
threads (perhaps i have an outdated package)? 






More information about the elephant-devel mailing list