Hi,<div><br></div><div>I am writing this email with Bcc to the ECL mailing list and to the GC developers mailing list. I just discovered a serious race condition that prevents our program from exiting. This race condition happens between the exit code associated to a call to dlclose() and the exit code from a POSIX thread.</div>
<div><br></div><div>Roughly, we just run ECL, load a bunch of libraries (DLLs) and then quit the program. At exit time two things will happen: the libraries will have to be unloaded and the servicing threads will exit. This results in the program hanging, as shown below</div>
<div><br></div><div>1) This thread is a servicing one. It is trying to exit and in the process it acquires the GC lock, but for some reason the thread invokes the dyld library. I still haven't located where in GC this happens but from the symptoms it seems it is close to GC_unregister...</div>
<div><br></div><div>(gdb) thread 2</div><div><div>(gdb) bt</div><div>#0 0x00007fff88009bf2 in __psynch_mutexwait ()</div><div>#1 0x00007fff897d31a1 in pthread_mutex_lock ()</div><div>#2 0x00007fff84eae623 in dyldGlobalLockAcquire ()</div>
<div>#3 0x00007fff6172a745 in __dyld__ZN26ImageLoaderMachOCompressed20doBindFastLazySymbolEjRKN11ImageLoader11Link\</div><div>ContextEPFvvES5_ ()</div><div>#4 0x00007fff61717922 in __dyld__ZN4dyld18fastBindLazySymbolEPP11ImageLoaderm ()</div>
<div>#5 0x00007fff84eae716 in dyld_stub_binder_ ()</div><div>#6 0x0000000101d01458 in C.88.15036 ()</div><div>#7 0x0000000101c73100 in GC_inner_start_routine (sb=0x1041deeb0, arg=0x102117ea0) at pthread_start.c:67</div>
<div>#8 0x0000000101c6eb1c in GC_call_with_stack_base (fn=0x101c73030 <GC_inner_start_routine>, arg=0x102117ea0) a\</div><div>t misc.c:1510</div><div>#9 0x0000000101c74565 in GC_start_routine (arg=0x102117ea0) at pthread_support.c:1504</div>
<div>#10 0x00007fff897d48bf in _pthread_start ()</div><div>#11 0x00007fff897d7b75 in thread_start ()</div><div><br></div><div>2) This thread is the main one. It is trying to close a bunch of libraries, none of which are related to the thread above. However, when dlclose() is called, some code associated to the garbage collector is run and we enter a race condition.</div>
<div><br></div><div>(gdb) thread 1</div><div>[Switching to thread 1 (process 37491), "com.apple.main-thread"]</div><div>0x00007fff88009bf2 in __psynch_mutexwait ()</div><div>(gdb) bt</div><div>#0 0x00007fff88009bf2 in __psynch_mutexwait ()</div>
<div>#1 0x00007fff897d31a1 in pthread_mutex_lock ()</div><div>#2 0x0000000101c74833 in GC_lock () at pthread_support.c:1784</div><div>#3 0x0000000101c6c53d in GC_remove_roots (b=0x104f03220, e=0x104f03238) at mark_rts.c:311</div>
<div>#4 0x0000000101c61f20 in GC_dyld_image_remove (hdr=0x104eff000, slide=4377800704) at dyn_load.c:1319</div><div>#5 0x00007fff61714bdd in __dyld__ZN4dyld11removeImageEP11ImageLoader ()</div><div>#6 0x00007fff6171858d in __dyld__ZN4dyld20garbageCollectImagesEv ()</div>
<div>#7 0x00007fff6171c432 in __dyld_dlclose ()</div><div>#8 0x00007fff84eaebd5 in dlclose ()</div><div>#9 0x0000000101c2ae8c in dlclose_wrapper [inlined] () at /Users/jjgarcia/devel/ecl/src/c/ffi/libraries.d:432</div>
<div>#10 0x0000000101c2ae8c in ecl_library_close (block=0x103be4e00) at libraries.d:432</div><div>#11 0x0000000101c2af79 in ecl_library_close_all () at libraries.d:448</div><div>#12 0x0000000101b1a84d in cl_shutdown () at main.d:301</div>
<div>#13 0x0000000101b1a964 in si_exit (narg=4377800704) at main.d:839</div><div>#14 0x0000000101b13e47 in main ()</div></div><div><br></div><div><div><br></div>-- <br>Instituto de Física Fundamental, CSIC<br>c/ Serrano, 113b, Madrid 28006 (Spain) <br>
<a href="http://juanjose.garciaripoll.googlepages.com" target="_blank">http://juanjose.garciaripoll.googlepages.com</a><br>
</div>