[Ecls-list] ecl with old libc: deadlock in gc due to signal handling

Juan Jose Garcia-Ripoll juanjose.garciaripoll at googlemail.com
Wed Jan 19 21:16:16 UTC 2011


I only have one suggestion, which is to temporarily deregister that thread
so that the garbage collector does not suspend it. Something like


        /* Waiting may fail! */
        int status;
        GC_unregister_my_thread();
        status = sigwait(&handled_set, &signo);
        if (status == 0) {
            if (interrupt_signal == signo)
                goto RETURN;
            signal_code = call_handler(lisp_signal_handler, signo,
                           NULL, NULL);
            if (!Null(signal_code)) {
                mp_process_run_function(3, @'si::handle-signal',
                            @'si::handle-signal',
                            signal_code);
            }
        }
        GC_register_my_thread((void*)&status);

Unfortunately this can not be used when the library works as expected
(sigwait does not block all signals) because some interrupt handlers may
need the garbage collector to work.

Juanjo

On Wed, Jan 19, 2011 at 7:21 PM, Anton Vodonosov <avodonosov at yandex.ru>wrote:

> Hello.
>
> I am building ECL for glibc-2.2.5. With that old glibc version
> a deadlock occurs any time when garbage collection starts.
>
> I found out the mechanics of how it happens.
>
> Not sure if you want to fix it, because the libc version is old,
> but maybe you can provide an advice how can I workaround it.
>
> How it happens. Two parts are involved:
>
> 1. The Boehm-Weiser GC tries to stop all the threads before
>   performing garbage collection (it's called "stop world").
>   This is implemented by sending a SIG_SUSPEND signal to
>   every thread. The signal handler in every thread then
>   tells "ok, I am stopped" to the thread which wants to perform
>   the garbage collection, and then waits until the GC instruct
>   it to restart.
>
>   The "I am stopped" confirmation is sent via a
>   semaphore: sem_post(&GC_suspend_ack_sem).
>
>   The GC expects this from every thread. It performs
>   sem_wait(&GC_suspend_ack_sem) as many times, as
>   many threads were notified by the SIG_SUSPEND signal.
>
>   The corresponding code is in the src/gc/pthread_stop_world.c,
>   the functions GC_stop_world which calls GC_suspend_all.
>   The signal handler behavior is implemented in the
>   GC_suspend_handler_inner.
>
> 2. ECL has a special thread which handles all the signals
>   not handled by other threads.
>
>   See it's implementation in the function
>   asynchronous_signal_servicing_thread, file src/c/unixint.d.
>
>   It is an endless loop of
>      sigwait(<signlals blocked in other threads>);
>
> The deadlock is caused by the difference in sigwait behavior
> between the old libc and the contemporary libc.
>
> Namely, what happens when the asynchronous_signal_servicing_thread
> is waiting in sigwait(<signlals blocked in other threads>),
> and some signal _not_ from this set arrive? In particular, when
> GC sends the SIG_SUSPEND signal.
>
> The contemporary libc calls the signal handler. The old libc
> doesn't call the signal handler; sigwait just blocks
> the signlas other than it waits for.
>
> In result, with the old libc the sem_wait(&GC_suspend_ack_sem)
> is  not performed by the asynchronous_signal_servicing_thread,
> therefore the GC waits on the semaphore forever.
>
> ECL hangs first time the GC is invoked, for example on
> (MAKE-ARRAY 3000000).
>
> What would be the easiest way to workaround this problem?
>
> Best regards,
> - Anton
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Ecls-list mailing list
> Ecls-list at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ecls-list
>



-- 
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20110119/a34a6111/attachment.html>


More information about the ecl-devel mailing list