"Got signal before environment was installed on our thread"
Fabrizio Fabbri
strabixbox at yahoo.com
Tue Sep 12 00:18:30 UTC 2017
> On Sep 11, 2017, at 7:13 PM, Dima Pasechnik <dimpase+ecl at gmail.com> wrote:
>
>> On Mon, Sep 4, 2017 at 11:15 AM, Daniel Kochmański <daniel at turtleware.eu> wrote:
>> From the backtrace it is sure that fail is caused inside the call to
>> GC_init. Such errors are known to have happened when another GC was
>> initialized already on the system (I've linked the issue). It might be
>> caused by something else in bdwgc, I don't know. Either way I'd focus on
>> GC_init part.
>
> Our project (sagemath) only uses libgc within the embedded ECL. Thus I
> am really puzzled how another libgc instance might kick in and spoil
> the game for ECL.
>
> One possibility is that clang is using libgc, and thus, in principle,
> libgc might be sitting somewhere in the runtime?!
>
>
>>
>> To make sure, that I'm right with my assertion you may put printf before and
>> after call to GC_init. I'm not quite familiar with bdwgc internals to say,
>> what is wrong though. Maybe updating bundled sources of GC will help? Or
>> linking with libgc on the system? It might be that it was a bug in bdwgc
>> which got already fixed.
>
> We are not using the bdwgc shipped with ECL, we use a separate libgc
> 7.6.0, which is the latest stable.
> (Is there a reason to ship bdwgc sources with ECL - do you patch it?)
>
I'm using ecl with the non embedded bdwgc as well and I don't have issue.
Ensure that bdwgc it's not also build statically in ecl as well. I expect linking problems in that case but worth it double check.
> Thanks,
> Dima
>
>>
>> Regards,
>>
>> Daniel
>>
>>
>>
>>> On 04.09.2017 12:04, Dima Pasechnik wrote:
>>>
>>> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <daniel at turtleware.eu>
>>> wrote:
>>>>
>>>> I dont think its related to shared vs static - rather two gc running
>>>> concurrently. Try commenting out GC_init call in ecl and see what
>>>> happens.
>>>
>>> I don't understand how two GCs can run concurrently on a memory region
>>> controlled by ECL which is statically linked to GC...
>>> In fact I am pretty sure no other instances of GC are running anywhere
>>> within our process tree.
>>>
>>> By the way, I don't know whether it's obvious from the backtrace that
>>> cl_boot() has been completed, or not.
>>>
>>> If it actually was completed, could it be a bug that invalidates the
>>> bit indicating that cl_boot() has been done?
>>>
>>> We have seen similar troubles with clang recently, related to FPE.
>>> There an FPE bit was flipped by assignment of a double to an
>>> integer type (sic!).
>>> It took us a lot of head banging on various hard surfaces to debug this:
>>> https://trac.sagemath.org/ticket/22799
>>> it turned out we did hit a known bug:
>>> https://bugs.llvm.org//show_bug.cgi?id=17686
>>>
>>>> Do you need sigchld for anything? Run-program was rewritten and sigchld
>>>> handling wasnt viable option anymore for it.
>>>>
>>> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we
>>> now can simply skip it all together.
>>>
>>> Thanks,
>>> Dima
>>>
>>>> Im on phone, will be avail after the weekend.
>>>>
>>>> Regards, D.
>>>>
>>>>
>>>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik
>>>> <dimpase+ecl at gmail.com>
>>>> napisał(a):
>>>>>
>>>>> Hi Daniel,
>>>>> Thanks for the message. The scenario you talk about only happens if GC
>>>>> is a shared library, right?
>>>>>
>>>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to
>>>>> GC.
>>>>> And I still get very similar segfaults:
>>>>>
>>>>> ;;; ECL C Backtrace
>>>>> ;;; 0 ecl_internal_error (0x87d79b375)
>>>>> ;;; 1 init_unixint (0x87d7c17e0)
>>>>> ;;; 2 init_unixint (0x87d7c1582)
>>>>> ;;; 3 pthread_sigmask (0x80103779d)
>>>>> ;;; 4 pthread_getspecific (0x801036d6f)
>>>>> ;;; 5 unknown (0x7ffffffff193)
>>>>> ;;; 6 GC_push_current_stack (0x87d7ef7c3)
>>>>> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360)
>>>>> ;;; 8 GC_push_roots (0x87d7ef9c2)
>>>>> ;;; 9 GC_mark_some (0x87d7ec97c)
>>>>> ;;; 10 GC_stopped_mark (0x87d7e6b7a)
>>>>> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75)
>>>>> ;;; 12 GC_init (0x87d7f08ea)
>>>>> ;;; 13 init_alloc (0x87d7d5669)
>>>>> ;;; 14 cl_boot (0x87d69f66b)
>>>>> ...
>>>>>
>>>>> And a very similar picture on the develop branch of ECL - although
>>>>> I had to change our code, as in particular
>>>>> ECL_OPT_TRAP_SIGCHLD is gone...
>>>>>
>>>>> So, what can it be? Some signals issue?
>>>>>
>>>>> Thanks,
>>>>> Dima
>>>>>
>>>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <daniel at turtleware.eu>
>>>>> wrote:
>>>>>>
>>>>>> Hey Dima,
>>>>>>
>>>>>> this looks like the issue with having GC initialized before ECL kicks
>>>>>> in.
>>>>>> See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a
>>>>>> discussion about this problem. Basically some other component already
>>>>>> called
>>>>>> GC_init and ECL calls it once more. It's arguably not a bug.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>>> On 31.08.2017 15:29, Dima Pasechnik wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I'm struggling to understand strange segfaults coming from
>>>>>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as
>>>>>>> follows:
>>>>>>>
>>>>>>> Got signal before environment was installed on our thread
>>>>>>> [2: No such file or directory]
>>>>>>>
>>>>>>> ;;; ECL C Backtrace
>>>>>>> ;;; 0 ecl_internal_error (0x87d790765)
>>>>>>> ;;; 1 init_unixint (0x87d7b6bd0)
>>>>>>> ;;; 2 init_unixint (0x87d7b6972)
>>>>>>> ;;; 3 pthread_sigmask (0x80103779d)
>>>>>>> ;;; 4 pthread_getspecific (0x801036d6f)
>>>>>>> ;;; 5 unknown (0x7ffffffff193)
>>>>>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c)
>>>>>>> ;;; 7 GC_mark_some (0x87db12eec)
>>>>>>> ;;; 8 GC_stopped_mark (0x87db09baa)
>>>>>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75)
>>>>>>> ;;; 10 GC_init (0x87db16f4f)
>>>>>>> ;;; 11 init_alloc (0x87d7caa59)
>>>>>>> ;;; 12 cl_boot (0x87d694a5b)
>>>>>>> ;;; 13 initecl (0x87d218340)
>>>>>>> ;;; 14 initecl (0x87d20a43f)
>>>>>>> ;;; 15 initecl (0x87d207e28)
>>>>>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c)
>>>>>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f)
>>>>>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8)
>>>>>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>>> ;;; 20 _PyBuiltin_Init (0x800b162d7)
>>>>>>> ;;; 21 PyObject_Call (0x800a7d3e3)
>>>>>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c)
>>>>>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4)
>>>>>>> ;;; 24 PyEval_EvalCode (0x800b1ad96)
>>>>>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11)
>>>>>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8)
>>>>>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f)
>>>>>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8)
>>>>>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>>> ;;; 30 _PyBuiltin_Init (0x800b162d7)
>>>>>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1)
>>>>>>> Segmentation fault (core dumped)
>>>>>>>
>>>>>>> It looks as if ECL (version 16.1.2) is being called before an
>>>>>>> initialisation is complete, but it it possible to say more without a
>>>>>>> debugger?
>>>>>>>
>>>>>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0
>>>>>>> with libatomic_ops version 7.4.6.
>>>>>>> And only reproducible on FreeBSD.
>>>>>>>
>>>>>>> ECL is built with --disable-threads; GC is built with or without
>>>>>>> threads---result is still the same.
>>>>>>> (so it's unclear to me where pthread_* calls in the trace
>>>>>>> come from).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dima
>>>>>>>
>>>>>>> PS. the segfault is at the bottom of
>>>>>>> https://trac.sagemath.org/ticket/22679#comment:87
>>>>>>
>>>>>>
>>>>>>
>>>> -- Wysłane za pomocą K-9 Mail.
>>
>>
>
.
More information about the ecl-devel
mailing list