"Got signal before environment was installed on our thread"

Daniel Kochmański daniel at turtleware.eu
Mon Sep 4 10:15:19 UTC 2017


 From the backtrace it is sure that fail is caused inside the call to 
GC_init. Such errors are known to have happened when another GC was 
initialized already on the system (I've linked the issue). It might be 
caused by something else in bdwgc, I don't know. Either way I'd focus on 
GC_init part.

To make sure, that I'm right with my assertion you may put printf before 
and after call to GC_init. I'm not quite familiar with bdwgc internals 
to say, what is wrong though. Maybe updating bundled sources of GC will 
help? Or linking with libgc on the system? It might be that it was a bug 
in bdwgc which got already fixed.

Regards,

Daniel


On 04.09.2017 12:04, Dima Pasechnik wrote:
> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <daniel at turtleware.eu> wrote:
>> I dont think its related to shared vs static - rather two gc running
>> concurrently. Try commenting out GC_init call in ecl and see what happens.
> I don't understand how two GCs can run concurrently on a memory region
> controlled by ECL which is statically linked to GC...
> In fact I am pretty sure no other instances of GC are running anywhere
> within our process tree.
>
> By the way, I don't know whether it's obvious from the backtrace that
> cl_boot() has been completed, or not.
>
> If it actually was completed, could it be a bug that invalidates the
> bit indicating that cl_boot() has been done?
>
> We have seen similar troubles with clang recently, related to FPE.
> There an FPE bit was flipped by assignment of a double to an
> integer type (sic!).
> It took us a lot of head banging on various hard surfaces to debug this:
> https://trac.sagemath.org/ticket/22799
> it turned out we did hit a known bug:
> https://bugs.llvm.org//show_bug.cgi?id=17686
>
>> Do you need sigchld for anything? Run-program was rewritten and sigchld
>> handling wasnt viable option anymore for it.
>>
> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we
> now can simply skip it all together.
>
> Thanks,
> Dima
>
>> Im on phone, will be avail after the weekend.
>>
>> Regards, D.
>>
>>
>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik <dimpase+ecl at gmail.com>
>> napisał(a):
>>> Hi Daniel,
>>> Thanks for the message. The scenario you talk about only happens if GC
>>> is a shared library, right?
>>>
>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to GC.
>>> And I still get very similar segfaults:
>>>
>>> ;;; ECL C Backtrace
>>> ;;;    0 ecl_internal_error (0x87d79b375)
>>> ;;;    1 init_unixint (0x87d7c17e0)
>>> ;;;    2 init_unixint (0x87d7c1582)
>>> ;;;    3 pthread_sigmask (0x80103779d)
>>> ;;;    4 pthread_getspecific (0x801036d6f)
>>> ;;;    5 unknown (0x7ffffffff193)
>>> ;;;    6 GC_push_current_stack (0x87d7ef7c3)
>>> ;;;    7 GC_with_callee_saves_pushed (0x87d7f7360)
>>> ;;;    8 GC_push_roots (0x87d7ef9c2)
>>> ;;;    9 GC_mark_some (0x87d7ec97c)
>>> ;;;   10 GC_stopped_mark (0x87d7e6b7a)
>>> ;;;   11 GC_try_to_collect_inner (0x87d7e6a75)
>>> ;;;   12 GC_init (0x87d7f08ea)
>>> ;;;   13 init_alloc (0x87d7d5669)
>>> ;;;   14 cl_boot (0x87d69f66b)
>>> ...
>>>
>>> And a very similar picture on the develop branch of ECL - although
>>> I had to change our code, as  in particular
>>> ECL_OPT_TRAP_SIGCHLD is gone...
>>>
>>> So, what can it be? Some signals issue?
>>>
>>> Thanks,
>>> Dima
>>>
>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <daniel at turtleware.eu>
>>> wrote:
>>>>   Hey Dima,
>>>>
>>>>   this looks like the issue with having GC initialized before ECL kicks
>>>> in.
>>>>   See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a
>>>>   discussion about this problem. Basically some other component already
>>>> called
>>>>   GC_init and ECL calls it once more. It's arguably not a bug.
>>>>
>>>>   Best regards,
>>>>
>>>>   Daniel
>>>>
>>>>
>>>>   On 31.08.2017 15:29, Dima Pasechnik wrote:
>>>>>
>>>>>   Dear all,
>>>>>
>>>>>   I'm struggling to understand strange segfaults coming from
>>>>>   ECL(+Maxima) on FreeBSD embedded into Python; they typically look as
>>>>>   follows:
>>>>>
>>>>>   Got signal before environment was installed on our thread
>>>>>      [2: No such file or directory]
>>>>>
>>>>>   ;;; ECL C Backtrace
>>>>>   ;;;    0 ecl_internal_error (0x87d790765)
>>>>>   ;;;    1 init_unixint (0x87d7b6bd0)
>>>>>   ;;;    2 init_unixint (0x87d7b6972)
>>>>>   ;;;    3 pthread_sigmask (0x80103779d)
>>>>>   ;;;    4 pthread_getspecific (0x801036d6f)
>>>>>   ;;;    5 unknown (0x7ffffffff193)
>>>>>   ;;;    6 GC_push_all_stacks (0x87db1ea2c)
>>>>>   ;;;    7 GC_mark_some (0x87db12eec)
>>>>>   ;;;    8 GC_stopped_mark (0x87db09baa)
>>>>>   ;;;    9 GC_try_to_collect_inner (0x87db09a75)
>>>>>   ;;;   10 GC_init (0x87db16f4f)
>>>>>   ;;;   11 init_alloc (0x87d7caa59)
>>>>>   ;;;   12 cl_boot (0x87d694a5b)
>>>>>   ;;;   13 initecl (0x87d218340)
>>>>>   ;;;   14 initecl (0x87d20a43f)
>>>>>   ;;;   15 initecl (0x87d207e28)
>>>>>   ;;;   16 _PyImport_LoadDynamicModule (0x800b3ed1c)
>>>>>   ;;;   17 PyImport_AppendInittab (0x800b3d71f)
>>>>>   ;;;   18 PyImport_AppendInittab (0x800b3d1a8)
>>>>>   ;;;   19 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>   ;;;   20 _PyBuiltin_Init (0x800b162d7)
>>>>>   ;;;   21 PyObject_Call (0x800a7d3e3)
>>>>>   ;;;   22 PyEval_EvalFrameEx (0x800b2121c)
>>>>>   ;;;   23 PyEval_EvalCodeEx (0x800b1b5d4)
>>>>>   ;;;   24 PyEval_EvalCode (0x800b1ad96)
>>>>>   ;;;   25 PyImport_ExecCodeModuleEx (0x800b3ad11)
>>>>>   ;;;   26 PyImport_AppendInittab (0x800b3ddb8)
>>>>>   ;;;   27 PyImport_AppendInittab (0x800b3d71f)
>>>>>   ;;;   28 PyImport_AppendInittab (0x800b3d1a8)
>>>>>   ;;;   29 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>   ;;;   30 _PyBuiltin_Init (0x800b162d7)
>>>>>   ;;;   31 PyEval_EvalFrameEx (0x800b22dd1)
>>>>>   Segmentation fault (core dumped)
>>>>>
>>>>>   It looks as if ECL (version 16.1.2) is being called before an
>>>>>   initialisation is complete, but it it possible to say more without a
>>>>>   debugger?
>>>>>
>>>>>   More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0
>>>>>   with libatomic_ops version 7.4.6.
>>>>>   And only reproducible on FreeBSD.
>>>>>
>>>>>   ECL is built with --disable-threads; GC is built with or without
>>>>>   threads---result is still the same.
>>>>>   (so it's unclear to me where pthread_* calls in the trace
>>>>>   come from).
>>>>>
>>>>>   Thanks,
>>>>>   Dima
>>>>>
>>>>>   PS. the segfault is at the bottom of
>>>>>   https://trac.sagemath.org/ticket/22679#comment:87
>>>>
>>>>
>> -- Wysłane za pomocą K-9 Mail.




More information about the ecl-devel mailing list