"Got signal before environment was installed on our thread"

Tue Sep 12 00:18:30 UTC 2017

> On Sep 11, 2017, at 7:13 PM, Dima Pasechnik <dimpase+ecl at gmail.com> wrote:
> 
>> On Mon, Sep 4, 2017 at 11:15 AM, Daniel Kochmański <daniel at turtleware.eu> wrote:
>> From the backtrace it is sure that fail is caused inside the call to
>> GC_init. Such errors are known to have happened when another GC was
>> initialized already on the system (I've linked the issue). It might be
>> caused by something else in bdwgc, I don't know. Either way I'd focus on
>> GC_init part.
> 
> Our project (sagemath) only uses libgc within the embedded ECL. Thus I
> am really puzzled how another libgc instance might kick in and spoil
> the game for ECL.
> 
> One possibility is that clang is using libgc, and thus, in principle,
> libgc might be sitting somewhere in the runtime?!
> 
> 
>> 
>> To make sure, that I'm right with my assertion you may put printf before and
>> after call to GC_init. I'm not quite familiar with bdwgc internals to say,
>> what is wrong though. Maybe updating bundled sources of GC will help? Or
>> linking with libgc on the system? It might be that it was a bug in bdwgc
>> which got already fixed.
> 
> We are not using the bdwgc shipped with ECL, we use a separate libgc
> 7.6.0, which is the latest stable.
> (Is there a reason to ship bdwgc sources with ECL - do you patch it?)
> 

I'm using ecl with the non embedded bdwgc as well and I don't have issue.

Ensure that bdwgc it's not also build statically in ecl as well. I expect linking problems in that case but worth it double check.

> Thanks,
> Dima
> 
>> 
>> Regards,
>> 
>> Daniel
>> 
>> 
>> 
>>> On 04.09.2017 12:04, Dima Pasechnik wrote:
>>> 
>>> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <daniel at turtleware.eu>
>>> wrote:
>>>> 
>>>> I dont think its related to shared vs static - rather two gc running
>>>> concurrently. Try commenting out GC_init call in ecl and see what
>>>> happens.
>>> 
>>> I don't understand how two GCs can run concurrently on a memory region
>>> controlled by ECL which is statically linked to GC...
>>> In fact I am pretty sure no other instances of GC are running anywhere
>>> within our process tree.
>>> 
>>> By the way, I don't know whether it's obvious from the backtrace that
>>> cl_boot() has been completed, or not.
>>> 
>>> If it actually was completed, could it be a bug that invalidates the
>>> bit indicating that cl_boot() has been done?
>>> 
>>> We have seen similar troubles with clang recently, related to FPE.
>>> There an FPE bit was flipped by assignment of a double to an
>>> integer type (sic!).
>>> It took us a lot of head banging on various hard surfaces to debug this:
>>> https://trac.sagemath.org/ticket/22799
>>> it turned out we did hit a known bug:
>>> https://bugs.llvm.org//show_bug.cgi?id=17686
>>> 
>>>> Do you need sigchld for anything? Run-program was rewritten and sigchld
>>>> handling wasnt viable option anymore for it.
>>>> 
>>> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we
>>> now can simply skip it all together.
>>> 
>>> Thanks,
>>> Dima
>>> 
>>>> Im on phone, will be avail after the weekend.
>>>> 
>>>> Regards, D.
>>>> 
>>>> 
>>>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik
>>>> <dimpase+ecl at gmail.com>
>>>> napisał(a):
>>>>> 
>>>>> Hi Daniel,
>>>>> Thanks for the message. The scenario you talk about only happens if GC
>>>>> is a shared library, right?
>>>>> 
>>>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to
>>>>> GC.
>>>>> And I still get very similar segfaults:
>>>>> 
>>>>> ;;; ECL C Backtrace
>>>>> ;;;    0 ecl_internal_error (0x87d79b375)
>>>>> ;;;    1 init_unixint (0x87d7c17e0)
>>>>> ;;;    2 init_unixint (0x87d7c1582)
>>>>> ;;;    3 pthread_sigmask (0x80103779d)
>>>>> ;;;    4 pthread_getspecific (0x801036d6f)
>>>>> ;;;    5 unknown (0x7ffffffff193)
>>>>> ;;;    6 GC_push_current_stack (0x87d7ef7c3)
>>>>> ;;;    7 GC_with_callee_saves_pushed (0x87d7f7360)
>>>>> ;;;    8 GC_push_roots (0x87d7ef9c2)
>>>>> ;;;    9 GC_mark_some (0x87d7ec97c)
>>>>> ;;;   10 GC_stopped_mark (0x87d7e6b7a)
>>>>> ;;;   11 GC_try_to_collect_inner (0x87d7e6a75)
>>>>> ;;;   12 GC_init (0x87d7f08ea)
>>>>> ;;;   13 init_alloc (0x87d7d5669)
>>>>> ;;;   14 cl_boot (0x87d69f66b)
>>>>> ...
>>>>> 
>>>>> And a very similar picture on the develop branch of ECL - although
>>>>> I had to change our code, as  in particular
>>>>> ECL_OPT_TRAP_SIGCHLD is gone...
>>>>> 
>>>>> So, what can it be? Some signals issue?
>>>>> 
>>>>> Thanks,
>>>>> Dima
>>>>> 
>>>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <daniel at turtleware.eu>
>>>>> wrote:
>>>>>> 
>>>>>>  Hey Dima,
>>>>>> 
>>>>>>  this looks like the issue with having GC initialized before ECL kicks
>>>>>> in.
>>>>>>  See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a
>>>>>>  discussion about this problem. Basically some other component already
>>>>>> called
>>>>>>  GC_init and ECL calls it once more. It's arguably not a bug.
>>>>>> 
>>>>>>  Best regards,
>>>>>> 
>>>>>>  Daniel
>>>>>> 
>>>>>> 
>>>>>>>  On 31.08.2017 15:29, Dima Pasechnik wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>  Dear all,
>>>>>>> 
>>>>>>>  I'm struggling to understand strange segfaults coming from
>>>>>>>  ECL(+Maxima) on FreeBSD embedded into Python; they typically look as
>>>>>>>  follows:
>>>>>>> 
>>>>>>>  Got signal before environment was installed on our thread
>>>>>>>     [2: No such file or directory]
>>>>>>> 
>>>>>>>  ;;; ECL C Backtrace
>>>>>>>  ;;;    0 ecl_internal_error (0x87d790765)
>>>>>>>  ;;;    1 init_unixint (0x87d7b6bd0)
>>>>>>>  ;;;    2 init_unixint (0x87d7b6972)
>>>>>>>  ;;;    3 pthread_sigmask (0x80103779d)
>>>>>>>  ;;;    4 pthread_getspecific (0x801036d6f)
>>>>>>>  ;;;    5 unknown (0x7ffffffff193)
>>>>>>>  ;;;    6 GC_push_all_stacks (0x87db1ea2c)
>>>>>>>  ;;;    7 GC_mark_some (0x87db12eec)
>>>>>>>  ;;;    8 GC_stopped_mark (0x87db09baa)
>>>>>>>  ;;;    9 GC_try_to_collect_inner (0x87db09a75)
>>>>>>>  ;;;   10 GC_init (0x87db16f4f)
>>>>>>>  ;;;   11 init_alloc (0x87d7caa59)
>>>>>>>  ;;;   12 cl_boot (0x87d694a5b)
>>>>>>>  ;;;   13 initecl (0x87d218340)
>>>>>>>  ;;;   14 initecl (0x87d20a43f)
>>>>>>>  ;;;   15 initecl (0x87d207e28)
>>>>>>>  ;;;   16 _PyImport_LoadDynamicModule (0x800b3ed1c)
>>>>>>>  ;;;   17 PyImport_AppendInittab (0x800b3d71f)
>>>>>>>  ;;;   18 PyImport_AppendInittab (0x800b3d1a8)
>>>>>>>  ;;;   19 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>>>  ;;;   20 _PyBuiltin_Init (0x800b162d7)
>>>>>>>  ;;;   21 PyObject_Call (0x800a7d3e3)
>>>>>>>  ;;;   22 PyEval_EvalFrameEx (0x800b2121c)
>>>>>>>  ;;;   23 PyEval_EvalCodeEx (0x800b1b5d4)
>>>>>>>  ;;;   24 PyEval_EvalCode (0x800b1ad96)
>>>>>>>  ;;;   25 PyImport_ExecCodeModuleEx (0x800b3ad11)
>>>>>>>  ;;;   26 PyImport_AppendInittab (0x800b3ddb8)
>>>>>>>  ;;;   27 PyImport_AppendInittab (0x800b3d71f)
>>>>>>>  ;;;   28 PyImport_AppendInittab (0x800b3d1a8)
>>>>>>>  ;;;   29 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>>>>  ;;;   30 _PyBuiltin_Init (0x800b162d7)
>>>>>>>  ;;;   31 PyEval_EvalFrameEx (0x800b22dd1)
>>>>>>>  Segmentation fault (core dumped)
>>>>>>> 
>>>>>>>  It looks as if ECL (version 16.1.2) is being called before an
>>>>>>>  initialisation is complete, but it it possible to say more without a
>>>>>>>  debugger?
>>>>>>> 
>>>>>>>  More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0
>>>>>>>  with libatomic_ops version 7.4.6.
>>>>>>>  And only reproducible on FreeBSD.
>>>>>>> 
>>>>>>>  ECL is built with --disable-threads; GC is built with or without
>>>>>>>  threads---result is still the same.
>>>>>>>  (so it's unclear to me where pthread_* calls in the trace
>>>>>>>  come from).
>>>>>>> 
>>>>>>>  Thanks,
>>>>>>>  Dima
>>>>>>> 
>>>>>>>  PS. the segfault is at the bottom of
>>>>>>>  https://trac.sagemath.org/ticket/22679#comment:87
>>>>>> 
>>>>>> 
>>>>>> 
>>>> -- Wysłane za pomocą K-9 Mail.
>> 
>> 
> 

.