"Got signal before environment was installed on our thread"

Mon Sep 4 10:04:35 UTC 2017

On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <daniel at turtleware.eu> wrote:
> I dont think its related to shared vs static - rather two gc running
> concurrently. Try commenting out GC_init call in ecl and see what happens.

I don't understand how two GCs can run concurrently on a memory region
controlled by ECL which is statically linked to GC...
In fact I am pretty sure no other instances of GC are running anywhere
within our process tree.

By the way, I don't know whether it's obvious from the backtrace that
cl_boot() has been completed, or not.

If it actually was completed, could it be a bug that invalidates the
bit indicating that cl_boot() has been done?

We have seen similar troubles with clang recently, related to FPE.
There an FPE bit was flipped by assignment of a double to an
integer type (sic!).
It took us a lot of head banging on various hard surfaces to debug this:
https://trac.sagemath.org/ticket/22799
it turned out we did hit a known bug:
https://bugs.llvm.org//show_bug.cgi?id=17686

>
> Do you need sigchld for anything? Run-program was rewritten and sigchld
> handling wasnt viable option anymore for it.
>
We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we
now can simply skip it all together.

Thanks,
Dima

> Im on phone, will be avail after the weekend.
>
> Regards, D.
>
>
> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik <dimpase+ecl at gmail.com>
> napisał(a):
>>
>> Hi Daniel,
>> Thanks for the message. The scenario you talk about only happens if GC
>> is a shared library, right?
>>
>> I've rebuilt GC disabling shared libs, and ECL doing static linking to GC.
>> And I still get very similar segfaults:
>>
>> ;;; ECL C Backtrace
>> ;;;    0 ecl_internal_error (0x87d79b375)
>> ;;;    1 init_unixint (0x87d7c17e0)
>> ;;;    2 init_unixint (0x87d7c1582)
>> ;;;    3 pthread_sigmask (0x80103779d)
>> ;;;    4 pthread_getspecific (0x801036d6f)
>> ;;;    5 unknown (0x7ffffffff193)
>> ;;;    6 GC_push_current_stack (0x87d7ef7c3)
>> ;;;    7 GC_with_callee_saves_pushed (0x87d7f7360)
>> ;;;    8 GC_push_roots (0x87d7ef9c2)
>> ;;;    9 GC_mark_some (0x87d7ec97c)
>> ;;;   10 GC_stopped_mark (0x87d7e6b7a)
>> ;;;   11 GC_try_to_collect_inner (0x87d7e6a75)
>> ;;;   12 GC_init (0x87d7f08ea)
>> ;;;   13 init_alloc (0x87d7d5669)
>> ;;;   14 cl_boot (0x87d69f66b)
>> ...
>>
>> And a very similar picture on the develop branch of ECL - although
>> I had to change our code, as  in particular
>> ECL_OPT_TRAP_SIGCHLD is gone...
>>
>> So, what can it be? Some signals issue?
>>
>> Thanks,
>> Dima
>>
>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <daniel at turtleware.eu>
>> wrote:
>>>
>>>  Hey Dima,
>>>
>>>  this looks like the issue with having GC initialized before ECL kicks
>>> in.
>>>  See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a
>>>  discussion about this problem. Basically some other component already
>>> called
>>>  GC_init and ECL calls it once more. It's arguably not a bug.
>>>
>>>  Best regards,
>>>
>>>  Daniel
>>>
>>>
>>>  On 31.08.2017 15:29, Dima Pasechnik wrote:
>>>>
>>>>
>>>>  Dear all,
>>>>
>>>>  I'm struggling to understand strange segfaults coming from
>>>>  ECL(+Maxima) on FreeBSD embedded into Python; they typically look as
>>>>  follows:
>>>>
>>>>  Got signal before environment was installed on our thread
>>>>     [2: No such file or directory]
>>>>
>>>>  ;;; ECL C Backtrace
>>>>  ;;;    0 ecl_internal_error (0x87d790765)
>>>>  ;;;    1 init_unixint (0x87d7b6bd0)
>>>>  ;;;    2 init_unixint (0x87d7b6972)
>>>>  ;;;    3 pthread_sigmask (0x80103779d)
>>>>  ;;;    4 pthread_getspecific (0x801036d6f)
>>>>  ;;;    5 unknown (0x7ffffffff193)
>>>>  ;;;    6 GC_push_all_stacks (0x87db1ea2c)
>>>>  ;;;    7 GC_mark_some (0x87db12eec)
>>>>  ;;;    8 GC_stopped_mark (0x87db09baa)
>>>>  ;;;    9 GC_try_to_collect_inner (0x87db09a75)
>>>>  ;;;   10 GC_init (0x87db16f4f)
>>>>  ;;;   11 init_alloc (0x87d7caa59)
>>>>  ;;;   12 cl_boot (0x87d694a5b)
>>>>  ;;;   13 initecl (0x87d218340)
>>>>  ;;;   14 initecl (0x87d20a43f)
>>>>  ;;;   15 initecl (0x87d207e28)
>>>>  ;;;   16 _PyImport_LoadDynamicModule (0x800b3ed1c)
>>>>  ;;;   17 PyImport_AppendInittab (0x800b3d71f)
>>>>  ;;;   18 PyImport_AppendInittab (0x800b3d1a8)
>>>>  ;;;   19 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>  ;;;   20 _PyBuiltin_Init (0x800b162d7)
>>>>  ;;;   21 PyObject_Call (0x800a7d3e3)
>>>>  ;;;   22 PyEval_EvalFrameEx (0x800b2121c)
>>>>  ;;;   23 PyEval_EvalCodeEx (0x800b1b5d4)
>>>>  ;;;   24 PyEval_EvalCode (0x800b1ad96)
>>>>  ;;;   25 PyImport_ExecCodeModuleEx (0x800b3ad11)
>>>>  ;;;   26 PyImport_AppendInittab (0x800b3ddb8)
>>>>  ;;;   27 PyImport_AppendInittab (0x800b3d71f)
>>>>  ;;;   28 PyImport_AppendInittab (0x800b3d1a8)
>>>>  ;;;   29 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>>  ;;;   30 _PyBuiltin_Init (0x800b162d7)
>>>>  ;;;   31 PyEval_EvalFrameEx (0x800b22dd1)
>>>>  Segmentation fault (core dumped)
>>>>
>>>>  It looks as if ECL (version 16.1.2) is being called before an
>>>>  initialisation is complete, but it it possible to say more without a
>>>>  debugger?
>>>>
>>>>  More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0
>>>>  with libatomic_ops version 7.4.6.
>>>>  And only reproducible on FreeBSD.
>>>>
>>>>  ECL is built with --disable-threads; GC is built with or without
>>>>  threads---result is still the same.
>>>>  (so it's unclear to me where pthread_* calls in the trace
>>>>  come from).
>>>>
>>>>  Thanks,
>>>>  Dima
>>>>
>>>>  PS. the segfault is at the bottom of
>>>>  https://trac.sagemath.org/ticket/22679#comment:87
>>>
>>>
>>>
>
> -- Wysłane za pomocą K-9 Mail.