"Got signal before environment was installed on our thread"
Dima Pasechnik
dimpase+ecl at gmail.com
Mon Sep 4 10:04:35 UTC 2017
On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <daniel at turtleware.eu> wrote:
> I dont think its related to shared vs static - rather two gc running
> concurrently. Try commenting out GC_init call in ecl and see what happens.
I don't understand how two GCs can run concurrently on a memory region
controlled by ECL which is statically linked to GC...
In fact I am pretty sure no other instances of GC are running anywhere
within our process tree.
By the way, I don't know whether it's obvious from the backtrace that
cl_boot() has been completed, or not.
If it actually was completed, could it be a bug that invalidates the
bit indicating that cl_boot() has been done?
We have seen similar troubles with clang recently, related to FPE.
There an FPE bit was flipped by assignment of a double to an
integer type (sic!).
It took us a lot of head banging on various hard surfaces to debug this:
https://trac.sagemath.org/ticket/22799
it turned out we did hit a known bug:
https://bugs.llvm.org//show_bug.cgi?id=17686
>
> Do you need sigchld for anything? Run-program was rewritten and sigchld
> handling wasnt viable option anymore for it.
>
We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we
now can simply skip it all together.
Thanks,
Dima
> Im on phone, will be avail after the weekend.
>
> Regards, D.
>
>
> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik <dimpase+ecl at gmail.com>
> napisał(a):
>>
>> Hi Daniel,
>> Thanks for the message. The scenario you talk about only happens if GC
>> is a shared library, right?
>>
>> I've rebuilt GC disabling shared libs, and ECL doing static linking to GC.
>> And I still get very similar segfaults:
>>
>> ;;; ECL C Backtrace
>> ;;; 0 ecl_internal_error (0x87d79b375)
>> ;;; 1 init_unixint (0x87d7c17e0)
>> ;;; 2 init_unixint (0x87d7c1582)
>> ;;; 3 pthread_sigmask (0x80103779d)
>> ;;; 4 pthread_getspecific (0x801036d6f)
>> ;;; 5 unknown (0x7ffffffff193)
>> ;;; 6 GC_push_current_stack (0x87d7ef7c3)
>> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360)
>> ;;; 8 GC_push_roots (0x87d7ef9c2)
>> ;;; 9 GC_mark_some (0x87d7ec97c)
>> ;;; 10 GC_stopped_mark (0x87d7e6b7a)
>> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75)
>> ;;; 12 GC_init (0x87d7f08ea)
>> ;;; 13 init_alloc (0x87d7d5669)
>> ;;; 14 cl_boot (0x87d69f66b)
>> ...
>>
>> And a very similar picture on the develop branch of ECL - although
>> I had to change our code, as in particular
>> ECL_OPT_TRAP_SIGCHLD is gone...
>>
>> So, what can it be? Some signals issue?
>>
>> Thanks,
>> Dima
>>
>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <daniel at turtleware.eu>
>> wrote:
>>>
>>> Hey Dima,
>>>
>>> this looks like the issue with having GC initialized before ECL kicks
>>> in.
>>> See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a
>>> discussion about this problem. Basically some other component already
>>> called
>>> GC_init and ECL calls it once more. It's arguably not a bug.
>>>
>>> Best regards,
>>>
>>> Daniel
>>>
>>>
>>> On 31.08.2017 15:29, Dima Pasechnik wrote:
>>>>
>>>>
>>>> Dear all,
>>>>
>>>> I'm struggling to understand strange segfaults coming from
>>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as
>>>> follows:
>>>>
>>>> Got signal before environment was installed on our thread
>>>> [2: No such file or directory]
>>>>
>>>> ;;; ECL C Backtrace
>>>> ;;; 0 ecl_internal_error (0x87d790765)
>>>> ;;; 1 init_unixint (0x87d7b6bd0)
>>>> ;;; 2 init_unixint (0x87d7b6972)
>>>> ;;; 3 pthread_sigmask (0x80103779d)
>>>> ;;; 4 pthread_getspecific (0x801036d6f)
>>>> ;;; 5 unknown (0x7ffffffff193)
>>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c)
>>>> ;;; 7 GC_mark_some (0x87db12eec)
>>>> ;;; 8 GC_stopped_mark (0x87db09baa)
>>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75)
>>>> ;;; 10 GC_init (0x87db16f4f)
>>>> ;;; 11 init_alloc (0x87d7caa59)
>>>> ;;; 12 cl_boot (0x87d694a5b)
>>>> ;;; 13 initecl (0x87d218340)
>>>> ;;; 14 initecl (0x87d20a43f)
>>>> ;;; 15 initecl (0x87d207e28)
>>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c)
>>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f)
>>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8)
>>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>> ;;; 20 _PyBuiltin_Init (0x800b162d7)
>>>> ;;; 21 PyObject_Call (0x800a7d3e3)
>>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c)
>>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4)
>>>> ;;; 24 PyEval_EvalCode (0x800b1ad96)
>>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11)
>>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8)
>>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f)
>>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8)
>>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce)
>>>> ;;; 30 _PyBuiltin_Init (0x800b162d7)
>>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1)
>>>> Segmentation fault (core dumped)
>>>>
>>>> It looks as if ECL (version 16.1.2) is being called before an
>>>> initialisation is complete, but it it possible to say more without a
>>>> debugger?
>>>>
>>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0
>>>> with libatomic_ops version 7.4.6.
>>>> And only reproducible on FreeBSD.
>>>>
>>>> ECL is built with --disable-threads; GC is built with or without
>>>> threads---result is still the same.
>>>> (so it's unclear to me where pthread_* calls in the trace
>>>> come from).
>>>>
>>>> Thanks,
>>>> Dima
>>>>
>>>> PS. the segfault is at the bottom of
>>>> https://trac.sagemath.org/ticket/22679#comment:87
>>>
>>>
>>>
>
> -- Wysłane za pomocą K-9 Mail.
More information about the ecl-devel
mailing list