<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div></div><div><br></div><div><br>On Sep 21, 2017, at 8:31 AM, Dima Pasechnik <<a href="mailto:dimpase+ecl@gmail.com">dimpase+ecl@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="auto"><br><br>On Tue, Sep 12, 2017 at 1:18 AM, Fabrizio Fabbri <<a href="mailto:strabixbox@yahoo.com">strabixbox@yahoo.com</a>> wrote:<br>><br>>> On Sep 11, 2017, at 7:13 PM, Dima Pasechnik <<a href="mailto:dimpase%2Becl@gmail.com">dimpase+ecl@gmail.com</a>> wrote:<br>>><br>>>> On Mon, Sep 4, 2017 at 11:15 AM, Daniel Kochmański <<a href="mailto:daniel@turtleware.eu">daniel@turtleware.eu</a>> wrote:<br>>>> From the backtrace it is sure that fail is caused inside the call to<br>>>> GC_init. Such errors are known to have happened when another GC was<br>>>> initialized already on the system (I've linked the issue). It might be<br>>>> caused by something else in bdwgc, I don't know. Either way I'd focus on<br>>>> GC_init part.<br>>><br>>> Our project (sagemath) only uses libgc within the embedded ECL. Thus I<br>>> am really puzzled how another libgc instance might kick in and spoil<br>>> the game for ECL.<br>>><br>>> One possibility is that clang is using libgc, and thus, in principle,<br>>> libgc might be sitting somewhere in the runtime?!<br>>><br>>><br>>>><br>>>> To make sure, that I'm right with my assertion you may put printf before and<br>>>> after call to GC_init. I'm not quite familiar with bdwgc internals to say,<br>>>> what is wrong though. Maybe updating bundled sources of GC will help? Or<br>>>> linking with libgc on the system? It might be that it was a bug in bdwgc<br>>>> which got already fixed.<br>>><br>>> We are not using the bdwgc shipped with ECL, we use a separate libgc<br>>> 7.6.0, which is the latest stable.<br>>> (Is there a reason to ship bdwgc sources with ECL - do you patch it?)<br>>><br>><br>> I'm using ecl with the non embedded bdwgc as well and I don't have issue.<br>><br>> Ensure that bdwgc it's not also build statically in ecl as well. I expect linking problems in that case but worth it double check.<br><br>here is a part of a stacktrace from the debugger, in a scenario where<br>a call to embedded ECL from Python leads to a ECL's stack overflow, on<br>an already initialised ECL; it seems to be related to a particular thread this call comes from (another, usual, calling sequence<br>does not lead to crashes). There is no mention of GC in the stacktrace.<br><br></div></div></blockquote><div><br></div><div>If the current thread is generated outside the lisp environment you need to import it before call any ecl function.</div><div>That is done by </div><div>ecl_import_current_thread</div><div>ecl_release_current_thread</div><div><br></div><div>You could see the example here:</div><div><a href="https://gitlab.com/embeddable-common-lisp/ecl/tree/develop/examples/threads/import">https://gitlab.com/embeddable-common-lisp/ecl/tree/develop/examples/threads/import</a></div><div><br></div><div>Maybe you already do that but worth mentioning that.</div><div><br></div><div>Best</div>F.<br><blockquote type="cite"><div><div dir="auto">This looks to me as a lack of thread safety on ECL side, although I might be wrong.<br>...<br> frame #16: 0x000000088444b9d6 libecl.so.16.1`si_serror(narg=6, cformat=0x0000000000d27ba0, eformat=0x00000008847d12a0) at error.d:549<br> frame #17: 0x000000088448bd42 libecl.so.16.1`ecl_cs_overflow at stacks.d:76<br> frame #18: 0x00000008844168af libecl.so.16.1`ecl_interpret(frame=0x00007fffdeff2658, env=0x0000000000000001, bytecodes=0x0000000000db33c0) at interpreter.d:286<br> frame #19: 0x0000000884414afc libecl.so.16.1`ecl_apply_from_stack_frame(frame=0x00007fffdeff2658, x=0x0000000000db33c0) at eval.d:79<br> frame #20: 0x000000088441545b libecl.so.16.1`cl_apply(narg=0, fun=0x0000000000db33c0, lastarg=0x0000000000000001) at eval.d:164<br> frame #21: 0x0000000883e0e1b4 ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_safe_funcall(__pyx_v_func=0x0000000000769600, __pyx_v_arg=0x0000000000e6dfa0) at ecl.c:5831<br> frame #22: 0x0000000883e0d519 ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_safe_read_string(__pyx_v_s="(setf *load-verbose* NIL)") at ecl.c:6084<br> frame #23: 0x0000000883e0d02b ecl.so`__pyx_f_4sage_4libs_3ecl_ecl_eval(__pyx_v_s=0x0000000882add970, __pyx_skip_dispatch=0) at ecl.c:10682<br> frame #24: 0x0000000883e0cd4c ecl.so`__pyx_pf_4sage_4libs_3ecl_10ecl_eval(__pyx_self=0x0000000000000000, __pyx_v_s=0x0000000882add970) at ecl.c:10762<br> frame #25: 0x0000000883e0cab7 ecl.so`__pyx_pw_4sage_4libs_3ecl_11ecl_eval(__pyx_self=0x0000000000000000, __pyx_v_s=0x0000000882add970) at ecl.c:10745<br> frame #26: 0x0000000800d8a68f libpython2.7.so.1`call_function(pp_stack=0x00007fffdeff2c00, oparg=1) at ceval.c:4340<br> frame #27: 0x0000000800d854d2 libpython2.7.so.1`PyEval_EvalFrameEx(f=0x00000008829939b0, throwflag=0) at ceval.c:2989<br>...<br> frame #91: 0x0000000800d88361 libpython2.7.so.1`PyEval_CallObjectWithKeywords(func=0x000000087cdf99e0, arg=0x000000080064e060, kw=0x0000000000000000) at ceval.c:4221<br> frame #92: 0x0000000800de60d1 libpython2.7.so.1`t_bootstrap(boot_raw=0x0000000807015598) at threadmodule.c:620<br> frame #93: 0x00000008012d3b55 libthr.so.3`___lldb_unnamed_symbol1$$libthr.so.3 + 325<br><br><br><br>><br>>> Thanks,<br>>> Dima<br>>><br>>>><br>>>> Regards,<br>>>><br>>>> Daniel<br>>>><br>>>><br>>>><br>>>>> On 04.09.2017 12:04, Dima Pasechnik wrote:<br>>>>><br>>>>> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <<a href="mailto:daniel@turtleware.eu">daniel@turtleware.eu</a>><br>>>>> wrote:<br>>>>>><br>>>>>> I dont think its related to shared vs static - rather two gc running<br>>>>>> concurrently. Try commenting out GC_init call in ecl and see what<br>>>>>> happens.<br>>>>><br>>>>> I don't understand how two GCs can run concurrently on a memory region<br>>>>> controlled by ECL which is statically linked to GC...<br>>>>> In fact I am pretty sure no other instances of GC are running anywhere<br>>>>> within our process tree.<br>>>>><br>>>>> By the way, I don't know whether it's obvious from the backtrace that<br>>>>> cl_boot() has been completed, or not.<br>>>>><br>>>>> If it actually was completed, could it be a bug that invalidates the<br>>>>> bit indicating that cl_boot() has been done?<br>>>>><br>>>>> We have seen similar troubles with clang recently, related to FPE.<br>>>>> There an FPE bit was flipped by assignment of a double to an<br>>>>> integer type (sic!).<br>>>>> It took us a lot of head banging on various hard surfaces to debug this:<br>>>>> <a href="https://trac.sagemath.org/ticket/22799">https://trac.sagemath.org/ticket/22799</a><br>>>>> it turned out we did hit a known bug:<br>>>>> <a href="https://bugs.llvm.org//show_bug.cgi?id=17686">https://bugs.llvm.org//show_bug.cgi?id=17686</a><br>>>>><br>>>>>> Do you need sigchld for anything? Run-program was rewritten and sigchld<br>>>>>> handling wasnt viable option anymore for it.<br>>>>>><br>>>>> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we<br>>>>> now can simply skip it all together.<br>>>>><br>>>>> Thanks,<br>>>>> Dima<br>>>>><br>>>>>> Im on phone, will be avail after the weekend.<br>>>>>><br>>>>>> Regards, D.<br>>>>>><br>>>>>><br>>>>>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik<br>>>>>> <<a href="mailto:dimpase%2Becl@gmail.com">dimpase+ecl@gmail.com</a>><br>>>>>> napisał(a):<br>>>>>>><br>>>>>>> Hi Daniel,<br>>>>>>> Thanks for the message. The scenario you talk about only happens if GC<br>>>>>>> is a shared library, right?<br>>>>>>><br>>>>>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to<br>>>>>>> GC.<br>>>>>>> And I still get very similar segfaults:<br>>>>>>><br>>>>>>> ;;; ECL C Backtrace<br>>>>>>> ;;; 0 ecl_internal_error (0x87d79b375)<br>>>>>>> ;;; 1 init_unixint (0x87d7c17e0)<br>>>>>>> ;;; 2 init_unixint (0x87d7c1582)<br>>>>>>> ;;; 3 pthread_sigmask (0x80103779d)<br>>>>>>> ;;; 4 pthread_getspecific (0x801036d6f)<br>>>>>>> ;;; 5 unknown (0x7ffffffff193)<br>>>>>>> ;;; 6 GC_push_current_stack (0x87d7ef7c3)<br>>>>>>> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360)<br>>>>>>> ;;; 8 GC_push_roots (0x87d7ef9c2)<br>>>>>>> ;;; 9 GC_mark_some (0x87d7ec97c)<br>>>>>>> ;;; 10 GC_stopped_mark (0x87d7e6b7a)<br>>>>>>> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75)<br>>>>>>> ;;; 12 GC_init (0x87d7f08ea)<br>>>>>>> ;;; 13 init_alloc (0x87d7d5669)<br>>>>>>> ;;; 14 cl_boot (0x87d69f66b)<br>>>>>>> ...<br>>>>>>><br>>>>>>> And a very similar picture on the develop branch of ECL - although<br>>>>>>> I had to change our code, as in particular<br>>>>>>> ECL_OPT_TRAP_SIGCHLD is gone...<br>>>>>>><br>>>>>>> So, what can it be? Some signals issue?<br>>>>>>><br>>>>>>> Thanks,<br>>>>>>> Dima<br>>>>>>><br>>>>>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <<a href="mailto:daniel@turtleware.eu">daniel@turtleware.eu</a>><br>>>>>>> wrote:<br>>>>>>>><br>>>>>>>> Hey Dima,<br>>>>>>>><br>>>>>>>> this looks like the issue with having GC initialized before ECL kicks<br>>>>>>>> in.<br>>>>>>>> See <a href="https://gitlab.com/embeddable-common-lisp/ecl/issues/371">https://gitlab.com/embeddable-common-lisp/ecl/issues/371</a> for a<br>>>>>>>> discussion about this problem. Basically some other component already<br>>>>>>>> called<br>>>>>>>> GC_init and ECL calls it once more. It's arguably not a bug.<br>>>>>>>><br>>>>>>>> Best regards,<br>>>>>>>><br>>>>>>>> Daniel<br>>>>>>>><br>>>>>>>><br>>>>>>>>> On 31.08.2017 15:29, Dima Pasechnik wrote:<br>>>>>>>>><br>>>>>>>>><br>>>>>>>>> Dear all,<br>>>>>>>>><br>>>>>>>>> I'm struggling to understand strange segfaults coming from<br>>>>>>>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as<br>>>>>>>>> follows:<br>>>>>>>>><br>>>>>>>>> Got signal before environment was installed on our thread<br>>>>>>>>> [2: No such file or directory]<br>>>>>>>>><br>>>>>>>>> ;;; ECL C Backtrace<br>>>>>>>>> ;;; 0 ecl_internal_error (0x87d790765)<br>>>>>>>>> ;;; 1 init_unixint (0x87d7b6bd0)<br>>>>>>>>> ;;; 2 init_unixint (0x87d7b6972)<br>>>>>>>>> ;;; 3 pthread_sigmask (0x80103779d)<br>>>>>>>>> ;;; 4 pthread_getspecific (0x801036d6f)<br>>>>>>>>> ;;; 5 unknown (0x7ffffffff193)<br>>>>>>>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c)<br>>>>>>>>> ;;; 7 GC_mark_some (0x87db12eec)<br>>>>>>>>> ;;; 8 GC_stopped_mark (0x87db09baa)<br>>>>>>>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75)<br>>>>>>>>> ;;; 10 GC_init (0x87db16f4f)<br>>>>>>>>> ;;; 11 init_alloc (0x87d7caa59)<br>>>>>>>>> ;;; 12 cl_boot (0x87d694a5b)<br>>>>>>>>> ;;; 13 initecl (0x87d218340)<br>>>>>>>>> ;;; 14 initecl (0x87d20a43f)<br>>>>>>>>> ;;; 15 initecl (0x87d207e28)<br>>>>>>>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c)<br>>>>>>>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f)<br>>>>>>>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8)<br>>>>>>>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce)<br>>>>>>>>> ;;; 20 _PyBuiltin_Init (0x800b162d7)<br>>>>>>>>> ;;; 21 PyObject_Call (0x800a7d3e3)<br>>>>>>>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c)<br>>>>>>>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4)<br>>>>>>>>> ;;; 24 PyEval_EvalCode (0x800b1ad96)<br>>>>>>>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11)<br>>>>>>>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8)<br>>>>>>>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f)<br>>>>>>>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8)<br>>>>>>>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce)<br>>>>>>>>> ;;; 30 _PyBuiltin_Init (0x800b162d7)<br>>>>>>>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1)<br>>>>>>>>> Segmentation fault (core dumped)<br>>>>>>>>><br>>>>>>>>> It looks as if ECL (version 16.1.2) is being called before an<br>>>>>>>>> initialisation is complete, but it it possible to say more without a<br>>>>>>>>> debugger?<br>>>>>>>>><br>>>>>>>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0<br>>>>>>>>> with libatomic_ops version 7.4.6.<br>>>>>>>>> And only reproducible on FreeBSD.<br>>>>>>>>><br>>>>>>>>> ECL is built with --disable-threads; GC is built with or without<br>>>>>>>>> threads---result is still the same.<br>>>>>>>>> (so it's unclear to me where pthread_* calls in the trace<br>>>>>>>>> come from).<br>>>>>>>>><br>>>>>>>>> Thanks,<br>>>>>>>>> Dima<br>>>>>>>>><br>>>>>>>>> PS. the segfault is at the bottom of<br>>>>>>>>> <a href="https://trac.sagemath.org/ticket/22679#comment:87">https://trac.sagemath.org/ticket/22679#comment:87</a><br><br></div>
</div></blockquote></body></html>