What is the best version of ECL to use right now?

Dennis Ogbe do at ogbe.net
Wed Jun 5 14:21:32 UTC 2019


Hello Daniel, thanks for taking the time to respond to this.

> what is the object you call cl_class_of on? are you sure it is initialized cl_object? you may try attaching gdb to the process (see src/utils/gdbinit for useful configuration).

I'm not sure. I'm also not calling cl_class_of directly. If I look at the stack traces from core files generated from these crashes, I see the following:

1. A CLOS method is called somewhere in my program, resulting in the generic dispatch mechanism being triggered (generic_function_dispatch -> _ecl_standard_dispatch)

2. _ecl_standard_dispatch calls fill_spec_vector, presumably as a part of the whole generic function dispatch mechanism (This is just what I can infer from ECL sources, I am not sure whether I got this right)

3. fill_spec_vector seems to inspect a stack frame and pull out the types of the arguments. It calls cl_class_of(...) as part of this. This [1] is the exact line where cl_class_of is called and crashes. This is an example stack trace:

#0  0x00007fde3d402428 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fde3d40402a in __GI_abort () at abort.c:89
#2  0x00007fde45c26275 in ecl_internal_error (s=s at entry=0x7fde45cb5fb5 "not a lisp data object") at /build/ecl/src/c/error.d:61
#3  0x00007fde45c117cc in cl_class_of (x=<optimized out>) at /build/ecl/src/c/instance.d:396
#4  0x00007fde45c11d54 in fill_spec_vector (frame=0x7fdc73ffc010, frame=0x7fdc73ffc010, gf=0x2c0823f0, vector=0x2a674930) at /build/ecl/src/c/gfun.d:139
#5  _ecl_standard_dispatch (frame=0x7fdc73ffc010, frame at entry=0x7fdc73ffc0a0, gf=0x2c0823f0) at /build/ecl/src/c/gfun.d:235
#6  0x00007fde45c1217d in generic_function_dispatch_vararg (narg=<optimized out>) at /build/ecl/src/c/gfun.d:272
#7  0x000000000069032f in L11store_object (narg=<optimized out>, v1obj=0x2d0ffe00, v2stream=0x2d0e5750) at /root/.cache/common-lisp/ecl-16.1.3-unknown-linux-x64/root/radio/build/src/ai/lisp-deps/cl-store/plumbing.c:276

[...] there are multiple more calls to Lxxstore_object() methods below this

I am having problems debugging this because I highly doubt that the generic function dispatch mechanism is broken (otherwise *nothing ever* would work, right?) So I think something else is causing this confusion in fill_spec_vector.

> I've used recently ECL with threads disabled and all seemed to work. I would try playing with flags (i.e first allow use autodetected boehm, then skip the with-dffi flag if it still doesn't work, then remove enable-shared and at last enabl-edebug). If ./configure --disable-threads without any additions still crashes then it is indeed problem with this exactly flag.

I've compiled it with only the --disable-threads flag now and I still get the same crash in the call to GC_init() in cl_boot(). However, staring the ECL interpreter works fine and embedding ECL into a single-threaded, small example program also works.

Could it be that I am missing something when trying to embed ECL in a large C++ codebase? Do I have to worry about the Boehm GC not functioning when most of the program is not designed to use GC_MALLOC? I am also statically linking my lisp code, would that make a difference here?

[1] https://gitlab.com/embeddable-common-lisp/ecl/blob/4c3dcfdbd52e427910486b2c1f7b6c03a62016e5/src/c/gfun.d#L138

Thanks,
Dennis

Daniel Kochmański <daniel at turtleware.eu> writes:

> On Tue, 04 Jun 2019 20:22:48 -0400
> Dennis Ogbe <do at ogbe.net> wrote:
>
>> Hello Daniel,
>>
>> thanks for your reply, that's about what I expected. It's not a secret at all---My team and I (a bunch of graduate students) are building an "intelligent" radio network using software-defined radios. The source is not opened--yet--since we are competing as part of a DARPA Grand Challenge [1].
>>
>> While I have you here: I am currently fighting a strange bug that crashes my process. I am still in the phase where its occurrences seem random to me, so I can't tell you how to reproduce it, but the crashes seem localized to the if statement in fill_spec_vector in src/c/gfun.d--the call to cl_class_of() crashes with an unrecoverable error "not a lisp object".
>
> what is the object you call cl_class_of on? are you sure it is initialized cl_object? you may try attaching gdb to the process (see src/utils/gdbinit for useful configuration).
>>
>> Since I've seen merge requests like [2] I wanted to try to disable threading, since I won't be using it. But when I compile ecl with
>>
>> ./configure --enable-shared --enable-threads=no --enable-boehm=included --with-dffi --enable-debug=yes
>>
>> I now crash in cl_boot in a GC function (GC_push_all_eager)! Is building without threads supposed to work or am I trying the wrong thing here? My original problem (the crash in fill_spec_vector) only happens about 1/500 times I call the offending function (it's the store function from cl-store), and I am still investigating what the culprit could be. If you have any thoughts--I'd appreciate it!
>
> I've used recently ECL with threads disabled and all seemed to work. I would try playing with flags (i.e first allow use autodetected boehm, then skip the with-dffi flag if it still doesn't work, then remove enable-shared and at last enabl-edebug). If ./configure --disable-threads without any additions still crashes then it is indeed problem with this exactly flag.
>>
>> Thanks,
>> Dennis
>
> Regards,
> Daniel
>>
>> [1] https://www.spectrumcollaborationchallenge.com/
>> [2] https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/100
>>
>> Daniel Kochmański <daniel at turtleware.eu> writes:
>>
>> > Hello Dennis,
>> >
>> > On Mon, 2019-06-03 at 20:02 -0400, Dennis Ogbe wrote:
>> >> Hello,
>> >>
>> >> I am working on embedding ECL in a reasonably-sized C++ program and I
>> >> have been using v16.1.3 until now, since it seems like this is the
>> >> latest official release.
>> >
>> > Yes, 16.1.3 is the latest official release.
>> >>
>> >> However, it seems like there is a lot of activity and bug fixes in
>> >> the develop branch and I already ran into a few bugs (for example
>> >> [1]) that are fixed in develop, but are not in any release. The
>> >> documentation also seems to overlap more with the develop branch than
>> >> the latest release.
>> >
>> > That is also true, we work on the next release and we expect to make
>> > the new one soon™ (only a few tasks has been left over to implement).
>> >>
>> >> In your opinion, what is the best and most stable ECL version to use
>> >> as of June 2019? I have some reservations about simply picking a
>> >> random commit from a dev branch, so I wanted to reach out and ask
>> >> y'all directly.
>> >
>> > There is no good answer for that. While develop branch indeed has many
>> > improvements in form of bug fixes and new (dare I say – exciting)
>> > features it is only loosely tested. Before each release we work hard to
>> > test the release candidate against a big variety of operating systems,
>> > architectures and libraries (cl-test-grid is an invaluable help with
>> > that) and try to fix regressions. If you feel adventurous just pick
>> > develop branch, we do not commit there half-baked things (only stuff
>> > which we are certain about or which was a subject of a peer review /
>> > testing around the thing being changed) - it is fairly stable. But
>> > there is no guarantee that you won't hit some ugly regression we are
>> > not aware of yet. Otherwise you may try to live with 16.1.3 until we
>> > release the new 16.2.0 version – hopefully withing a few months from
>> > now.
>> >>
>> >> Thanks for all the hard work, this project is great!
>> >
>> > That's very kind of you to say that. If it is not a secret what are you
>> > working on?
>> >>
>> >> [1] https://gitlab.com/embeddable-common-lisp/ecl/issues/418
>> >>
>> >
>> > Best regards,
>> > Daniel



More information about the ecl-devel mailing list