[Ecls-list] Possible unwinding issues?
Matthew Mondor
mm_lists at pulsar-zone.net
Sat Aug 29 20:04:13 UTC 2009
It is not yet totally clear to me what causes this but I often see ECL
endlessly looping after the reception of a signal (including SIGTERM,
occasionally SIGSEGV (and I didn't discover the reasons of the crashes
generating SIGSEGV yet)).
Also, some very simple thread creation and killing test succeeds
without an apparent problem, yet an endless loop in the thread being
killed also occurs in another small program. I noticed that when a
thread exists, unless it's the main thread, ecl_unwind() is called.
I've started wondering if perhaps there was some bug in the unwinding
code.
At the ECL REPL (not slime's which hides most of the things inside), I
also was able to produce some interesting loop until the stack was full:
stdin"> signaled an error.
Explanation: Interrupted system call.
Broken at SI:BYTECODES.No restarts available.
Broken at SI:BYTECODES.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Read or write operation to stream #<input stream "stdin"> signaled an error.
Explanation: Interrupted system call.
Broken at SI:BYTECODES.No restarts available.
Broken at SI:BYTECODES.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Read or write operation to stream #<input stream "stdin"> signaled an error.
Explanation: Interrupted system call.
Broken at SI:BYTECODES.No restarts available.
Broken at SI:BYTECODES.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Read or write operation to stream #<input stream "stdin"> signaled an error.
[...]
This however was with 8.12.0, which appears to catch SIGINT more gracefully, after using the 1. CONTINUE restart and then issuing another SIGINT via ^C.
With CVS HEAD, the endless loop triggers immediately at the first SIGINT (here's a ktrace/kdump result):
[...]
28996 1 ecl CALL read(0,0xbb703000,0x1000)
28996 1 ecl RET read -1 errno 4 Interrupted system call
28996 1 ecl PSIG SIGINT caught handler=0xbbb464c0 mask=(): code=SI_NOINFO
28996 1 ecl CALL mprotect(0xbb8f9000,0x1c0,1)
28996 1 ecl RET mprotect 0
28996 1 ecl CALL setcontext(0xbfbfdd84)
28996 1 ecl RET write JUSTRETURN
28996 1 ecl PSIG SIGSEGV caught handler=0xbbb46610 mask=(11): code=SEGV_ACCERR, addr=0xbb8f9000, trap=6)
28996 1 ecl CALL issetugid
28996 1 ecl RET issetugid 0
28996 1 ecl CALL issetugid
28996 1 ecl RET issetugid 0
28996 1 ecl PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=558, uid=0)
So the SIGINT handler is called, and soon a SIGSEGV (access error)
occurs and an endless loop without any syscall ensues, until I kill the
process with SIGTERM at which point it exits immediately.
The first time I noticed SIGSEGV followed by an endless loop was when
ecl-min compiled with __thread was crashing, so the endless loop might
well be a landmark of the SIGSEGV handler somewhere (and
jump_to_sigsegv_handler() does call ecl_unwind() as well).
I'll have to try looking more closely at this with gdb on a debug
build, but was wondering if other ECL users are also seeing similar
symptoms.
Thanks,
--
Matt
More information about the ecl-devel
mailing list