interrupting and destroying threads [internals, help wanted]

Sat Mar 3 00:45:26 UTC 2018

TL;DR; Doesn't seem to work reliably. Interrupts, for example in slime,
don't reliably get acted on. Sometimes threads are killed instead of
interrupted. Some effort to amend this is discussed below.

Background:

There are two interrupt systems in ABCL, one defined on
org.armedbear.lisp.Lisp, and one on org.armedbear.lisp.LispThread. It is
unclear why there are two parallel systems.

The interrupt defined on org.armedbear.lisp.Lisp works as follows:

Call (interrupt-lisp)
A object variable "interrupted" is set.
At several places in the interpreter (eval, tagbody) the variable is
checked.
When compiling code - typically at branch points - code is emitted to check
the variable, and if set, handleInterrupts is called. handleInterrupts
starts a break loop. It's not clear which thread that will happen in, since
there is a delay between when an interrupt is signaled and  when the break
is called, so you might be in a different thread than the one in which
interrupt-lisp is called.

This interrupt system is *not* the one used in slime or bordeaux threads.

The second interrupt system, which *is* used by slime and bordeaux uses an
instance variable on LispThread called "threadInterrupted". One calls
(interrupt-thread thread function &rest args) to interrupt. At that point
the variable  threadInterrupted is set to true and the function is queued
for eventual (hopefully prompt) execution. Then the java thread built-in
interrupt call is made, which also sets some state in the thread (JVM
internal) indicating a request for interrupt.

Java's interrupt is checked by Java in some set of internals that wait,
such as when sleep is called, or presumably when there is blocking IO, or
by explicit checks by user code. When detected by java an
InterruptedException is thrown. When doing user checked the user has to
throw the exception. It is intended that there are exception handlers to
process the interrupts.

In several places ABCL explicitly catches the exception and calls
processThreadInterrupts which executes the queued set of interrupt
functions and should then proceed in whatever it was doing. Presumably the
issue with threads dying is when an Interrupted Exception is not handled.

Handling of InterruptException happens at only a few points - thread-join,
sleep, object-wait. Unlike as with lisp-interrupt the compiler does
*not* generate
code to check.
As a consequence if a thread is doing anything else it will not notice the
exception. This leads to a poor interactive experience - slime's control-c
often does not work in a timely manner.

Handling the interrupt can be tricky if called at the wrong time as there
is no guarantee that other lisp state is consistent.

Related to this the implementation of destroy thread is suboptimal.
Bordeaux threads documents that it is implementation-defined whether
unwind-protected forms are handled on destroy. Practically it seems many
implementations do, as the package lparallel (highly recommended!) depends
on that and implements the ability to kill worker threads on all supported
platforms *except* than ABCL.

Thread destroying is implemented in a similar manner to interrupts. There
is a variable that is set when the destruction is requested, with a single
place where it is checked an acted on (beginning of eval). When detected a
ThreadDestroyed exception is thrown. ThreadDestroyed is caught at the top
of the thread's execution, which seems to explain why unwind protects are
not handled. That the only check is in eval means that if you are executing
only compiled code the thread will not actually be destroyed.

---

As you can see, this is something of a mess. I've made some initial
attempts to remedy it but am not confident enough that they are the correct
way of doing things, or whether they will work reliably. The changes are:

1) Whenever Lisp interrupts are checked, also check for thread interrupts.
There are a few places in the java code that do this and I simply add a
check for thread interrupts at those points as well. In addition I modify
the code generation so that when checks for lisp interrupts are generated
as part of compiled code, I also generate a check for thread interrupts.

2) Don't call JVMs thread.interrupt. The benefit of not calling it is that
you remove the possibility that the lisp will be in an inconsistent state
when it handles the interrupt. The disadvantage is that you won't be able
to interrupt anything that's not in lisp code.

3) Have destroy-thread use interrupt-thread to throw to a new catch tag
which is set around the thread run function. There is already  a provision
for defining a wrapper around a thread's run function  - a lisp function
called (unsurprisingly) THREAD-FUNCTION-WRAPPER. Currently it has an abort
restart handler. However destroying a thread is not necessarily the same as
abort.

Since the throw works correctly wrt unwind protect and other lisp state,
behavior of destroy is predictable - active unwind-protections are run
before the thread exits.

Note: the ThreadDestroyed exception is never called now.

---
The above seem to work "ok" but haven't been extensively tested.
Responsiveness to slime's control-c is often fast, and lparallel can kill
worker threads.

So the first question I have is: Is there something I've missed, or does it
seem like the strategy above should work. Is anybody doing thread-heavy
work that you could test this against.

Having done the above and verified that it passes the smoke test, I'd like
to enable using java's thread.interrupt(), so that interrupts can happen
even when JVM or foreign java code is running. I've enabled it to see what
would happen and things work ok in some cases, but there are gaps. For
example, in one case lparallel code complained about working with a lock.

So the second question is: What needs to be done to have thread.interrupt
be reliable.

My current speculation is that the way to handle it is in code generation.
The code generation for unwind protect already uses exception catching to
for unwind protect and some other cases. One thought is to, effectively,
automatically add more cleanup code wherever the compiler generates and
(JVM Exception) catch. However the compiler is complicated and I don't
understand enough of how it works yet.

The differences, as they stand now, can be seen at
https://github.com/armedbear/abcl/compare/master...alanruttenberg:thread-interrupt

Comments, suggestions, help, would very much be welcomed.

Regards,
Alan Ruttenberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20180302/c6d69959/attachment.html>