[Ecls-list] Slightly disruptive change (in threads)

Fri Feb 17 07:53:09 UTC 2012

On Sun, 12 Feb 2012 23:32:32 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> I have finally convinced myself that there is no way to reuse the operating
> system mutexes from pthreads or Windows if we still want to have
> interruptible threads.
> 
> As a possible fix I have uploaded to git/CVS an implementation based on
> libatomics' CAS (compare-and-swap) combined with some simple-minded wait
> scheme (very similar to SBCL's). The code looks very simple. It consist on
> a function, get_lock_inner(), which is executed with disable threads,
> followed by some code that decides whether to wait and for how long.
> 
> The difference with respect to pthreads is that get_lock_inner() stores in
> the lock two values, the owner and the counter, which are enough to know
> whether a lock is owned or not. With that, WITH-LOCK becomes implementable
> with lisp functions and no special magic (See also below).
> 
> I would appreciate if you could test it and discuss here both the stability
> and the philosophy of the implementation.

I just got my broken hardware back and could finally try the new ECL.

I see that this is similar to an adaptive spinlock, that'll spin, and
then sleep if the lock cannot be obtained when in blocking mode.
However, it naturally cannot magically awaken immediately when a lock
is freed, but wait until the sleep operation returned, so it
busy-sleeps.

I noticed a performance degradation (perhaps twice less requests per
second served), possibly due to the needed delay to come back from
sleep, I'm not sure.

Unfortunately my testing was quite summary as I couldn't build SLIME
anymore under the new ECL, it errors about the lack of condition
variables.  It's too late for me to now try to fix these, so will
report with more details if necessary during the weekend.

As for stability testing, I could stress test the HTTPd using various
ab runs, but still experienced some instability when doing 5000+
connections runs at 16+ concurrency (ab -c16 -n5000).  A stack overflow
error eventually was produced at the REPL and I could not recover
gracefully without restarting ECL.  The previous runs using -n1000 were
successful, but I didn't really run more than a douzen so far.

Thanks,
-- 
Matt