[Ecls-list] Latest changes

Sun Mar 18 22:19:04 UTC 2012

On Sun, 18 Mar 2012 17:33:04 +0100
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> I have uploaded a set of changes that covers various things, including some
> recent bug reports and the implementation of locks and condition variables.
> I will focus on the locks here.
> 
> The code has been tested using Matthew Mondor's web server (
> http://article.gmane.org/gmane.lisp.ecl.general/8965) with a varying number
> of connections and threads. I have witnessed some glitches but only when
> recompiling and using a new executable. Will continue testing, though.
> 
> The current philosophy is as follows:
> * The lock is a word-sized field in a structure, atomically updated by the
> locking thread.
> * The no-wait locking amounts to trying to reset the field atomically
> (using libatomics), and returning NIL if the field is already set.
> * The wait mechanism consits on three phases
> ++ try to acquire the lock once
> ++ enter ecl_wait_on() which will
> +++ first perform a brief spin lock
> +++ then enter a look with increasing wait times
> * The unlocking routine is now designed to awake at least one thread. This
> is done by issuing an interrupt that will disrupt the waiting loop from
> ecl_wait_on().
> 
> As discussed earlier, the advantage of this user space implementation is
> that we have full control of the status of the lock even when interrupts
> occur.
> 
> Time-wise, it seems to be slower than the pthreads implementation, reaching
> 400 connections / second or 2.5 ms / connection in Matthew's tests, vs. 1.5
> ms / connection for pthreads (ECL 12.1.1). On the other hand, I have not
> seen any segfaults and it still puzzles me the difference: using a profiler
> I do not see any more time being spent in the waiting loop (it is 0.4% of
> the total time)

Thanks again for working on this.

When I had done the initial benchmarking, what seemed to be the problem
was that a thread waiting for a lock wouldn't wake up as fast as
previously when the lock became available.  The wakeup mechanism
probably helps but it's plausible that threads waiting on a lock still
don't wake up as fast as with the pthreads implementation, which would
likely explain the difference.

I'll try to test and stress-test the new implementation soon.

I also recently learned that the boehm-gc library does not evaluate the
stack position and length properly on NetBSD.  This is suspected to
cause issues with Mono (which I don't use but there is a related NetBSD
PR filed, lib/46147).  It is possible that this is affecting stability
as well, and it should be fixed to use pthread_attr_get_np() and
pthread_attr_getstack() as some other ports are using.  I started
looking at the latest alpha and to modify it, but I'm not done yet, the
system-specific boehm-gc code is very messy and I didn't have much time
to put on it.

BTW, the lock on which most threads are waiting consists of the
accept-lock, which serializes access to the access(2) syscall.  This
lock is necessary on some operating systems only.  On NetBSD and Linux
it is not necessary to serialize accept(2) calls, yet the last time I
removed the lock (it used to be an option), the problem was that at
startup some deadlock of sorts would happen.  This happened on both
Linux and NetBSD.  If the new locking implementation is more stable,
it's likely that this lock can be avoided (the threads will sleep in
accept(2) instead).  This could improve performance if the new locking
code's wakeup latency is high.
-- 
Matt