[Ecls-list] Latest changes

Tue Mar 27 12:22:06 UTC 2012

On Sun, 18 Mar 2012, Juan Jose Garcia-Ripoll wrote:

> The current philosophy is as follows:
> * The lock is a word-sized field in a structure, atomically updated by the locking thread.
> * The no-wait locking amounts to trying to reset the field atomically (using libatomics), and returning NIL if the field is already set.
> * The wait mechanism consits on three phases
> ++ try to acquire the lock once
> ++ enter ecl_wait_on() which will
> +++ first perform a brief spin lock
> +++ then enter a look with increasing wait times
> * The unlocking routine is now designed to awake at least one thread. This is done by issuing an interrupt that will disrupt the waiting loop from ecl_wait_on().
> 
> As discussed earlier, the advantage of this user space implementation is that we have full control of the status of the lock even when interrupts occur.

... and later in the thread you mention reimplementing this with a FIFO 
queue and adding logic for threads to place themselves on the end of the 
queue.

I'm no expert, but your approach above matches the pattern I use for 
portable code.  On linux, I've started using the futex API directly.

The futex philosophy is roughly
- use a word-sized field with atomic operations
- when contention is detected, invoke the kernel with parameters for the
   expected value and what to do if the kernel sees it
   - block this thread
   - wake waiting threads
   - wake some waiting threads
   - etc.
- the kernel has a fancy algorithm to dynamically manage waiting lists
   in a known location based on the lock word address

One phrase they use is the "thundering herd".  If a large number of 
threads are waiting on a single lock, and you wake all the threads, then 
they will all compete for the lock, and all but one will be blocked... 
The futex API has a strategy or two to address this, but it is unclear to 
me how successful it is in practice.

> Time-wise, it seems to be slower than the pthreads implementation, reaching 400 connections / second or 2.5 ms / connection in Matthew's tests, vs. 1.5 ms / connection for pthreads (ECL 12.1.1). On the other hand, I have not seen
> any segfaults and it still puzzles me the difference: using a profiler I do not see any more time being spent in the waiting loop (it is 0.4% of the total time)

Kernel-based approaches have the benefit of tighter integration with the 
scheduler.  To me, the futex API looks about right, and you are emulating 
the kernel-side of a futex by polling and sending signals in userspace. 
The NPTL now uses futexes on linux; so its actually getting hard to 
outperform pthreads on that platform.  Occasionally a polling loop still 
beats other kernel calls; but I think that is mostly in cases where 
contention is very short and you don't mind pegging the CPU while waiting. 
(or in cases where polling avoids combining heavy constructs like a mutex 
and a condition variable)

I don't know of any portable futex emulation.  SBCL had their lutexes for 
a while, but I don't think they were fully debugged.  I believe these are 
now being replaced by safe points.  The BSDs have linux emulation; you 
might find some good ideas in their documentation.

e.g. 
http://www.freebsd.org/doc/en/articles/linux-emulation/mi.html#FUTEXES

Later,
Daniel

P.S.  Just found this.  Might be useful.  Hopefully faster than strace and 
friends.
http://nptltracetool.sourceforge.net/