[Ecls-list] Latest changes
Daniel Herring
dherring at tentpost.com
Tue Mar 27 12:22:06 UTC 2012
On Sun, 18 Mar 2012, Juan Jose Garcia-Ripoll wrote:
> The current philosophy is as follows:
> * The lock is a word-sized field in a structure, atomically updated by the locking thread.
> * The no-wait locking amounts to trying to reset the field atomically (using libatomics), and returning NIL if the field is already set.
> * The wait mechanism consits on three phases
> ++ try to acquire the lock once
> ++ enter ecl_wait_on() which will
> +++ first perform a brief spin lock
> +++ then enter a look with increasing wait times
> * The unlocking routine is now designed to awake at least one thread. This is done by issuing an interrupt that will disrupt the waiting loop from ecl_wait_on().
>
> As discussed earlier, the advantage of this user space implementation is that we have full control of the status of the lock even when interrupts occur.
... and later in the thread you mention reimplementing this with a FIFO
queue and adding logic for threads to place themselves on the end of the
queue.
I'm no expert, but your approach above matches the pattern I use for
portable code. On linux, I've started using the futex API directly.
The futex philosophy is roughly
- use a word-sized field with atomic operations
- when contention is detected, invoke the kernel with parameters for the
expected value and what to do if the kernel sees it
- block this thread
- wake waiting threads
- wake some waiting threads
- etc.
- the kernel has a fancy algorithm to dynamically manage waiting lists
in a known location based on the lock word address
One phrase they use is the "thundering herd". If a large number of
threads are waiting on a single lock, and you wake all the threads, then
they will all compete for the lock, and all but one will be blocked...
The futex API has a strategy or two to address this, but it is unclear to
me how successful it is in practice.
> Time-wise, it seems to be slower than the pthreads implementation, reaching 400 connections / second or 2.5 ms / connection in Matthew's tests, vs. 1.5 ms / connection for pthreads (ECL 12.1.1). On the other hand, I have not seen
> any segfaults and it still puzzles me the difference: using a profiler I do not see any more time being spent in the waiting loop (it is 0.4% of the total time)
Kernel-based approaches have the benefit of tighter integration with the
scheduler. To me, the futex API looks about right, and you are emulating
the kernel-side of a futex by polling and sending signals in userspace.
The NPTL now uses futexes on linux; so its actually getting hard to
outperform pthreads on that platform. Occasionally a polling loop still
beats other kernel calls; but I think that is mostly in cases where
contention is very short and you don't mind pegging the CPU while waiting.
(or in cases where polling avoids combining heavy constructs like a mutex
and a condition variable)
I don't know of any portable futex emulation. SBCL had their lutexes for
a while, but I don't think they were fully debugged. I believe these are
now being replaced by safe points. The BSDs have linux emulation; you
might find some good ideas in their documentation.
e.g.
http://www.freebsd.org/doc/en/articles/linux-emulation/mi.html#FUTEXES
Later,
Daniel
P.S. Just found this. Might be useful. Hopefully faster than strace and
friends.
http://nptltracetool.sourceforge.net/
More information about the ecl-devel
mailing list