[Ecls-list] threading failures

Daniel Kochmański daniel at turtleware.eu
Wed Sep 2 10:09:04 UTC 2015


Hm, then I can't reproduce neither of them. Spawning too many threads
blows the heap, but it's understandable. I think it might be that i have
x86_64 and a new kernel, (maybe it happens only on x86, or linux 3.2 had
some bug?).

You can track this issue here:
https://gitlab.com/embeddable-common-lisp/ecl/issues/150

Thanks,
Daniel

James M. Lawrence writes:

> The version with a homemade semaphore is
>
> (defstruct sema
>   (count 0)
>   (lock (mp:make-lock :recursive nil))
>   (cvar (mp:make-condition-variable)))
>
> (defun inc-sema (sema)
>   (mp:with-lock ((sema-lock sema))
>     (incf (sema-count sema))
>     (mp:condition-variable-signal (sema-cvar sema))))
>
> (defun dec-sema (sema)
>   (mp:with-lock ((sema-lock sema))
>     (loop (cond ((plusp (sema-count sema))
>                  (decf (sema-count sema))
>                  (return))
>                 (t
>                  (mp:condition-variable-wait
>                   (sema-cvar sema) (sema-lock sema)))))))
>
> (defun test (message-count worker-count)
>   (let ((to-workers (make-sema))
>         (from-workers (make-sema)))
>     (loop repeat worker-count
>           do (mp:process-run-function
>               "test"
>               (lambda ()
>                 (loop
>                    (dec-sema to-workers)
>                    (inc-sema from-workers)))))
>     (loop
>        (loop repeat message-count
>              do (inc-sema to-workers))
>        (loop repeat message-count
>              do (dec-sema from-workers))
>        (assert (zerop (sema-count to-workers)))
>        (assert (zerop (sema-count from-workers)))
>        (format t ".")
>        (finish-output))))
>
> (defun run ()
>   (test 10000 64))
>
> RUN fails with:
>
> Condition of type: SIMPLE-ERROR
> Attempted to recursively lock #<lock (nonrecursive) 0a4597f8> which is
> already owned by #<process "test">
>
> In the previous test case, by "hang" I meant that it hangs
> indefinitely, as opposed to printing dots in spurts. Both these cases
> fail within seconds for me, sometimes immediately. They should be
> compiled. Increasing the number of threads (second argument to TEST)
> will typically cause a quicker failure in these kinds of stress tests.
> 4-core machine:
>
> Linux xi 3.2.0-24-generic-pae #39-Ubuntu SMP Mon May 21 18:54:21 UTC
> 2012 i686 i686 i386 GNU/Linux
>
> (:NEW :LINUX :FORMATTER :ECL-WEAK-HASH :LITTLE-ENDIAN :ECL-READ-WRITE-LOCK
>  :LONG-LONG :UINT64-T :UINT32-T :UINT16-T :RELATIVE-PACKAGE-NAMES :LONG-FLOAT
>  :UNICODE :DFFI :CLOS-STREAMS :CMU-FORMAT :UNIX :ECL-PDE :DLOPEN :CLOS :THREADS
>  :BOEHM-GC :ANSI-CL :COMMON-LISP :IEEE-FLOATING-POINT :PREFIXED-API :FFI
>  :PENTIUM3 :COMMON :ECL)
>
>
> On Tue, Sep 1, 2015 at 1:04 AM, Daniel Kochmański <daniel at turtleware.eu> wrote:
>> Hello,
>>
>> that's probably my fault, sorry. I've migrated bugs manually and
>> probably missed this one (I remember this bug! but can't find anywhere).
>>
>> I'm adding it to regression tests in repository, thanks!  Yes, old
>> reports are unfortunately lost.
>>
>> As a sienote, please use ecl-devel at common-lisp.net mailing list – I'm
>> closing the old one today. You can subscribe here
>> https://mailman.common-lisp.net/listinfo/ecl-devel . All archives before
>> 2015-08-10 are imported to the new one and gmane stream is redirected
>> (if you use it).
>>
>> Regards,
>> Daniel
>>
>> James M. Lawrence writes:
>>
>>> Hello, the threading bugs I reported a while ago appear to have not
>>> survived the migration from sourceforge, and the old pages are now
>>> 404'd. There were a number of test cases, including
>>>
>>> (defun test (message-count worker-count)
>>>   (let ((to-workers (mp:make-semaphore))
>>>         (from-workers (mp:make-semaphore)))
>>>     (loop repeat worker-count
>>>           do (mp:process-run-function
>>>               "test"
>>>               (lambda ()
>>>                 (loop
>>>                    (mp:wait-on-semaphore to-workers)
>>>                    (mp:signal-semaphore from-workers)))))
>>>     (loop
>>>        (loop repeat message-count
>>>              do (mp:signal-semaphore to-workers))
>>>        (loop repeat message-count
>>>              do (mp:wait-on-semaphore from-workers))
>>>        (assert (zerop (mp:semaphore-count to-workers)))
>>>        (assert (zerop (mp:semaphore-count from-workers)))
>>>        (format t ".")
>>>        (finish-output))))
>>>
>>> (defun run ()
>>>   (test 10000 64))
>>>
>>> RUN will eventually hang on all versions of ECL I've tried, including
>>> the latest. Another test case was a variant of the above using a
>>> homemade semaphore. I can rewrite that and other test cases, but
>>> before doing so I'd like to know whether the old reports are really
>>> lost or have survived in some form.
>>>
>>> Best,
>>> lmj
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Ecls-list mailing list
>>> Ecls-list at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/ecls-list
>>
>> --
>> Daniel Kochmański | Poznań, Poland
>> ;; aka jackdaniel
>>
>> "Be the change that you wish to see in the world." - Mahatma Gandhi

-- 
Daniel Kochmański | Poznań, Poland
;; aka jackdaniel

"Be the change that you wish to see in the world." - Mahatma Gandhi



More information about the ecl-devel mailing list