[Ecls-list] Bug report (?) and two questions

Thu Nov 29 07:50:01 UTC 2001

On Wed, Nov 28, 2001 at 04:18:08PM +0100, Dr. Edmund Weitz wrote:
> worm <worm at arrakis.es> writes:
> 
> > > If I define the factorial function in the usual, recursive way,[1]
> > > and call it (without compiling it) with an argument of 10000, this
> > > will result in a reproducible complete crash of my system. After
> > > about two or three seconds, the machine is completely unresponsive
> > > to the mouse or the keyboard and the only 'fix' is to power it
> > > down and hose the file system. This is Linux 2.4.10 (SuSE 7.3) on
> > > an IBM Thinkpad with 256 MB RAM and a 850 MHz P III.
> > 
> > This is due to the fact that the interpreted implementation is not
> > tail recursive. If you compile this code, the recursive call will be
> > converted to a C goto. If you simply interpret it, the stack will be
> > filled 10000 times and you will produce over 10000 bignums, which
> > can be rather memory intensive in case of a factorial. I will look
> > into this, though, to make sure that no other bug is popping out.
> 
> Thanks for your reply. I've investigated this a little further, and
> here's what I got.
> 
> 1. I compiled the function. If I do this, (FAC 10000) works fine - but
>    only once: If I submit something like (DOTIMES (N 10) (FAC 10000)),
>    I get lots of messages like
> 
>         Needed to allocate blacklisted block at 0x99a9000
> 
>    followed by
> 
>         Out of Memory!  Returning NIL!
> 
>    The system doesn't get as unresponsive as in the original
>    (interpreted) case, though. I can still switch to a console and
>    kill the process.
> 
> 2. Noticing that my original definition wasn't properly
>    tail-recursive, I changed it to 
> 
>      (defun fac (n &key (result 1))
>        (if (= n 0)
>            result
>            (fac (1- n) :result (* n result))))
> 
>    I compiled it and tried (FAC 10000) again. This time I got the
>    correct result but a lot of "Needed to allocate..." messages before
>    the result was printed. Then I tried (DOTIMES (N 3) (FAC 10000))
>    and got
>      
>         Out of Memory!  Returning NIL!
>         Segmentation violation.
> 
> 3. I tried to intermingle the calls to FAC with (GC T), but that
>    didn't help.
> 
> I think this must be some problem with garbage collection. CLISP and
> CMUCL can execute this code interpreted as well as compiled without
> failing. BTW, my ECLS is built with the default configuration,
> i.e. Boehm-Weiser-GC. Let me know if you want me to run the test cases
> with the original, conservative GC.
> 

Could it be that the storage allocated for bignums isn't marked
pointer-free for the garbage collector?  AFAIK, if a memory area is
allocated using malloc(), the garbage collector will assume that it
might contain pointers.  When such a memory block is deallocated, the
GC will scan the memory area looking for bitpatterns that points into
address areas of the heap that are not yet allocated, but which the GC
might want to allocate from in the future.  These bit-patterns are
then put on a blacklist so that the GC won't allocate from those areas
of the heap in the future.  If you have a large area of essentially
random data, the GC will find random pointers into the heap, and the
blacklist will be congested.  I think random data in a memory block
which is allowed to contain pointers is the allocator's worst
nightmare.

I would think that during computation of a huge number like 10000!,
you'll see memory being allocated, filled with essentially random
data, deallocated, allocated, filled.. etc.

astor

-- 
Alexander Kjeldaas                Mail:  astor at fast.no