[Ecls-list] computing results into preallocated bignums

Tue Sep 8 23:27:37 UTC 2009

When the recent changes to ECL's use of GMP were made, one thing came up:

If we could preallocate limb space for bignums in such a way that GMP 
doesn't need to resize it, one could avoid the use of registers 
altogether. Jason Moxham responded to the suggestion by making available 
macros that would guarantee space needs (it's pretty clear how much space 
an answer is going to need, but MPIR development wants to reserve the 
right to use a little more, should that ever be beneficial for speed)

I decided to run a little experiment with addition to see if avoiding 
extra memcpy's is going to make a difference:

In the file "num_arith.d" I tried out the following:

OLD CODE in ecl_plus:
------------------------------------------------------------------------------
 		case t_bignum:
                         z = _ecl_big_register0();
                         _ecl_big_add(z,x,y);
 			return _ecl_big_register_normalize(z);
------------------------------------------------------------------------------
NEW CODE in ecl_plus:
------------------------------------------------------------------------------
 		case t_bignum:
                         i = x->big.big_size;
                         if (i < 0) i = -i;
                         j = y->big.big_size;
                         if (j < 0) j = -j;
                         if (i > j ) dim = (i+1); else dim = (j+1);
                         z = ecl_alloc_compact_object(t_bignum, (cl_index) (dim*sizeof(mp_limb_t)));
                         z->big.big_limbs = ECL_COMPACT_OBJECT_EXTRA(z);
                         z->big.big_size = 0;
                         z->big.big_dim = dim;
                         _ecl_big_add(z, x, y);
------------------------------------------------------------------------------
(so this only affects bignum+bignum). I tried out the following program:

-------------------------------------------------------------------------
(defun fibo (n)
    (let ((a 1)(b 1)(c 0))
    (loop for i from 1 to n do
      (setq c a)
      (setq a (+ a b))
      (setq b c))
    b))

(defun test (n m)
     (let ((t0 (get-internal-run-time)))
     (loop for i from 1 to m do
       (fibo n))
     (- (get-internal-run-time) t0)))

-------------------------------------------------------

On a 2391.454 MHz AMD Opteron(tm) Processor 250 I tried getting some 
timings from

(test 4000 4000)

It was hard to get stable results, so the following figures are indicative 
only:

OLD: 15665, 16050, 16032
NEW: 15217, 15233, 15210

after

(compile 'fibo)
(compile 'test)

OLD: 9182, 9189, 9181 [but also some much lower values at the start]
NEW: 8156, 8144, 8144

so, I think the conclusion is: Yes, in some situations, the extra 
allocations/deallocations and memcpy involved in bignum arithmetic do have 
an influence on computation times. With macro's becoming available to 
guarantee allocation sizes, it may be worthwhile to rewrite bignum 
arithmetic to avoid extra register use as much as possible.

(incidentally, the "ECL" executable is still dynamically linked to libecl, 
so renaming "ecl" to "ecl_old" and then recompiling ecl does NOT lead to a 
situation where one can compare two versions of ecl!)

Kind regards,

Nils