[Gsll-devel] Fwd: Efficient access to externally generated double-float arrays?

Liam Healy lhealy at common-lisp.net
Fri Oct 22 19:33:16 UTC 2010


---------- Forwarded message ----------
From: Sebastian Sturm <Sebastian.Sturm at itp.uni-leipzig.de>
Date: Fri, Oct 22, 2010 at 2:13 PM
Subject: Re: [Gsll-devel] Efficient access to externally generated
double-float arrays?
To: Liam Healy <lhealy at common-lisp.net>, gsll-devel at common-lisp.net


> I'm surprised about the grid:gref result.   Please run a profiler and
> confirm that the time consumed is within grid:gref and its callees,

I tried using sb-sprof, but as a complete CL newbie I had a hard time
making sense of its output and thus ended up ignoring the profiler and
using trial-and-error again. It seems that a fair amount of consing is
due to the index linearization. Simply replacing grid:gref by
grid:gref* for one-dimensional arrays reduces consing by a factor of 3
in my experiments. It seems that the rest can be eliminated by
specifying the data type (here it's :double) at compile-time;
incorporating this as an option to grid:gref via some macro-trickery
would probably uglify the GSLL framework somewhat (I guess?), but I'd
be willing to pay that price in return for an order-of-magnitude
speedup. Also, I haven't tried if similar problems arise (and if
similar workarounds are available) for the case of complex-valued
arrays.
That being said, I'm currently using the following force-function
(where caching the foreign-ptrs is irrelevant with respect to consing,
but reduces the amount of cycles needed to access array elements by a
factor of ~ 5):

(defun better-force-function (dim)
 "Given an integer dim, this constructs a function that, when supplied with a
 N-dimensional vector Z and some output vector (-> pointer?), yields the
 corresponding forces"
 (declare (fixnum dim))
 (let ((temp-values (make-array 2 :element-type 'double-float
:initial-element 0.0d0)))
  (lambda (zvector output)
    (let ((zvector-fptr (grid::foreign-pointer zvector))
           (output-fptr (grid::foreign-pointer output)))
       (macrolet ((quick-ref (the-vector n)
                    `(cffi:mem-aref
                      ,(case the-vector
                         (zvector 'zvector-fptr)
                         (output 'output-fptr))
                      :double
                      ,n)))
         (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
           (setf (aref temp-values 0) 0.0d0)
           (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
             (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
               (setf (aref temp-values 1) 0.0d0)
               (do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
                 (incf (aref temp-values 1) (quick-ref zvector k)))
               (incf (aref temp-values 0) (expt (aref temp-values 1) -2))))
           (setf (quick-ref output i)
                 (- (quick-ref zvector i)
                    (aref temp-values 0)))))))))

Using the timing code from my first post and dim set to 50, I get the
following results:

- using plain-vanilla grid:gref:
Evaluation took:
 2.692 seconds of real time
 2.668223 seconds of total run time (2.624302 user, 0.043921 system)
 [ Run times consist of 0.082 seconds GC time, and 2.587 seconds non-GC time. ]
 99.11% CPU
 5,906,443,268 processor cycles
 189,391,360 bytes consed

- using the unpleasant cffi:mem-aref workaround:
Evaluation took:
 0.004 seconds of real time
 0.003107 seconds of total run time (0.003049 user, 0.000058 system)
 75.00% CPU
 6,997,397 processor cycles
 0 bytes consed

> As a side point, it seems like your use of an array is a case of
> premature optimization: you've concluded it will run faster by
> allocating an array of length two, without studying the results both
> ways.  I think there would be no measurable difference in speed;
> perhaps even faster with scalars.  And I disagree with you about
> elegance: scalars are much more understandable to me and therefore
> elegant than array cruft.  But it's your choice and a side point
> anyway.

Concerning the last point, I fully agree with you; probably my
previous post was somewhat unclear. Accessing certain unnamed elements
of some arbitrary temp-array is probably the most ugly solution one
could think of, but it seems to be a simple way to ensure low overhead
(I took the idea from a paper by Fateman). Since it doesn't really
relate to the problem at hand, I figured that I could just leave it at
that until the more pressing issue is resolved. Still, I appreciate
very much any advice on how to further improve speed and/or clarity.

Thanks alot for your help,
Sebastian




More information about the gsll-devel mailing list