[Gsll-devel] Introducing "Grid Structured Data"

Sun Feb 21 03:41:05 UTC 2010

I agree there are the different classes of usage, and it's certainly
my hope that whatever we adopt will be usable and convenient for both
cases.  I'm not sure there will be a dramatic difference in
efficiency; I think this it might be a case of premature optimization.

Anyway, I'd like to make a distinction in surface syntax and core
implementation.  I think any algorithm can be implemented with any
surface syntax that we want, and it seems that a lot of the syntax of
xarray and grid are similar, and where they're different sometimes
it's because I ran out of time and didn't carry up through the layers
some of what Tamas did in affi that I see in xarray.  Other times it's
just a missing feature, like reduction.  As far as core implementation
goes, it seems like there ought to be a choice between affi and
xarray, presuming there's a difference in efficiency or some other
useful quality.

Liam

On Sun, Jan 24, 2010 at 7:24 PM, Mirko Vukovic <mirko.vukovic at gmail.com> wrote:
> Some thoughts on the two interfaces (grid, xarray) discussed here ...
>
> I am trying to figure out if we can classify different types of usage of
> vector and matrix data.  The classification below is very rough with much
> gray area in-between.
>
> At some basic level, collections of numbers are either
>
> vectors and arrays to be processed by numerical algorithms
> just collections of numbers that are will be parsed, processed in some
> semi-numerical algorithms
>
> Packages such as GSL and LAPACK will deal mostly with the first kind.
>
> For other uses, like when dealing with results from multiple experiments, we
> are using vectors and arrays as indexed storage with fast access, but there
> may not be anything `algebraic' (in the sense of linear algebra) to those
> collections.
>
> In this second case, we may choose to process all the numbers in the
> collection, or some random subset of them.  (In either case, vectorized
> processing of those collections may be desired - Tamas has published a
> package that does that).
>
> It seems to me that Tamas' (now abandoned) `affi'  package, on top of which
> `grid' is built upon, is a natural for case 1 above, while xarray is natural
> for case 2 above.
>
> In addition, someone noted that affi is probably faster than xarray (to be
> verified), which is of paramount importance for the number crunching
> libraries (We first use non-numeric tools at the top level when parsing the
> data, which than may pass the data to the number-crunchers in gsll, lla,
> where speed is important).
>
> In that case, the two packages may have a valid role each.  What would be
> optimal would be a unified notation, in which case that of grid would be a
> subset of the xarray.
>
> Mirko
>