[Ecls-list] SSE Intrinsics in ECL

Fri Aug 20 15:00:03 UTC 2010

On Thu, Aug 19, 2010 at 2:53 PM, Juan Jose Garcia-Ripoll
<juanjose.garciaripoll at googlemail.com> wrote:
> I can wait until you fix the names. Or would you rather want me to include
> it, so that your contribution does not break while I work on the next
> release?

Actually, apart from the issue of merging conflicts, I'm also a bit
concerned about compiler compatibility since I only tested on one
setup: gcc + x86_64. Committing and running test builds would be a
good way to find out if there are any unexpected issues. Note that
since I don't understand how the autoconf stuff works, SSE is
automatically enabled in config.h by testing defines provided by the C
compiler (the condition should work correctly on gcc and msvc). On
32-bit x86 it is necessary to add -msse2 (for gcc) to CFLAGS.

Incidentally, maybe you could quickly look through the files and give
your impression of my naming choices on general grounds?

http://github.com/angavrilov/ecl-sse/blob/master/contrib/sse/sse.lsp
http://github.com/angavrilov/ecl-sse/blob/master/contrib/sse/sse-core.lsp
http://github.com/angavrilov/ecl-sse/blob/master/contrib/sse/sse-utils.lsp

A summary of these naming decisions is as follows:

0) Original intrinsic function name postfix meanings (invented by Intel):

- _ss = operates on the 0th single-float in the register
- _ps = operates on all 4 single-floats
- _sd = operates on the 0th double-float
- _pd = operates on all 2 double-floats
- _epi?? = operates on all ints of size ??
- _epu?? = operates on all unsigned ints of size ??
- _si128 = operates on the whole register as one big int

1) I dropped the _mm_ prefix, and instead named the package 'sse' to
allow casual use as "sse:something".

Pros: With packages it seems to produce redundant clutter.

Cons: The new Intel AVX instruction intrinsics use _mm256_ to
distinguish them (although first CPUs to support these instructions
are only due 2011). This can be solved by putting them in a different
package, but then it would be impossible to import both sse and
potential future avx definitions (not that it makes much sense to mix
the two).

2) I dropped "ep" from "epi" and "epu".

Pros: "e" is meaningless because I don't plan to support MMX due to
obsolescence. "p" is likewise pointless because all integer ops work
on the whole register.

Cons: dropping "p" makes conversion function naming somewhat awkward
when contrasted with "si32" to denote an ordinary integer, and when
compared with the "pi" below.

3) I renamed "si128" to "pi".

Rationale: It's too long and unlike the ordinary "epi". I hate it. "i"
alone is too short.

4) I renamed comparison functions using graphic characters, e.g. <=-ps
instead of _mm_cmple_ps

Rationale: Distinguishing "cmple", "cmplt" etc at a glance is quite
hard, at least for me.

Arithmetic ops are still called "add-ps" etc, because things like
"+ps", "+-ps", "--ps" and so on are ugly and even more confusing.

5) I added some new constants like 0.0-ps, true-ps, false-ps, 0-pi and
so on. Likewise macros if-ps, not-ps, neg-ps etc.

6) New lispy array access functions like aref-ps, row-major-aref-ps and so on.

They always ignore the precise element type, only checking that it's a
specialized array and 16 bytes are accessible after the specified
index. row-major-aref-prefetch-* even ignore array bounds, since the
underlying instructions are only caching hints and officially safe to
use with bad addresses.

I think that's all, but maybe I forgot something.

Alexander