2.11 Vector Intrinsics

The following functions are a collection of vector operation intrinsics, available by importing the gcc.simd module.

Template: void gcc.simd.prefetch (bool rw, ubyte locality) (const(void)* addr)

Emit the prefetch instruction. The value of addr is the address of the memory to prefetch. The value of rw is a compile-time constant one or zero; one means that the prefetch is preparing for a write to the memory address and zero, the default, means that the prefetch is preparing for a read. The value locality must be a compile-time constant integer between zero and three.

This intrinsic is the same as the GCC built-in function __builtin_prefetch.

for (i = 0; i < n; i++)
{
    import gcc.simd : prefetch;
    a[i] = a[i] + b[i];
    prefetch!(true, 1)(&a[i+j]);
    prefetch!(false, 1)(&b[i+j]);
    // 
}
Template: V gcc.simd.loadUnaligned (V)(const V* p)

Load unaligned vector from the address p.

float4 v;
ubyte[16] arr;

v = loadUnaligned(cast(float4*)arr.ptr);
Template: V gcc.simd.storeUnaligned (V)(V* p, V value)

Store vector value to unaligned address p.

float4 v;
ubyte[16] arr;

storeUnaligned(cast(float4*)arr.ptr, v);
Template: V0 gcc.simd.shuffle (V0, V1, M)(V0 op1, V1 op2, M mask)
Template: V gcc.simd.shuffle (V, M)(V op1, M mask)

Construct a permutation of elements from one or two vectors, returning a vector of the same type as the input vector(s). The mask is an integral vector with the same width and element count as the output vector.

This intrinsic is the same as the GCC built-in function __builtin_shuffle.

int4 a = [1, 2, 3, 4];
int4 b = [5, 6, 7, 8];
int4 mask1 = [0, 1, 1, 3];
int4 mask2 = [0, 4, 2, 5];
int4 res;

res = shuffle(a, mask1);    // res is [1,2,2,4]
res = shuffle(a, b, mask2); // res is [1,5,3,6]
Template: V gcc.simd.shufflevector (V1, V2, M...)(V1 op1, V2 op2, M mask)
Template: V gcc.simd.shufflevector (V, mask...)(V op1, V op2)

Construct a permutation of elements from two vectors, returning a vector with the same element type as the input vector(s), and same length as the mask.

This intrinsic is the same as the GCC built-in function __builtin_shufflevector.

int8 a = [1, -2, 3, -4, 5, -6, 7, -8];
int4 b = shufflevector(a, a, 0, 2, 4, 6);   // b is [1,3,5,7]
int4 c = [-2, -4, -6, -8];
int8 d = shufflevector!(int8, 4, 0, 5, 1, 6, 2, 7, 3)(c, b); // d is a
Template: E gcc.simd.extractelement (V, int idx)(V val)

Extracts a single scalar element from a vector val at a specified index idx.

int4 a = [0, 10, 20, 30];
int k = extractelement!(int4, 2)(a);    // a is 20
Template: V gcc.simd.insertelement (V, int idx)(V val, B e)

Inserts a scalar element e into a vector val at a specified index idx.

int4 a = [0, 10, 20, 30];
int4 b = insertelement!(int4, 2)(a, 50); // b is [0,10,50,30]
Template: V gcc.simd.convertvector (V, T)(T val)

Convert a vector val from one integral or floating vector type to another. The result is an integral or floating vector that has had every element cast to the element type of the return type.

This intrinsic is the same as the GCC built-in function __builtin_convertvector.

int4 a = [1, -2, 3, -4];
float4 b = [1.5, -2.5, 3, 7];
float4 c = convertvector!float4(a);    // c is [1,-2,3,-4]
double4 d = convertvector!double4(a);  // d is [1,-2,3,-4]
double4 e = convertvector!double4(b);  // e is [1.5,-2.5,3,7]
int4 f = convertvector!int4(b);        // f is [1,-2,3,7]
Template: V0 gcc.simd.blendvector (V0, V1, M)(V0 op0, V1 op1, M mask)

Construct a conditional merge of elements from two vectors, returning a vector of the same type as the input vector(s). The mask is an integral vector with the same width and element count as the output vector.

int4 a = [1, 2, 3, 4];
int4 b = [3, 2, 1, 4];
auto c = blendvector(a, b, a > b);  // c is [3,2,3,4]
auto d = blendvector(a, b, a < b);  // d is [1,2,1,4]