What does SAXPY do?

SAXPY is a combination of scalar multiplication and vector addition, and it’s very simple: it takes as input two vectors of 32-bit floats X and Y with N elements each, and a scalar value A. It multiplies each element X[i] by A and adds the result to Y[i]. A simple C implementation looks like this.

Table of Contents

What is Daxpy?

DESCRIPTION DAXPY adds a scalar multiple of a double precision vector to another double precision vector. DAXPY computes a constant alpha times a vector x plus a vector y. The result overwrites the initial values of vector y. This routine performs the following vector operation: y <– alpha*x + y.

What is saxpy operation in data structure?

Alternatively, SAXPY operation can be described as a function acting over individual elements, and applying this function to every element of the input data. Suppose the i th element of x is xi and the i th element of y is yi. Then we can define f lpar t, p, q) = t p + q, ∀ i: y i ← f lpar a, x i, y i).

How does the saxpy kernel work?

As you can see, the SAXPY kernel contains the same computation as the sequential C version, but instead of looping over the N elements, we launch a single thread for each of the N elements, and each thread computes its array index using blockIdx.x*blockDim.x + threadIdx.x. 4. Thrust SAXPY

How do you perform saxpy in a single line?

It performes SAXPY in a single line using thrust::transform (), which acts like a parallel foreach (∥∀!), applying a multiply-add (MAD) operation to each element of the input vectors x and y.

Is saxpy a good way to compare the performance of different approaches?

Because it’s so simple, and does very little computation, SAXPY is not really a great computation to use for comparing the performance of the different approaches, but that’s not my intent. My goal is to demonstrate multiple ways to program on the CUDA platform today, not to suggest that any one is better than any other.