HPC - Array reversal with CUDA

Moreno Marzolla

Last updated: 2023-11-29

Write a program that reverses an array v[] of length \(n\), i.e., exchanges v[0] and v[n-1], v[1] and v[n-2] and so on. You should write the following functions:

  1. reverse() reverses an array in[] into a different array out[] (the input is not modified). Assume that in[] and out[] reside on non-overlapping memory blocks.

  2. inplace_reverse() reverses the array in[] “in place”, i.e., exchanging elements using \(O(1)\) additional storage; therefore, you are not allowed to allocate a temporary output vector.

The file cuda-reverse.cu provides a serial implementation of reverse() and inplace_reverse(). Your goal is to odify the functions to use of the GPU, defining any additional kernel that is required.

Hints

reverse() can be parallelized by launching \(n\) CUDA threads; each thread copies a single element from the input to the output array. Since the array size \(n\) can be large, you should create as many one-dimensional thread blocks as needed to have at least \(n\) threads. Have a look at the lecture notes on how to do this.

inplace_reverse() can be parallelized by launching \(\lfloor n/2 \rfloor\) CUDA threads (note the rounding): each thread swaps an element on the first half of in[] with the corresponding element on the second half.

To map threads to array elements it is possible to use the expression:

const int idx = threadIdx.x + blockIdx.x * blockDim.x;

In both cases the program might create more threads than actually needed; special care should be made to ensure that the extra threads do nothing, e.g., using

if (idx < n) {
  /* body */
}
/* else do nothing */

for reverse(), and

if (idx < n/2) {
  /* body */
}
/* else do nothing */

for inplace_reverse().

To compile:

    nvcc cuda-reverse.cu -o cuda-reverse

To execute:

    ./cuda-reverse [n]

Example:

    ./cuda-reverse

Files