HPC - Array reversal with CUDA

Moreno Marzolla

Last updated: 2022-11-17

Write a program that reverses an array v[] of length \(n\), i.e., exchanges v[0] and v[n-1], v[1] and v[n-2] and so on. You should write two versions of the program:

  1. the first version reverses an input array in[] into a different output array out[], so that the input is not modified. You can assume that in[] and out[] are mapped to different, non-overlapping memory blocks.

  2. The second version reverses an array in[] “in place” using \(O(1)\) additional storage.

The file cuda-reverse.cu provides a CPU-based implementation of reverse() and inplace_reverse(). Modify the functions to use of the GPU.

Hint: reverse() can be easily transformed into a kernel executed by \(n\) CUDA threads (one for each array element). Each thread copies one element from in[] to out[]. Use one-dimensional thread blocks, since that makes easy to map threads to array elements. inplace_reverse() can be transformed into a kernel as well, but in this case only \(\lfloor n/2 \rfloor\) CUDA threads are required (note the rounding): each thread swaps an element from the first half of in[] with the appropriate element from the second half. Make sure that the program works also when the input length \(n\) is odd.

To compile:

    nvcc cuda-reverse.cu -o cuda-reverse

To execute:

    ./cuda-reverse [n]

Example:

    ./cuda-reverse

Files