Last updated: 2022-11-17
Write a program that reverses an array v[]
of length \(n\), i.e., exchanges v[0]
and v[n-1]
, v[1]
and v[n-2]
and so on. You should write two versions of the program:
the first version reverses an input array in[]
into a different output array out[]
, so that the input is not modified. You can assume that in[]
and out[]
are mapped to different, non-overlapping memory blocks.
The second version reverses an array in[]
“in place” using \(O(1)\) additional storage.
The file cuda-reverse.cu provides a CPU-based implementation of reverse()
and inplace_reverse()
. Modify the functions to use of the GPU.
Hint: reverse()
can be easily transformed into a kernel executed by \(n\) CUDA threads (one for each array element). Each thread copies one element from in[]
to out[]
. Use one-dimensional thread blocks, since that makes easy to map threads to array elements. inplace_reverse()
can be transformed into a kernel as well, but in this case only \(\lfloor n/2 \rfloor\) CUDA threads are required (note the rounding): each thread swaps an element from the first half of in[]
with the appropriate element from the second half. Make sure that the program works also when the input length \(n\) is odd.
To compile:
nvcc cuda-reverse.cu -o cuda-reverse
To execute:
./cuda-reverse [n]
Example:
./cuda-reverse