Write a program that reverses an array
v of length \(n\), i.e., exchanges
v[n-2] and so on. You should write two versions of the program:
the first version reverses an input array
in into a different output array
out, so that the input is not modified. You can assume that
out are mapped to different, non-overlapping memory blocks.
The second version reverses an array
in "in place" using \(O(1)\) additional storage.
The file cuda-reverse.cu provides a CPU-based implementation of
inplace_reverse(). Modify the functions to use of the GPU.
reverse() can be easily transformed into a kernel executed by \(n\) CUDA threads (one for each array element). Each thread copies one element from
out. Use one-dimensional thread blocks, since that makes easy to map threads to array elements.
inplace_reverse() can be transformed into a kernel as well, but in this case only \(\lfloor n/2 \rfloor\) CUDA threads are required (note the rounding): each thread swaps an element from the first half of
in with the appropriate element from the second half. Make sure that the program works also when the input length \(n\) is odd.
nvcc cuda-reverse.cu -o cuda-reverse