Last updated: 2022-11-17
The server has three identical GPUs (NVidia GeForce GTX 1070). The first one is used by default, although it is possible to select another card either programmatically (cudaSetDevice(0)
uses the first GPU, cudaSetDevice(1)
uses the second one, and so on), or using the environment variable CUDA_VISIBLE_DEVICES
.
For example
CUDA_VISIBLE_DEVICES=0 ./cuda-stencil1d
runs cuda-stencil1d
on the first GPU (default), while
CUDA_VISIBLE_DEVICES=1 ./cuda-stencil1d
runs the program on the second GPU.
Run deviceQuery
from the command line to display the hardware features of the GPUs.
The program cuda-dot.cu computes the dot product of two arrays x[]
and y[]
of length \(n\). Modify the program to use the GPU, by transforming the dot()
function into a kernel. The dot product \(s\) of two arrays x[]
and y[]
is defined as
\[ s = \sum_{i=0}^{n-1} x[i] \times y[i] \]
Some modifications of the dot()
function are required to use the GPU. In this exercise we implement a simple (although not efficient) approach where we use a single block of BLKDIM threads. The algorithm works as follows:
The CPU allocates a tmp[]
array of BLKDIM elements on the GPU, in addition to a copy of x[]
and y[]
.
The CPU executes a single 1D thread block containing BLKDIM threads; use the maximum number of threads per block supported by the hardware, which is BLKDIM = 1024.
Thread \(t\) (\(t = 0, \ldots, \mathit{BLKDIM}-1\)) computes the value of the expression \((x[t] \times y[t] + x[t + \mathit{BLKDIM}] \times y[t + \mathit{BLKDIM}] + x[t + 2 \times \mathit{BLKDIM}] \times y[t + 2 \times \mathit{BLKDIM}] + \ldots)\) and stores the result in tmp[t]
(see Figure 1).
When the kernel terminates, the CPU transfers tmp[]
back to host memory and performs a sum-reduction to compute the final result.
Your program must work correctly for any value of \(n\), even if it is not a multiple of BLKDIM.
A better way to compute a reduction will be shown in future lectures.
To compile:
nvcc cuda-dot.cu -o cuda-dot -lm
To execute:
./cuda-dot [len]
Example:
./cuda-dot