HPC - Dot product

Moreno Marzolla

Last updated: 2022-09-14

The file omp-dot.c contains a serial program that computes the dot product of two arrays v1[] and v2[]. The program accepts the array lengths \(n\) as the only command line parameter. The arrays are initialized deterministically in order to know their scalar product without the need to compute it; this is useful for testing.. Recall that the dot product of v1[] and v2[] is defined as:

\[ \sum_{i = 0}^{n-1} v1[i] \times v2[i] \]

Parallelize the serial program using the omp parallel construct with the appropriate clauses. It is instructive to begin without using the omp parallel for directive, and calculating the endpoints of the iterations by hand as follows: let \(P\) be the size of the OpenMP thread pool, partition the arrays into \(P\) blocks of approximately uniform size. Thread \(p\) (\(0 \leq p < P\)) computes the dot product my_p of the subvectors with indices \(\texttt{my_start}, \ldots, \texttt {my_end}-1\):

\[ \texttt{my_p}: = \sum_{i=\texttt{my_start}}^{\texttt{my_end}-1} v1[i] \times v2[i] \]

There are several ways to accumulate partial results. One possibility is to make sure that the value calculated by thread \(p\) is stored in partial_p[p], where partial_p[] is an array of length \(P\); in this way each thread handles a different element of partial_p[] and does not check race condition. The master computes the final result as the sum of the values ​​in partial_p[]. Be careful to manage correctly the case where the length \(n\) of the arrays is not a multiple of \(P\).

The solution above is instructive but very tedious. In fact, unless there are specific reasons to do otherwise, you should use the omp parallel for directive with the reduction() clause, and let the compiler take care of everything.

To compile:

    gcc -fopenmp -std=c99 -Wall -Wpedantic omp-dot.c -o omp-dot

To execute:

    ./omp-dot [n]

For example, if you want to use two OpenMP threads:

    OMP_NUM_THREADS=2 ./omp-dot 1000000