Last updated: 2022-09-14

The file omp-dot.c contains a serial program that computes the dot product of two arrays `v1[]`

and `v2[]`

. The program accepts the array lengths \(n\) as the only command line parameter. The arrays are initialized deterministically in order to know their scalar product without the need to compute it; this is useful for testing.. Recall that the dot product of `v1[]`

and `v2[]`

is defined as:

\[ \sum_{i = 0}^{n-1} v1[i] \times v2[i] \]

Parallelize the serial program using the `omp parallel`

construct with the appropriate clauses. It is instructive to begin without using the `omp parallel for`

directive, and calculating the endpoints of the iterations by hand as follows: let \(P\) be the size of the OpenMP thread pool, partition the arrays into \(P\) blocks of approximately uniform size. Thread \(p\) (\(0 \leq p < P\)) computes the dot product `my_p`

of the subvectors with indices \(\texttt{my_start}, \ldots, \texttt {my_end}-1\):

\[ \texttt{my_p}: = \sum_{i=\texttt{my_start}}^{\texttt{my_end}-1} v1[i] \times v2[i] \]

There are several ways to accumulate partial results. One possibility is to make sure that the value calculated by thread \(p\) is stored in `partial_p[p]`

, where `partial_p[]`

is an array of length \(P\); in this way each thread handles a different element of `partial_p[]`

and does not check *race condition*. The master computes the final result as the sum of the values in `partial_p[]`

. Be careful to manage correctly the case where the length \(n\) of the arrays is not a multiple of \(P\).

The solution above is instructive but very tedious. In fact, unless there are specific reasons to do otherwise, you should use the `omp parallel for`

directive with the `reduction()`

clause, and let the compiler take care of everything.

To compile:

` gcc -fopenmp -std=c99 -Wall -Wpedantic omp-dot.c -o omp-dot`

To execute:

` ./omp-dot [n]`

For example, if you want to use two OpenMP threads:

` OMP_NUM_THREADS=2 ./omp-dot 1000000`