/****************************************************************************
*
* omp-dot.c - Dot product
*
* Copyright (C) 2018--2023 by Moreno Marzolla
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
****************************************************************************/
/***
% HPC - Dot product
% Moreno Marzolla
% Last updated: 2023-10-15
The file [omp-dot.c](omp-dot.c) contains a serial program that
computes the dot product of two arrays `v1[]` and `v2[]`. The program
receives the array lengths $n$ as the only parameter on the command
line. The arrays are initialized deterministically, so that their
scalar product is known awithout computing it explicitly. The dot
product of `v1[]` and `v2[]` is defined as:
$$
\sum_{i = 0}^{n-1} v1[i] \times v2[i]
$$
The goal of this exercise is to parallelize the serial program using
the `omp parallel` construct with the appropriate clauses. It is
instructive to begin without using the `omp parallel for` directive
and computing the endpoints of the iterations explicitly. To this
aim, let $P$ be the size of the OpenMP thread pool; partition the
arrays into $P$ blocks of approximately uniform size. Thread $p$ ($0
\leq p < P$) computes the dot product `my_p` of the subvectors with
indices $\texttt{my_start}, \ldots, \texttt {my_end}-1$:
$$
\texttt{my_p}: = \sum_{i=\texttt{my_start}}^{\texttt{my_end}-1} v1[i] \times v2[i]
$$
There are several ways to accumulate partial results. One possibility
is to store the value computed by thread $p$ on `partial_p[p]`, where
`partial_p[]` is an array of length $P$. In this way each thread
writes on different elements of `partial_p[]` and no race conditions
are possible. After the parallel region completes, the master thread
computes the final result by summing the content of `partial_p[]`. Be
sure to handle the case where $n$ is not an integer multiple of $P$
correctly.
The solution above is instructive but tedious and inefficient. Unless
there are specific reasons to do so, in practice you should use the
`omp parallel for` directive with the `reduction()` clause, and let
the compiler take care of everything.
To compile:
gcc -fopenmp -std=c99 -Wall -Wpedantic omp-dot.c -o omp-dot
To execute:
./omp-dot [n]
For example, if you want to use two OpenMP threads:
OMP_NUM_THREADS=2 ./omp-dot 1000000
## File
- [omp-dot.c](omp-dot.c)
***/
#include
#include
#include
#include
void fill( int *v1, int *v2, size_t n )
{
const int seq1[3] = { 3, 7, 18};
const int seq2[3] = {12, 0, -2};
size_t i;
for (i=0; i 2 ) {
fprintf(stderr, "Usage: %s [n]\n", argv[0]);
return EXIT_FAILURE;
}
if ( argc > 1 ) {
n = atol(argv[1]);
}
if ( n > n_max ) {
fprintf(stderr, "FATAL: The array length must be at most %lu\n", (unsigned long)n_max);
return EXIT_FAILURE;
}
printf("Initializing array of length %lu\n", (unsigned long)n);
v1 = (int*)malloc( n*sizeof(v1[0])); assert(v1 != NULL);
v2 = (int*)malloc( n*sizeof(v2[0])); assert(v2 != NULL);
fill(v1, v2, n);
const int expect = (n % 3 == 0 ? 0 : 36);
const double tstart = omp_get_wtime();
const int result = dot(v1, v2, n);
const double elapsed = omp_get_wtime() - tstart;
if ( result == expect ) {
printf("Test OK\n");
} else {
printf("Test FAILED: expected %d, got %d\n", expect, result);
}
printf("Elapsed time: %f\n", elapsed);
free(v1);
free(v2);
return EXIT_SUCCESS;
}