HPC - Implementing the “schedule(dynamic)” clause by hand

Moreno Marzolla

Last updated: 2022-10-18

We know that the schedule(dynamic) OpenMP clause dynamically assigns iterations of a “for” loop to the first available OpenMP thread. The purpose of this exercise is to simulate the schedule(dynamic) clause using only the omp parallel construct (notomp parallel for).

File omp-dynamic.c contains a serial program that initializes an array vin[] with \(n\) random integers (the value of \(n\) can be passed from the command line). The program creates a second array vout[] of the same length, where vout[i] = Fib(vin[i]) for each \(i\). Fib(k) is the k-th Fibonacci number: Fib(0) = Fib(1) = 1; Fib(k) = Fib(k-1) + Fib(k-2) for \(k \geq 2\). The computation of Fibonacci numbers is deliberately inefficient to ensure that there are huge variations of the running time depending on the argument.

First, try to parallelize the “for” loop indicated by the [TODO] comment using the omp parallel for construct. Keeping the array length constant, measure the execution times in the following cases:

  1. Using the #pragma omp parallel for directive with the default schedule, which for GCC is the static schedule with chunk size equal to \(n / \textit{thread_pool_size}\));

  2. Using the #pragma omp parallle for schedule(static,..) with a smaller chunk size, e.g. 64 or less;

  3. Using the #pragma omp parallel for schedule(dybamic) directive; recall that in this case the default chunk size is 1.

Then, try to simulate the same behavior of step 3 (dynamic schedule with chunk size set to 1) using the omp parallel construct only. Proceed as follows: inside the parallel region, use a shared variable idx to hold the index of the next element of vin[] that must be processed. Each thread atomically acquires the current value of idx and then increments the shared value. This operation should be carefully performed inside a critical region. However, the actual computation of vin[] must be performed outside the critical region, so that the threads can actually compute in parallel.

To compile:

    gcc -std=c99 -Wall -Wpedantic -fopenmp omp-dynamic.c -o omp-dynamic

To execute:

    ./omp-dynamic [n]


    OMP_NUM_THREADS=2 ./omp-dynamic