HPC - Monte Carlo approximation of \(\pi\)

Moreno Marzolla

Last updated: 2023-10-19

The file omp-pi.c implements a serial Monte Carlo algorithm for computing the approximate value of \(\pi\). Monte Carlo algorithms use pseudo-random numbers to evaluate some function of interest.

Figure 1: Monte Carlo computation of the value of \pi
Figure 1: Monte Carlo computation of the value of \(\pi\)

To estimate \(\pi\) we generate \(N\) random points uniformly distributed inside the square with corners at \((-1, -1)\) and \((1, 1)\) (Figure 1). Let \(x\) be the number of points that fall inside the circle inscribed in the square; then, the ratio \(x / N\) is an approximation of the ratio between the area of the circle and the area of the square. Since the area of the circle is \(\pi\) and the area of the square is \(4\), we have \(x/N \approx \pi / 4\), from which \(\pi \approx 4x / N\). This estimate becomes more accurate as we generate more points.

The goal of this exercise is to modify the serial program to make use of shared-memory parallelism using OpenMP.

The hard (and inefficient) way

Start with a version that uses the omp parallel construct. Let \(P\) be the number of OpenMP threads; then, the program operates as follows:

  1. The user specifies the number \(N\) of points to generate as a command-line parameter, and the number \(P\) of OpenMP threads using the OMP_NUM_THREADS environment variable.

  2. Thread \(p\) generates \(N/P\) points using function generate_points() (already provided), and stores the result in inside[p]. inside[] is an integer array of length \(P\) that must be declared outside of the parallel region, since it must be shared across all OpenMP threads.

  3. At the end of the parallel region, the master (thread 0) computes \(x\) as the sum of the content of inside[]; from this the estimate of \(\pi\) can be computed as above.

You may initially assume that the number of points \(N\) is an integer multiple of \(P\); when you get a working program, relax this assumption to make the computation correct for any value of \(N\).

The better way

A better approach is to let the compiler parallelize the “for” loop in generate_points() using omp parallel and omp for. There is a small issue with this exercise: the rand() function is non-reentrant, therefore it can not be used concurrently by multiple threads. Instead, we use rand_r() which is reentrant but requires that each thread keeps a local state seed and pass it to the function. To achieve this we can split the omp parallel and omp for directives, so that a different local seed can be given to each thread:

#pragma omp parallel default(none) shared(n, n_inside)
{
        const int my_id = omp_get_thread_num();
        unsigned int my_seed = 17 + 19*my_id;
        ...
#pragma omp for reduction(+:n_inside)
        for (int i=0; i<n; i++) {
                /* call rand_r(&seed) here... */
                ...
        }
        ...
}

Compile with:

    gcc -std=c99 -fopenmp -Wall -Wpedantic omp-pi.c -o omp-pi -lm

Run with:

    ./omp-pi [N]

For example, to compute the approximate value of \(\pi\) using \(P=4\) OpenMP threads and \(N=20000\) points:

    OMP_NUM_THREADS=4 ./omp-pi 20000

File2