The file omp-pi.c contains a serial program for computing the approximate value of \(\pi\) using a Monte Carlo algorithm. Monte Carlo algorithms use pseudorandom numbers to compute an approximation of some quantity of interest.

The idea is quite simple (see Figure 1). We generate \(N\) random points uniformly distributed inside the square with corners at \((-1, -1)\) and \((1, 1)\). Let \(x\) be the number of points that lie inside the circle inscribed in the square; then, the ratio \(x / N\) is an approximation of the ratio between the area of the circle and the area of the square. Since the area of the circle is \(\pi\) and the area of the square is \(4\), we have \(x/N \approx \pi / 4\) which yelds \(\pi \approx 4x / N\). This estimate becomes more accurate as the number of points \(N\) increases.

Modify the serial program to make use of shared-memory parallelism with OpenMP. Start with a version that uses the omp parallel construct. Let \(P\) be the number of OpenMP threads; then, the program operates as follows:

The user specifies the number \(N\) of points to generate as a command-line parameter, and the number \(P\) of OpenMP threads using the OMP_NUM_THREADS environment variable.

Thread \(p\) generates \(N/P\) points using the provided function generate_points(), and stores the result in inside[p] where inside[] is an integer array of length \(P\). The array must be declared outside the parallel region since it must be shared across all OpenMP threads.

At the end of the parallel region, the master (thread 0) computes the sum of the values in the inside[] array, and from that value the approximation of \(\pi\).

You may initially assume that the number of points \(N\) is an integer multiple of \(P\); when you get a working program, relax this assumption to make the computation correct for any value of \(N\).