The file mpi-pi.c contains a serial program for computing the approximate value of \(\pi\) using a Monte Carlo algorithm. Monte Carlo algorithms use pseudorandom numbers to compute an approximation of some quantity of interest.

The idea is quite simple (see Figure 1). We generate \(N\) random points uniformly distributed inside the square with corners at \((-1, -1)\) and \((1, 1)\). Let \(x\) be the number of points that lie inside the circle inscribed in the square; then, the ratio \(x / N\) is an approximation of the ratio between the area of the circle and the area of the square. Since the area of the circle is \(\pi\) and the area of the square is \(4\), we have \(x/N \approx \pi / 4\) which yelds \(\pi \approx 4x / N\). This estimate becomes more accurate as the number of points \(N\) increases.

Modify the serial program to parallelize the computation. Several parallelization strategies are possible, but for now you are advised to implement the following one (\(P\) is the number of MPI processes):

Each process gets the value of the number of points \(N\) from the command line; you may initially assume that \(N\) is a multiple of \(P\), and then relax this requirement to make the program with any value of \(N\).

Each process \(p\), including the master, generates \(N/P\) random points and keeps track of the number \(x_p\) of points inside the circle;

Each process \(p > 0\) sends its local value \(x_p\) to the master using point-to-point send/receive operations.

The master receives \(x_p\) from all each process \(p = 1, \ldots, P-1\) (the master already knows \(x_0\)), computes their sum \(x\) and the prints the approximate value of \(\pi\) as \((4x / N)\).

Step 3 above should be performed using send/receive operations. However, note that this is not efficient and should be done using the MPI reduction operation that will be introduced in the next lecture.