SE292: High Performance Computing (Aug, 2014)
============
Assignment 4
============
Due date: By midnight of Dec 1, 2014

1) Distributed Memory Parallel Programming using MPI (70 points)

Multiply two matrices A and B to get a new matrix C using distributed-memory message passing with MPI. Use simple MPI sends and receives, and MPI calls to determine ranks and size of the processes. Use 2D block distribution to decompose both matrices A and B to calculate the values of C (e.g. See Section 3.2.2, of Intro to Parallel Computing, Grama, et al, 2nd Ed, 2003).. For matrices A and B, generate the double values of the matrices randomly in the processes.

For matrices A and B, use square matrices of size 1000 to 5000 in step sizes of 1000. Perform your experiments on 4 (2x2 process grid), 8 (2x4 process grid), 12 (3x4 process grid) and 16 (4x4 process grid)  processes, when running on both 1 and 2 compute nodes (e.g. For 4 processes, run 4 processes on 1 node and then run 2 each on 2 nodes).

Show graphs of execution times and speedups on different (1) matrix sizes, (2) process grids, and (3) single/dual nodes, when compared to sequential execution using a single process. Run your experiments on the compute nodes of the dell-cluster.

Submit a single C program that takes the matrix size and block size as inputs, a script file to launch the program on one or two nodes, and a PDF report with your plots and analysis. Send these files by email to simmhan@serc.iisc.in.


2) Shared Memory Parallel Programming using OpenMP (20 points)
The sequential code to calculate the value of PI is given below. Change this to a shared memory parallel program using OpenMP. Run the program on an 8-core machine using 1, 2, 4, 6 and 8 parallel threads. Find the time taken and speedup for the different numbers of threads, as compared to the single threaded program below. Submit the code and analysis in a single PDF file.

static long num_steps = 100000;
double step;
void main ()
{ 
  int i; double x, pi, sum = 0.0;
  step = 1.0/(double) num_steps;
  for (i=0;i< num_steps; i++){
    x = (i+0.5)*step;
    sum = sum + 4.0/(1.0+x*x);
  }
  pi = step * sum;
}