SE292: High Performance Computing (Aug, 2014) ============ Assignment 4 ============ Due date: By midnight of Dec 1, 2014 1) Distributed Memory Parallel Programming using MPI (70 points) Multiply two matrices A and B to get a new matrix C using distributed-memory message passing with MPI. Use simple MPI sends and receives, and MPI calls to determine ranks and size of the processes. Use 2D block distribution to decompose both matrices A and B to calculate the values of C (e.g. See Section 3.2.2, of Intro to Parallel Computing, Grama, et al, 2nd Ed, 2003).. For matrices A and B, generate the double values of the matrices randomly in the processes. For matrices A and B, use square matrices of size 1000 to 5000 in step sizes of 1000. Perform your experiments on 4 (2x2 process grid), 8 (2x4 process grid), 12 (3x4 process grid) and 16 (4x4 process grid) processes, when running on both 1 and 2 compute nodes (e.g. For 4 processes, run 4 processes on 1 node and then run 2 each on 2 nodes). Show graphs of execution times and speedups on different (1) matrix sizes, (2) process grids, and (3) single/dual nodes, when compared to sequential execution using a single process. Run your experiments on the compute nodes of the dell-cluster. Submit a single C program that takes the matrix size and block size as inputs, a script file to launch the program on one or two nodes, and a PDF report with your plots and analysis. Send these files by email to simmhan@serc.iisc.in. 2) Shared Memory Parallel Programming using OpenMP (20 points) The sequential code to calculate the value of PI is given below. Change this to a shared memory parallel program using OpenMP. Run the program on an 8-core machine using 1, 2, 4, 6 and 8 parallel threads. Find the time taken and speedup for the different numbers of threads, as compared to the single threaded program below. Submit the code and analysis in a single PDF file. static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; }