Parallel Programming Practice

Parallel Programming (3:1)

The objective of this course is to give you some level of confidence in parallel programming techniques, algorithms and tools. At the end of the course, you would (we hope) be in a position to deal with real-life parallel problems and to explore new avenues of research in the area of parallel programming.

Class

Santanu T
Anwit Roy
Abhishek Kolipaka
Sharath Chandra
Gowthami Gottipatti
Rooparam Chaudhury
M Aravind Krishnan
Sreekanth Raja
Priti Parate
Rajendra Kumar
Vaibhav Goel
P Vinay Kumar
Chandrakant Raj

Instructor

Sathish Vadhiyar

Reading Materials

Introduction to Parallel Computing
by Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Publisher: Addison Wesley; (2003) ISBN: 0-201-64865-2
Various publication materials, lecture slides

Additional Reading

Petascale Computing: Algorithms and Applications
David A. Bader (Ed.), Chapman & Hall/CRC Computational Science Series. 2007.
The Sourcebook of Parallel Computing
by Jack Dongarra (Editor), Ian Foster, Geoffrey Fox (Editor), Ken Kennedy, Andy White (Editor), Linda Torczon, Wiliiam Gropp Publisher: Morgan Kaufmann; (November 2002) ISBN: 1-558-60871-0
Numerical Linear Algebra for High Performance Computers
by Jack Dongarra, Iain Duff, Danny Sorensen, Henk van der Vorst Publisher: SIAM; (1998) ISBN: 0-89871-428-1

Meeting Hours

11:30-1:00; Tuesday, Thursday; Room 202, SERC

Grading System

Sessionals
- 2 Mid-Term Exams (approximately February end, March end) - 20
- Assignments (4/5) - 30
Terminal
- Final Exam (During the week of final exam) - 50

Syllabus

Introduction ppt
Parallel Programming Tools
1. MPI - 1 ppt
  1. Collective communication implementations ppt
  2. MPI process groups and topologies
2. MPI-2
  1. Overview ppt
  2. I/O Optimizations ppt
3. GPU Programming
  1. Quick review ppt
  2. Obtaining High Performance on GPUs ppt
Parallel Algorithms and Applications

Parallel Linear algebra

Dense LA - Gaussian elimination (different distributions), tridiagonal systems (divide and conquer algorithm)
Sparse LA and relate graph algorithms - basics and sequential solutions; parallel steps for ordering, symbolic and numerical factorization, triangular solve; parallel graph partitioning; iterative methods; parallel graph coloring

FFT ppt
Molecular Dynamics ppt
Population evolution or Game of Life ppt
Cosmology ppt
Mesh applications (e.g. finite element computations) ppt
Parallel Bioinformatics ppt

Parallel System Software
1. Scheduling in Parallel Systems ppt
2. Checkpointing and Fault tolerance for large systems ppt

Reading Material

Introduction: Look at the concurrency and parallelization slides of HPC 2010 course for details on programming models, metrics, architecture classification, cache coherence, interconnection networks, synchronization, steps in parallelization (mapping, orchestration etc.) and the corresponding reading material suggested in the HPC 2010 web site.
MPI-1: Online tutorial "MPI Complete Reference". Google for it.
Collective Communications: Lecture slides, and Google for paper "Optimization of Collective Communication Operations in MPICH" by Thakur, Rabenseifner and Gropp, IJHPCA 2005
MPI-2: Online tutorial: http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm
I/O Optimizations: Google for paper Rajeev Thakur, William Gropp, and Ewing Lusk, “A Case for Using MPI's Derived Datatypes to. Improve I/O Performance,” in Proceedings of SC98
GPU and CUDA:
- Lecture slides and CUDA SDK code samples – NVIDIA - http://www.nvidia.com/object/cuda_get_samples.html
- CUDA Optimizations - Chapter 5 of CUDA programming guide and slides 30-34 of advanced CUDA
Dense Linear Algebra
- Lecture Slides
Sparse Linear Algebra
- Sources [You can get these papers from me and take photocopies]
  - Parallel Algorithms for sparse linear systems - Heath, Ng and Peyton
  - Reordering sparse matrices for parallel elimination - Liu
  - A parallel graph partitioning algorithm for a message-passing multiprocessor - Gilbert and Zmijewski
  - Task scheduling for parallel sparse Cholesky factorization - Geist and Ng
  - Parallel algorithms for forward and back substitution in direct solution of sparse linear systems - Anshul Gupta and Vipin Kumar
  - Matrix computations - Golub and van Loan
  - Lecture slides
- General steps of sparse matrix factorization: Heath, Ng and Peyton - pages 420-429
- Parallel ordering
  - Heath, Ng and Peyton - pages 429-435 till Kernighan Lin
  - Liu - pages 75 and 89 (you can read other pages on reduction of elimination tree heights if interested),
  - for Kernighan Lin ordering - Gilbert and Zmijewski - pages 427-433, 437-440, your lecture slides
  - for mapping: Heath, Ng and Peyton - 437-439, figures 9 and 10; Geist and Ng - sections 3 and 4
  - for numerical factorization: Heath, Ng and Peyton - 442-450
FFT - FFT Chapter in "Introduction to Parallel Computing" book
Molecular Dynamics - lecture slides, the paper ``A New Parallel Method for Molecular Dynamics Simulation of Macromolecular Systems'' by Plimpton and Hendrickson
Game of Life - Lecture slides
Cosmology
- Lecture slides
- The paper "Scalable parallel formulations of the Barnes-Hut method for n-body simulations" by Grama, Kumar and Sameh. In Parallel Computing 24(1998) 797-822.
- The paper "Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, Fast Multipole, and Dardiosity" by Singh, Holt, Totsuka, Gupta and Hennessey. In Journal of Parallel and Distributed Computing, 1994
Mesh applications
- Lecture slides
- Paper: "Multilevel diffusion schemes for repartitioning of adaptive meshes" by Schloegel et al.
- Paper: "A unified algorithm for load-balancing adaptive scientific simulations" by Schloegel et al.
- Paper: "Dynamic repartitioning of adaptively refined meshes" by Schloegel et al.
Bioinformatics
- Paper: "Parallel biological sequence comparison using prefix computations" by Aluru et al.
Scheduling
- Paper: "Backfilling with lookahead to optimize the packing of parallel jobs" by Shmueli and Feitelson. JPDC 2005.
- Paper: "A comparison study of eleven static mapping heuristics for a class of meta-tasks on heterogeneous computing systems" by Tracy Braun et al., HCW 1999.
Checkpointing:
- Paper: "An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance" by James Plank

Mid-Term Exams

Assignments

Platform for Assignments
Important Notes

Collective communications and 2D Jacobi
CUDA programming and optimizations
Parallel Subcolumn Cholesky
Load Balancing N-body Simulations
Parallel Job Scheduling

Links

Designing and Building Parallel Programs. Ian Foster