Parallel Programming (3:1)
The objective of this course is to give you some level of
confidence in parallel programming techniques, algorithms and tools. At the end of the course, you would (we
hope) be in a position to deal with real-life parallel problems and to
explore new avenues of research in the area of parallel programming.
Class

Instructor
Sathish Vadhiyar
Reading Materials
- Introduction
to Parallel Computing
by
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Publisher:
Addison Wesley; (2003) ISBN: 0-201-64865-2
- Various publication materials, lecture slides
Additional Reading
- Petascale
Computing: Algorithms and Applications
David A. Bader (Ed.), Chapman & Hall/CRC Computational Science Series. 2007.
- The
Sourcebook of Parallel Computing
by Jack Dongarra (Editor), Ian
Foster, Geoffrey Fox (Editor), Ken Kennedy, Andy White (Editor), Linda
Torczon, Wiliiam Gropp
Publisher: Morgan Kaufmann; (November 2002) ISBN:
1-558-60871-0
-
Numerical
Linear Algebra for High Performance Computers
by
Jack Dongarra, Iain Duff, Danny Sorensen, Henk van der Vorst Publisher:
SIAM; (1998) ISBN: 0-89871-428-1
Meeting Hours
11:30-13:00; Tuesday, Thursday; Room 202, SERC
Grading System
Syllabus
- Prerequisites -
Introduction,
MPI
- Parallel Programming
tools/languages/models - 1
- MPI -
collective communication
implementations, communicator groups, process topologies
-
OpenMP
- Applications - Basic
-
Dense and Sparse Linear
Algebra
-
FFT
-
Molecular Dynamics
-
Population evolution or
Game of Life
- Parallel Programming
tools/languages/models - 2
-
MPI-2
-
Parallel I/O optimizations
- GPU programming -
CUDA review,
CUDA optimizations,
advanced CUDA
- OpenCL - class ppt,
lecture 5
from open university kit
- Charm++ -
ppt
- Applications - Advanced
- Cosmology/n-body
simulations ppt
- Mesh applications
ppt
- List algorithms with GPUs
ppt
- Parallel Programming
tools/languages/models - 3
- Software Transactional memory,
Galois (Guest Lecture by Dr. Uday Reddy, CSA)
- Parallel System
Software
- Scheduling in Parallel
Systems ppt
- Checkpointing and Fault
tolerance for large systems ppt
Reading Material
-
Introduction: Grama et al. - 2.4, 3.1, 3.5, 5.1, 5.6
-
MPI-1: Online tutorial "MPI Complete Reference". Google for
it.
-
Collective Communications: Lecture slides, and Google for
paper "Optimization of Collective Communication Operations in MPICH" by Thakur, Rabenseifner and Gropp,
IJHPCA 2005
-
OpenMP - Lecture slides, and
OpenMP tutorial:
http://www.llnl.gov/computing/tutorials/openMP
-
Dense LA - lecture slides
-
Sparse LA
- Sources [You can get these papers from me and take photocopies]
- Parallel Algorithms for sparse linear systems - Heath, Ng and
Peyton
- Reordering sparse matrices for parallel elimination - Liu
- A parallel graph partitioning algorithm for a message-passing
multiprocessor - Gilbert and Zmijewski
- Task scheduling for parallel sparse Cholesky factorization -
Geist and Ng
- Parallel algorithms for forward and back substitution in direct
solution of sparse linear systems - Anshul Gupta and Vipin Kumar
- Matrix computations - Golub and van Loan
- Lecture slides
- General steps of sparse matrix factorization: Heath, Ng and Peyton -
pages 420-429
- Parallel ordering
- Heath, Ng and Peyton - pages 429-435 till Kernighan Lin
- Liu - pages 75 and 89 (you can read other pages on reduction of
elimination tree heights if interested),
- for Kernighan Lin ordering - Gilbert and Zmijewski - pages
427-433, 437-440, your lecture slides
- for mapping: Heath, Ng and Peyton
- 437-439, figures 9 and 10; Geist and Ng - sections 3 and 4
- for numerical factorization: Heath, Ng and Peyton - 442-450
- FFT - FFT Chapter in "Introduction to Parallel Computing" book
- Molecular Dynamics - lecture slides, the paper ``A New Parallel Method
for Molecular Dynamics Simulation of Macromolecular Systems'' by Plimpton
and Hendrickson
- Game of Life - Lecture slides
-
MPI-2: Online tutorial:
http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm
-
Parallel
I/O Optimizations: Google for paper Rajeev Thakur, William Gropp, and Ewing
Lusk, “A Case for Using MPI's Derived Datatypes to. Improve I/O
Performance,” in Proceedings of SC98
-
GPU and CUDA:
- OpenCL
- Charm++
- Cosmology
- Lecture slides
- The paper "Scalable parallel formulations of the Barnes-Hut method
for n-body simulations" by Grama, Kumar and Sameh. In Parallel Computing
24(1998) 797-822.
- The paper "Load balancing and data locality in adaptive hierarchical
N-body methods: Barnes-Hut, Fast Multipole, and Dardiosity" by Singh,
Holt, Totsuka, Gupta and Hennessey. In Journal of Parallel and
Distributed Computing, 1994
- Mesh applications
- Lecture slides
- Paper: "Multilevel diffusion schemes for repartitioning of adaptive
meshes" by Schloegel et al.
- Paper: "A unified algorithm for load-balancing adaptive scientific
simulations" by Schloegel et al.
- Paper: "Dynamic repartitioning of adaptively refined meshes" by
Schloegel et al.
- Paper: "Dynamic Octree Load Balancing Using Space-filling curves" by
Campbell et al. - Section 2.5
- Paper: "Irregularity in Multi-dimensional space-filling curves with
applications in multimedia databases" by Mokbel and Aref - Section 4
- List Ranking
- Lecture slides
-
Paper: Optimization of Linked List Prefix Computations
on Multithreaded GPUs Using CUDA
- Software Transactional Memory and Galois
- STM
- Compiler and runtime support for efficient software
transactional memory, Ali-Reza Adl-Tabatabai, Brian T. Lewis, Vijay
Menon, Brian R. Murphy, Bratin Saha, Tatiana Shpeisman, In PLDI
2006.
- McRT-STM: A high performance software transactional memory
system for a multi-core runtime. B. Saha, A.-R. Adl-Tabatabai, R.
Hudson, C. C. Minh, and B. Hertzberg. In PPoPP 2006: Principles and
Practice of Parallel Programming.
- Galois
- How much parallelism is there in irregular applications? [Vinayak]
Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali,
and Calin Casçaval In Proc. Symp. on Principles and practice of
parallel programming (PPoPP), pages 3-14, New York, NY, USA, 2009.\
- The Tao of Parallelism in Algorithms [Suraj] Keshav Pingali,
Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan,
Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich,
Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. In ACM SIGPLAN
PLDI 2011
- Scheduling
- Paper: "Backfilling with lookahead to optimize the packing of
parallel jobs" by Shmueli and Feitelson. JPDC 2005.
- Paper: "A comparison study of eleven static mapping heuristics for a
class of meta-tasks on heterogeneous computing systems" by Tracy Braun
et al., HCW 1999.
- Checkpointing:
- Paper: "An overview of checkpointing in uniprocessor and distributed
systems, focusing on implementation and performance" by James Plank
- And other references given in the references slides of the lecture
slides
Mid-Term Exams
Assignments
-
Assignment 1 - Collective
Communications and Open MP
-
Assignment 2 - Matrix
Multiplication
-
Assignment 3 - Parallel Ordering
-
Assignment 4 - MPI I/O and CUDA
Optimizations
-
Assignment 5 - Load balancing using
space filling curves (SFCs)
-
Platform for Assignments
-
Important
Notes