DS 295: Parallel Programming

Parallel Programming

  • Instructor: Sathish Vadhiyar
  • Course number: DS295
  • Credits: 3:1
  • Semester: Jan, 2018
  • Lecture: Tue/Th 1130AM-1PM (First class: Jan 4, 1130AM)
  • Room: CDS 202
  • Office Hours: Fridays, 5-6 PM


The objective of this course is to give you some level of confidence in parallel programming techniques, algorithms and tools. At the end of the course, you would (we hope) be in a position to apply parallelization to your project areas and beyond, and to explore new avenues of research in the area of parallel programming.

The course covers parallel programming tools, constructs, models, algorithms, parallel matrix computations, parallel programming optimizations, scientific applications and parallel system software.


DS 221: Introduction to Scalable Systems

Grading Scheme

  • Sessionals
    • Two mid-Term exams (Feb 15, Mar 15) – 30
    • Assignments ( 2. Due: February 4, March 2) – 20
  • Terminal
    • Assignment (1. Due: March 30) – 10
    • Final project (Proposal: March 8, Final presentation: April 18, Final report: April 19) – 20
    • Final Exam (April 26) – 20


  • Introduction to Parallel Computing. Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar. Publisher: Addison Wesley. ISBN: 0-201-64865-2. 2003.
  • Parallel Computing. Theory and Practice. Michael J. Quinn. Publisher: Tata: McGraw-Hill. ISBN: 0-07-049546-7. 2002.
  • Various publication materials and references that will be posted along with the lecture slides.

Lecture Slides

Topic Reading Materials
  • Parallel programming tools/models/languages
    • MPI communicator groups, process topologies pdf
  • CUDA Optimizations pdf, Advanced CUDA Slides, CUDA occupancy calculator xls
  • Parallel Algorithms
    • MPI collective communication algorithms pdf ppt
    • Divide and conquer algorithms pdf
      • Solving Tridiagonal systems
      • Sample sort
  • Collective Communications: Lecture slides, and Google for paper “Optimization of Collective Communication Operations in MPICH” by Thakur, Rabenseifner and Gropp, IJHPCA 2005
  • Mapping to network topologies. Section 5.1 in book “Parallel Computing: Theory and Practice” by Michael J Quinn
  • Tridiagonal systems – lecture slides
  • Sorting
    • Paper: On the versatility of parallel sorting by regular sampling. Li et al. Parallel Computing. 1993. (Pages 1-6)
    • Paper: Parallel Sorting by regular sampling. Shi and Schaeffer. JPDC 1992. (Pages 2-4)



Final Project

The final project has to clearly demonstrate the uniqueness of your work over existing work and show adequate performance improvements. You can work in a team of max 2 members. It can be in

  • parallelizing well known algorithms or problems. Examples:
    • hybrid executions on both CPU and GPU cores.
    • graph algorithms – spanning trees, shortest paths, satisfiability, mesh generation or refinement (e.g., Delaunay)
    • sorting, searching
    • clustering algorithms
    • parallelization of fundamental algorithms you would encounter in an algorithm book. e.g.,
      1. Introduction to Algorithms. Third Edition. Cormen, Leiserson, Rivest, Stein
      2. The Design and Analysis of Algorithms. Aho, Hopcroft, Ulman
      3. Data Structures and Algorithms. Aho, Hopcroft, Ulman
  • parallelizing numerical methods, Examples:
    • Dense martrix or sparse matrix computations. e.g., Cholesky, QR, Inverse, SVD, conjugate gradient etc.
    • >Transforms (FFT, wavelet etc.)
    • Any numerical method you would encounter in matrix/numerical methods book. e.g.,
      1. Introduction to Numerical Analysis. Second Edition. Hildebrand
      2. Elementary Numerical Analysis. Third Edition. Conte, de Boor
  • System software. Examples:
    • Techniques for scheduling or mapping parallel tasks to processors to achieve least makespan or throughput
    • Load balancing techniques for irregular computations
    • Automatic techniques for splitting tasks among GPU and CPU cores.
    • Automatic data management techniques for GPU cores.
    • Proposing genetic programming abstractions
    • Fault tolerance

Sample Projects from Previous Years

  • Finding maximum clique in hybrid system – pdf
  • Parallel hybrid implementation of SVM – pdf
  • GPU TSP – pdf
  • GPU implementation of implicit Runge-Kutta methods – pdf
  • Parallel implementation of shared nearest neighbor clustering algorithm – pdf
  • Hybrid implementation of alternate least square algorithm – pdf
  • Approximate k-nearest neighbor search – pdf
  • Incremental Delaunay triangulation – pdf
  • k-maximum subarray problem – pdf
  • Scale Invariant Feature Transform (SIFT) – pdf
  • AKS algorithm for primality proving – pdf
  • Betweenness centrality – pdf

Ethics and Rules


  1. Please do not even exchange ideas with your friends since there is a thin line between exchanging ideas and codes looking the same.
  2. Please do not look up web/books for solutions.
  3. See Dr. Yogesh’ nice writeup on plagiarism policies in his HPC page


All assignments will be evaluated for a maximum of 10. There will be a penalty of -1 for every additional day taken for submission after the assignment due date.

Thus, you will have to be judicious regarding deciding when to submit your assignments.


Suppose you have completed 1/2 of the assignment by the due date.

Scenario 1:

You think that it will take another 1 day to finish 3/4 of the assignment. In this scenario, if you submit by the due date, you will get a maximum score of 5 and if you submit a day after, you will get a maximum score of 6.5 (=7.5-1, -1 for the extra day). Thus, you will get better score if you take an extra day, finish 3/4 of the assignment and then submit.

Scenario 2:

You think that it will take another 3 days to finish 3/4 of the assignment.  In this scenario, if you submit by the due date, you will get a maximum score of 5 and if you submit 3 days after, you will get a maximum score of 4.5 (=7.5-3, -3 for the three extra days). Thus, you will get better score if you submit your assignment that is 1/2 complete by the due date than submit the assignment that will be 3/4 complete after 3 days.