**DS 200 (AUG) 0:1 Research Methods**

*Faculty*

This course will develop the soft skills required for the CDS students. The modules (each spanning 3 hours) that each student needs to complete include: Seminar attendance, literature review, technical writing (reading, writing, reviewing), technical presentation, CV/resume preparation, grant writing, Intellectual property generation (patenting), incubation/start-up opportunities, and academia/industry job search.

Compulsory for all CDS students and all modules needs to be completed by all students (more information)

**DS 201 (AUG) 2:0 Bioinformatics**

*K. Sekar and D. Pal*

Unix utilities, overview of various biological databases (Protein Data Bank, structural classification of proteins, genome database and Cambridge structural database for small molecules), introduction to protein structures, introduction to how to solve macromolecular structure using various biophysical methods, protein structure analysis, visualization of biological macro molecules, data mining techniques using protein sequences and structures. short sequence alignments, multiple sequence alignments, genome alignments, phylogenetic analysis, genome context-based methods, RNA and transcriptome analysis, mass spectrometry applications in proteome and metabolome analysis, molecular modeling, protein docking and dynamics simulation. Algorithms, scaling challenges and order of computing in big biological data.

Pre-requisites: Undergraduate level familiarity in Physics, Chemistry and Maths.

*C. Branden and J. Tooze (eds) Introduction to Protein Structure, Garland, 1991

*Mount, D.W., Bioinformatics: Sequence and Genome Analysis, Cold. Spring Harbor Laboratory Press, 2001.

*Baxevanis, A.D., and Ouellette, B.F.F. (Eds), Bioinformatics: A practical guide to the analysis of the genes and proteins, Wiley-Interscience, 1998

**DS 202 (Jan) 2:1 Algorithmic Foundations of Big Data Biology ****(www)**

*Chirag Jain*

Overview:

This course will cover computer science techniques involved in the analysis of biological big data. The focus will be on understanding the algorithmic and mathematical foundations of the methods, and how these methods get implemented in associated tools to support biological applications. Hands-on programming assignments will be offered to appreciate the complexities of real sequencing data.

Syllabus:

(0) Introduction- basics of biological data, high-throughput DNA/RNA sequencing and associated biotechnological breakthroughs, data structures and algorithms warm-up

(1) Exact string pattern matching: Z algorithm, Knuth-Morris-Pratt and Boyer-Moore

(2) Genome-scale index structures: suffix tries and suffix trees, Burrows-Wheeler Transform, FM-Index

(3) Approximate string pattern matching: Hamming distance, edit distance, dynamic programming, pairwise and multiple sequence alignment

(4) Alignment-free sequence comparison: co-linear chaining problem, whole-genome comparison

(5) Genome assembly: de Bruijn graphs, overlap graphs, haplotype assembly and phasing

(6) Pattern discovery: Hidden Markov models, gene finding

(7) Phylogenetics – algorithms for evolutionary tree reconstruction, distance-based phylogeny, neighbour-joining algorithm

(8) Trending topics – cancer genomics, deep learning in genomics, transcriptomics, single-cell omics, population genomics

Pre-requisites:

Knowledge of basic data structures, algorithms, programming experience, and DS-221 (or) DS-201 (or) E0-251 (or) E0-225 (or) consent from the Instructor

References:

· Gusfield, Dan. “Algorithms on strings, trees, and sequences: Computer science and computational biology.” Acm Sigact News 28.4 (1997): 41-60.

· Durbin, Richard, et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.

· Jones, Neil C. and Pavel Pevzner. An introduction to bioinformatics algorithms. MIT press, 2004.

· Mäkinen, Veli, et al. Genome-scale algorithm design. Cambridge University Press, 2015.

· Aluru, Srinivas, ed. Handbook of computational molecular biology. CRC Press, 2005.

**DS 207 (Jan) 3:1 Introduction to Natural Language Processing**

*Danish Pruthi*

Overview:

This course is a graduate-level introduction to the field of Natural Language Processing (NLP), which involves building computational systems to handle human languages. We interact with NLP systems on a daily basis—such systems answer the questions we ask (using Google, or other search engines), curate the content we read, autocomplete words we are likely to type, translate text from languages we don’t know, flag content on social media that we might find harmful, etc. Such systems are prominently used in industry as well as academia, especially for analyzing textual data. In this course, we will cover text classification & representation learning, (large & not-so-large) language models, conditioned generation, including machine translation, summarization, multilinguality, structured prediction and decoding techniques, information extraction, and broader societal and ethical implications of language technologies.

Expected Outcomes:

The course aspires to equip students with the key ideas to confidently tackle NLP problems. The assignments in the course would enable students to develop, debug and evaluate NLP systems in practice.

Pre-requisites:

The class is intended for graduate students and senior undergraduates. We do not plan to impose any strict requisites on IISc courses that one should have completed to register for this course. However, students are expected to know the basics of linear algebra, probability, calculus, and neural networks. Programming assignments would require proficiency in Python.

**DS 211 (AUG) 3:0 Numerical Optimization**

*Deepak Subramani*

Introduces numerical optimization with emphasis on convergence and numerical analysis of algorithms as well as applying them in problems of practical interest. Topics include: Methods for solving matrix problems and linear systems that arise in the context of optimization algorithms. Major algorithms in unconstrained optimization (e.g., modified Newton, quasi-Newton, steepest descent, nonlinear conjugate gradient, trust-region methods, line search methods), constrained optimization (e.g., simplex, barrier, penalty, sequential gradient, augmented Lagrangian, sequential linear constrained, interior point methods), derivative-free methods (e.g., simulated annealing, Bayesian optimization, Surrogate-assisted optimization), dynamic programming, and optimal control.

Pre-requisites: Basic knowledge of Numerical Methods, linear algebra, and/or consent from the advisor

*Numerical Optimization, J. Nocedal and S. Wright, Springer Series in Operations Research and Financial Engineering, 2006.

*Linear Programming with MATLAB, M. Ferris, O. Mangasarian, and S. Wright, MPS-SIAM Series on Optimization, 2007.

* Practical Methods of Optimization by R. Fletcher 2nd edition, Wiley, 1987.

**DS 215 (AUG) 3:0 Introduction to Data Science**

*Anirban Chakraborty*

Probability and Statistics Primer: Discrete and continuous random variables and their probability distributions, Theoretical distributions, Markov and Chebyshev Inequalities, Transform methods, Pairs of random variables and their joint statistics, Conditional probability distribution, Correlation, Independence, Vector random variables, Convergence of random sequences, Sums of random variables, Central limit theorem, Laws of large numbers, Random Processes and examples, Statistics between two processes, Sum processes, Independent increment and Markov property, Poisson processes, Random telegraph signal, Gaussian processes, Stationarity, Time averages and Ergodic theorems.

Statistical Inference – Parameter Estimation Theory: Minimum Variance Unbiased Estimator, CRLB, General linear models, General MVU estimation: Sufficient statistic, RBLS theorem, BLUEs, Gauss-Markov theorem, Maximum Likelihood (ML) estimation, Least-squares (LS) estimation, Sequential, constrained and nonlinear least squares, Bayesian philosophy, MMSE and Bayesian MSE, Bayesian ML, MAP, LMMSE, Sequential LMMSE and Kalman filters.

Statistical Hypothesis Testing: Null and alternative hypotheses, Type-I and Type-II errors, Pearson’s Chi-square: Applications in testing goodness of fit, independence and homogeneity, T-Tests.

Machine Learning Fundamentals: Data clustering: K-means, Expectation-Maximization and Gaussian Mixture Model, Evaluation of clustering performance, Principal Component Analysis, Linear regression, Overfitting, Bias-variance tradeoff, Regularization techniques, Gaussian processes regression, K-NN, Logistic regression, Naive Bayes classification, Introduction to neural networks, Building blocks of ANN and modern deep nets, Forward propagation, Backpropagation, Training by gradient descent, Hands-on tutorial with Numpy, Scipy, Scikit-learn, Tensorflow/Pytorch.

Pre-requisites: Undergraduate level knowledge of linear algebra, multivariate calculus, numerical methods, basic programming skills (in any programming language).

* Athanasios Papoulis and S. Unnikrishna Pillai, Probability, Random Variables and Stochastic Processes, McGraw Hill Education, 2017.

* Alberto Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd Edition, Pearson, 2008.

* Steven M. Kay, Fundamentals of Statistical Signal Processing, VolumeI: Estimation Theory, Pearson, 1993.

* Jerome H. Friedman, Robert Tibshirani and Trevor Hastie, The Elements of Statistical Learning, Springer, 2001.

* Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

* Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, The MIT Press, 2016.

**DS 221 (AUG) 3:1 Introduction to Scalable Systems (www)**

*Matthew Jacob, Sathish Vadhiyar and Chirag Jain*

Architecture: computer organization, single-core optimizations including exploiting cache hierarchy and vectorization, parallel architectures including multi-core, shared memory, distributed memory and GPU architectures; Algorithms and Data Structures: algorithmic analysis, overview of trees and graphs, algorithmic strategies, concurrent data structures; Parallelization Principles: motivation, challenges, metrics, parallelization steps, data distribution, PRAM model; Parallel Programming Models and Languages: OpenMP, MPI, CUDA; Distributed Computing: Commodity cluster and cloud computing; Distributed Programming: MapReduce/Hadoop model.

Pre-requisites: Basic knowledge of system science and/or consent from the advisor

* Parallel Computing Architecture. A Hardware/Software Approach. David Culler, Jaswant Singh. Publisher: Morgan Kauffman. ISBN: 981-4033-103. 1999.

* Parallel Computing. Theory and Practice. Michael J. Quinn. Publisher: Tata: McGraw-Hill. ISBN: 0-07-049546-7. 2002.

* Computer Systems – A Programmer’s Perspective. Bryant and O’Hallaron. Publisher: Pearson Education. ISBN: 81-297-0026-3. 2003.

* Data Structures, Algorithms, and Applications in C++, 2nd Edition, Sartaj Sahni

* Introduction to Parallel Computing. Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar. Publisher: Addison Wesley. ISBN: 0-201-64865-2. 2003.

* An Introduction to Parallel Programming. Peter S Pacheco. Publisher: Morgan Kauffman. ISBN: 978-93-80931-75-3. 2011.

* Online references for OpenMP, MPI, CUDA

* Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, Kai Hwang, Jack Dongarra and Geoffrey Fox, Morgan Kaufmann, 2011

* Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, 2010

**DS 222 (AUG) 3:1 Machine Learning with Large Datasets**

*P P Talukdar*

Streaming algorithms and Naive Bayes, fast nearest neighbor, parallel perceptrons, parallel SVM, randomized algorithms, hashing, sketching, scalable SGD, parameter servers, graph-based semi-supervised learning, scalable link analysis, large-scale matrix factorization, speeding up topic modeling, big learning and data platforms, learning with GPUs.

Pre-requisites: Prior exposure to machine learning.

* Mining of Massive Dataset. Jure Leskovec, Anand Rajaraman, Jeff Ullman

* Scaling up Machine Learning: Parallel and Distributed Approaches. Ron Bekkerman, Mikhail Bilenko, John Langford

* Foundations of Data Science. Avrim Blum, John Hopcroft, Ravi Kannan

* Research literature

**DS 226 (AUG) 2:1 Introduction to Computing for AI & Machine Learning**

*Sashikumaar Ganesan*

This course is aimed at building the foundation of computational thinking with applications to Artificial Intelligence and Machine learning (AI&ML). Besides, how to build a neural network and how to train, evaluate and optimize it with TensorFlow will also be covered in this course.

Topics:

Programming Foundation: Fundamentals of digital storage of data,Performance of a computer, Caches, Debugging and Profiling, Basic optimization techniques for serial codes.

Introduction to Object oriented programming: Object and Data Structure Basics, Python Statements, Methods and Functions,

Object-oriented programming (OOP): Inheritance,Encapsulation, Abstraction, Polymorphism. OOP concepts in Python. OOP concepts in C++. Python tools for Data Science: Pandas, NumPy, Matplotlib, Scikit-Learn, Just-in-Time(JIT) compilers, Numba

Computational Thinking: Arrays, Matrix-Vector, Matrix multiplication, Solving dense andsparse systems. Basic machine learning algorithms.

Deep Learning with Opensource AI/MLPackages: Tensors,Tensor Flow basics, mlpack, Interface to mlpack, Simple statistics and plotting, Loading and exploring data, Learning with Tensor Flow and Keras, Mini-project.

Prerequisites: Basic knowledge of mathematics, data structures, and algorithms.

Textbooks:

1. John Hennessy David Patterson. Computer Architecture .A Quantitative Approach. 6th edition,Morgan Kauffman, 2017. https://www.elsevier.com/books/computer-architecture/hennessy/978-0-12-811905-1

2. Shaw, Zed A.Learn python 3 the hard way: A very simpleintroduction to the terrifyinglybeautiful world of computers and code. Addison-WesleyProfessional, 2017.

3. Aurélien Géron, Hands-On Machine Learning with Scikit-Learn,Keras, and TensorFlow, 2ndEdition, O’Reilly Media, Inc. 2019

#### DS 244 (JAN) 2:1 Hardware-aware Scientific Computing

*Sashikumaar Ganesan*

This course is focused on building hardware aware hybrid parallel libraries for Exascale computing. It is at the intersection of Computer architecture, Object-oriented parallel programming and Numerical algorithms.

Topics:

Parallel Computer Architecture: Fundamentals of Parallel Multicore Architecture, Communication networks. Debugging and Profiling, GDB, Profiling with GProf.

Hybrid Parallel Programming Models: Shared Memory Programming basics, OpenMP with accelerator devices, C++ threads, Intel Thread Building Blocks, Message-passing, MPI with threads, OpenCL, CUDA with multi-GPU, CUDA-Aware MPI, Kokkos, Task-based programming, StarPU, MPI+X programming model.

Parallel Algorithms: Speedup & scalability, Roofline model, Performance measurements, Likwid. Computational LinearAlgebra, Sparse linear algebra; Iterative Solution of Linear Sparse Systems: SELL-C-sigma, Varying precision algorithms. High-Performance Libraries: MAGMA, MANDALA.

Course Plan**: **https://indianinstituteofscience.sharepoint.com/sites/hasc

Prerequisites: Good knowledge of C/C++ and Consent from the instructor

Textbooks:

1. Frédéric Magoules, François-Xavier Roux, GuillaumeHouzeaux. Parallel Scientific Computing,Wiley, 2016, doi: 10.1002/9781118761687

2. Georg Hager, Gerhard Wellen. Introduction to HighPerformance Computing for Scientists andEngineers. CRC Press, 2010.

3. John Hennessy David Patterson. Computer Architecture.A Quantitative Approach. 6th edition,Morgan Kauffman, 2017. https://www.elsevier.com/books/computer-architecture/hennessy/978-0-12-811905-1

4. Ananth Grama, Anshul Gupta, George Karypis, VipinKumar. Introduction to ParallelComputing. 2nd edition, Benjamin/Cummings, 2003.https://www-users.cs.umn.edu/~karypis/parbo

**DS 250 (JAN) 3:1 Multigrid Methods**

*Sashikumaar Ganesan*

Classical iterative methods, convergence of classical iterative methods, Richardson iteration method, Krylov subspace methods: Generalized minimal residual (GMRES), Conjugate Gradient (CG), Bi-CG method. Geometric Multigrid Method: Grid transfer, Prolongation and restriction operators, two-level method, Convergence of coarse grid approximation, Smoothing analysis. Multigrid Cycles: Vcycle, W-cycle, F-cycle, convergence of multigrid cycles, remarks on computational complexity. Algebraic Multigrid Method: Hierarchy of levels, Algebraic smoother, Coarsening, Interpolation, remarks on parallel implementation.

Pre-requisites: Good knowledge of Linear Algebra and/or consent from the instructor.

* Pieter Wesseling, An Introduction to Multigrid Methods, R.T. Edwards, Inc., 2004.

* William L. Briggs, Van Emden Henson and Steve F. McCormick, A Multigrid Tutorial, SIAM, 2nd edition, 2000.

**DS 252 (AUG) 3:1 Cloud Computing**

*Yogesh Simmhan*

Context: Shared/distributed memory computing; Data/task parallel computing; Role of Cloud computing.

Technology: Cloud Virtualization, Elastic computing; Infrastructure/Platform/Software as a Service (IaaS/PaaS/SaaS); Public/Private Clouds; Service oriented architectures; Mobile, Edge and Fog computing; Multi-clouds.

Application Design Patterns: Workflow and dataflow; Batch, transactional and continuous; Scaling, locality and speedup; Cloud, Mobile and Internet of Things (IoT) applications.

Execution Models: Synchronous/asynchronous patterns; Scale up/Scale out; Data marshalling/unmarshalling; Load balancing; stateful/stateless applications; Performance metrics; Consistency, Availability and Partitioning (CAP theorem).

Programming project using public Cloud infrastructure, e.g. Amazon AWS, Microsoft Azure Cloud resources provided.

Pre-requisites: Data Structures, Programming and Algorithm concepts. Programming experience.

* Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, Kai Hwang, Jack Dongarra and Geoffrey Fox, Morgan Kaufmann, 2011

* Current literature.

**DS 255 (JAN) 3:1 System Virtualization**

*J. Lakshmi*

Virtualization as a construct for resource sharing; Re-emergence of virtualization and it’s importance for Cloud computing; System abstraction layers and modes of virtualization; Mechanisms for system virtualization – binary translation, emulation, para-virtualization and hardware virtualization; Virtualization using HAL layer – Exposing physical hardware through HAL (example of x86 architecture) from an OS perspective; System bootup process; Virtual Machine Monitor; Processor virtualization; Memory Virtualization; NIC virtualization; Disk virtualization; Graphics card virtualization; OS-level virtualization and the container model; OS resource abstractions and virtualization constructs (Linux Dockers example) ; Virtualization using APIs – JVM example.

Pre-requisites: Basic course on operating systems and consent of the instructor.

* J. Smith, R. Nair, Virtual Machines: Versatile Platforms for Systems and Processess, Morgan Kaufman, 2005.

* D. Bovet, M. Casti, Understanding the Linux Kernel, Third Edition, O’Reilly, 2005.

* Wolfgang Mauerer, Linux Kernel Architecture, Wiley India, 2012.

* D. Chisnall, The Definitive Guide to the Xen Hypervisor, Prentice Hall, 2007

* R. Bryant, D. O’Hallaron, Computer Systems: A Programmer’s Perspective (2nd Edition), Addison Wesley, 2010

* Current literature.

**DS 256 (JAN) 3:1 Scalable Systems for Data Science**

*Yogesh Simmhan*

Design of distributed program models and abstractions, such as MapReduce, Dataflow and Vertex-centric models, for processing volume, velocity and linked datasets, and for storing and querying over NoSQL datasets.

Approaches and design patterns to translate existing data-intensive algorithms and analytics into these distributed programming abstractions.

Distributed software architectures, runtime and storage strategies used by Big Data platforms such as Apache Hadoop, Spark, Storm, Giraph and Hive to execute applications developed using these models on commodity clusters and Clouds in a scalable manner.

This course has a hands-on project where students will work with real, large datasets and commodity clusters, and use scalable algorithms and platforms to develop a Big Data application.

Pre-requisites: Data Structures, Programming and Algorithm concepts with strong programming experience, and DS 221 (or) DS 222 (or) DS 252 (or) consent from the Instructor

* Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, 2010

* Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman, 2nd Edition (v2.1), 2014.

* Current literature

**DS 260 (JAN) 3:0 Medical Imaging**

*Phaneendra Yalavarthy*

X-ray Physics, interaction of radiation with matter, X-ray production, X-ray tubes, dose, exposure, screen-filmradiography, digital radiography, X-ray mammography, X-ray Computed Tomography (CT). Basic principles of CT, single and multi-slice CT. Tomographic image reconstruction, filtering, image quality, contrast

resolution, CT artifacts. Magnetic Resonance Imaging (MRI): brief history, MRI major components. Nuclear

Magnetic Resonance: basics, localization of MR signal, gradient selection, encoding of MR signal, T1 and T2

relaxation, k-space filling, MR artifacts. Ultrasound basics, interaction of ultrasound with matter, generation

and detection of ultrasound, resolution. Doppler ultrasound, nuclear medicine (PET/SPECT), multi-modal

imaging, PET/CT, SPECT/CT, oncological imaging, medical image processing and analysis, image fusion,

contouring, segmentation, and registration.

Pre-requisites: Basic knowledge of system theory and Consent from the instructor.

* The Essential Physics of Medical Imaging, J. T. Bushberg, J. A. Seibert, E. M. Leidholdt Jr., and J. M. Boone, Second Edition, Lippincott Williams & Wilkins Publishers, 2002.

* Physics of Radiology, A. B. Wolbarst, Second Edition, Medical Physics Publishing, 2005.

* Current Literature

**DS 261 (AUG) 3:1 Artificial Intelligence for Medical Image Analysis**

**Vaanathi Sundaresan**

**Topics**

1). Overview of biological and medical imaging modalities and research/clinical applications

2). Quick introduction to: (a) medical imaging tools (viewers, formats, etc.) and (b) Pytorch/Tensorflow

3). Basic Mathematics (Linear Algebra, Probability, and Optimization)

4). Challenges in biomedical image data handling and curation

5). Detection / Segmentation / Image classification for biomedical images

6). Machine Learning Methods: SVM, PCA, KNN, FCM, etc. applied to medical image analysis

7). Overview of Neural networks and Deep Learning: Principle of learning, CNNs, Loss Functions, etc. for medical image analysis

8). Transfer learning, fine tuning, and generalization in medical image analysis

9). Evaluation methodology in medical image analysis: metrics, calibration, uncertainty, bias, etc.

10). Challenges in healthcare AI deployment: reproducibility, interpretability and regulatory

11). Unsupervised/self-supervised learning in biomedical image analysis

12). Generative models and Inverse problems in medical imaging**B.** Laboratory Component:

1). Systematically test a number of methods for medical image analysis using artificial intelligence methods

2). Mini Project to solve a test problem in medical image analysis (Ex:- Segmentation of Hepatic vessels in kidney using X-ray CT images)

Pre-requisites: Basic knowledge of Systems and Signals, Proficiency in Python, C/C++ and Consent from the instructor.

* Kevin Zhou, Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches, Elsevier, 1st Edition – December 2, 2015.

* Jon Krohn, Grant Beyleveld, Aglaé Bassens, Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence, Addison Wesley, 2019.

* Current Literature

**DS 263 (AUG) 3:1 Video Analytics**

*R. Venkatesh Babu, Anirban Chakrabarty and Arjun Jain*

Revisit to Digital Image and Video Processing, Camera Models, Background Modelling, Object Detection and Recognition, Local Feature Extraction, Biologically Inspired Vision, Object Classification, Segmentation, Object Tracking, Activity Recognition, Anomaly Detection, Handling Occlusion, Scale and Appearance changes, Other Applications.

Pre-requisites: Image Processing, Probability, Linear Algebra.

* Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010

* Forsyth, D.A., and Ponce, J., Computer Vision: A Modern Approach, Pearson Education, 2003.

* Current Literature

**DS 265 (JAN) 3:1 Deep Learning for Computer Vision**

*R. Venkatesh Babu and Anirban Chakrabarty*

Computer vision – brief overview; Machine Learning – overview of selected topics ; Introduction to Neural Networks, Backpropagation, Multi-layer Perceptrons ; Convolutional Neural Networks ; Training Neural Networks ; Deep Learning Software Frameworks ; Popular CNN Architectures ; Recurrent Neural Networks ; Applications of CNNs- Classification, Detection, Segmentation, Visualization, Model compression ; Unsupervised learning ; Generative Adversarial Networks.

Prerequisites: Basic knowledge of Computer Vision and Machine Learning, Proficiency in Python, C/C++.

- Current Literature

**DS 269 (JAN) 2:1 Computational Methods for Reacting Flows**

*Aditya Konduri*

Course description: This is an advanced elective course for research students. The first part of this course would train students in developing detail chemistry based reacting flow solvers, specifically relevant to combustion processes. Dimensionality reduction concepts coupled with neural networks for parametrising thermo-chemical properties that would significantly lower computational costs will be introduced. The second part of the course focuses on data analysis methods for combustion datasets. Both standard and machine learning based analyses methods would be covered. In addition to the theoretical background, the course would involve programming of a one-dimensional solver and hands-on data analysis exercises.

**Topics**

- Governing equations: conservation of mass, momentum, energy and species. Low mach number and fully compressible formulations. Non-dimensional numbers. (1 week)
- Discretisation methods: finite difference and finite volume. (1 weeks)

Introduction to chemical kinetics: global and elementary reactions, Arrhenius equation, chemical time scales, stiffness. (1 week) - Elements of a solver development: initial and boundary conditions, simulation algorithms, verification and validation (3 weeks)
- Dimensionality reduction: principal component analysis, higher order moment tensors (1 week)
- Regression methods for thermo-chemical coefficients (1.5 weeks)
- DNS database analysis: premixed and non-premixed turbulent flames, modes of combustion, flame structure, turbulence-chemistry interactions, chemical explosive mode analysis (3.5 weeks)
- Machine learning based analysis: flame surface extraction, detection of combustion instabilities (2 weeks)

Prerequisites: Basic knowledge in combustion (AE 241 or equivalent), numerical methods for differential equations (DS 289 or equivalent) and machine learning (E0 229 or equivalent), or a consent from the instructor. Good proficiency in programming.

Evaluation: assignments, paper presentation, project, final exam.

Reference Text books:

- An Introduction to Combustion, Stephen R. Turns, McGraw Hill, 2011.
- Theoretical and numerical combustion, Thierry Poinsot and Denis Veynante, RT Edwards Inc., 2005.
- Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control content, J. Nathan Kutz and Steven L. Brunton, Cambridge University Press, 2019.
- Research papers, material/notes provided by instructor.

**DS 284 (AUG) 2:1 Numerical Linear Algebra **

*Phani Motamarri*

A foundational course on computational linear algebra and is fundamental to computational and data science research conducted in many emerging scientific disciplines including AI/ML and Quantum computing

(i) **Preliminaries:** Matrix-vector multiplication, Matrix-matrix multiplication, Review of rank, column-

space, null-space and invertibility of matrices, Matrix and vector norms, Orthogonal vectors and

matrices, arithmetic complexity, floating point arithmetic, Conditioning and stability of a problem,

Forward and Backward stability. (ii) **Matrix decompositions and direct methods for linear system solutions**: Singular value decomposition (SVD), Rank and matrix approximations using SVD, Projectors, QR factorization, Gram-Schmidt orthogonalization, Least squares problems, pseudoinverse, normal equations, Gaussian elimination, LU factorization, Pivoting, Cholesky decomposition, Stability of Matrix-decomposition algorithms. (iii) **Eigenvalue problems**: Greshgorin theorem, Similarity transformation, normal matrices, eigenvalue and eigenvector computations, eigendecomposition, Rayleigh quotients, Hessenberg transformation, Schur decomposition, real symmetric eigenvalue problems, power method, Rayleigh quotient iteration, inverse iteration, Jordan canonical form, QR algorithm with and without shifts, Subspace iteration, Bi-diagonalization techniques for computing SVD (iv) **Iterative methods:** Krylov subspace methods (Lanczos, Arnoldi, GMRES, Conjugate gradients, Bi-conjugate gradients), Approximating eigenvalues and eigenvectors using iterative methods.

Pre-requisites: Undergraduate level understanding of linear algebra, multivariate calculus and a familiarity with a programming environment (Matlab/Octave/Python/C/C++ etc)

* Biswa Nath Datta, Numerical Linear Algebra and Applications, 2nd Edition, 2004

* Lloyd N. Trefethen and David Bau, III, Numerical linear algebra, SIAM, 1997.

* C. G. Cullen, An Introduction to numerical linear algebra, Charles PWS Publishing, 1994.

* David C. Lay, Linear Algebra and its Applications, Pearson, 2013.

* Golub, G., Van Loan C.F., Matrix Computation, John Hopkins, 1996.

* Saad, Y., Iterative Methods for Sparse Linear Systems, Second Edition, SIAM, 2003

#### DS 285 (JAN) 3:1 Tensor Computations for Data Science

*Ratikanta Behera*

*Fundamentals: Basic concepts of matrix properties: norms, rank, trace, inner products, Kronecker product, similarity matrix. Fast Fourier transform, diagonalization of matrices. Toeplitz and circulant matrices with their properties (eigenvalue and eigenvector), block matrix computation, and warm-up algorithms.

*Introduction to Tensors: Tensors and tensor operations: Mode-n product of a tensor. Kronecker product of two tensors, tensor element product, tensor trace, tensor convolution, quantitative tensor product, Khatri-Rao product, the outer product. The Einstein product and t-product tensors. The explicit examples include identity tensor, symmetric tensor, orthogonal tensor, tensor rank, and block tensor.

*Tensor Decomposition: Block tensor decomposition, Canonical Polyadic (CP) decomposition, the Tucker decomposition, the multilinear singular value (the higher-order SVD or HOSVD) decomposition, the hierarchical Tucker(HT) decomposition, and the tensor- train (TT) decomposition. Eigenvalue decomposition and singular value decomposition via t-product and the Einstein product. Truncated tensor singular value decomposition. Tensor inversion, and Moore-Penrose inverse. Power tensor, solving system of multilinear equations.

*Applications of Tensor decompositions: Low-rank tensor approximation, background removal with robust principal tensor component analysis, image deblurring, image compression, compressed sensing with robust Regression, higher-order statistical moments for anomaly detection, solving elliptic partial differential equations.

*Tensors for Deep Neural Networks: Deep neural networks, Tensor networks, and their decompositions, including CP decomposition, Tucker decomposition, Hierarchical Tucker decomposition, Tensor train, tensor ring decomposition, and Transform-based tensor decomposition.

*Prerequisites:⋄ DS 284 – Numerical Linear Algebra, or MA219 – Linear Algebra with basic programming skills (in any programming language, however preferably Matlab) and consent of the instructor.

* Liu, Y. (Ed.). Tensors for Data Processing: Theory, Methods, and Applications. Academic Press. (2021)

* Liu Y, Liu J, Long Z, Zhu C. Tensor Computation for Data Analysis. Springer; 2022.

* T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev., 51(3):455–500, 2009.

* C. D. Martin, R. Shafer, B. Larue. An order-p tensor factorization with applications in imaging. SIAM J Sci Comput. 2013;35(1): A474–90.

* M. Brazell, N. Li, C. Navasca, et al. Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal Appl. 2013;34(2):542–570.

* Ji, Y., Wang, Q., Li, X., & Liu, J. (2019). A survey on tensor techniques and applications in machine learning. IEEE Access, 7, 162950-162990.

* Current literature

**DS 288 (AUG) 3:0 Numerical Methods ***(Equivalent to UE 201 : Introduction to Scientific Computing)*

*(Equivalent to UE 201 : Introduction to Scientific Computing)*

*Ratikanta Behera*

Review of multivariable calculus (Derivative, gradients, partial derivatives, Jacobian, chain-rule, Hessian), applications to backpropagation and automatic differentiation in ML, first order differential equations, Taylor series, and convergence, Picard’s theorem. Root finding: Functions and polynomials, zeros of a function, roots of a nonlinear equation, bracketing, bisection, Regula falsi method, secant, and Newton-Raphson methods. Interpolation, splines, polynomial fits, Chebyshev approximation. Optimization: Extremization of functions, simple search, Nelder-Mead simplex method, Powell’s method, gradient-based methods. Numerical Integration and Differentiation: Evaluation of integrals, elementary analytical methods, trapezoidal and Simpson’s rules, Romberg integration, Gaussian quadrature and orthogonal polynomials, multidimensional integrals, summation of series, Euler-Maclaurin summation formula, numerical differentiation and estimation of errors.

Pre-requisites: Basic knowledge of multivariate calculus and elementary real analysis

* Richard L. Burden and J. Douglas Faires, Numerical Analysis: Theory and Applications, India Edition, Cengage Brooks-Cole Publishers, 2010.

* Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P., Numerical Recipes in C/FORTRAN, Prentice Hall of India, New Delhi, 1994.

* Borse, G.J., Numerical Methods with MATLAB: A Resource for Scientists and Engineers, PWS Publishing Co., Boston, 1997.

**DS 289 (JAN) 3:1 Numerical Solution of Differential Equations **

*Aditya Konduri*

Ordinary differential equations: Lipschitz condition, solutions in closed form, power series method. Numerical methods: error analysis, stability and convergence, Euler and Runge-Kutta methods, multistep methods, Adams-Bashforth and Adams-Moulton methods, Gear’s open and closed methods, predictor-corrector methods. Sturm-Liouville problem: eigenvalue problems, special functions, Legendre, Bessel, and Hermite functions. Partial differential equations: classification, elliptic, parabolic and hyperbolic PDEs, Dirichlet, Neumann and mixed boundary value problems, separation of variables, Green’s functions for inhomogeneous problems. Numerical solution of PDEs: relaxation methods for elliptic PDEs, Crank-Nicholson method for parabolic PDEs, Lax-Wendroff method for hyperbolic PDEs. Calculus of variations and variational techniques for PDEs, integral equations. Finite element method and finite difference time domain method, method of weighted residuals, weak and Galerkin forms, ordinary and weighted/general least squares. Fitting models to data, parameter estimation using PDEs.

Pre-requisites: Basic course on numerical methods and consent of the instructor.

* Arfken, G.B., and Weber, H.J., Mathematical Methods for Physicists, Sixth Edition, Academic Press, 2005.

* Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P., Numerical Recipes in C/FORTRAN – The art of Scientific Computing, Second Edn, Cambridge University Press, 1998.

* Lynch, D.R., Numerical Partial Differential Equations for Environmental Scientists and Engineers – A First Practical Course, Springer, New York, 2005.

**DS 290 (AUG) 3:0 Modelling and Simulation**

*Soumyendu Raha*

Statistical description of data, data-fitting methods, regression analysis, analysis of variance, goodness of fit. Probability and random processes, discrete and continuous distributions, Central Limit theorem, measure of randomness, Monte Carlo methods. Stochastic Processes and Markov Chains, Time Series Models. Modelling and simulation concepts,Discrete-event simulation: Event scheduling/Time advance algorithms verification and validation of simulation models. Continuous Simulation: Modelling with and Simulation of Stochastic Differential Equations

Pre-requisites: Basic course on numerical methods and consent of the instructor.

* Banks, J., Carson, J.S., and Nelson, B., Discrete-Event System Simulation, Second Edn, Prentice Hall of India, 1996.

* Francois E. Cellier, Ernesto Kofman, Continuous System Simulation, Springer, 2006, ISBN: 0387261028.

* Peter E. Kloden, Eckhard Platen, Numerical Solutions of Stochastic Differential Equations, Springer, Verlog, 1999.

* Peter E. Kloden, Eckhard Platen, Henri Schurz, Numerical Solution of SDE through Computer Experiments, Springer Verlog, 1994

**DS 291 (AUG) 3:1 Finite Elements: Theory and Algorithms**

*Sashikumaar Ganesan*

Generalized (weak) derivatives, Sobolev norms and associated spaces, inner-product spaces, Hilbert spaces, construction of finite element spaces, mapped finite elements, two- and three-dimensional finite elements,Interpolation and discretization error, variational formulation of second order elliptic boundary value problems, finite element algorithms and implementation for linear elasticity, Mindlin-Reissner plate problem, systems in fluid mechanics

Pre-requisites: Good knowledge of numerical analysis along with basic programming background and/or consent from the instructor.

* Sashikumaar Ganesan, Lutz Tobiska: Finite elements: Theory and Algorithms, Cambridge-IISc Series, Cambridge University Press, 2017

* Dietrich Braess, Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics, Cambridge University Press, 3rd ed., 2007.

* Susanne C. Brenner, Ridgway Scott, The Mathematical Theory of Finite Element Methods, Springer-Verlag, 3rd ed., 2008.

* Current literature

**DS 294 (JAN) 3:0 Data Analysis and Visualization**

*Anirban Chakraborty*

Data pre-processing, data representation, data reconstruction, machine learning for data processing, convolutional neural networks, visualization pipeline, isosurfaces, volume rendering, vector field visualization, applications to biological and medical data, OpenGL, visualization toolkit, linear models, principal components, clustering, multidimensional scaling, information visualization.

Pre-requisites: Basic knowledge of numerical methods and consent from instructor

* Hansen, C.D., and Johnson, C.R., Visualization Handbook, Academic Press, 2004.

* Ware, C., Information Visualization: Perception for Design, Morgan Kaufmann, Second Edn, 2004.

* Current literature

**DS 295 (JAN) 3:1 Parallel Programming (****www)**

*Sathish Vadhiyar*

Parallel Algorithms: MPI collective communication algorithms including prefix computations, sorting, graph algorithms, GPU algorithms; Parallel Matrix computations: dense and sparse linear algebra, GPU matrix computations; Algorithm models: Divide-and-conquer, Mesh-based communications, BSP model; Advanced Parallel Programming Models and Languages: advanced MPI including MPI-2 and MPI-3, advanced concepts in CUDA programming; Scientific Applications: sample applications include molecular dynamics, evolutionary studies, N-Body simulations, adaptive mesh reinements, bioinformatics; System Software: sample topics include scheduling, mapping, performance modeling, fault tolerance.

Pre-requisites: Introduction to Scalable Systems course (or)

Students are expected to be prepared on the slides that will be provided on introduction to parallel computing, OpenMP, MPI, CUDA.

* Parallel Computing. Theory and Practice. Michael J. Quinn. Publisher: Tata: McGraw-Hill. ISBN: 0-07-049546-7. 2002.

* Introduction to Parallel Computing. Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar. Publisher: Addison Wesley. ISBN: 0-201-64865-2. 2003.

* An Introduction to Parallel Programming. Peter S Pacheco. Publisher: Morgan Kauffman. ISBN: 978-93-80931-75-3. 2011.

* Online references for OpenMP, MPI, CUDA

* Literature: relevant conference and journal papers.

#### DS 298 (Jan) 3:1 Random Variates in Computation

*Murugesan Venkatapathi*

This course is aimed at introducing graduate students to random variate generation, and statistical methods in computation with continuously varying numbers. Basic sets of operations namely linear algebra, integration of functions, and evaluation of statistical parameters are addressed in high dimensions where a purely numerical approach may either be unviable or significantly less efficient. The following is a brief description of the contents of the coursework.

**Topics**:

Part I – Random variate generation: Descriptive statistics; probability distributions; convergence of samples; concentration inequalities; operations on random variables and transformations; variates using inverse transform method; numerical stability of inversion; rejection sampling; scaling of rejection sampling with number of dependent variables; acceptance-complement method; linear transformations of multivariate distributions; specialized algorithms.

Part II – Randomized numerical linear algebra: Randomized SVD approximations and low-rank projections; matrix norm estimation; approximate matrix multiplication; single-view/streaming approximations of a matrix; randomized solution of linear system of equations and linear regressions.

Part III – Random sampling and integration/estimation: Monte Carlo sampling; brief note on quasi-Monte Carlo (QMC) and deterministic sampling; Markov Chain Monte Carlo (MCMC) methods (Gibbs sampler, Metropolis type updates, and Hamiltonian dynamics); high-dimensional integration using MCMC; non-convex domains and integration using N-Sphere Monte Carlo (NSMC); stopping and confidence intervals; scaling of methods with number of dimensions.

**Prerequisites**: Undergraduate level statistics and graduate level linear algebra in the engineering curriculum.

**Reference material:**

- Luc Devroye,
*Non-uniform random variate generation*, Springer-Verlag, New York 1986. - Martinsson, P. and Tropp, J., Randomized numerical linear algebra: Foundations and algorithms,
*Acta Numerica*29, 403-572 (2020). - Petros Drineas, Ravi Kannan, and Michael W. Mahoney, Fast Monte Carlo Algorithms for Matrices,
*SIAM Journal on Computing*36:1, 132-206 (2006). - Avrim Blum, John Hopcroft, and Ravindran Kannan,
*Foundations of Data Science*. - Other reference materials to be distributed by the instructor.

**DS 299 0:28 Dissertation Project**

This includes the analysis, design of hardware/software construction of an apparatus/instruments and testing and evaluation of its performance. The project work is usually based on a scientific/engineering problem of current interest. Every student has to complete the work in the specified period and should submit the Project Report for final evaluation. The students will be evaluated at the end first year summer for 4 credits. The split of credits term wise is as follows 0:4 Summer, 0:8 AUG, 0:16 JAN.

**DS 303 (AUG) 2:0 Chemoinformatics**

*Debnath Pal*

Exploring current chemoinformatics resources for synthetic polymers, pigments, pesticides, herbicides, diagnostic markers, biodegradable materials, biomimetics. Primary, secondary and tertiary sources of chemical information. Database search methods: chemical indexing, proximity searching, 2D and 3D structure and substructure searching. Introduction to quantum methods, combinatorial chemistry (library design, synthesis and deconvolution), spectroscopic methods and analytical techniques. Analysis and use of chemical reaction information, chemical property information, spectroscopic information, analytical chemistry information, chemical safety information. Representing intermolecular forces: ab initio potentials, statistical potentials, forcefields, molecular mechanics. Monte Carlo methods, simulated annealing, molecular dynamics. High throughput synthesis of molecules and automated analysis of NMR spectra. Predicting reactivity of biologically important molecules, combining screening and structure ‘SAR by NMR’. Computer storage of chemical information, data formats, OLE, XML, web design and delivery.

Pre-requisites: Basic knowlege of chemistry and background in Maths.

* Current Scientific Literature and Web lectures: Lectures posted online.

* Maizell, R.E., How to find Chemical Information: A guide for Practicing Chemists, Educators, and students, John Wiley and Sons, 1998. ISBN 0-471-12579-2.

* Gasteiger, J., and Engel, T., Chemoinformatics. A Textbook, Wiley-VCH, 2003. ISBN: 3-527-30681-1

**DS 305 (AUG) 3:1 Topics in Web-scale Knowledge Harvesting**

*P P Talukdar*

Entity extraction, entity normalization, entity categorization, relation extraction, distant supervision, curriculum learning, knowledge base (KB) inference, open information extraction (OpenIE), temporal inference,ontology evolution, bootstrapped learning, learning from limited supervision in KBs, scalable learning and inference over large datasets for KB construction, recent KB construction systems, multilingual knowledge acquisition, knowledge acquisition from multiple modalities, representation learning for knowledge harvesting.

Pre-requisites: Basic knowledge of machine learning and/or natural languageprocessing will be helpful although not mandatory.

Current Literature.

**DS 307 (AUG) 3:0 (elective) Ethics in AI**

*Danish Pruthi *

We interact with AI technology on a daily basis—such systems answer the questions we ask (using Google, or other search engines), curate the content we read, unlock our phones, allow entry to airports, etc. Further, with the recent advances in large language and vision models, the impact of such technology on our lives is only expected to grow. This course introduces students to ethical implications associated with design, development and deployment of AI technology spanning NLP, Vision and Speech applications.

Specifically, this seminar course would facilitate discussions among students structured around pre-selected readings on topics related to ethics in AI, including but not limited to:

- Foundational and philosophical frameworks of ethics (e.g., consequentialism, deontology, virtue ethics) to reason about ethical dilemmas
- Experimenting with human-subjects: protocols and guidelines
- Ethical concerns associated with data collection and curation
- Biases and algorithmic fairness; debiasing and mitigating harms
- Misinformation, disinformation and hate-speech; approaches to identify propaganda and manipulation in news, to identify fake news, deep fakes, political framing. ● Privacy; protection algorithms against personality profiling
- Algorithmic Audits; transparency; explainability; robustness
- Content Moderation: Copyright, Recognizing AI generated content, watermarking; regulations and policies (and how ethical considerations depend on countries’ policies)
- Environmental impact of model training and inference

Prerequisites: The class is intended for graduate students and senior undergraduates. Students should have finished at least a basic machine learning course, and any one course related to the discussed applications (computer vision, speech or NLP).

Class size: We plan to cap the course to 25 students to better facilitate in-class discussions

**DS 323 (AUG) 1:1 Parallel Computing for Finite Element Methods**

*Sashikumaar Ganesan*

This course will provide an introduction to parallel finite element data structure and its efficient implementation in ParMooN (Parallel Mathematics and object oriented Numerics), an open source parallel finite element package. Further, the implementation of the parallel (MPI/OpenMPI) geometric multigrid solver will also be taught. Parallel finite element solution of scalar and incompressible Navier-Stokes equations in two- and three-dimensions using ParMooN (cmg.cds.iisc.ac.in/parmoon/) will also be a part of this course..

Pre-requisites: Good knowledge of finite element methods and C/C++.

- Sashikumaar Ganesan, Lutz Tobiska: Finite elements: Theory and Algorithms, Cambridge-IISc Series, Cambridge University Press, 2017
- An Introduction to Parallel Programming. Peter S Pacheco. Publisher: Morgan Kauffman. ISBN: 978-93-80931-75-3. 2011
- Current literature

**DS 360 (JAN) 3:0 Topics in Medical Imaging**

*Phaneendra K Yalavarthy*

Three-dimensional Medical Image Processing, Medical Image reconstruction using high performance computing, General Purpose Graphics Processing Units (GP-GPU) computing for Medical Image processing, reconstruction, and Analysis, Computer Aided Detection (CAD) systems – Algorithms, Analysis, Medical Image Registration: rigid and non-rigid registration, Volume based image analysis, Medical Image Enhancement: Deblurring techniques, Four-dimensional Medical Imaging, Molecular Imaging, Diffuse Optical Tomography, and Medical Image Informatics.

Pre-requisites: DS 260 or E9 241 or consent from the Instructor.

* Current Literature

**DS 363 (Aug) 3:1 Topics in Visual Analytics**

*Anirban Chakraborty and R. Venkatesh Babu*

This course aims to provide an introduction to research topics in the area of computer vision and machine learning and would be beneficial for students who are pursuing or intend to pursue research in the aforementioned area. We shall read and discuss an eclectic mix of classic and recent research papers on topics including (but not limited to) object and scene recognition, grouping, segmentation, pose modelling, motion estimation and visual tracking, activity recognition, 3D scene representation and understanding, vision and language models, deep generative models, vulnerabilities of deep vision models and mitigation strategies, zero/few-shot learning, domain adaptation, continual learning for vision tasks etc. This predominantly paper-reading style course would be interspersed with lectures/tutorials clarifying the fundamentals needed to assimilate the more advanced topics. Students will also need to complete significant hands-on projects towards successful completion of the course.

**Prerequisites**: A first course in data analysis or machine learning (e.g., DS 216, E1 213, E0 270, DS 265 etc. or similar) is a mandatory requirement. A course in computer vision (e.g., DS 265), image processing (e.g., E9 241, E9 246 etc.) or related fields (e.g., DS 261); or prior exposure to computer vision projects would be strongly preferred.

**Resources**: Current literature and classic papers from the domain

**DS 391 (JAN) 3:0 Data Assimilation to Dynamical Systems**

*Soumyendu Raha*

Quick introduction to nonlinear dynamics: bifurcations, unstable manifolds and attractors, Lyapunov exponents, sensitivity to initial conditions and concept of predictability. Markov chains, evolution of probabilities (Fokker-Planck equation), state estimation problems. An introduction to the problem of data assimilation (with examples) Bayesian viewpoint, discrete and continuous time cases Kalman filter (linear estimation theory) Least squares formulation (possibly PDE examples) Nonlinear Filtering: Particle filtering and MCMC sampling methods. Introduction to Advanced topics (as and when time permits): Parameter estimation, Relations to control theory, Relations to synchronization.

Pre-requisites: Consent from the Instructor.

* Edward Ott, Chaos in Dynamical Systems, Camridge press, 2nd Edition, 2002.(or one of the many excellent books on dynamical systems)

* Van Leeuwen, Peter Jan, Cheng, Yuan, Reich, Sebastian, Nonlinear Data Assimilation, Springer Verlag, July 2015.

* Sebastian Reich, Colin Cotter, Probabilistic Forecasting and Bayesian Data Assimilation, Cambridge University Press, August 2015

* Law, Kody, and Stuart, Andrew, and Zygalakis, Konstantinos, Data Assimilation, A Mathematical Introduction, Springer Texts in Applied Mathematics, September 2015.

**DS 392 (JAN) 3:1 Environmental Data Analytics**

*Deepak Subramani*

The course introduces and trains students in different Data Analytics techniques used in the Geosciences including Machine Learning and Deep Learning algorithms. Emphasis is laid on understanding the algorithms as well as using them in practice. The course is designed as research case studies from Ocean Modeling, Remote Sensing and Study of the Natural Environment with significant hands-on activity. At the end of the course, students would be able to recognize, model and solve geoscience problems that require application of data analytics methods.

Syllabus: Data-Driven Modelling in the Geosciences: Problem Formulation and Computational Modeling Approaches. Handling and Analysing Spatiotemporal Geoscience Data (Remote Sensing, In-situ instruments, Primitive Equation Models). [1 Week]

Hands-on Applications of Supervised (Linear Methods, Nonlinear Methods) and Unsupervised Learning (Clustering, Dimensionality Reduction) in Environmental Analytics. [5 Weeks]

Hands-on Applications of Deep Learning (Multi-Layer Perceptron, Convolutional and Recurrent Architectures) in Remote Sensing of the Natural Environment. [3 Weeks]

Bayesian Inference and Data Assimilation for Physics-based Dynamic Data-driven Environmental Systems. [3 Weeks]

Reinforcement Learning for Ocean Sensing. [2 Weeks]

Prerequisites: DS 211 (Numerical Optimization), DS 221 (Introduction to Scalable Systems), DS 284 (Numerical Linear Algebra), Consent of Instructor

Evaluation: Case studies based on recent literature and research from QUEST Lab; Fortnightly Quizzes; Coding Competitions (Mid-Term and End-Term); Course Project (which must be suitable for a publication)

Textbooks:

1. Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. Second Edition. O’Reilly Media, 2019.

2. Särkkä, Simo. Bayesian Filtering and Smoothing. Cambridge University Press, 2013.

3. Murphy, Kevin P. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.

4. Recent Literature, Selected Chapters, Material/Notes Provided by Instructor

**DS 393 (JAN) 3:1 High-performance computing for Quantum Modeling of Materials**

*Phani Motamarri*

Course Description: Quantum mechanics based electronic structure calculations are widely used in materials science, chemistry, physics, and related fields for modelling materials. This area has recently received increasing research attention in computational science and extreme-scale computing. This advanced elective course is designed to equip students with the relevant mathematical theory, numerical algorithms and implementation aspects underlying the widely used computational techniques employed in electronic structure calculations. The course will also highlight methods applicable to today’s exascale computing era. At the end of the course, students will appreciate key computational and implementation aspects relevant to behind-the-scene algorithms of black-box codes used for electronic structure calculations.

**Syllabus:**

Part I — * Review of fundamentals*: Dirac notation, Linear vector spaces, Calculus of variations, Classical mechanics and postulates of quantum mechanics, System of Identical Particles, Slater Determinant, Hartree Fock Theory, Density Functional Theory equations (DFT), Pseudopotentials in DFT, PAW formalism in DFT, Density Functional Perturbation Theory;

*: Plane waves, Atomic-orbital basis sets, LAPW approach, Finite-difference, Finite-elements*

__Review of state-of-the-art basis for electronic structure calculations__Part II — * Iterative eigenvalue algorithms for large-scale DFT calculations*: Davidson type methods, Preconditioned Descent methods, Preconditioned Conjugate Gradient methods, LOBPCG approach, RMM-DIIS method, Chebyshev filtered subspace iteration;

*Krylov subspace, GMRES, MinRES, Conjugate gradient;*

__Iterative algorithms for solving large linear system of equations:__*: Kohn-Sham fixed-point problem, SCF convergence, Simple mixing, Quasi-Newton approaches (Broyden, Anderson, Pulay, Kerker methods);*

__Mathematical theory of mixing schemes used for DFT__*Steepest descent, BFGS, L-BFGS, FIRE*

__Geometry Optimization methods:__Part III — * Scalable methods for DFT calculations using finite-elements (DFT-FE):* Introduction to heterogenous architectures of today, Importance of accurate and efficient large-scale DFT calculations with generic boundary conditions, Local real-space variational formulation for DFT, Finite-element (FE) discretization of DFT equations, Mixed precision based subspace iteration algorithms for FE discretized eigenproblem on hybrid CPU-GPU architectures, Generalised force approach for computing atomic forces and stresses in DFT-FE

**Prerequisites: **Background in Numerical Methods/Numerical Linear Algebra/Scalable systems/Quantum Mechanics, Working knowledge of any DFT code, and/or consent from instructor

**Evaluation:** Student presentations on the above topics, Coding assignments, Course projects involving numerical implementations on HPC architectures

**References/Textbooks:**

- Richard M Martin, Electronic Structure – Basic Theory and Practical Methods, Second Edition, Cambridge University Press, 2020.
- Zhaojun Bai, James Demmel, Jack Dongorra, Axel Ruhe — Templates for the

Solution of Algebraic Eigenvalue Problems, A Practical Guide, SIAM Publishers http://www.netlib.org/utk/people/JackDongarra/etemplates/index.html - Lin Lin, Jianfeng Lu — A Mathematical Introduction to Electronic Structure Theory, SIAM Publishers
- Research Papers, Material/Notes Provided by Instructor

**DS 397 (JAN) 2:1 Topics in Embedded Computing**

*S K Nandy*

Introduction to embedded processing, dataflow architectures, architecture of embedded SoC platforms, dataflow process networks, compiling techniques/optimizations for stream processing, architecture of runtime reconfigurable SoC platforms, simulation, design space exploration and synthesis of applications on runtime reconfigurable SoC platforms, additional topics including but not limited to computation models for coarse grain reconfigurable architectures (CGRA), readings and case study of REDEFINE architecture, compiler back-ends for CGRAs.

Pre-requisites: Basic knowledge of digital electronics, computer organization and design, computer architecture, data structures and algorithms, and consent of instructor.

* Current literature.

**DS 216 (JAN) Machine Learning for Data Science **

*Vaanathi Sundaresan*

**Course description: **This four credits course aims to cover machine learning techniques and statistical methods required for planning, developing and evaluating methods, especially applicable for various data analysis tasks. The course would also require students to implement programming assignments/projects related to these topics.

**Learning objectives: **

- To understand the machine learning concepts and choose appropriate techniques/methods for various tasks.
- To be able to build a data analytics pipeline suited for various real-world applications for various modalities – e.g., audio/speech/sensory signal processing, image analysis applications such as object detection, tracking or counting, business/commercial survey data analysis etc.
- Evaluate the performance of the method with respect to a gold standard target and analyze the competency of the method.
- Determine the significance in the improvement/change in the performance of the method given the population/dataset size in the real-world scenarios.

**Topics:**

**Foundations of machine learning: **

**Review of ML fundamentals:**Un/semi/self-supervised learning, feature-based clustering, model fitting, linear regression, Generalized linear models, Discriminative models: logistics regression, discriminant analysis basics, regularization (2 weeks)**Unsupervised/supervised ML techniques:**Clustering techniques; Expectation maximization (EM) – K-Nearest Neighborhood classifiers, Gaussian Mixture Models, Generalized EM; Representation learning; Supervised ML methods: kernel-based methods – support vector machines, ensemble methods: Classification and regression trees (CART), boosting/bootstrap aggregation, Bayesian networks – hidden Markov models, Conditional random fields. (4.5 weeks)**Dimensionality reduction techniques:**Principal component analysis (PCA), linear discriminant analysis (LDA), T-Stochastic Neighborhood Estimation (TSNE), independent component analysis (ICA). (1.5 week)**Deep learning basics:**Computational graphs, feedforward networks, loss functions, convolutional neural networks, backpropagation, optimization, feature saliency and visualization, convolutional neural networks, encoder-decoder models, graph-based models, generative models. (2 weeks)

**ML applications and statistical evaluation:**

**C****lassification****, segmentation and decision-making:**Template matching, correlation – audio/speech signals; Regression and classification on publicly available sensory/survey data, image segmentation & classification – machine learning classifiers, feature-based and rule-based decision making, uncertainty estimation. (1 week)**Evaluation of analysis tasks:**Evaluation metrics, segmentation evaluation metrics (IoU, Dice, Jaccard indices, Hausdorff distance measures), classification evaluation metrics (confusion matrix, sensitivity, specificity, accuracy), registration metrics (MSE, MAE). (1 week)**Testing statistical significance of ML applications**:

**Practicals:**

- Implementation of linear and logistic regressions, least square fitting, under/overfitting, regularization, LDA
- Scikit-learn: ML classifiers, comparison of unsupervised/supervised ML classifiers on data, feature ranking, feature reduction, Pytorch/tensorflow: deep learning parameter tuning, training regime, feature visualization/saliency. Python: Statistical tests (t-test/ANOVA).

**Prerequisites: **Basic knowledge in linear algebra, probability or a consent from the instructor. Good proficiency in programming.

**Evaluation: **assignments, ML application project, mid-term, final exam

**References:**

- C.M.Bishop, Pattern Recognition and Machine Learning, Springer, 2006,
- I. Goodfellow, Y. Bengio and A. Courville: Deep Learning, 2016.
- Jerome H. Friedman, Robert Tibshirani and Trevor Hastie, The Elements of Statistical Learning, Springer, 2001.
- R.O.Kuehl, Design of experiments : statistical principles of research design and analysis, 2000.
- Research papers, material/notes provided by instructor.