Overview
Download Course Flyer
Cloud computing is a key distributed systems paradigm that has grown popular in the last few years. Cloud technologies are pervasive, touching our daily lives any time we access the world wide web, use a mobile app, or make a retail purchase. Clouds are also the de facto infrastructure for "Big Data" applications. While innovative Cloud services are offered by information technology companies, Cloud computing is also grounded in foundational distributed systems and scalable software systems principles, and is an active area of research by the academic community.
This introductory course on Cloud computing will teach both the fundamental concepts of how and why Cloud systems works, as well as Cloud technologies that manifest these concepts, such as from Amazon AWS, Microsoft Azure, and OpenStack. Students will learn distributed systems concepts like virtualization, data parallelism, CAP theorem, and performance analysis at scale. They will also get a practitioners view by learning "Big Data" programming patterns such as Map-Reduce (Hadoop), Vertex-centric graphs (Giraph) and Continuous Dataflows (Storm), and NoSQL storage systems to build Cloud applications. Besides a hands-on project on Cloud infrastructure, the course will include research readings and guest lectures from industry.
Students who perform well in this course will be eligible to undertake their final year M.Tech./M.E. project in the DREAM:Lab under the instructor's supervision.
↑Intended Learning Outcomes (ILO)
At the end of the course, students will have achieved the following learning objectives.
- Parallel and Distributed Systems Context: Classify and describe the architecture and taxonomy of parallel and distributed computing, including shared and distributed memory, and data and task parallel computing. Explain and contrast the role of Cloud computing within this space.
- Cloud Virtualization, Abstractions and Enabling Technologies: Explain virtualization and their role in elastic computing. Characterize the distinctions between Infrastructure, Platform and Software as a Service (IaaS, PaaS, SaaS) abstractions, and Public and Private Clouds, and analyze their advantages and disadvantages. Describe service oriented architectures that are foundational to the WWW.
- Programming Patterns for "Big Data" Applications on Cloud: Examine the design of task and data parallel distributed algorithms for Clouds and use them to construct Cloud applications. Demonstrate the use of Map-Reduce, Vertex-Centric and Continuous Dataflow programming models. Apply Amdahl's law and data locality principles to analyze and characterize the potential speedup of Cloud applications.
- Application Execution Models on Clouds: Compare synchronous and asynchronous execution patterns. Design and implement Cloud applications that can scale up on a VM and out across multiple VMs. Demonstrate the use of data marshalling/unmarshalling for executing remote Cloud applications, and use asynchronous queues for coordination and synchronization of concurrent tasks. Illustrate the use of NoSQL Cloud storage for information storage and retrieval.
- Performance, scalability and consistency on Clouds: Illustrate the use of load balancing techniques for stateful and stateless applications. Describe and compare different performance metrics for evaluating Cloud applications and demonstrate their use for application measurement. Explain the distinctions between Consistency, Availability and Partitioning (CAP theorem), and discuss the types of Cloud applications that exhibit these features.
Pre-requisites
While this is an introductory course in Cloud computing, it builds upon prior knowledge that students have on computing and software systems and programming knowledge. Students must be familiar with Data Structures (e.g. Arrays, Queues, Trees, Hashmaps, Graphs) and Algorithms (e.g. Sorting, Searching, Graph traversal, String algorithms, etc.). Students must be comfortable with programming these data structures and algorithms, preferably using Java v5 or above, or Python. Practical experience with network (socket) programming is encouraged. One of the following courses, or prior approval by the instructor, is required: SE 286 (Data Structures & Programming), SE 292 (HPC), SE 295 (Parallel Programming), E0 251 (Data Structures and Algorithms), E0 253 (Operating Systems) or E0 264 (Distributed Computing Systems).
↑Teaching and Learning Activities
- Lectures
- Lectures will form the primary teaching activity, the schedule for which is outlined below. Lecture material will address the intended learning objectives, and loosely follow the corresponding chapters identified in the course text book. The lecture material will be made available before the class and the lectures are meant to be interactive, where learning takes place through interactive discussion in class. A mailing list/online forum will be available for discussions outside the classroom, between students and with the faculty. Student engagement in class and in the online forum on will count towards assessment of student participation that has 5% of assessment weightage.
- Guest Lectures
- Structured lectures will be supplemented by several guest lectures by practitioners and researchers from industry and academia. These will serve to show the practical relevance of the course content and also the open problems that remain.
- Homework
- Homework serve the dual purposes of forming a learning activity as well as a means for assessment. There will be three homework assignments involving short answer questions and problem solving, each counting towards 10% of the assessment weightage. The students will be expected to understand and apply concepts learnt from class lectures and the text book, as well as use online resources, to complete these assignments.
- Research Reading & Summarization
- Cloud computing is an active area of research and it is important to understand both the gaps in technology and the novel research in this field. Graduate students also need to be able to build upon concepts learnt in the course to explore active research. Students will be expected to read one paper from a selection of suggested ones, and submit a 2 page report that summarizes: the key hypothesis or problem being solved, the novel research techniques being used, the experiments or analysis to support the hypothesis, and a justification/critique of the positive/negative aspects of the paper. This carries a 10% weightage for assessment.
- Project
- A student software project will encourage problem-based learning. The project will apply foundational concepts discussed in the lectures to practical applications. Students will be given several project topics to choose from or may propose a topic of their own. Students will be provided with Cloud computing resources on OpenStack Private Cloud available at the DREAM:Lab. The project may be performed in teams of up to two students, with both participants expected to work cohesively and contribute equally to the design, development and analysis. A report and demo at mid-term and at final are required. The total assessment weightage for the project is 30%.
- Exam
- There will be two exams for the course, a mid-term and a final exam, with 10% and 15% assessment weightages respectively. The mid-term exam will assess the intended learning objectives covered until the seventh week of classes, while the final exam will assess all the intended learning objectives from the entire course.
Assessment
The total assessment score for the course is based on a 1000 point scale. Of this, the weightage to different activities will be as follows:
30% Homework | Three homework assignments (100 points each). |
10% Research Summary | Reading and summary report on one research paper (100 points). |
30% Project | One ungraded but required assignment, one mid-term project review and demo(100 points), and one final project review and demo(200 points). |
25% Exams | One Mid-term (100 points) and one Final (150 points) exam. |
5% Participation | Participation (i.e. not just "attendance") in classroom discussions and online forum for the course (50 points). |
Academic Integrity
Students must uphold IISc's Academic Integrity guidelines. While these are common sense, it is helpful to review them since failure to follow them will lead to sanctions and penalties. This includes a reduced or failing grade in the course. Severe cases of academic violations will be reported to the Institute and may lead to an expulsion.
Learning takes place both within and outside the class. Hence, discussions between students and reference to online material is encouraged as part of the course to achieve the intended learning objectives. However, while you may learn from any valid source, you must form your own ideas and complete problems and assignments by yourself. All works submitted by the student as part of their academic assessment must be their own.
- Plagiarism
- Verbatim reproduction of material from external sources (web pages, books, papers, etc.) is not acceptable. If you are paraphrasing external content (or even your own prior work) or were otherwise influenced by them while completing your assignments, projects or exams, you must clearly acknowledge them. When in doubt, add a citation!
- Cheating
- While you may discuss lecture topics and broad outlines of homework problems and projects with others, you cannot collaborate in completing the assignments, copy someone else's solution or falsify results. You cannot use notes or unauthorized resources during exams, or copy from others. The narrow exception to collaboration is between team-mates when competing the project, and even there, the contribution of each team member for each project assignment should be clearly documented.
- Classroom Behavior
- Ensure that the course atmosphere, both in the class, outside and on the online forum, is conducive for learning. Participate in discussions but do not dominate or be abusive. There are no “stupid” questions. Be considerate of your fellow students and avoid disruptive behavior.
Resources
Textbook | Select topics from Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, Kai Hwang, Jack Dongarra and Geoffrey Fox, Morgan Kaufmann, 2011 (Tata Book House) |
Online Forum | se252.jan15@mailman.serc.iisc.in | Mailman Info Webpage (To Be Activated) |
Teaching & Office Hours
Lecture | Tue/Thu 2-330PM, SERC 202 |
Office Hours | Fri 4-5PM or by appointment (i.e., send email), SERC 411 |
Tentative Schedule
Schedule is based on two 1.5 hour lectures on Tue/Thu 2-330PM each week, and 3 hours of independent practical exercise.
Lecture No. | Date | Topics Covered & Assignments | Slides |
---|---|---|---|
1 | Tue 6 Jan | Course Introduction Assignment: Sign up on mailing list. | L1 |
2 | Thu 8 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (Web Services & SOA) | L2 |
3 | Tue 13 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/PaaS/SaaS) Project 0 available. | L3 |
4 | Tue 20 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (Virtualization) | L4 |
5 | Thu 22 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/AWS) Project 0 submission due. Project topics available. | L5/6 |
6 | Tue 27 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/AWS/OpenStack) Homework A Available | L5/6 |
7 | Thu 29 Jan | ILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (PaaS) | L7 |
8 | Fri 30 Jan | ILO 1: Parallel and Distributed Systems Context (Flynn's Taxonomy) | L8+9 |
9 | Tue 3 Feb | ILO 1: Parallel and Distributed Systems Context (Distributed Comp Models) | L8+9 |
10 | Thu 5 Feb | ILO 1: Parallel and Distributed Systems Context (Scalability Metrics) Project topics and teams decided. Research paper list available Homework A Submission Due on Feb 6. | L10 |
11 | Tue 10 Feb | ILO 3: Algorithms and Programming Patterns for Cloud Applications (How to review a paper? Task, Data and Pipeline Parallelism) Research Paper Selection Completed by Thu 12 Feb. | L11+12 |
* | Thu 12 Feb | OpenStack and Scheduling (Vedsar) | OpenStack |
12 | Thu 19 Feb | ILO 3: Algorithms and Programming Patterns for Cloud Applications (Task, Data and Pipeline Parallelism) | L11+12 |
13 | Tue 24 Feb | ILO 3: Algorithms and Programming Patterns for Cloud Applications (Map-Reduce and Hadoop/HDFS) | L13+14 |
14 | Thu 26 Feb | ILO 3: Algorithms and Programming Patterns for Cloud Applications (Map-Reduce and Hadoop/HDFS) | L13+14 |
15 | Tue 3 Mar | ILO 3: Algorithms and Programming Patterns for Cloud Applications (Graph Analytics and Giraph) | L15+16 |
* | Thu 5 Mar | Mid-term Exam | |
16 | Tue 10 Mar | ILO 3: Algorithms and Programming Patterns for Cloud Applications (Graph Analytics and Giraph) Research Paper mid-term draft submission due. |
L15+16 |
17 | Thu 12 Mar | ILO 4: Application Execution Models on Clouds (Cloud Scheduling Characteristics) Project Mid-term Report Submission Due. |
L17 |
* | Fri 13 Mar | Project Mid-term Review and Demo from 3-6PM. | |
18 | Tue 17 Mar | ILO 4: Application Execution Models on Clouds (List Scheduling and DAG Scheduling) | L18+19 |
19 | Thu 19 Mar | ILO 4: Application Execution Models on Clouds (List Scheduling and DAG Scheduling) | L18+19 |
20 | Tue 24 Mar | ILO 4: Application Execution Models on Clouds (Dynamic Scheduling) | L20 |
21 | Tue 31 Mar | ILO 5: Performance, scalability and consistency on Clouds (Cloud Performance Benchmarks & Monitoring) Homework B Available |
L21+22 |
22 | Tue 7 Apr | ILO 5: Performance, scalability and consistency on Clouds (Cloud Performance Benchmarks & Monitoring) | L21+22 |
23 | Thu 9 Apr | ILO 5: Performance, scalability and consistency on Clouds (CAP Theorem) Homework B Due on Fri 10 Apr |
L23+24 |
24 | Tue 14 Apr | ILO 5: Performance, scalability and consistency on Clouds (CAP Theorem) | L23+24 |
25 | Thu 16 Apr | ILO 5: Performance, scalability and consistency on Clouds (BASE & Weak Consistency) Homework C Available. Due Sat 25 Apr. |
L25 |
* | Thu 23 Apr | Final Research Paper Due. Final Project Report Submission Due. | |
* | Fri 24 Apr | Final Project Review and Demo between 3-6PM. | |
* | Sat 25 Apr | Homework C Submission Due | |
* | Mon 27 Apr | Final Exam from 2-5PM |
Assignments
All assignments, unless noted otherwise, are due by midnight on the mentioned date.
Homeworks
Email your homework to simmhan@serc.iisc.in with subject line "SE252_JAN2015_HW-A_StudentName". Replace HW-A with the "HW-B" and "HW-C" for those submissions, and "StudentName" with your first name.
- Homework A (Updated) has been posted on Fri 30 Jan, due on Fri 6 Feb.
- Homework B has been posted on Tue 31 Mar, due on Fri 10 Apr.
- Homework C has been posted on Thu 16 Apr, due on Sat 25 Apr.
Projects
Email your project submissions to simmhan@serc.iisc.in with subject line "SE252_JAN2015_PROJ-0_StudentName". Replace PROJ-0 with the "PROJ-MIDTERM" and "PROJ-FINAL" for those submissions.
Revised Project 0 is posted. Revised startup code to begin Project 0 is available. Project 0 is due on Thu 22 Jan.
List of Projects: Students can also propose a different topic.Topic and team (max. 2 students) selection due on Thu 5 Feb.
Mid-term project report due on Thu 12 Mar.
Mid-term project review and demo on Fri 13 Mar from 3-6PM.
Final project report due on Thu 23 Apr.
Final project review and demo on Fri 24 Apr from 3-6PM.
ID Type Title Team --- ------- --------------------------------------- ------------------------- 1 App-aaS IISc Campus Map using OpenStreetMaps Arnab Sen, Chetan Mahajan 2 PaaS Edge+Cloud CEP Processing for IoT Niranjan Singh 3 IaaS USB Cloud Simulator for AWS Diptaparna Biswas 4 PaaS Online Analytics/Viz on Storm+Hive Vamshi, Anshu 5 Analytics-aaS Time-series graph algorithms using NELL Varshitha, Ravikant
Research Summary
Email your research summary to simmhan@serc.iisc.in with subject line "SE252_JAN2015_RES-MIDTERM_StudentName". Replace RES-MIDTERM with the "RES-FINAL" for that submission.
Research paper list assigned on Tue 17 Feb.Research Paper mid-term draft submission is due on Tue 10 Mar.
Final Research Paper Summary due by Thu 23 Apr.
[1] Arnab Sen: Applications of Social Networks and Crowdsourcing for Disaster Management Improvement, Besaleva, L.I., Weaver, A.C., International Conference on Social Computing (SocialCom), 2013, http://dx.doi.org/10.1109/SocialCom.2013.38 [2] Chetan Mahajan: Jie Li, Marty Humphrey, Deborah A. Agarwal, Keith R. Jackson, Catharine van Ingen, Youngryel Ryu: eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform. IPDPS 2010, http://dx.doi.org/10.1109/IPDPS.2010.5470418 [3] Niranjan Singh: CloneCloud: elastic execution between mobile device and cloud, Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti, In Conference on Computer systems (EuroSys), 2011, http://doi.acm.org/10.1145/1966445.1966473 [4] Diptaparna Biswas: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Rodrigo N. Calheiros, Rajiv Ranjan2, Anton Beloglazov1, Cesar A. F. De Rose3 and Rajkumar Buyya1, Software: Practice and Experience, Volume 41, Issue 1, pages 23-50, January 2011, http://dx.doi.org/10.1002/spe.995 [5] Vamshi: Meteor Shower: A Reliable Stream Processing System for Commodity Data Centers, Huayong Wang, Li-Shiuan Peh ; Koukoumidis, E. ; Shao Tao ; Mun Choon Chan, IPDPS, 2012. http://dx.doi.org/10.1109/IPDPS.2012.108 [6] Anshu: Christopher Olston, Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B.N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, Topher ZiCornell, and Xiaodan Wang. 2011. Nova: continuous Pig/Hadoop workflows. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11), http://dx.doi.org/10.1145/1989323.1989439 [7] Varshitha: Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). http://dx.doi.org/10.1145/2465351.2465369 [8] Ravikant: Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. In Proceedings of the 39th international conference on Very Large Data Bases (PVLDB'13), http://dx.doi.org/10.14778/2535570.2488333
Acknowledgement
The course syllabus has been designed based on the Curriculum Initiative on Parallel and Distributed Computing by the NSF/IEEE-TCPP, and the Computer Science Curricula 2013, by the ACM/IEEE-Computer Society's Joint Task Force on Computing Curricula.