Supercomputer Education and Research Centre (SERC)   |   Indian Institute of Science (IISc), Bangalore

Course website for past years: [Jan 2014]

Introduction to Cloud Computing

InstructorYogesh Simmhan (www | mail)
NumberSE252
Credits3:1
SemesterJan, 2015
ScheduleTTh 2-330PM (First class on Tue 6 Jan, 2015)
RoomSERC 202
Pre-requisitesKnowledge of Data Structures, Programming and Algorithm concepts. Programming experience required, preferably in Java 5+ or Python. One of the following courses or prior instructor approval is required -- SE 286 (Data Structures & Programming), SE 292 (HPC), SE 295 (Parallel Programming), E0 251 (Data Structures and Algorithms), E0 253 (Operating Systems) or E0 264 (Distributed Computing Systems).

Overview

Download Course Flyer

Cloud computing is a key distributed systems paradigm that has grown popular in the last few years. Cloud technologies are pervasive, touching our daily lives any time we access the world wide web, use a mobile app, or make a retail purchase. Clouds are also the de facto infrastructure for "Big Data" applications. While innovative Cloud services are offered by information technology companies, Cloud computing is also grounded in foundational distributed systems and scalable software systems principles, and is an active area of research by the academic community.

This introductory course on Cloud computing will teach both the fundamental concepts of how and why Cloud systems works, as well as Cloud technologies that manifest these concepts, such as from Amazon AWS, Microsoft Azure, and OpenStack. Students will learn distributed systems concepts like virtualization, data parallelism, CAP theorem, and performance analysis at scale. They will also get a practitioners view by learning "Big Data" programming patterns such as Map-Reduce (Hadoop), Vertex-centric graphs (Giraph) and Continuous Dataflows (Storm), and NoSQL storage systems to build Cloud applications. Besides a hands-on project on Cloud infrastructure, the course will include research readings and guest lectures from industry.

Students who perform well in this course will be eligible to undertake their final year M.Tech./M.E. project in the DREAM:Lab under the instructor's supervision.

Intended Learning Outcomes (ILO)

At the end of the course, students will have achieved the following learning objectives.

  1. Parallel and Distributed Systems Context: Classify and describe the architecture and taxonomy of parallel and distributed computing, including shared and distributed memory, and data and task parallel computing. Explain and contrast the role of Cloud computing within this space.
  2. Cloud Virtualization, Abstractions and Enabling Technologies: Explain virtualization and their role in elastic computing. Characterize the distinctions between Infrastructure, Platform and Software as a Service (IaaS, PaaS, SaaS) abstractions, and Public and Private Clouds, and analyze their advantages and disadvantages. Describe service oriented architectures that are foundational to the WWW.
  3. Programming Patterns for "Big Data" Applications on Cloud: Examine the design of task and data parallel distributed algorithms for Clouds and use them to construct Cloud applications. Demonstrate the use of Map-Reduce, Vertex-Centric and Continuous Dataflow programming models. Apply Amdahl's law and data locality principles to analyze and characterize the potential speedup of Cloud applications.
  4. Application Execution Models on Clouds: Compare synchronous and asynchronous execution patterns. Design and implement Cloud applications that can scale up on a VM and out across multiple VMs. Demonstrate the use of data marshalling/unmarshalling for executing remote Cloud applications, and use asynchronous queues for coordination and synchronization of concurrent tasks. Illustrate the use of NoSQL Cloud storage for information storage and retrieval.
  5. Performance, scalability and consistency on Clouds: Illustrate the use of load balancing techniques for stateful and stateless applications. Describe and compare different performance metrics for evaluating Cloud applications and demonstrate their use for application measurement. Explain the distinctions between Consistency, Availability and Partitioning (CAP theorem), and discuss the types of Cloud applications that exhibit these features.

Pre-requisites

While this is an introductory course in Cloud computing, it builds upon prior knowledge that students have on computing and software systems and programming knowledge. Students must be familiar with Data Structures (e.g. Arrays, Queues, Trees, Hashmaps, Graphs) and Algorithms (e.g. Sorting, Searching, Graph traversal, String algorithms, etc.). Students must be comfortable with programming these data structures and algorithms, preferably using Java v5 or above, or Python. Practical experience with network (socket) programming is encouraged. One of the following courses, or prior approval by the instructor, is required: SE 286 (Data Structures & Programming), SE 292 (HPC), SE 295 (Parallel Programming), E0 251 (Data Structures and Algorithms), E0 253 (Operating Systems) or E0 264 (Distributed Computing Systems).

Teaching and Learning Activities

Lectures
Lectures will form the primary teaching activity, the schedule for which is outlined below. Lecture material will address the intended learning objectives, and loosely follow the corresponding chapters identified in the course text book. The lecture material will be made available before the class and the lectures are meant to be interactive, where learning takes place through interactive discussion in class. A mailing list/online forum will be available for discussions outside the classroom, between students and with the faculty. Student engagement in class and in the online forum on will count towards assessment of student participation that has 5% of assessment weightage.
Guest Lectures
Structured lectures will be supplemented by several guest lectures by practitioners and researchers from industry and academia. These will serve to show the practical relevance of the course content and also the open problems that remain.
Homework
Homework serve the dual purposes of forming a learning activity as well as a means for assessment. There will be three homework assignments involving short answer questions and problem solving, each counting towards 10% of the assessment weightage. The students will be expected to understand and apply concepts learnt from class lectures and the text book, as well as use online resources, to complete these assignments.
Research Reading & Summarization
Cloud computing is an active area of research and it is important to understand both the gaps in technology and the novel research in this field. Graduate students also need to be able to build upon concepts learnt in the course to explore active research. Students will be expected to read one paper from a selection of suggested ones, and submit a 2 page report that summarizes: the key hypothesis or problem being solved, the novel research techniques being used, the experiments or analysis to support the hypothesis, and a justification/critique of the positive/negative aspects of the paper. This carries a 10% weightage for assessment.
Project
A student software project will encourage problem-based learning. The project will apply foundational concepts discussed in the lectures to practical applications. Students will be given several project topics to choose from or may propose a topic of their own. Students will be provided with Cloud computing resources on OpenStack Private Cloud available at the DREAM:Lab. The project may be performed in teams of up to two students, with both participants expected to work cohesively and contribute equally to the design, development and analysis. A report and demo at mid-term and at final are required. The total assessment weightage for the project is 30%.
Exam
There will be two exams for the course, a mid-term and a final exam, with 10% and 15% assessment weightages respectively. The mid-term exam will assess the intended learning objectives covered until the seventh week of classes, while the final exam will assess all the intended learning objectives from the entire course.

Assessment

The total assessment score for the course is based on a 1000 point scale. Of this, the weightage to different activities will be as follows:

30% HomeworkThree homework assignments (100 points each).
10% Research SummaryReading and summary report on one research paper (100 points).
30% ProjectOne ungraded but required assignment, one mid-term project review and demo(100 points), and one final project review and demo(200 points).
25% ExamsOne Mid-term (100 points) and one Final (150 points) exam.
5% ParticipationParticipation (i.e. not just "attendance") in classroom discussions and online forum for the course (50 points).

Academic Integrity

Students must uphold IISc's Academic Integrity guidelines. While these are common sense, it is helpful to review them since failure to follow them will lead to sanctions and penalties. This includes a reduced or failing grade in the course. Severe cases of academic violations will be reported to the Institute and may lead to an expulsion.

Learning takes place both within and outside the class. Hence, discussions between students and reference to online material is encouraged as part of the course to achieve the intended learning objectives. However, while you may learn from any valid source, you must form your own ideas and complete problems and assignments by yourself. All works submitted by the student as part of their academic assessment must be their own.

Plagiarism
Verbatim reproduction of material from external sources (web pages, books, papers, etc.) is not acceptable. If you are paraphrasing external content (or even your own prior work) or were otherwise influenced by them while completing your assignments, projects or exams, you must clearly acknowledge them. When in doubt, add a citation!
Cheating
While you may discuss lecture topics and broad outlines of homework problems and projects with others, you cannot collaborate in completing the assignments, copy someone else's solution or falsify results. You cannot use notes or unauthorized resources during exams, or copy from others. The narrow exception to collaboration is between team-mates when competing the project, and even there, the contribution of each team member for each project assignment should be clearly documented.
Classroom Behavior
Ensure that the course atmosphere, both in the class, outside and on the online forum, is conducive for learning. Participate in discussions but do not dominate or be abusive. There are no “stupid” questions. Be considerate of your fellow students and avoid disruptive behavior.

Resources

TextbookSelect topics from Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, Kai Hwang, Jack Dongarra and Geoffrey Fox, Morgan Kaufmann, 2011 (Tata Book House)
Online Forumse252.jan15@mailman.serc.iisc.in | Mailman Info Webpage (To Be Activated)

Teaching & Office Hours

LectureTue/Thu 2-330PM, SERC 202
Office HoursFri 4-5PM or by appointment (i.e., send email), SERC 411

Tentative Schedule

Schedule is based on two 1.5 hour lectures on Tue/Thu 2-330PM each week, and 3 hours of independent practical exercise.

Lecture No.DateTopics Covered & AssignmentsSlides
1Tue 6 JanCourse Introduction
Assignment: Sign up on mailing list.
L1
2Thu 8 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (Web Services & SOA)L2
3Tue 13 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/PaaS/SaaS)
Project 0 available.
L3
4Tue 20 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (Virtualization)L4
5Thu 22 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/AWS)
Project 0 submission due.
Project topics available.
L5/6
6Tue 27 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (IaaS/AWS/OpenStack)
Homework A Available
L5/6
7Thu 29 JanILO 2: Cloud Virtualization, Abstractions and Enabling Technologies (PaaS)L7
8Fri 30 JanILO 1: Parallel and Distributed Systems Context (Flynn's Taxonomy)L8+9
9Tue 3 FebILO 1: Parallel and Distributed Systems Context (Distributed Comp Models) L8+9
10Thu 5 FebILO 1: Parallel and Distributed Systems Context (Scalability Metrics)
Project topics and teams decided.
Research paper list available
Homework A Submission Due on Feb 6.
L10
11Tue 10 FebILO 3: Algorithms and Programming Patterns for Cloud Applications (How to review a paper?
Task, Data and Pipeline Parallelism)
Research Paper Selection Completed by Thu 12 Feb.
L11+12
*Thu 12 FebOpenStack and Scheduling (Vedsar)OpenStack
12Thu 19 FebILO 3: Algorithms and Programming Patterns for Cloud Applications (Task, Data and Pipeline Parallelism)L11+12
13Tue 24 FebILO 3: Algorithms and Programming Patterns for Cloud Applications (Map-Reduce and Hadoop/HDFS)L13+14
14Thu 26 FebILO 3: Algorithms and Programming Patterns for Cloud Applications (Map-Reduce and Hadoop/HDFS) L13+14
15Tue 3 MarILO 3: Algorithms and Programming Patterns for Cloud Applications (Graph Analytics and Giraph) L15+16
*Thu 5 MarMid-term Exam
16Tue 10 MarILO 3: Algorithms and Programming Patterns for Cloud Applications (Graph Analytics and Giraph)
Research Paper mid-term draft submission due.
L15+16
17Thu 12 MarILO 4: Application Execution Models on Clouds (Cloud Scheduling Characteristics)
Project Mid-term Report Submission Due.
L17
*Fri 13 MarProject Mid-term Review and Demo from 3-6PM.
18Tue 17 MarILO 4: Application Execution Models on Clouds (List Scheduling and DAG Scheduling) L18+19
19Thu 19 MarILO 4: Application Execution Models on Clouds (List Scheduling and DAG Scheduling) L18+19
20Tue 24 MarILO 4: Application Execution Models on Clouds (Dynamic Scheduling) L20
21Tue 31 MarILO 5: Performance, scalability and consistency on Clouds (Cloud Performance Benchmarks & Monitoring)
Homework B Available
L21+22
22Tue 7 AprILO 5: Performance, scalability and consistency on Clouds (Cloud Performance Benchmarks & Monitoring) L21+22
23Thu 9 AprILO 5: Performance, scalability and consistency on Clouds (CAP Theorem)
Homework B Due on Fri 10 Apr
L23+24
24Tue 14 AprILO 5: Performance, scalability and consistency on Clouds (CAP Theorem) L23+24
25Thu 16 AprILO 5: Performance, scalability and consistency on Clouds (BASE & Weak Consistency)
Homework C Available. Due Sat 25 Apr.
L25
*Thu 23 AprFinal Research Paper Due.
Final Project Report Submission Due.
*Fri 24 AprFinal Project Review and Demo between 3-6PM.
*Sat 25 AprHomework C Submission Due
*Mon 27 AprFinal Exam from 2-5PM

Assignments

All assignments, unless noted otherwise, are due by midnight on the mentioned date.

Homeworks

Email your homework to simmhan@serc.iisc.in with subject line "SE252_JAN2015_HW-A_StudentName". Replace HW-A with the "HW-B" and "HW-C" for those submissions, and "StudentName" with your first name.

  1. Homework A (Updated) has been posted on Fri 30 Jan, due on Fri 6 Feb.
  2. Homework B has been posted on Tue 31 Mar, due on Fri 10 Apr.
  3. Homework C has been posted on Thu 16 Apr, due on Sat 25 Apr.

Projects

Email your project submissions to simmhan@serc.iisc.in with subject line "SE252_JAN2015_PROJ-0_StudentName". Replace PROJ-0 with the "PROJ-MIDTERM" and "PROJ-FINAL" for those submissions.

Revised Project 0 is posted. Revised startup code to begin Project 0 is available. Project 0 is due on Thu 22 Jan.

List of Projects: Students can also propose a different topic.
Topic and team (max. 2 students) selection due on Thu 5 Feb.
Mid-term project report due on Thu 12 Mar.
Mid-term project review and demo on Fri 13 Mar from 3-6PM.
Final project report due on Thu 23 Apr.
Final project review and demo on Fri 24 Apr from 3-6PM.
ID	Type	        Title					Team
---	-------         ---------------------------------------	-------------------------
1	App-aaS		IISc Campus Map using OpenStreetMaps	Arnab Sen, Chetan Mahajan
2	PaaS		Edge+Cloud CEP Processing for IoT 	Niranjan Singh
3	IaaS		USB Cloud Simulator for AWS		Diptaparna Biswas
4	PaaS		Online Analytics/Viz on Storm+Hive	Vamshi, Anshu
5	Analytics-aaS	Time-series graph algorithms using NELL	Varshitha, Ravikant
			

Research Summary

Email your research summary to simmhan@serc.iisc.in with subject line "SE252_JAN2015_RES-MIDTERM_StudentName". Replace RES-MIDTERM with the "RES-FINAL" for that submission.

Research paper list assigned on Tue 17 Feb.
Research Paper mid-term draft submission is due on Tue 10 Mar.
Final Research Paper Summary due by Thu 23 Apr.
[1] Arnab Sen:
Applications of Social Networks and Crowdsourcing for Disaster Management Improvement, Besaleva, L.I., Weaver, A.C., International Conference on Social Computing (SocialCom), 2013, http://dx.doi.org/10.1109/SocialCom.2013.38 

[2] Chetan Mahajan:
Jie Li, Marty Humphrey, Deborah A. Agarwal, Keith R. Jackson, Catharine van Ingen, Youngryel Ryu:
eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform. IPDPS 2010, http://dx.doi.org/10.1109/IPDPS.2010.5470418 

[3] Niranjan Singh:
CloneCloud: elastic execution between mobile device and cloud, Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti, In Conference on Computer systems (EuroSys), 2011, http://doi.acm.org/10.1145/1966445.1966473 

[4] Diptaparna Biswas:
CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Rodrigo N. Calheiros, Rajiv Ranjan2, Anton Beloglazov1, Cesar A. F. De Rose3 and Rajkumar Buyya1, Software: Practice and Experience, Volume 41, Issue 1, pages 23-50, January 2011, http://dx.doi.org/10.1002/spe.995 

[5] Vamshi:
Meteor Shower: A Reliable Stream Processing System for Commodity Data Centers, Huayong Wang, Li-Shiuan Peh ; Koukoumidis, E. ; Shao Tao ; Mun Choon Chan, IPDPS, 2012. http://dx.doi.org/10.1109/IPDPS.2012.108 

[6] Anshu:
Christopher Olston, Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B.N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, Topher ZiCornell, and Xiaodan Wang. 2011. Nova: continuous Pig/Hadoop workflows. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11), http://dx.doi.org/10.1145/1989323.1989439 

[7] Varshitha:
Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). http://dx.doi.org/10.1145/2465351.2465369 

[8] Ravikant:
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. In Proceedings of the 39th international conference on Very Large Data Bases (PVLDB'13), http://dx.doi.org/10.14778/2535570.2488333 

Acknowledgement

The course syllabus has been designed based on the Curriculum Initiative on Parallel and Distributed Computing by the NSF/IEEE-TCPP, and the Computer Science Curricula 2013, by the ACM/IEEE-Computer Society's Joint Task Force on Computing Curricula.