Abstract: Most programmers are used to writing programs on their desktops or laptops (or packaging software so that it runs on customer desktops). Some programmers are used to writing programs that run on severs in their enterprises to serve webpages, work with databases, and help manage information systems of organizations. The cloud is a relatively recent phenomenon, where a large amount of cheap computational resources are available on demand, and programmers have to judiciously use these resources to achieve their objectives. In addition to being a new programming environment, the cloud has also popularized a new breed of programs. Most programs that run in the cloud operate on massive amounts of information (called "big data"ť), and perform data analysis and machine learning as integral parts of what they do. So, if you are a programmer used to programming individual machines, how do you program the cloud? What is different about the cloud? How do you deal with the fact that machines and disks can fail in the cloud? How do you deal with performance issues in the cloud? How do you deal with securing your data and your customers' data in the cloud? How do you deal with accidental or malicious disclosure of private information of consumers in the cloud? Do we have the right programming languages to deal with these issues? I will give an overview of these topics, and describe several projects at Microsoft Research that are trying to answer some of these questions.
Abstract: With the astonishing speed of cloud adoption, virtualization technology has received a massive shot in the arm as it forms the backbone of cloud data centers. While virtualization has the obvious benefits of allowing heterogeneity of operating systems while permitting cloud managers to utilize physical machines to their fullest, it has created its own set of issues. In this talk we explore problems with virtualization and its usage in cloud data centers that we are working on at IIT Bombay. We broadly classify the work into 4 categories: (a) Hypervisor measurement/characterization and improvements - where we look into issues such as memory sharing between VMs for memory overprovisioning and I/O multiplexing for I/O scalability in virtualized environments. (b) VM Provisioning, Placement and Migration - where we explore issues in power aware provisioning of virtual machines, balanced placement, migration models and the tradeoffs between migration and replication of virtual machines. (c) Performance and Availability modeling and Simulation of cloud environments and (d) Cloud costing and Engineering issues in building and managing large clouds where we look at overheads of cloud managers such as OpenStack and CloudStack.
Abstract: Irregular algorithms, that is, algorithms exhibiting data-dependent and therefore statically unpredictable control flow and memory access patterns, are ubiquitous in many problem areas such as social network analysis and machine learning. Such algorithms typically operate on dynamic data structures and are challenging to parallelize. There is growing interest in using GPUs for irregular algorithms and past work has demonstrated their efficacy. However, the need for processing larger graphs poses limitations to existing techniques which implicitly assume that the complete graph resides in the main memory of the system. In this work, we deal with processing graphs that do not fit into GPU memory and develop out-of-core techniques to efficiently model the processing. Parallelizing an irregular algorithm on GPUs is quite challenging and making such an algorithm out-of-core only exacerbates the challenge. Therefore, we propose a domain-specific language to synthesize out-of-core irregular computations for GPUs. The synthesizer takes as input a high-level description of an irregular algorithm and a scheduling policy to efficiently generate CUDA code for out-of-core execution of the algorithm on the GPU. We demonstrate the expressiveness of the synthesizer by automatic generation of out-of-core computations for four graph algorithms, and illustrate that the generated code performs very close to the hand-tuned out-of-core versions.
The following Big Data Sessions are facilitated by Prof.Dinakar Sitaram (PESIT) and Prof.S.Pyne (C.R.Rao Institute).
Abstract: The Internet of Things (IoT) envisions a highly networked future where every object is integrated to interact with each other, allowing for communications between objects, as well as between humans and objects. IoT is rapidly transforming our lives by deeply affecting every industry—from manufacturing, logistics, supply chain, healthcare, buildings, transportation to telecommunications. Large data volumes from IoT will drive radical changes will require new Big Data strategies. The quicker enterprises can start analyzing their data the more business value they can derive. Data to decisions is about moving from insight to action and moving to fact based decisions making at all levels of the organization. In this talk, I will enumerate the opportunities, challenges and first solutions for capturing, organizing and analyzing the vast streams of data that will result from the Internt of Things.
Abstract: We will explore how Spark's approach is different than other MapReduce technologies like Hadoop and Storm. Spark gives us a comprehensive, unified framework to manage big data that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data). Spark lets you quickly write applications in Java, Scala, or Python. Spark, in addition to MapReduce operations, supports SQL queries, streaming data, machine learning and graph data processing. In this demo, we will briefly cover each of the core concepts and a brief operation using various examples.
Abstract: This talk addresses the well-known challenge involved in simultaneously delivering high productivity and high parallel performance on modern multicore architectures -- through the development of domain-specific languages (DSLs) and their optimizing code generators. It presents the domain of Image Processing Pipelines as the motivating case by presenting PolyMage, our DSL and its code generator for automatic and effective optimization of image processing pipelines. PolyMage takes an image processing pipeline expressed in a high-level language (embedded in Python) and generates an optimized C/C++ implementation of the pipeline. We show how certain techniques including those based on the polyhedral compiler framework need to be specialized in order to provide significant improvements in parallel performance over existing approaches including manual, library-based, and that of another state-of-the-art DSL (Halide). More information at http://mcl.csa.iisc.ernet.in/polymage.html Experimental results on a modern multicore system and a short demo will also be presented.
Abstract: This is a foundation level workshop aimed at early career researchers who have just begun, or aspire to enter the world of scholarly publishing. The workshop provides background information on academic publishing. It outlines the various important steps that, as an Author, you need to follow in preparing your manuscript for a successful publication. It will also provide advice about how to properly structure your article and the importance of using proper scientific language in a manuscript. The significant points on the key area of plagiarism are also highlighted, making sure you know all there is and to prevent the rules and regulations from being broken and harming your work. There are more than one million scientific articles are published every year. With that in mind, it is increasingly important for researchers to find efficient and impactful ways to make research stand out from this growing crowd. This workshop also provides information on the best tools to use both from Elsevier and from the industry.
Prof. Viktor Prasanna will offer comments to prospective authors from the perspective of the Editor in Chief (EIC) of JPDC.
Abstract: In this talk we will review memory hierarchy in multi-core architectures and discuss a few recent research topics. The talk will cover issues relating to shared last-level cache, stacked DRAM cache, hybrid on-chip and off-chip memory architecture for multi-core systems. It will also discuss research issues in memory scheduling and meta-data checking in DRAM-caches.
Abstract: Aadhaar is the world's largest biometric identity system with 840 million residents enrolled and over 1 million news residents enrolled each day. Aadhaar is designed to handle 100 million authentication per day. This talk will begin by describing the Aadhaar ecosystem and the services provided by Aadhaar. The architecture, design principles and technologies used in designing the Aadhaar system will be explained. The challenges faced when designing and operating a large scale system of this nature will also be discussed.
NOTE : Registration is not complete and accepted until you receive a confirmation email. Participants will be responsible for making travel and lodging arrangements. However, limited accommodation is available in IIIT on a request basis. Please send an e-mail to suresh.purini@iiit.ac.in for any queries.