Yogesh Simmhan is an Associate Professor in the Department of Computational and Data Sciences and a Swarna Jayanti Fellow at the Indian Institute of Science, Bangalore. His research explores scalable software platforms, algorithms and applications on distributed systems. These span Cloud and Edge Computing, Temporal Graph Processing, and Distributed storage and machine learning to support emerging Big Data and Internet of Things (IoT) applications. He has published over 100 peer-reviewed papers, and won the Best Paper Award at IEEE International Conference on Cloud Computing (CLOUD) 2019, IEEE TCSC SCALE Challenge Award in 2019 and 2012, the Distinguished Paper award at EuroPar 2018, and the IEEE/ACM Supercomputing HPC Storage Challenge Award in 2008. He is the recipient of the IEEE TCSC Award for Excellence in Scalable Computing (Mid Career Researcher) in 2020. He is an Associate Editor-in-Chief of the Journal of Parallel and Distributed Systems (JPDC), an Associate Editor of Future Generation Computing System (FGCS), and earlier served as an Associate Editor of IEEE Transactions on Cloud Computing and a member of the IEEE Future Directions Initiative on Big Data.

Yogesh has a Ph.D. in Computer Science from Indiana University, Bloomington, and was previously a Research Assistant Professor at the University of Southern California (USC), Los Angeles, and a Postdoc at Microsoft Research, San Francisco. He is a Senior Member of the IEEE and the ACM.

My research is on distributed and scalable data platforms to support Big Data, Internet of Things (IoT), UAV and computer vision applications on novel computing infrastructure, such as Clouds and Edge devices. I lead the DREAM:Lab - Distributed Research on Emerging Applications and Machines - at CDS.

We have open positions for Ph.D. students, postdocs and Research and Development staff in our group to work on some of these exciting projects! Candidates should have expertise in Big Data platforms, Edge/Cloud Computing and Applied Machine Learning, with strong programming, algorithms and systems skills. Research students need to apply to the research degree admissions at the CDS department at IISc, and choose the DREAM:Lab as one of your lab choices. See here for staff position details.

Some active research areas are:

  • Temporal Graphs: Platforms, Algorithms and Analytics Graphs that show structure and properties variation across time are common, yet less examined in literature.
    •   We have developed a novel Interval-centric Computing Model (ICM) [ICDE2020] that allows time-respecting and time-independent algorithms to be defined over temporal graphs. Graphite is its scalable implementation over Apache Giraph. Over 10 graphs algorithms have been mapped to ICM, and Graphite scaled to graphs with over 130M interval vertices and 5.5B interval edges on a 8-node commodity cluster.
    •   We have recently explored low-latency path queries over temporal property graphs, which has been published as the Granite system [CCGRID2020,JPDC2021], with a novel query cost model to optimize distributed execution.
    •   There are several new and ongoing projects related to temporal graphs: scalable training of Graph Neural Networks (GNN), incremental computing over temporal and streaming graph updates, memory-efficient out-of-core and window-based graph processing, streaming partitioning of large graphs to conserve local community structures, and temporal graph centrality methods to identify high risk population using COVID-19 contact trace networks as part of the GoCoronaGo project.
    •   We are also exploring high-performance temporal graph analytics as part of the National Supercomputing Mission, jointly with IIT-H and IIIT-H, with an emphasis on parallel algorithmic patterns and application resiliency.
    •   In the past, we have also examined the use of cloud elasticity to scale graph processing [CLOUD2019] and subgraph-centric processing of temporal graphs [IPDPS2015], besides a survey on scalable graph processing frameworks [CSUR2018].
  • Distributed Analytics and Storage on Edge, Cloud and Drones Edge and Fog computing resources are an emerging computing paradigms, with their availability growing as part of IoT deployments. Edge devices like Jetson also have on-board GPU accelerators. Our research explores analytics and storage platforms over edge, fog and cloud resources, including fleets of drones/UAVs.
    •   Federated learning over edge devices is an important problem, given the wide-spread availability of accelerated edge and mobile/smart-phone devices, and their collocation with video data sources. Our emphasis on the systems aspect of federated learning, such as the scheduling and orchestration of the deep models to efficiently utilize 100s of edge devices and accelerators in a wide-area network to trade-off accuracy, resiliency performance, and privacy.
    •   Anveshak is a domain specific model and platform for distributed video analytics, which trades-off scalability, accuracy and latency when running DNN models on edge, fog and cloud resources[TPDS2021]. It won the IEEE TCSC SCALE challenge in 2019 [SCALE2019].
    •   An active area of interest is on computing, data management and scheduling for autonomous aerial vehicles (UAV) or Drones [INFOCOM2021]. Open problems include UAV routing for complex missions; where to schedule machine learning models for execution across UAV and backend; balancing compute, network and energy capacity against application deadlines in the context of 5G communications; and computer vision and tracking algorithms to use drones to assist the visually impaired.
    •   ElfStore is a distributed storage platform for the edge, that is designed based on P2P and HDFS concepts [ICWS2019]. We are currently examining consistency models, caching strategies and mobility of edge devices for distributed storage. We also also exploring storing and querying over time-series data using distributed edge devices [EUROPAR2020].
    •   With the growing availability of large-scale video data from city-scale camera networks, drone cameas and intelligent deep models to perform inferencing over them, there is a critical need for NoSQL databases to manage large video respoitories. We are exploring distributed video storage and querying systems with native query capabilities for inferencing using DNNs and spatio-temporal characteristics and in a privacy-preserving manner. These should also leverage edge accelerators that may be available, with trade-offs between a priori indexing and inferencing at ingest time, and on-demand inferencing at query time.
    •   Platforms for large IoT and edge deployments are difficult to validate due to lack of access to edge clusters with 1000s of devices. We developed the VIoLET container-based emulation environment for deploying large-scale edge and fog testbeds on which to validate these platforms [EUROPAR2018,TCPS2021]. We are extending this to Ultra-VIoLET, which will support diverse network configuration, device mobility and energy constraints, and coupling the computing and network models with physical system simulators such as Gazego and SUMO for drone and vehicle mobility.
    • I coordinate the IBM-IISc Hybrid Cloud lab, a collaboration between faculty at IISc and researchers at IBM to explore the role of AI and verification in the efficient management of distributed information, data center operations and microservices within hybrid cloud and edge.
    •   In the past, we have also examined dataflow execution engines [ICSOC2017], dataflow scheduling [TCPS2017,CCGRID2018] and have a survey article on scheduling on edge, fog and cloud resources [SPE2019].
  • Scalable Data Management and Analytics for Science and Society We engage with our science and engineering collaborators on multi-disciplinary projects of social and scientific impact.
    •   In this era of COVID-19, our team has developed the GoCoronaGo Contact Tracing App for federated collection of Bluetooth-based proximity data at the institutional scale [JIISC2020]. Various temporal graph techniques are used to assign contact risk scores it users, to help with preventive measures and to perform digital contact tracing if a COVID case is found. This is being deployed at the IISc campus.
    •   SATVAM is a Indo-US project on low-cost air quality monitoring in urban spaces, with IIT-K, IIT-B and Duke University. Our group is examining means for autonomous monitoring and management of the IoT fabric, and machine learning models to enhance the calibration of low-cost commodity sensors to enhance their accuracy [ESCIENCE2019,AMT2021].
    •   The Genome India Project is a new pan-India initiative for next generation genome sequencing of 20,000 subjects. We are part of a 20+ consortium, led by the Center for Brain Research at IISc. We are investigating reliable, scalable and affordable storage and management of the sequencing data, and graph-based analytics over it [HIPCW2019].
    •   EQWATER is a project supported by the IMPRINT program to ensure equitable supply of water in mega-cities. We are exploring network-alaytics for optimizing supply schedules and management of data from field devices. In the past, we have proposed an IoT software architecture for data-driven smart city utilities [SPE2018].

ORCID: 0000-0003-4140-7774 | Google Scholar | DBLP

Recent publications since 2019 are listed below. See here for all publications

  1. Shriram Ramesh, Animesh Baranawal, and Yogesh Simmhan Granite: A Distributed Engine for Scalable Path Queries over Temporal Property Graphs, Journal of Parallel and Distributed Computing (JPDC), Vol. 151, Pages 94-111, May 2021, 10.1016/j.jpdc.2021.02.004, (CORE A*)
  2. Aakash Khochare, Aravindhan Krishnan, and Yogesh Simmhan A Scalable Platform for Distributed Object Tracking across a Many-camera Network, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 32, Pages 1479-1493, June 2021, 10.1109/TPDS.2021.3049450 (CORE A*)
  3. Aakash Khochare, Yogesh Simmhan, Francesco Betti Sorbelli and Sajal K. Das Heuristic Algorithms for Co-scheduling of Edge Analytics and Routes for UAV Fleet Missions, IEEE International Conference on Computer Communications (INFOCOM), 2021 (CORE A*, To Appear)
  4. Shrey Baheti, Parwat Singh Anjana, Sathya Peri and Yogesh Simmhan, DiPETrans: A Framework for Distributed Parallel Execution of transactions of Blocks in Blockchain, Concurrency and Computation: Practice and Experience, 2021 (To Appear)
  5. Shrey Baheti, Shreyas Badiger, and Yogesh Simmhan VIoLET: An Emulation Environment for Validating IoT Deployments at Large-Scales, ACM Transactions on Cyber Physical Systems (TCPS), 5(3), 2021, 10.1145/3446346
  6. Amrita Namtirtha, Animesh Dutta, Biswanath Dutta, Amritha Sundararajan and Yogesh Simmhan Best Influential Spreaders Identification Using Network Global Structural Properties, Nature Scientific Reports, 2021
  7. Manoj K Agarwal, Animesh Baranawal, Yogesh Simmhan, Manish Gupta, Event Related Data Collection from Microblog Streams, International Conference on Database and Expert Systems Applications (DEXA), 2021, 10.1007/978-3-030-86475-0_31
  8. Ravi Sahu, Ayush Nagal, Kuldeep Kumar Dixit, Harshavardhan Unnibhavi, Srikanth Mantravadi, Srijith Nair, Yogesh Simmhan, Brijesh Mishra, Rajesh Zele, Ronak Sutaria, Vidyanand Motiram Motghare, Purushottam Kar, and Sachchida Nand Tripathi Robust statistical calibration and characterization of portable low-cost air quality monitoring sensors to quantify real-time O3 and NO2 concentrations in diverse environments, Atmospheric Measurement Techniques (AMT), 14, 37–52, 2021, 10.5194/amt-14-37-2021
  9. Prateeksha Varshney and Yogesh Simmhan, Characterizing Application Scheduling on Edge, Fog and Cloud Computing Resources, Software: Practice and Experience , 50 (5) , 2020 , pp. 558-595
  10. Yogesh Simmhan, Tarun Rambha, Aakash Khochare, Shriram Ramesh, Animesh Baranawal, John Varghese George, Rahul Atul Bhope, Amrita Namtirtha, Amritha Sundararajan, Sharath Suresh Bhargav, Nihar Thakkar and Raj Kiran, GoCoronaGo: Privacy Respecting Contact Tracing for COVID-19 Management , Journal of the Indian Institute of Science, Vol. 100, 2020, doi:10.1007/s41745-020-00201-5
  11. Shriram Ramesh, Animesh Baranawal and Yogesh Simmhan, A Distributed Path Query Engine for Temporal Property Graphs , IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) , 2020 , pp. 499-508 [CORE A]
  12. Swapnil Gandhi and Yogesh Simmhan, An Interval-centric Model for Distributed Computing over Temporal Graphs , IEEE International Conference on Data Engineering (ICDE) , pp. 1129-1140, 2020, 10.1109/ICDE48307.2020.00102, [CORE A*]
  13. Srikrishna Acharya, Amrutur Bharadwaj, Yogesh Simmhan, Aditya Gopalan, Parimal Parag and Himanshu Tyagi, CORNET: A Co-Simulation Middleware for Robot Networks , IEEE International Conference on COMmunication Systems & NETworkS (COMSNETS) , 2020 , pp. 245-251
  14. Dhruv Garg, Prathik Shirolkar, Anshu Shukla and Yogesh Simmhan, TorqueDB: Distributed Querying of Time-Series Data from Edge-local Storage, International Conference on Parallel and Distributed Computing (Euro-Par), Lecture Notes in Computer Science, vol 12247. Springer, 2020, 10.1007/978-3-030-57675-2_18 [CORE A]
  15. Simmhan, Y., Khochare, A., and Ramachandra, S.K., Chapter: Computing and storage models for edge computing, Edge Computing: Models, technologies and applications Book, 2020, IET, 10.1049/pbpc033e_ch6
  16. Rajkumar Buyya, Satish Narayana Srirama, Giuliano Casale, Rodrigo N. Calheiros, Yogesh Simmhan, Blesson Varghese, Erol Gelenbe, Bahman Javadi, Luis Miguel Vaquero, Marco A. S. Netto, Adel Nadjaran Toosi, Maria Alejandra Rodriguez, Ignacio Martin Llorente, Sabrina De Capitani di Vimercati, Pierangela Samarati, Dejan S. Milojicic, Carlos A. Varela, Rami Bahsoon, Marcos Dias de Assuncao, Omer Rana, Wanlei Zhou, Hai Jin, Wolfgang Gentzsch, Albert Y. Zomaya and Haiying Shen, A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade , ACM Computing Surveys (CSUR) , 51 (5) , 2019 , pp. 105:1-105:38 [CORE A*]
  17. Prateeksha Varshney and Yogesh Simmhan, AutoBoT: Resilient and Cost-effective Scheduling of a Bag of Tasks on Spot VMs , IEEE Transactions on Parallel and Distributed Systems (TPDS) , 30 (7) , 2019 , pp. 1512-1527 [CORE A*]
  18. Sumit Kumar Monga, R. Sheshadri K and Yogesh Simmhan, ElfStore: A Resilient Data Storage Service for Federated Edge and Fog Resources, IEEE International Conference on Web Services (ICWS) , 2019, pp. 336-345, 10.1109/ICWS.2019.00062 ([CORE A])
  19. Yogesh Simmhan, Chapter: Big Data and Fog Computing, Encyclopedia of Big Data Technologies , 2019 , Springer.
  20. Siddharth D. Jaiswal and Yogesh Simmhan, A Partition-centric Distributed Algorithm for Identifying Euler Circuits in Large Graphs , IEEE International Workshop on High-Performance Big Data, Deep Learning, and Cloud Computing (HPBDC), Co-located with IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 2019 , pp. 452-459
  21. Aakash Khochare and Yogesh Simmhan, A scalable and composable analytics platform for distributed wide-area tracking , ACM International Conference on Distributed Computing and Networking (ICDCN) , 2019 , pp. 506 [CORE B] (Extended Abstract)
  22. Ravikant Dindokar and Yogesh Simmhan, Adaptive Partition Migration for Irregular Graph Algorithms on Elastic Resources , IEEE International Conference on Cloud Computing (CLOUD) , 2019 , pp. 281-290 [CORE B] ([CORE B])
  23. Diksha Chaudhary, Bratati Kahali and Yogesh Simmhan, An Empirical Study on Efficient Storage of Human Genome Data , Women in Data Science and Computing Workshop, Co-located with IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC) , 2019 , pp. 87-92
  24. Aakash Khochare, Sheshadri Ramachandra, Shriram Ramesh and Yogesh Simmhan, Dynamic Scaling of Video Analytics for Wide-area Tracking in Urban Spaces , IEEE International Scalable Computing Challenge (SCALE), Co-located with IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) , 2019 , pp. 76-81 (SCALE Challenge Winner)
  25. Prithvi Alva, K. R. Sheetal Kumar, Yogesh Simmhan and M. S. Mohan Kumar, Enabling Equitable Water Supply in a Mega-city using a Big Data Analytics Platform , International Conference on Computing and Control for Water Industry (CCWI) , 2019 , pp. 1-2 (Extended Abstract)
  26. Yogesh Simmhan, Malati Hegde, Rajesh Zele, Sachchida N. Tripathi, Srijith Nair, Sumit K. Monga, Ravi Sahu, Kuldeep Dixit, Ronak Sutaria, Brijesh Mishra, Anamika Sharma and S. V. R. Anand, SATVAM: Toward an IoT Cyber-Infrastructure for Low-Cost Urban Air Quality Monitoring , IEEE International Conference on eScience (eScience) , 2019 , pp. 57-66
  27. Shilpa Chaturvedi and Yogesh Simmhan, Toward Resilient Stream Processing on Clouds using Moving Target Defense , IEEE International Symposium on Real-Time Distributed Computing (ISORC) , 2019 , pp. 134-142

See here for all publications

  • IEEE TCSC Award for Excellence in Scalable Computing (Middle Career Researcher), 2020 for contributions to "Big Data Platforms, Programming Models and Dataflow Scheduling on Distributed Systems"
  • Swarna Jayanti Fellowship, 2019-2024. "Scalable Management and Analytics of Temporal Graphs"
  • Best Paper Award, IEEE International Conference on Cloud Computing (CLOUD), 2019. "Adaptive Partition Migration for Irregular Graph Algorithms on Elastic Resources", Dindokar and Simmhan
  • IEEE SCALE Challenge. First Place, 2019. "Dynamic Scaling of Video Analytics for Wide-area Tracking in Urban Spaces", Khochare, et al.
  • EuroPar Distinguished Paper Award, 2018. "VIoLET: A Large-scale Virtual Environment for Internet of Things", Badiger, Baheti and Simmhan
  • IEEE HiPC Best Paper Finalist, 2018. "ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads", Jayanth Kalyanasundaram and Yogesh Simmhan
  • IEEE SCALE Challenge. First Place, 2012. "Adaptive Energy Forecasting and Information Diffusion for Smart Power Grids", Simmhan, et al.
  • Microsoft Ship-It Award, 2009. "Microsoft Trident Scientific Workflow Workbench", Barga, et al.
  • IEEE/ACM Supercomputing HPC Storage Challenge. First Place, 2008. "GrayWulf: Scalable Cluster Architecture for Data Intensive Computing", Szalay, et al.

Current Service

Recent Past Service

The primary course I teach is DS256: Scalable Systems for Data Science (3:1), being offered in the Jan semester starting from 2016 at the CDS department. It is a soft-core course for the M.Tech.(CDS) program. The course covers platforms and tools required for developing algorithms, and programming and analyzing Big Data. A major programming project is an essential part of the course, with students working over real-world, large datasets, and using Big Data platforms at scale.

I co-teach the DS221: Introduction to Scalable Systems (3:0), which is a core-course for the M.Tech.(CDS) program. It blends various systems concepts for students with a non-computer science under-graduate major, and introduces architecture, operating systems, data structures, algorithms and programming. It also includes more advanced topics on parallel computing and Big Data platforms.

I will be teaching the Data Engineering at Scale core course as part of the new M.Tech. in Data Science and Business Analytics (DSBA) program starting in Aug, 2021, as part of IISc's push towards online degrees targetted at industry professionals. The course will train students in using Big Data platforms to acquire, manage, process and derive insights from large-scale, fast and linked data, while understading the core distributed systems principles that make these platforms work.

I give lectures on data engineering, Cloud and IoT topics as part of several online certification programs jointly conducted by IISc and TalentSprint, including Computational Data Science, Digital Health and Imaging and Deep Learning: Foundations and Applications.

Earlier, I taught the DS286: Data Structures and Programming (2:1) core course in the Aug semester, sometimes with Prof. Venkatesh Babu. I also co-taught the SE292: High Performance Computing (3:0) core course in the Aug 2014 semester, along with Prof. Govindarajan. Both of these have been discontinued, and their topics absorbed into DS221.

Previously, I offered the SE252: Introduction to Cloud Computing (3:1) as an elective course in the Aug semester. The course covers topics on parallel and distributed computing; IaaS/PaaS/SaaS Clouds; Big Data processing patterns on Clouds; Runtime execution models on Clouds; and Performance evaluation of Cloud applications. Some of these topics are subsumed into DS256.

Current Students

  • Aakash Khochare Ph.D., CDS (2016 - Present)
  • Animesh Baranawal M.Tech.(Research), CDS (2019-present)
  • Bharati Khanijo Ph.D., CDS (2019 - Present)
  • Prashanthi S.K. Ph.D., CDS (2020 - Present)
  • Srikrishna Acharya Ph.D., RBCCPS, jointly with Prof.Bharadwaj Amrutur (2017 - Present)
  • Suman Raj Ph.D., CDS (2020 - Present)
  • Varad Vinod Kulkarni Ph.D., CDS (2021 - Present)

Current Staff

  • Amrita Namtirtha Postdoc Researcher (2020 - Present)
  • Deepsubhra Guha Roy IOE Postdoc Researcher (2021 - Present)
  • Akarsh Chaturvedi Project Staff (2021 - Present)
  • Badri Narayanan Project Staff (2021 - Present)
  • Harshil Gupta Project Staff (2021 - Present)
  • Koustav K. Mondal Project Staff (2021 - Present)
  • Rounaq Choudhuri Project Staff (2021 - Present)
  • Sai Anuroop Kesanapalli Project Staff (2021 - Present)
  • Tuhin Khare Project Staff (2020 - Present)

Lab Alumni

The last known affiliation of the lab alumnus is provided
  • Sunny Anand M.Tech.(CDS), 2021
  • Swapnil Gandhi M.Tech.(Research), 2020, Microsoft Research
  • Siddharth Jaiswal M.Tech.(Research), 2020, Ph.D. Student, IIT, Kharagpur
  • Shayal Chabbra M.Tech.(Research), 2020, Microsoft
  • Shriram Ramesh M.Tech.(CDS), 2020, Wells Fargo
    • IISc Motorola Medal for Best CDS M.Tech.(CDS) Thesis (2020)
  • Prateeksha Varshney M.Tech.(Research), 2019, Microsoft
    • CDS Honorable Mention for M.Tech.(Research) Thesis (2019)
  • Shilpa Chaturvedi M.Tech.(Research), 2019, NetApp ATG
  • Shrey Baheti M.Tech.(CDS), 2019, Cargill
  • Nashez Zubair M.Tech.(CDS), 2019, Blaize
  • Anshu Skukla M.Sc.(Engg.), 2018, Microsoft
    • IISc NetApp Medal for Best CDS M.Sc.(Engg.) Thesis (2019)
  • Ravikant Dindokar M.Sc.(Engg.), 2018), VMWare
  • Abhilash Sharma M.Sc.(Engg.), 2018, SkyPoint Cloud
  • Siva Prakash Reddy Komma M.Tech.(CDS), 2018, Oracle
  • Rajrup Ghosh M.Tech.(CDS), 2017, Ph.D. Student, USC, Los Angeles
    • IISc Motorola Medal for Best CDS M.Tech.(CDS) Thesis (2017)
  • Neel Choudhury M.Tech.(CP), 2015, Google
    • IISc Motorola Medal for Best CDS M.Tech.(CP) Thesis (2015)
  • Tarun Sharma M.Tech.(CP), 2015, Nvidia
  • Vedsar Kushwaha M.Tech.(CP), 2015, Amazon Web Services

Yogesh has been the recipient on a number of sponsored research grants from agencies of the Government of India, including Ministry of Electronics and Information Technology (MeitY), Ministry of Education (MOE/MHRD), Department of Science and Technology (DST) and Department of Biotechnology (DBT). He has also received funding from the Indo US Science and Technology Forum (IUSSTF). He has been an investigator on proposals cumulatively funded for over INR 130 Million (USD 1.75 Million) at IISc. In the past, he has received grants from the US NSF, DARPA and DOE.

He also actively collaborates with the industry, and is grateful for faculty fellowships, unrestricted grants, Corporate Social Responsibility (CSR) awards, and Cloud credits received from various corporations such as Microsoft, IBM Research, Facebook, VMWare, Accenture, NetApp ATG, Huawei, AWS, TechMahindra, etc. that support his lab's research activities over the years.