CLOUD COMPUTING SEMINAR SERIES
Speaker : Divya Pathak, IBM India Research Lab.
Title : IT-Bench: Benchmarking LLM-Powered Agents for Intelligent IT Operations in Real-World Scenarios
Date & Time : April 04, 2025 (Friday), 02:00 PM
Venue : # 102, CDS Seminar Hall
Abstract: AI-driven automation is revolutionizing IT operations, with LLM-based agents managing incident diagnosis, localisation, and remediation. But how effective are these agents in real-world scenarios? In this talk, we introduce IT-Bench, an open-source benchmarking framework designed to evaluate LLM-powered agents for intelligent IT operations. Unlike traditional benchmarks that rely on static datasets, IT-Bench involves live system interactions, testing AI agents in dynamic IT environments with real failure scenarios. Built on cloud-native principles, IT-Bench seamlessly scales across Kubernetes-based infrastructures, allowing for rigorous, production-grade assessments across hundreds of real-world IT situations. The session will cover how real-world incident scenarios are created, test applications are deployed, and IT agents are benchmarked.
Bio: Divya Pathak is a Research Engineer at IBM Research, India, where she has contributed to observability, chaos engineering, and AIOps to develop resilient, adaptive, and self-healing systems. Her current research focuses on enhancing observability for multi-agent workflows. Before joining IBM Research, she completed her M.Tech (Research) in Computer Science and Engineering from IIT Hyderabad.
Pranjal Gupta is a Research Software Engineer at IBM Research India, specializing in artificial intelligence (AI) and machine learning (ML) for IT operations (AIOps). His research focuses on automating and optimizing IT workflows to enhance efficiency and minimize manual intervention.
Host Faculty: Prof. Yogesh Simmhan
About: The IBM-IISc Hybrid Cloud Lab (IIHCL) hosted at IISc is curating the Cloud Computing Seminar series with guest speakers from Industry and Academia speaking about the latest technologies and research on Cloud and edge computing, distributed computing systems, and AI/ML/Big Data platforms.
ALL ARE WELCOME