CDS Systems + ML Seminar Series
Speaker : Anjaly Parayil
Affiliation : Microsoft M365 Research
Title : Serving Models, Fast and Slow: Optimizing Heterogeneous LLM Inferencing Workloads at Scale
Date & Time : July 3, 2025 (Thursday), 12:00 Noon
Venue : #102, CDS Seminar Hall
ABSTRACT
Managing inference workloads for Large Language Models (LLMs) at scale demands balancing the diverse and often conflicting SLA requirements of latency-sensitive (e.g., chatbots) and latency-insensitive (e.g., report generation) tasks. In this talk, I present our recent work, done in collaboration with Prof. Yogesh Simmhan, on LLM infrastructure optimization to address these challenges. We propose a comprehensive serving framework that dynamically adjusts to workload characteristics using multi-timescale control knobs. Our system integrates short-term request routing with long-term GPU VM scaling and model placement, formulating the resource allocation challenge as an Integer Linear Programming (ILP) problem. Evaluations using real and simulated production requests, spanning three geographic regions and four open-source models, demonstrate up to 25% savings in GPU-hours and 80% reduction in scaling overhead.
BIO: Anjaly is a Senior Researcher at M365 Research leading applied research at the intersection of efficiency and reliability of cloud services. In particular, she works at the intersection of machine learning and systems to ensure continuous availability of cloud services as well as for the efficiency of Cloud infrastructure running various workloads, including the newly emerged Large Language Model workloads. Previously, she served as a Postdoctoral Researcher at the US Army Research Laboratory’s Computational and Information Sciences Directorate, specializing in reinforcement learning and Bayesian inferencing. Anjaly earned her doctorate from the Indian Institute of Science’s Department of Aerospace Engineering, with a thesis on uncertain systems and multi-agent control that received the Prof. A. K. Rao Medal for Best Ph.D. Thesis. Her work has resulted in over 25 publications and multiple patent filings.
Host Faculty: Prof. Yogesh Simmhan
ALL ARE WELCOME