BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.4.0.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:141@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20250905T140000
DTEND;TZID=Asia/Kolkata:20250905T150000
DTSTAMP:20250821T123704Z
URL:https://cds.iisc.ac.in/events/seminar-cds-102-september-05th-200-servi
 ng-models-fast-and-slow-optimizing-heterogeneous-llm-inferencing-workloads
 -at-scale/
SUMMARY:{Seminar} @ CDS: #102\, September 05th \, 2:00: "Serving Models\, F
 ast and Slow: Optimizing Heterogeneous LLM Inferencing Workloads at Scale"
DESCRIPTION:\n\nCDS Systems + ML Seminar Series\n\n\n\nSpeaker : Anjaly Par
 ayil\nAffiliation : Microsoft M365 Research\nTitle : Serving Models\, Fast
  and Slow: Optimizing Heterogeneous LLM Inferencing Workloads at Scale\nDa
 te &amp\; Time : September 5\, 2025 (Friday)\, 2:00 – 3:00 PM\nVenue : #
 102\, CDS Seminar Hall\n\n\n\nABSTRACT\n\nManaging inference workloads for
  Large Language Models (LLMs) at scale demands balancing the diverse and o
 ften conflicting SLA requirements of latency-sensitive (e.g.\, chatbots) a
 nd latency-insensitive (e.g.\, report generation) tasks. In this talk\, I 
 present our recent work\, done in collaboration with Prof. Yogesh Simmhan\
 , on LLM infrastructure optimization to address these challenges. We propo
 se a comprehensive serving framework that dynamically adjusts to workload 
 characteristics using multi-timescale control knobs. Our system integrates
  short-term request routing with long-term GPU VM scaling and model placem
 ent\, formulating the resource allocation challenge as an Integer Linear P
 rogramming (ILP) problem. Evaluations using real and simulated production 
 requests\, spanning three geographic regions and four open-source models\,
  demonstrate up to 25% savings in GPU-hours and 80% reduction in scaling o
 verhead.\n\nBIO: Anjaly is a Senior Researcher at M365 Research leading ap
 plied research at the intersection of efficiency and reliability of cloud 
 services. In particular\, she works at the intersection of machine learnin
 g and systems to ensure continuous availability of cloud services as well 
 as for the efficiency of Cloud infrastructure running various workloads\, 
 including the newly emerged Large Language Model workloads. Previously\, s
 he served as a Postdoctoral Researcher at the US Army Research Laboratory
 ’s Computational and Information Sciences Directorate\, specializing in 
 reinforcement learning and Bayesian inferencing. Anjaly earned her doctora
 te from the Indian Institute of Science’s Department of Aerospace Engine
 ering\, with a thesis on uncertain systems and multi-agent control that re
 ceived the Prof. A. K. Rao Medal for Best Ph.D. Thesis. Her work has resul
 ted in over 25 publications and multiple patent filings.\n\n\n\nHost Facul
 ty: Prof. Yogesh Simmhan\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Talks
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20240905T140000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR