BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:131@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20250703T120000
DTEND;TZID=Asia/Kolkata:20250703T130000
DTSTAMP:20250701T172931Z
URL:https://cds.iisc.ac.in/events/seminar-cds-102-july-03rd-1200-serving-m
 odels-fast-and-slow-optimizing-heterogeneous-llm-inferencing-workloads-at-
 scale/
SUMMARY:Cancelled: {Seminar} @ CDS: #102\, July 03rd\, 12:00: "Serving Mode
 ls\, Fast and Slow: Optimizing Heterogeneous LLM Inferencing Workloads at 
 Scale"
DESCRIPTION:The following seminar stands cancelled as the speaker is unabl
 e to give the talk due to medical emergency. We apologize for the inconve
 nience caused.\n\n\n\nCDS Systems + ML Seminar Series\n\n\n\nSpeaker : Anj
 aly Parayil\nAffiliation : Microsoft M365 Research\nTitle : Serving Models
 \, Fast and Slow: Optimizing Heterogeneous LLM Inferencing Workloads at Sc
 ale\nDate &amp\; Time : July 3\, 2025 (Thursday)\, 12:00 Noon\nVenue : #10
 2\, CDS Seminar Hall\n\n\n\nABSTRACT\n\nManaging inference workloads for L
 arge Language Models (LLMs) at scale demands balancing the diverse and oft
 en conflicting SLA requirements of latency-sensitive (e.g.\, chatbots) and
  latency-insensitive (e.g.\, report generation) tasks. In this talk\, I pr
 esent our recent work\, done in collaboration with Prof. Yogesh Simmhan\, 
 on LLM infrastructure optimization to address these challenges. We propose
  a comprehensive serving framework that dynamically adjusts to workload ch
 aracteristics using multi-timescale control knobs. Our system integrates s
 hort-term request routing with long-term GPU VM scaling and model placemen
 t\, formulating the resource allocation challenge as an Integer Linear Pro
 gramming (ILP) problem. Evaluations using real and simulated production re
 quests\, spanning three geographic regions and four open-source models\, d
 emonstrate up to 25% savings in GPU-hours and 80% reduction in scaling ove
 rhead.\n\nBIO: Anjaly is a Senior Researcher at M365 Research leading appl
 ied research at the intersection of efficiency and reliability of cloud se
 rvices. In particular\, she works at the intersection of machine learning 
 and systems to ensure continuous availability of cloud services as well as
  for the efficiency of Cloud infrastructure running various workloads\, in
 cluding the newly emerged Large Language Model workloads. Previously\, she
  served as a Postdoctoral Researcher at the US Army Research Laboratory’
 s Computational and Information Sciences Directorate\, specializing in rei
 nforcement learning and Bayesian inferencing. Anjaly earned her doctorate 
 from the Indian Institute of Science’s Department of Aerospace Engineeri
 ng\, with a thesis on uncertain systems and multi-agent control that rece
 ived the Prof. A. K. Rao Medal for Best Ph.D. Thesis. Her work has resulte
 d in over 25 publications and multiple patent filings.\n\n\n\nHost Faculty
 : Prof. Yogesh Simmhan\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Talks
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20240703T120000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR