BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:176@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20260121T110000
DTEND;TZID=Asia/Kolkata:20260121T120000
DTSTAMP:20260108T163117Z
URL:https://cds.iisc.ac.in/events/seminar-cds-102-january-21st-1100-enabli
 ng-determinism-in-llm-inference/
SUMMARY:{Seminar} @ CDS: #102\, January 21st: 11:00: "Enabling Determinism 
 in LLM Inference."
DESCRIPTION:Department of Computational and Data Sciences\nDepartment Semin
 ar\n\n\n\nSpeaker : Dr. Ashish Panwar\, Microsoft Research India\nTitle : 
 Enabling Determinism in LLM Inference\nDate &amp\; Time: January 21st\, 20
 26 (Wednesday)\, 11:00 AM\nVenue : # 102\, CDS Seminar Hall\n\n\n\nABSTRAC
 T\nIn LLM inference\, the same prompt may yield different outputs across d
 ifferent runs even when sampling hyper-parameters are fixed. At the system
  level\, this non-determinism stems from the non-associativity of floating
 -point arithmetic combined with dynamic batching\, as GPU kernels adapt th
 eir reduction strategies based on the batch size. A straightforward way to
  enforce determinism is to disable dynamic batching\, but this severely de
 grades throughput. Another approach is to make kernels batch-invariant\; h
 owever\, this tightly couples determinism to kernel design\, requiring spe
 cialized kernels that apply a universal reduction strategy to all tokens r
 egardless of batch size. This coupling also imposes fixed runtime overhead
 s\, regardless of how much of the workload actually requires determinism.\
 n\nI will present LLM-42\, an alternative approach to enable determinism i
 n LLM inference. LLM-42 is inspired by speculative execution and some inte
 resting properties of GPU kernel implementations. Our key observation is t
 hat determinism does not require a universal reduction strategy: it suffic
 es that each token position is decoded using a consistent reduction schedu
 le. Moreover\, most GPU kernels already use shape-consistent reductions. L
 everaging these observations\, LLM-42 decodes tokens along a non-determini
 stic fast path and enforces determinism via a lightweight verify–rollbac
 k loop. The verifier replays candidate tokens under a fixed-shape reductio
 n schedule\, commits those that are guaranteed to be consistent across run
 s\, and rolls back those violating determinism. By decoupling determinism 
 from kernel design\, LLM-42 achieves deterministic inference with unmodifi
 ed kernels and incurs overhead only in proportion to the traffic that requ
 ires determinism.\n\nBIO: Ashish Panwar is a Principal Researcher at Micro
 soft Research India\, where he explores methods to improving large languag
 e model inference. His broader research interests span operating systems\,
  memory systems\, and GPUs. Prior to joining Microsoft Research in 2022\, 
 he obtained his MSc (Engg) and PhD from the CSA department at IISc where h
 e was advised by Prof. K. Gopinath and Prof. Arkaprava Basu.\n\nHost Facul
 ty: Jayant Haritsa\, CDS\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Talks
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20250121T110000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR