BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:118@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20250402T150000
DTEND;TZID=Asia/Kolkata:20250402T160000
DTSTAMP:20250325T123744Z
URL:https://cds.iisc.ac.in/events/ph-d-thesis-colloquium-202-cds-02-april-
 2025-systems-optimizations-for-dnn-training-and-inference-on-accelerated-e
 dge-devices/
SUMMARY:Ph.D: Thesis Colloquium: 419 : CDS: 02\, April 2025 "Systems Optimi
 zations for DNN Training and Inference on Accelerated Edge Devices"
DESCRIPTION:DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES\nPh.D. Thesis Col
 loquium\n\n\n\nSpeaker : Ms. Prashanthi S. K.\nS.R. Number : 06-18-01-10-1
 2-20-1-18362\nTitle : "Systems Optimizations for DNN Training and Inferenc
 e on Accelerated Edge Devices"\nResearch Supervisor :Prof. Yogesh Simmhan\
 nDate &amp\; Time : April 2\, 2025 (Wednesday)\, 03:00 PM\nVenue : CDS # 4
 19\n\n\n\nABSTRACT\n\nDeep Neural Networks (DNNs) have had a significant i
 mpact on a wide variety of domains\, such as Autonomous Vehicles\, Smart C
 ities\, and Healthcare\, through low-latency inferencing on edge computing
  devices close to the data source. Recently\, there has also been a push t
 owards training DNN models on the edge. This is driven by the increasing d
 ata collected from edge devices in Cyber-Physical Systems (CPS)\, the grow
 ing computing power of edge devices\, and the rise of on-device training p
 aradigms such as Federated Learning and Continuous Learning that focus on 
 privacy and personalization.\n\nExisting literature has focused heavily on
  optimizing edge inference\, and there is very limited systems research on
  optimizing DNN training\, and concurrent training and inference on the ed
 ge. Previous work on server GPUs cannot be directly applied since edge dev
 ices are architecturally different from cloud/server GPUs and find use in 
 varied field deployments that have power or energy constraints. Through th
 is PhD thesis\, we design system optimizations and tune edge platforms to 
 help DNN training and inference workloads utilize the full potential of ac
 celerated edge hardware.\n\nSpecifically\, in this thesis\, we make four c
 ontributions: 1) Characterize the impact of training and device parameters
  on the performance and energy of DNN training workloads 2) Develop empiri
 cal ML models to predict and optimize the performance of training workload
 s in a power-constrained setting 3) Develop an analytical roofline model t
 o understand and explain the impact of device parameters on power and perf
 ormance of training and inference. 4) Design a scheduler for concurrent tr
 aining and inference workloads to meet diverse QoS goals of latency and th
 roughput within a power budget.\n\nWe motivate the need for training on th
 e edge and the associated systems research challenges and conduct a rigoro
 us empirical performance characterization of four classes of NVIDIA Jetson
  accelerated edge devices for DNN training. We vary training and device pa
 rameters such as I/O pipelining and parallelism\, storage media\, mini-bat
 ch sizes\, and power modes\, and examine their effect on CPU and GPU utili
 zation\, fetch stalls\, training time\, energy usage\, and variability. Ou
 r analysis exposes several resource inter-dependencies and counter-intuiti
 ve insights\, while also helping quantify known wisdom.\n\nBuilding upon t
 he insights from our characterization\, we develop PowerTrain\, a pre-trai
 ning and transfer-learning approach to accurately predict the performance 
 and power consumption of a given DNN training workload using any specified
  power mode (CPU/GPU/memory frequencies\, core count) on NVIDIA Jetson dev
 ices. We use these predictions to instantly construct a Pareto front and r
 eturn a configuration that minimizes training time within a power budget. 
 PowerTrain requires minimal additional profiling for transfer learning to 
 a new workload and generalizes to different models\, datasets\, and other 
 edge devices. Our predictions outperform the NVIDIA prediction tool and ot
 her baselines and have low prediction errors of 5-15%.\n\nIn Pagoda\, we i
 nvestigate analytical roofline-based characterization to understand and ex
 plain the impact of power modes for various workloads. We develop a time r
 oofline and a novel energy roofline model for diverse power modes. We coup
 le this with an analytical model of the compute (FLOP) and memory access (
 bytes) for DNN workloads to analyze them from first principles. Lastly\, w
 e apply these methods to modify the power mode and\, hence\, the roofline 
 of the edge device to optimize the latency and energy usage for DNN infere
 nce. Our experiments show energy benefits of up to 15% without a degradati
 on in time.\n\nFinally\, we design Fulcrum\, a scheduler that optimizes th
 e power and performance of DNN training and inference workloads\, both ind
 ividually and when run concurrently. Specifically\, we develop an interlea
 ved approach for concurrent workload execution scheduled at the minibatch 
 granularity\, offering low variability in the inference latency. We also p
 ropose two novel optimization strategies that satisfy the diverse QoS goal
 s of meeting inference latency and maximizing training throughput while st
 aying within a power budget for field deployments. Our gradient descent-ba
 sed multi-dimensional search approach (GMD) quickly converges to a solutio
 n with lesser profiling of power modes\, while our active-learning-based a
 pproach (ALS) generalizes well across various problem configurations. Both
  our strategies outperform baselines and are close to the optimal solution
 .\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Ph.D. Thesis Colloquium
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20240402T150000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR