BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:153@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20251016T160000
DTEND;TZID=Asia/Kolkata:20251016T170000
DTSTAMP:20251009T074003Z
URL:https://cds.iisc.ac.in/events/phd-thesis-defense-102-cds-seminar-hall-
 oct-16-thursday-400-pm-systems-optimizations-for-dnn-training-and-inferenc
 e-on-accelerated-edge-devices/
SUMMARY:PhD Thesis Defense: #102: CDS Seminar Hall: Oct -16 (Thursday) @ 4:
 00 PM: "Systems Optimizations for DNN Training and Inference on Accelerate
 d Edge Devices"
DESCRIPTION:DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES\nPh.D. Thesis Def
 ense\n\n\n\nSpeaker : Ms. Prashanthi S. K.\nS.R. Number : 06-18-01-10-12-2
 0-1-18362\nTitle : "Systems Optimizations for DNN Training and Inference o
 n Accelerated Edge Devices"\nResearch Supervisors : Prof. Yogesh Simmhan\n
 Thesis Examiner : Dr. Preeti Malakar\, Indian Institute of Technology Kanp
 ur\nDate &amp\; Time : Oct 16\, 2025 (Thursday) at 4:00 PM\nVenue : # 102 
 CDS Seminar Hall\n\n\n\nABSTRACT\nDeep Neural Networks (DNNs) have had a s
 ignificant impact on a wide variety of domains\, such as Autonomous Vehicl
 es\, Smart Cities\, and Healthcare\, through low-latency inferencing on ed
 ge computing devices close to the data source. Recently\, there has also b
 een a push towards training DNN models on accelerated edge devices having 
 on-board Graphics Processing Units (GPUs) with 100-1000s of Compute Unifie
 d Device Architecture (CUDA) cores. This is driven by the increasing data 
 collected from edge devices in Cyber-Physical Systems (CPS) and Internet o
 f Things (IoT)\, the growing computing power of edge devices\, and the ris
 e of on-device training paradigms such as Federated Learning and Continuou
 s Learning that focus on privacy and personalization.\n\nExisting literatu
 re has primarily focused on optimizing edge inference. There is limited sy
 stems research on optimizing DNN training\, and concurrent training and in
 ference on edge accelerators. Previous work on server GPUs cannot be direc
 tly applied to edge devices since they have architectural distinctions fro
 m cloud/server GPUs\, in particular their 1000s of power modes consisting 
 of Central Processing Unit (CPU) core count and CPU\, GPUs and memory freq
 uencies. They are also used in varied field deployments that impose power 
 or energy constraints. In this dissertation\, we characterize\, model and 
 predict the behavior of NVIDIA Jetson edge accelerators and their power mo
 de configurations for DNN workloads. These employ both empirical Machine L
 earning (ML) based models and analytical roofline driven models. We levera
 ge these to design system optimizations to tune the edge platform for DNN 
 training and inference workloads\, and help DNNs effectively utilize the f
 ull potential of accelerated edge hardware.\n\nWe first motivate the need 
 for training on the edge and the associated systems research challenges th
 rough a rigorous empirical performance characterization of four classes of
  NVIDIA Jetson accelerated edge devices for DNN training. We vary paramete
 rs of the Pytorch training framework and edge device\, such as I/O pipelin
 ing and parallelism\, storage media\, mini-batch sizes and power modes\, a
 nd examine their effect on CPU and GPU utilization\, fetch stalls\, traini
 ng time\, energy usage\, and variability. Our analysis exposes several res
 ource inter-dependencies and counter-intuitive insights\, while also helpi
 ng quantify known wisdom. We also study the impact of containerized DNN in
 ference and training workloads and contrast it against bare metal executio
 n on running time\, CPU\, GPU and memory utilization\, and energy consumpt
 ion.\n\nBuilding upon these insights\, we develop PowerTrain\, a transfer-
 learning approach to accurately predict the performance and power usage of
  a new DNN training workload for any given power mode. PowerTrain does a o
 ne-time costly profiling of 100s of power models for one DNN model trainin
 g on a Jetson device to train a reference prediction model. It is then abl
 e to generalize this using transfer learning to different DNN models\, dat
 asets and edge devices with limited custom profiling. We use these predict
 ions to instantly construct a Pareto frontier for the behavior of the new 
 DNN workload and decide the power mode configuration that minimizes the tr
 aining time within a power budget. Our predictions outperform the NVIDIA p
 rediction tool and other baselines\, and have low prediction errors of 5-1
 5% on time and power.\n\nIn Pagoda\, we investigate analytical roofline-ba
 sed characterization to understand and explain the impact of power modes f
 or various workloads. We develop a time roofline and a novel energy roofli
 ne model for diverse power modes. We couple this with an analytical model 
 of the compute (FLOP) and memory access (bytes) for DNN workloads to analy
 ze them from first principles. Lastly\, we apply these methods to modify t
 he power mode and\, hence\, the roofline of the edge device to optimize th
 e latency and energy usage for DNN inference. Our experiments show energy 
 benefits of up to 15% with minimal degradation in time.\n\nFinally\, we de
 sign Fulcrum\, a scheduler that optimizes the power and performance of DNN
  training and inference workloads\, both individually and when run concurr
 ently. Specifically\, we develop a managed interleaving approach for concu
 rrent workload execution scheduled at the minibatch granularity\, offering
  low variability in the inference latency compared to native interleaving 
 done by the GPU scheduler. We also propose two novel optimizations that sa
 tisfy the diverse QoS goals of meeting inference latency and maximizing tr
 aining throughput while staying within a power budget for field deployment
 s. Our gradient descent-based multi-dimensional search approach (GMD) quic
 kly converges to a solution with lesser profiling of power modes\, while o
 ur active-learning-based approach (ALS) generalizes well across various pr
 oblem configurations. Both our strategies outperform baselines and are clo
 se to the optimal solution.\n\nTogether\, these contributions holistically
  offer a deeper understanding of the performance of DNN workloads on edge 
 accelerators\, help accurately model the impact of power modes on their pe
 rformance and power usage\, and provide systems optimizations to effective
 ly leverage edge accelerators for DNN training and inferencing in practica
 l situations.\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Thesis Defense
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20241016T160000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR