Ph.D: Thesis Colloquium: 102 : CDS: 28, April 2025 “Scalable Platform for Intelligent Orchestration of Autonomous Systems Across Edge-Cloud Continuum”

When

28 Apr 25    
4:00 PM - 5:00 PM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
Ph.D. Thesis Colloquium


Speaker : Ms. Suman Raj
S.R. Number : 06-18-01-10-12-20-1-18450
Title :”Scalable Platform for Intelligent Orchestration of Autonomous Systems Across Edge-Cloud Continuum”
Research Supervisor: Prof. Yogesh Simmhan
Date & Time : April 28, 2025 (Monday), 04:00 PM
Venue : CDS # 102


ABSTRACT
The benefits of autonomous mobile platforms, such as unmanned aerial vehicles (UAVs) equipped with onboard cameras, are enhanced by compact edge accelerators that are co-located, such as the NVIDIA Jetson with 100s of CUDA cores. They enable rapid inference of Deep Neural Network (DNN) models and computer vision algorithms to support real-time analytics workflows for diverse domains, ranging from smart crop monitoring to assisting Visually Impaired People (VIPs), either individually or as part of a fleet.

However, programming such drones and edge devices for efficient, resilient and responsive operations poses challenges. The limited compute capacity of edge devices needs to be intelligently complemented by cloud computing to offload compute-intensive analytics in a timely and cost-effective manner. Routing fleets of drones to accomplish complex tasks requires us to optimize for and adapt to network variability, edge failures, latency and energy constraints, and monetary costs. Further, these need to be intuitively programmable to design practical applications across distributed autonomous platforms and edge resources. We address these challenges in this dissertation.

First, we design a task scheduling strategy, GEMS, that performs real-time decisions for DNN inference tasks generated by drones, to execute them either on local edge accelerators or remote cloud resources. The goals are to maximize the Quality of Service (QoS) and Quality of Experience (QoE) for a VIP assistive application within the deadline constraints of each task. GEMS accounts for the task deadline, cloud and edge pricing, and dynamic network variability. Our realistic experiments using up to 84 emulated drones show up to 2.7x higher QoS utility, up to 75% higher QoE utility, within ±1m/s3 Jerk, up to 42% lower yaw error, and a task completion rate of up to 88% compared to state-of-the-art baselines for diverse computer vision workloads.

Next, we study co-scheduling of DNN inferencing and routes for a fleet of drones used in collaborative applications such as smart agriculture. The drones need to visit a set of waypoints to collect data and perform analytics, have access to onboard edge compute, stationary fogs at cell towers, and mobile fogs on public buses. We define this as a Mission Scheduling Problem, which is NP-complete, and design MARC as a divide and assign heuristic to solve this optimization problem. Our simulation-based evaluation of MARC with fleets of up to 50 drones
achieves a 100% task completion rate and up to 31% higher average utility than contemporary baselines, and is within 75% of the optimal solution solved using MILP, which is tractable only for small inputs.

Further, we explore resilient scheduling of autonomous systems to ensure continuity of service despite drone and edge failures in the context of wildfire response, where a heterogeneous UAV fleet helps detect stranded individuals and generate evacuation routes to safety. We develop the AeroResQ platform with algorithms that dynamically adapt to failures by using heartbeats across drones and an onboard distributed datastore. Strategies like a load balancing algorithm to address the active requests being processed by failed drones and re-assignment of spatial regions across the available drones ensure uninterrupted detection, route planning, and monitoring. Our evaluations, conducted using an emulated environment based on recent Southern California wildfire data, demonstrate the robustness of our platform under failure scenarios and fleet configurations. The system achieves real-time performance with ≤1s end-to-end latency per evacuation request, much below the 2s request interval, while maintaining over 98% successful task reassignment and completion.

Finally, we design the Ocularone platform as an integrated Drones-as-a-Service (DaaS) programming framework that enables rapid development of analytics-driven UAV applications across the edge-cloud continuum. It abstracts drone navigation, control, and sensing into intuitive, composable Python-based interfaces, and can embed the scheduling strategies that we have developed, such as GEMS. This is validated for an assistive application to help navigate visually impaired individuals using buddy drones. This is further enabled by accurate DNN-based distance estimation algorithms we have developed to assess nearby obstacles, which are fine-tuned with curated datasets to improve real-world accuracy. The application can be implemented in under 40 lines of code using Ocularone.


ALL ARE WELCOME