DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
Ph.D. Thesis Colloquium
Speaker : Mr. Rohit Chowdhury
S.R. Number : 06-18-01-10-12-18-1-16320
Title :”Fast and Scalable Algorithms for Intelligent Routing of Autonomous Marine Vehicles”
Research Supervisor: Dr. Deepak Subramani
Date & Time : December 20, 2023 (Wednesday) at 10:00 AM
Venue : # 102 CDS Seminar Hall
ABSTRACT
Autonomous marine agents play a pivotal role in diverse ocean applications. These agents serve as indispensable instruments for acquiring crucial environmental information. They are used to explore and monitor of harsh environments, e.g., to map ocean topography, study coral reefs, search and rescue, structural monitoring of oil and gas installations etc. In naval security, these agents are used for surveillance and strategic monitoring of maritime activities. Building intelligence to optimally use these agents is essential for reducing operational costs.
The path planning problem is as follows. An autonomous marine agent must optimally traverse from a given start location to a given target location in a stochastic dynamic velocity field like ocean currents while avoiding obstacles or restricted regions in the flow. A key challenge is that the agent is heavily advected by the flow. The optimality may refer to minimising expected travel time or energy consumption, data collection or risk of failure. While there are multiple methods of solving path planning problems, each with its challenges, we develop and use a fast and scalable MDP-based offline planning software that computes optimal policies, and a novel sequence-modelling-based deep learning approach for onboard routing and dynamic planning, where the objective is to learn optimal action sequences for the agent. The goal of this thesis is to develop efficient, fast and scalable Artificial intelligence algorithms for optimal planning and on-board routing algorithms for autonomous marine agents in stochastic dynamic environments.
The thesis comprises five works organised into two parts based on the solution approach. In the first part, we model the path planning problem as a Markov Decision Process (MDP) and aim to compute an optimal policy. However, the key challenge here is that solving an MDP can be prohibitively expensive for large state and action spaces. To overcome this challenge, we either approximate the optimal policy or accelerate the computation using GPUs.
- Physics-driven Q-learning for onboard routing: First, the distribution of exact time-optimal paths predicted by stochastic Dynamically Orthogonal (DO) Hamilton-Jacobi level set partial differential equations (HJLS PDEs) are utilised to learn an initial action-value function that approximates the optimal policy. The flow data collected by onboard sensors are utilised to get a posterior estimate of the environment. The approximated optimal policy is refined in-mission by performing epsilon greedy Q-learning in simulated posterior environments. We showcase the computational advantage of the approach at the cost of approximating the optimal policy.
- GPU-accelerated path planning: We compute an exact optimal policy by solving the path planning problem modelled as an MDP. To solve large-scale real-time problems, which can otherwise be computationally expensive, we introduce an efficient end-to-end GPU accelerated algorithm that builds the MDP model (computing transition probabilities and expected one-step rewards) and solves the MDP to compute an optimal policy. We develop methodical and algorithmic solutions to overcome the limited global memory of GPUs by using a dynamic reduced-order representation of the ocean flows, leveraging the sparse nature of the state transition probability matrix and introducing a neighbouring subgrid concept to save memory and reduce the computational effort. We achieve significant speedups compared to conventional sequential computation.
- Multi-objective GPU-accelerated path planning: The end-to-end GPU accelerated MDP solver is extended to a multi-objective path planner to solve multi-objective optimisation problems in path planning, like minimising both the expected mission completion time and energy consumption. MDPs are modelled with scalarised rewards for multiple objectives. The solver is used to solve numerous instances of complex scenarios with other sources of uncertainty in the environment, enabling us to compute optimal operating curves in a fraction of the time compared to traditional solvers.
In the second part, we convert the optimal path planning problem into a supervised learning problem via sequence modelling. This approach involves learning optimal action sequences based on the available environment information and expert trajectories. It also avoids the issue of re-computing optimal policies for onboard routing.
- Intelligent onboard routing using decision transformers: We develop a novel, deep learning method based on the decision transformer (decoder-only model) for onboard routing of autonomous marine agents. Training data is obtained from aforementioned HJLS PDE or MDP solvers, which is further processed to sequences of states, actions and returns. The model is autoregressively trained on these sequences and then tested in different environment settings. We demonstrate that (i) a trained agent learns to infer the surrounding flow and perform optimal onboard routing when the agent’s state estimation is accurate,(ii) specifying the target locations (in case of multiple targets) as a part of the state enables a trained agent to route itself to the correct destination, and (iii) a trained agent is robust to limited noise in state transitions and is capable of reaching target locations in completely new flow scenarios. We extensively showcase end-to-end planning and onboard routing in various canonical and idealised ocean flow scenarios.
- Path planning with environment encoders and action decoders: We propose a novel combination of dynamically orthogonal flow representation with uncertainty and a transformer model (encoder-decoder) for the path planning task. We model the problem as a sequence-to-sequence translation task where the source sequence is the agent’s knowledge representation of the uncertain environmental flow. The target sequence is the optimal sequence of actions the agent must execute. We demonstrate that a trained transformer model can predict near-optimal paths for unseen flow realisations and obstacle configurations in a fraction of the time required by traditional planners. Validation is performed to show generalisation in unseen obstacle configurations. We also analyse the predictions of both transformer models, viz, decoder only and encoder-decoder and explain the inner mechanics of learning through a novel visualisation of self-attention of actions and states on the trajectories.
ALL ARE WELCOME