Ph.D. Thesis {Colloquium}: CDS: “Investigation of the Indian summer monsoon employing statistical and machine learning methods”

When

27 Nov 24    
9:30 AM - 10:30 AM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
Ph.D. Thesis Colloquium
============================================================
Speaker : Ms. AKANKSHA MANOJ RAJAK
S.R. Number : 06-18-01-10-12-18-2-16460
Title : “Investigation of the Indian summer monsoon employing statistical and machine learning methods”
Research Supervisor : Dr. Deepak Narayanan Subramani
Date & Time : November 27, 2024 (Wednesday), 9:30 AM
Venue : # 102 CDS Seminar Hall
=============================================================
ABSTRACT
The Indian summer monsoon rain contributes approximately 70-90 % of India’s annual precipitation, profoundly impacting agricultural productivity, water resource management, and thereby the Indian economy. The summer monsoon makes onset over the coast of Kerala around the first week of June and is characterised by spells of more than average rain (called active spells) and less than average rain (called as break spells). This thesis presents an analysis of the detection and forecasting of the onset date of the Indian summer monsoon using change point detection methods and deep neural network models. We also study the utility of unsupervised statistical methods in detecting active and break spells.

In the first part of the thesis, statistical change point detection methods are applied to a 114-year gridded dataset of rainfall in India to detect monsoon onset, and active and break phases of the monsoon in an unsupervised manner. We apply and study the results from multiple Change Point Detection methods such as the Pruned Exact Linear Time (PELT) method for efficient sequential change detection, Bayesian Online Change Point Detection (BOCD) for real-time onset detection, and Topological Data Analysis (TDA) for pattern recognition in complex data structures. These methods successfully identify monsoon onset dates from rainfall time series and establish relationships between the detected change points and their interannual variability. Furthermore, CPD methods systematically identify active and break spells in monsoon patterns by analyzing statistical property shifts between change points across multiple locations in India.

In the second part of the thesis, spatiotemporal deep neural network models were developed to forecast the onset date with a 30- to 60- day lead time. Reanalysis and remotely sensed atmospheric and oceanic data including sea surface temperature (SST), total cloud cover (TCC), net top thermal radiation for the clear sky atmosphere (TTRC) or outgoing longwave radiation (OLR) and mean sea level pressure (MSLP) from 1 March to 30 April are used as input to the deep neural models. Data were obtained from the ERA5 reanalysis, the MODIS Aqua and Terra satellites, and the NOAA satellites. Three different neural architectures were developed to predict the onset of monsoon from the above data: (i) ConvLSTM, (ii) Depthwise Separable Convolution, and (iii) Transformers. For the first two architectures, the daily spatial data of the input variables were fed into the model in a full space representation and a reduced space representation using convolutional auto-encoders. For the third architecture, only the reduced space representation was used. Separate models for each input variable, combined models for all input variables, and models with different lead times were developed, and the results were noted. For each model, extensive ablation studies were conducted. Impact of differe= nt data mix (reanalysis, remotely sensed) used for training is quantified. Our analysis revealed that SST and TTRC can provide accurate predictions with an error margin of less than 3 days by the end of March (60-day lead time), while variables like Cloud Fraction and MSLP require data from April (30-day lead time) to contribute effectively to the onset prediction. The best performing neural models for each variable are stacked together using a meta learner and the final onset date is predicted. This model achieves a mean absolute error of 1.8 days with a correlation coefficient of 0.89 at a 30-day lead time and 2.0 days with a correlation of 0.87 at a 60-day lead time. The existing dynamical and data driven models have a 4-day error, and we show supe rior predictive ability.

The enhanced prediction capabilities of the developed model provide valuable lead time for agricultural planning and water resource management, potentially improving decision-making processes for marginal farmers, crop-insurance providers and government irrigation departments across India.

================================================================
ALL ARE WELCOME