Ph.D. Thesis Defense: #102: CDS Seminar Hall: 08, September 2025 “Investigation of the Indian Summer Monsoon Rainfall Using Statistical and Machine Learning Techniques”

When

8 Sep 25    
10:00 AM - 11:00 AM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
Ph.D. Thesis Defense


Speaker : Ms. AKANKSHA MANOJ RAJAK
S.R. Number : 06-18-01-10-12-18-2-16460
Title : “Investigation of the Indian Summer Monsoon Rainfall Using Statistical and Machine Learning Techniques”
Research Supervisor : Dr. Deepak Subramani
Thesis examiner : Prof. Srinivasa Ramanujam Kannan, Indian Institute of Technology, Bhubaneswar
Date & Time : September 08, 2025 (Monday), 10:00 AM
Venue : # 102 CDS Seminar Hall


ABSTRACT

The Indian Summer Monsoon is an important atmospheric phenomenon, marked by a characteristic seasonal wind reversal pattern, delivering 70 to 90% of the annual rainfall to the Indian subcontinent. Monsoon rain profoundly impacts agricultural productivity, water resource management, and thereby the Indian economy. The first significant event of the monsoon season is its start, called Monsoon Onset over Kerala (MOK), which occurs around the first week of June. After that, the monsoon rain makes its way to the central India region in the next two to three weeks and covers the entire country by mid-July. The season has active and break spells, traditionally defined on the basis of the rain anomaly being more than or less than one standard deviation. The focus of the first part of this thesis is to understand the relationship of change points in rain, outgoing longwave radiation, winds, and cloud cover to the MOK and rain spells. The second part of this thesis focuses on forecasting the MOK one season ahead using deep learning techniques, reanalysis and remotely-sensed data.

In the first part of the thesis, statistical change point detection methods are applied to identify significant transitions in monsoon patterns across different regions in India subcontinent.

Change point detection identifies the time at which a statistically significant change in the data generation properties occurs in a time series, making it inherently suitable for detecting monsoon transitions in an unsupervised manner. We focus on two critical regions: Kerala, the traditional entry point of the southwest monsoon into mainland India, and Central India, a core monsoon zone. For these regions, we implement both Pruned Exact Linear Time (PELT) offline algorithm and Bayesian Online Change Point Detection (BOCD) methods. These techniques are applied to univariate rainfall time series as well as multivariate datasets incorporating outgoing longwave radiation, wind patterns, and cloud cover from April 1 onward each year. In the Kerala region, our change point analysis reveals significant shifts in monsoon dynamics over the 1975-2024 period. The mean and standard deviation of rainfall have increased for the month of May while decreasing for June and July during the last quarter of the 20th century. We observe a 3% increase in overall rainfall variability despite a 3% decrease in the southwest monsoon contribution to annual rainfall. These findings highlight the increasing pre-monsoon activity and evolving monsoon characteristics that challenge conventional fixed threshold-based definitions. We propose using the first significant rainfall change point as an objective proxy for Monsoon Onset over Kerala (MOK), demonstrating several advantages over the IMD’s conventional threshold-based criteria, particularly in its ability to adapt to changing baseline conditions. When extending this methodology to Central India, we discover a consistent propagation lag of 21-24 days between corresponding change points in Kerala and Central India, providing a statistical framework for understanding monsoon progression across regions. Our analysis further reveals fundamental constraints in monsoon behavior: approximately 57.8% of all transitions either originate from or remain in normal conditions, with a complete absence of direct transitions between extreme states (break-to-active or active-to-break). These findings establish a robust approach for characterizing interannual variability patterns, the frequency of high and low intensity rainfall periods, and the spatial propagation of monsoon signals, with significant implications for intraseasonal forecasting.

In the second part of the thesis, we transition from detecting monsoon onset to predicting it through the development of advanced deep learning approaches. Our research progresses from classification to precise date prediction, and from static to temporal models, each with increasing sophistication and performance. First, we establish a baseline approach by developing a CNN-based classification model for categorizing Monsoon Onset over Kerala (MOK) timing into Early, Normal, and Delayed classes. This model uses monthly mean sea surface temperature (SST) data from March, providing approximately a 60-day lead time for predictions. The network was initially trained from scratch on ERA5 reanalysis data, achieving an accuracy of 60.0% on the test dataset. We then implemented an efficient transfer learning approach, fine-tuning the pre-trained ERA5 model on MODIS satellite data, which resulted in an accuracy of 53.3%. Notably, this represents a substantial improvement over the 40.0% accuracy obtained when applying the ERA5-trained model directly to MODIS data without fine-tuning, where the model failed entirely to identify Early and Normal onset classes (0% recall for both classes). The fine-tuning effectively addressed the systematic differences between datasets observed in the Arabian Sea, Bay of Bengal, and Indian Ocean regions, improving the macro-average F1-score from 0.190 to 0.564. This adaptation demonstrates the model’s flexibility across different data sources and its potential for satellite-based operational forecasting. Gradient-weighted Class Activation Mapping (GradCAM) visualizations revealed valuable insights into the model’s decision-making process, highlighting how it focuses on specific regions in the Arabian Sea and Bay of Bengal when making predictions for different onset categories.

While effective, this initial model relies only on monthly mean data without incorporating temporal evolution of meteorological conditions. To address this limitation, we developed more sophisticated spatio-temporal neural network architectures that capture the temporal dynamics of pre-monsoon conditions. Two distinct models were created: a March model using data available through March 31 (providing a 60-day lead time) and an April model incorporating data through April 30 (providing a 30-day lead time). These models were trained to process multiple meteorological variables from ERA5 reanalysis, including sea surface temperature (SST), total cloud cover (TCC), net top thermal radiation for clear sky (TTRC/OLR), winds at 850 hPa and 200 hPa pressure levels, and mean sea level pressure (MSLP). Using data from 1940 to 2009 for training and validation, with 2010 to 2024 reserved for testing, we systematically evaluated the predictive capacity of individual variables and their combinations. Our most effective architecture integrates all meteorological variables into a unified spatio-temporal framework, achieving remarkable performance with an RMSE of approximately 2 days, a correlation coefficient of 0.89, and a 30-day lead time. The implementation of early stopping and L2 regularization techniques effectively prevented overfitting, creating a model that balances computational efficiency with prediction accuracy. The model demonstrates consistent performance across different test periods, suggesting robust real-world applicability. The enhanced prediction capabilities of these developed models provide valuable lead time for agricultural planning and water resource management, potentially improving decision-making processes for marginal farmers, crop insurance providers, and government irrigation departments across India. By establishing a progression from simple classification to precise onset date prediction, this research contributes practical tools that can be implemented in operational forecasting settings to support communities and economies dependent on accurate monsoon predictions.


ALL ARE WELCOME