Ph.D. Thesis {Colloquium}: CDS Seminar Hall # 102 “Learning Deep Neural Networks From Limited and Imperfect Data”


25 Apr 24    
10:00 AM - 11:00 AM

Event Type

Ph.D. Thesis Colloquium

Speaker : Mr. Harsh Rangwani

S.R. Number : 06-18-01-10-12-19-1-17477

Title :”Learning Deep Neural Networks From Limited and Imperfect Data”
Research Supervisor: Prof. Venkatesh Babu
Date & Time : April 25, 2024 (Thursday) at 10:00 AM
Venue : # 102 CDS Seminar Hall


Deep Neural Networks have demonstrated orders of magnitude improvement in capabilities over the years after AlexNet won the ImageNet challenge in 2012. One of the major reasons for this success is the availability of large-scale, well-curated datasets. These datasets (e.g., ImageNet, MSCOCO, etc.) are often manually balanced across categories (classes) to facilitate learning of all the categories. This curation process is often expensive and requires throwing away precious annotated data to balance the frequency across classes. This is because the distribution of data in the world (e.g., internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well- curated datasets perform suboptimally when used to learn from imperfect datasets with long-tailed imbalances and distribution shifts. For deep models to be widely used, getting away with the costly curation process by developing robust algorithms that can learn from real-world data distribution is necessary. Toward this goal, we develop practical algorithms for Deep Neural Networks that can learn from limited and imperfect data present in the real world. This thesis is divided into four segments, each covering a scenario of learning from limited or imperfect data. The first part of the thesis focuses on Learning Generative Models for Long-Tail Data, where we mitigate the mode-collapse for tail (minority) classes and enable diverse aesthetic image generations as head (majority) classes. In the second part, we enable effective generalization on tail classes through Inductive Regularization schemes, which allow tail classes to generalize as the head classes without enforcing explicit generation of images. In the third part, we develop algorithms for Optimizing Relevant Metrics compared to the average accuracy for learning from long-tailed data with limited annotation (semi-supervised), followed by the fourth part, which focuses on the effective domain adaptation of the model to various domains with zero to very few labeled samples.

Generative Models for Long-Tail Data. We first evaluate generative models’ performance, specifically variants of Generative Adversarial Networks (GANs) on long-tailed datasets. The GAN variants suffer from either mode-collapse or miss-class modes during generation. To mitigate this, we propose Class Balancing GAN with a Classifier in the Loop, which uses a classifier to asses the modes in generated images and regularizes GAN to produce all classes equally. To alleviate the dependence on the classifier, following our observation that spectral norm explosion of Batch Norm parameters is the major reason for mode collapse. We develop an inexpensive group Spectral Regularizer (gSR) to mitigate the spectral collapse, significantly improving the SotA conditional GANs (SNGAN and BigGAN) performance on long-tailed data. However, we observed that class confusion was present in the generated images due to norm regularization. In our latest work, NoisyTwins, we factor the latent space as distinct Gaussian by design for each class, enforcing class consistency and intra-class diversity using a contrastive approach (BarlowTwins). This helps to scale high-resolution StyleGANs for ≥ 1000 class long-tailed datasets of ImageNet-LT and iNaturalist2019, achieving state-of-the-art (SotA) performance.

Inducting Regularization Schemes for Long-Tailed Data. While Data Generation is exciting for improving classification models on tail classes, it often comes with the cost of training an auxiliary GAN model. Hence, a lightweight technique like enhancing loss weights (re-weighting) for tail classes while training CNNs is practical to improve minority class performance. However, despite this, the model only attains minima for the head class loss and converges to saddle point for tail classes. We show that inducing inductive bias of escaping saddles and converging to minima for tail classes, using Sharpness Aware Minimization (SAM) significantly improves performance on tail classes. Further training Vision Transformer (ViT) for long-tail recognition is hard, as they don’t have inductive biases like locality of features, which makes them data hungry. We propose DeiT-LT, which introduces OOD and low-rank distillation from CNN to induce CNN-like robustness into scalable ViTs for robust performance.

Semi-Supervised Long-Tailed Learning. The above methods work in supervised long-tail learning, where they avoid throwing off the annotated data. However, the real benefit of long-tailed methods could be leveraged when they utilize the extensive unlabeled data present (i.e., semi-supervised setting). For this, we introduce a paradigm where we measure the performance using relevant metrics like worst-case recall and recall H-mean on a held-out set, and we use their feedback to learn in a semi-supervised long-tailed setting. We introduce Cost-Sensitive Self Training (CSST) generalizes self-training (e.g., FixMatch, etc.) based semi-supervised learning to long-tail settings with strong guarantees and empirical performance. The general trend these days is to use self-supervised pre-training to obtain a robust model and then fine-tune it. In this setup, we introduce SelMix, an inexpensive fine-tuning technique to optimize the relevant metrics using pre-trained models. In SelMix, we relax the assumption that unlabeled distribution is similar to the labeled, making models robust to distribution shifts.

Efficient Domain Adaptation. The long-tail learning algorithms focus on limited data setup and improving in-distribution generalization. Still, for practical usage, the model must learn from imperfect data and perform well across various domains. Toward this goal, we develop Submodular Subset Selection for Adversarial Domain Adaptation, which carefully selects a few samples to be labeled for maximally improving model performance in the target domain. To further improve the efficiency of the Adaptation procedure, we introduce Smooth Domain Adversarial Training (SDAT), which converges to generalizable smooth minima. The smooth minimum enables efficient and effective model adaptation across domains and tasks.