Loading Events

« All Events

  • This event has passed.

M.Tech Research Thesis {Colloquium}: CDS: 26th April 2022 : “Towards Robust and Scalable Video Surveillance: Cross-modal and Domain Generalizable Person Re-identification.”

26 Apr @ 3:00 PM -- 4:00 PM

M.Tech Research Thesis Colloquium

Speaker                 : Ms. Chaitra S. Jambigi

S.R. Number         : 06-18-02-10-22-19-1-16951

Title                        : “Towards Robust and Scalable Video Surveillance: Cross-modal and Domain ​Generalizable Person Re-identification
Research Supervisor :  Dr. Anirban Chakraborty
Date & Time          : April 26, 2022 (Tuesday), 03:00 PM
Venue                      : #102, CDS Seminar Hall

With rapid technological advances, one can easily find video surveillance systems deployed in public places such as malls, airports etc. as well as across private residential areas. These systems play a critical role in ensuring safety and security against criminal/anomalous activities. ‘Person Re-Identification’ (re-ID) is a key component of such a system and is well-studied in modern computer vision literature. The task of person re-ID is typically posed as an instance retrieval problem in a large wide-area network of cameras with non overlapping field-of-views (FoV). When presented with an image of a person of interest (query) as observed in any given camera, the goal is to retrieve all image instances of the target with the same identity from all other cameras (gallery) in the network. Despite the extensive research in this area, there is still a gap between the efficacy of the existing re-ID frameworks under laboratory setting and their real-world deployability – thus necessitating development of practical solutions for person re-ID. In this thesis, we explore two such research directions to build robust and scalable person re-ID models.

The first part of the thesis proposes a solution for the challenging and open problem of Visible-Thermal Person Re-ID (VT Re-ID). In this cross-modal retrieval problem, the query image of a target (in dark/low-light conditions) is captured using a thermal imaging camera and the re-ID system needs to search and retrieve observations corresponding to the same identity from the gallery set, which is composed of visible spectrum images of various targets captured using standard RGB cameras in well-lit environment. Such a system has major applications in night-time surveillance and enables round-the-clock monitoring of the places of interest. Existing cross-modal re-ID methods align the modalities via adversarial learning or complex feature extraction modules that heavily rely on domain knowledge. We propose a simple but effective framework, MMD-ReID, to explicitly reduce the modality gap. MMD-ReID takes inspiration from ‘Maximum Mean Discrepancy’ (MMD), a statistical tool that determines the distance between two distributions. Our method uses a novel margin-based formulation to match class-conditional feature distributions of the visible and thermal samples to minimize intra-class distances while maintaining feature discriminability across identities. Extensive experiments show that our method outperforms state-of-the-art approaches by significant margins.

The second part of the thesis attempts to solve a more challenging problem of Domain Generalization (DG) in person re-ID. Most existing re-ID models are trained and tested on the same dataset and perform poorly when evaluated on a new dataset (domain) without any explicit fine-tuning using annotated data samples from the latter. Recent multi-source DG methods use meta-learning approaches, which are prone to overfitting on the seen domains. To overcome this, we propose a novel strategy based on a supervised contrastive learning framework for learning domain-agnostic features. Our method attempts to model domain variations by creating hallucinated ‘positive’ samples that realistically mimic the perturbations one expects from domain-shift. We empirically show that by using our proposed pool of perturbation strategies, we are able to learn better generalizable features, thereby achieving state-of-the-art performance across unseen domains. We also hypothesize that training on a related, auxiliary task that is preserved across domains can help in learning robust features. With attribute prediction as the chosen auxiliary task, we experimentally show that such training indeed leads to a better generalization of the learnt model.

Lastly, we study the task of predicting facial attributes on masked images, a novel problem that has become very relevant especially during this pandemic time. The existing attribute prediction methods are designed to work well on faces without occlusion and perform unsatisfactorily on masked faces. We propose a novel ‘unmasking’ technique based on image inpainting to mitigate this challenge and experimentally validate the efficacy of our method.


26 Apr
3:00 PM -- 4:00 PM