M.Tech (Research) Thesis Defense [Online]: Deep Learning Based Approaches for 3D Reconstruction and Person Re-identification

22 Apr 20    Yogesh Simmhan


M.Tech (Research)  Thesis Defense (Online)

Speaker                : Navaneet K L

S.R. Number        : 06-18-02-10-22-16-1-13933

Title                      :  Deep Learning Based Approaches for 3D Reconstruction and Person Re-identification 

Date & Time        :  22 April 2020 (Wednesday), 10:00 AM 

Venue                    : The Defense will be conducted online. 



 In this thesis, we explore two diverse problems of single image based 3D reconstruction and person re-identification. While the problems may seem different, the underlying tools used to solve them are similar. Specifically, for both the tasks, we propose fully/weakly supervised learning based solutions, with deep CNN architectures. 

Single Image based 3D Reconstruction: Knowledge of 3D properties of objects is necessary to build effective computer vision systems. Applications like robot navigation, grasping and autonomous driving rely heavily on the availability of such data. However, the process of obtaining 3D data is time consuming and expensive. Here, we consider the problem of single image based 3D reconstruction, where the 3D geometry of the object is predicted using an image of the object from a single view-point. Since the problem of reconstructing the entire 3D geometry from just a single image is ill-posed, there can exist multiple reconstructions which correspond to the input image, but differ in the unobserved views. Towards this, we explore the idea of predicting multiple plausible solutions. In the latter part of our work, we shift our attention to weakly supervised approaches. To overcome the issues associated with collecting 3D data, we consider multiple 2D images of the objects from different views as the supervisory data. We propose a differentiable projection module to facilitate training with such 2D images. We empirically demonstrate that the performance of the weakly supervised approach is comparable to that of the 3D supervised approach with as little as two training images per 3D model, while being more generalizable to real-world scenarios where 3D data is absent. Finally, we extend this to predict features associated with a 3D object such as color, semantic part segmentation label and surface normal. We observe that, similar to the case of shape prediction, the performance of weakly supervised approach is on-par with that of the 3D supervised baselines.  

Sequential multi-camera feature fusion for person re-identification: Given a target image as query, person re-identification systems retrieve a ranked list of candidate matches on a per-camera basis. In deployed systems, a human operator scans these lists and labels sighted targets. However, classical re-id approaches generate per-camera lists independently. Therefore, target identifications by operator in a subset of cameras cannot be utilized to improve ranking of the target in the remaining set of network cameras. To address this shortcoming, we propose a novel sequential multi-camera re-id approach. The proposed approach can accommodate human operator inputs and provides early gains via a monotonic improvement in target ranking. A fusion function is utilized to combine the inputs from multiple cameras at the feature level. We formulate an optimization procedure custom-designed to incrementally improve the query representation. We empirically demonstrate that the proposed fusion based framework and the novel loss function significantly improve the retrieval performance. We also develop a re-id user interface and perform comparative analysis of human operator performance to demonstrate the superiority and real-world feasibility of our approach. The above framework is also extended to the case of video-based person re-identification.