DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
M.Tech (Research) Thesis Defense (Online)
Speaker : Navaneet K L
S.R. Number : 06-18-02-10-22-16-1-13933
Title : Deep Learning Based Approaches for 3D Reconstruction and Person Re-identification
Date & Time : 22 April 2020 (Wednesday), 10:00 AM
Venue : The Defense will be conducted online.
_______________________________________________________________________________________________________________________________________
ABSTRACT
In this thesis, we explore two diverse problems of single image based 3D reconstruction and person re-identification. While the problems may seem different, the underlying tools used to solve them are similar. Specifically, for both the tasks, we propose fully/weakly supervised learning based solutions, with deep CNN architectures.
Single Image based 3D Reconstruction: Knowledge of 3D
properties of objects is necessary to build effective computer vision systems.
Applications like robot navigation, grasping and autonomous driving rely
heavily on the availability of such data. However, the process of obtaining 3D
data is time consuming and expensive. Here, we consider the problem of
single image based 3D reconstruction, where the 3D geometry of the
object is predicted using an image of the object from a single view-point.
Since the problem of reconstructing the entire 3D geometry from just a single
image is ill-posed, there can exist multiple reconstructions which correspond
to the input image, but differ in the unobserved views. Towards this,
we explore the idea of predicting multiple plausible solutions. In the latter
part of our work, we shift our attention to weakly supervised approaches. To
overcome the issues associated with collecting 3D data, we consider multiple 2D
images of the objects from different views as the supervisory data. We propose
a differentiable projection module to facilitate training with such 2D images.
We empirically demonstrate that the performance of the weakly supervised
approach is comparable to that of the 3D supervised approach with as little as
two training images per 3D model, while being more generalizable to real-world
scenarios where 3D data is absent. Finally, we extend this to predict features
associated with a 3D object such as color, semantic part segmentation label and
surface normal. We observe that, similar to the case of shape
prediction, the performance of weakly supervised approach is on-par with that
of the 3D supervised baselines.
Sequential multi-camera feature fusion for person
re-identification: Given a target image as query, person re-identification
systems retrieve a ranked list of candidate matches on a per-camera basis. In
deployed systems, a human operator scans these lists and labels sighted targets.
However, classical re-id approaches generate per-camera lists independently.
Therefore, target identifications by operator in a subset of cameras cannot be
utilized to improve ranking of the target in the remaining set of network
cameras. To address this shortcoming, we propose a novel sequential
multi-camera re-id approach. The proposed approach can accommodate human
operator inputs and provides early gains via a monotonic improvement in target
ranking. A fusion function is utilized to combine the inputs from multiple
cameras at the feature level. We formulate an optimization procedure
custom-designed to incrementally improve the query representation. We
empirically demonstrate that the proposed fusion based framework and the novel
loss function significantly improve the retrieval performance. We also develop
a re-id user interface and perform comparative analysis of human operator
performance to demonstrate the superiority and real-world feasibility of our
approach. The above framework is also extended to the case of video-based
person re-identification.
________________________________________________________________________________________________________________
ALL ARE WELCOME