- This event has passed.
Ph.D Open Thesis Defense: Deep Visual Representations: A study on Augmentation, Visualization and Robustness
14 Feb @ 11:00 AM -- 12:00 PM
Ph.D Open Thesis Defense
Speaker : Konda Reddy Mopuri
S.R. Number : 06-18-02-10-12-12-1-09607
Title : Deep Visual Representations: A study on Augmentation, Visualization and Robustness
Date & Time : 14 February 2019 (Thursday), 11:00 AM
Venue : CDS Seminar hall #102
Deep neural networks have resulted in unprecedented performances for various learning tasks. Particularly, Convolutional Neural Networks (CNNs) are shown to
learn representations that can efficiently discriminate hundreds of visual categories. They learn a hierarchy of representations ranging from low level edge and
blob detectors to high-level semantic features such as object categories. These representations can be employed as off-the-shelf visual features in various
vision tasks such as image classification, scene retrieval, caption generation, etc. In this thesis, we investigate three important aspects of the
representations learned by the CNNs: (i) incorporating useful side and additional information to augment the learned visual representations, (ii) visualizing
them, and (iii) studying their susceptibility to adversarial perturbations.
In the first part of this thesis, we present approaches that exploit the useful ‘side and additional’ information to enrich the learned representations with more
semantics. Specifically, we learn to encode additional discriminative information from (i) objectness prior, and (ii) strong supervision offered by the image
Visual representations: Objectness prior: In order to encode comprehensive visual information from a scene, existing methods typically employ deep-learned visual
representations in a sliding window framework. However, scenes typically are composition of the objects. We exploit objectness information while aggregating the
visual features from individual image regions into a compact image representation.
Strong supervision: In a typical supervised learning setting for object recognition, labels offer only weak supervision. All that a label provides is presence or
absence of an object. It neglects a lot of useful information such as, object attributes, context, etc. Image captions on the other hand, provide rich
information about the image contents. Therefore, to improve the performance of CNN representations we exploit the image captions as strong supervision for the
application of scene retrieval. We show that strong supervision when served with pairwise constraints can help the representations to effectively learn the
graded relevances between image pairs.
Visualization: Despite their impressive performance, CNNs offer limited transparency, therefore, are treated as black boxes. Increasing depth, intricate
architectures and sophisticated regularizers make them complex machine learning models. One way to make them transparent is to provide visual explanations for
their predictions, i.e., visualizing image regions that guide their predictions. In the second part of the thesis, we develop a novel visualization method to
locate the evidence in the input for a given activation at any layer in the architecture.
Robustness: Along with successful adaptation across various vision tasks, the learned representations are also observed to be unstable to addition of special
noise of small magnitude, called adversarial perturbations. Thus, the third and final part of the thesis focuses on the stability of the representations to input
Data-free objectives: These additive perturbations make the CNNs susceptible to produce inaccurate predictions with high confidence and threaten their
deploy-ability in the real world. In order to craft these perturbations (either image specific or agnostic), existing methods solve complex fooling objectives
that require samples from target data distribution. For the first time, we introduce data-free objectives to craft image-agnostic adversarial perturbations that
can effectively fool CNNs. Our objectives expose the fragility of the learned representations across various vision tasks even in the black-box attacking
scenario, where no information about the target model is known.
Modelling the adversaries: Recent works have shown the existence of image-agnostic perturbations that can fool CNNs over most natural images. Existing methods
present optimization approaches to craft these perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single
instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial
perturbations. We propose for the first time, a generative approach to model the distribution of such perturbations. Our generative model is inspired from GANs
and is trained using fooling and diversity objectives. The proposed generator network captures the distribution of adversarial perturbations for a given
classifier and readily generates a wide variety of such perturbations. We demonstrate that the perturbations crafted by our model, (i) achieve state-of-the-art
fooling rates, (ii) exhibit wide variety, and (iii) deliver excellent cross model generalizability.
ALL ARE WELCOME