Department of Computational and Data Sciences
Department Seminar
Speaker :Arun Balajee Vasudevan, postdoctoral researcher at Carnegie Mellon University
Title :”Multimodal Learning in 3D environments: Perception, and Simulation”
Date & Time : February 04, 2025 (Tuesday), 11:30 AM
Venue : # 102, CDS Seminar Hall
ABSTRACT
Autonomous robots have several potential applications such as virtual assistants, VR/AR, gaming, self-driving technologies, city planning and others. To achieve autonomy, a robot needs to see and hear the environment, before it can converse or navigate favorably to perform a human-desired task. Precisely, scene understanding begins with building 3D geometry, decoding semantics, understanding surround objects/humans, and planning and actions. My research talk addresses these fundamental challenges independently under two broad themes: understanding geometry and multimodal perception & data-driven simulation and navigation.
Under geometry and perception, I introduce the usage of several multimodalities such as the user’s gaze, visual sensors such as cameras or range sensors (e.g. Kinect), and speech/language instructions from human referrals for robot perception tasks. Further, I delve deep into the investigation of the audio sensing modality for the task using binaural sound microphones. Secondly, regarding the geometry, my talk addresses one of my ongoing works about the construction of digital twins of the real world with 4D reconstruction of dynamic scenes from ground visuals of a robot.
Under the theme of Data-driven simulation and robot navigation. Following perception, robots must navigate and take meaningful actions in the world. This involves broadly two aspects: wayfinding and motion planning. Earlier works address wayfinding based on directional instructions, overlooking human aspects. I briefly talk about a new paradigm that integrates principles from cognitive science with learning-based methods to tackle the challenge of language-based wayfinding for robots in real-world outdoor environments. The second aspect is motion planning for which I propose the learning of driver behavior models for MPC-based planners to build data-driven simulators.
Long term, I envision to bridge the above two themes to build multimodal digital twins simulators of the real-world. This potentially helps in training and testing of planners, VR/AR setups, gaming, and others. Lastly, I also cover my future research plan in the talk.
BIO: Arun Balajee Vasudevan is currently a postdoctoral researcher at Carnegie Mellon University under Prof. Deva Ramanan. His core research interest is in Computer Vision and Multimodal Learning. He has works in multimodal (vision, language and sounds) perception and navigation, 3D/4D reconstruction, Motion Planning and improving Foundational models. He published papers predominantly in vision and machine learning conferences/journals such as CVPR, ECCV, ICML, IJCV, TPAMI, and others. He defended his PhD under Prof. Luc Van Gool at ETH Zurich. He received his MSc in Computer Science from EPFL in 2016 and an undergraduate degree in Electrical Engineering from the Indian Institute of Technology Jodhpur in the year 2014. From March 2025, he would be joining Amazon as a Research Scientist.
Host Faculty: Dr. Anirban Chakraborty
ALL ARE WELCOME