CDS-KIAC {Seminar} @ CDS: #102: 05th, January “The Three P’s of Modern Computer Vision: Pixels, Perception, and Physics”

When

5 Jan 26    
11:30 AM - 12:30 PM

Event Type

We welcome you to CDS-KIAC talk on 05th January 2026 (Monday). The details are as below:


Speaker : Dr. Anand Bhattad, Assistant Professor of Computer Science at Johns Hopkins University
Title : The Three P’s of Modern Computer Vision: Pixels, Perception, and Physics
Date and Time : January 05, 2026: 11:30 AM
Venue : #102, CDS Seminar Hall.


Abstract:
For decades, computer vision has been guided by what Jitendra Malik and colleagues called the “Three R’s”: Recognition (what is it?), Reconstruction (what is its 3D shape?), and Reorganization (what belongs together?). This framework drove extraordinary progress. However, the rise of generative models opens a new frontier: moving from static description to dynamic understanding. This talk presents a new paradigm I call the Three P’s: Pixels, Perception, and Physics.

I will first begin with a puzzle: state-of-the-art models generate photorealistic images, yet their outputs often contain impossible shadows, broken perspective, and objects that defy gravity, violating principles established by Galileo four centuries ago. Are these “world models” or merely sophisticated pixel parrots?

The answer is more interesting than yes or no. Through systematic probing, I will demonstrate that generative models have learned fundamental scene properties—depth, surface normals, albedo, and shading—without explicit supervision, recovering intrinsic representations that researchers have pursued since Barrow and Tenenbaum’s foundational 1978 work. These models perceive far more than they appear to. The deeper question is: where exactly does understanding end and imitation begin?

My research maps this boundary. I introduce Visual Jenga, a new scene understanding task that reveals implicit physical knowledge by testing stability when objects are removed. I demonstrate how representing scenes as 3D primitives enables geometric control that pixel-based editing cannot achieve, and how physics-based cues, such as shadows, can steer generation toward plausibility. I conclude with a vision for the field: The Three R’s taught machines to describe the visual world. The Three P’s will teach them to understand how it works.

Bio of Speaker:
Anand Bhattad is an Assistant Professor of Computer Science at Johns Hopkins University and a member of the Data Science and AI Institute. He leads the Pixels, Perception, and Physics (3P) Vision Group, which focuses on building perception-driven and physics-aware visual models from raw pixel data. His research spans computer vision, computer graphics, and computational photography. Prior to joining Hopkins, he was a Research Assistant Professor at the Toyota Technological Institute at Chicago and a visiting scholar at UC Berkeley. He earned his Ph.D. in Computer Science from the University of Illinois Urbana-Champaign, advised by David Forsyth.

Host Faculty: Prof. Venkatesh Babu , CDS


ALL ARE WELCOME