BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:94@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20241223T110000
DTEND;TZID=Asia/Kolkata:20241223T120000
DTSTAMP:20241220T063435Z
URL:https://cds.iisc.ac.in/events/seminar-cds-102-december-23-1100-counter
 factual-world-modeling-a-framework-for-constructing-vision-foundation-mode
 ls/
SUMMARY:{Seminar} @ CDS: #102\, December 23\, 11:00: "Counterfactual World 
 Modeling – A framework for constructing vision foundation models."
DESCRIPTION:Department of Computational and Data Sciences\nDepartment Semin
 ar\n\n\n\nSpeaker :Mr. Rahul\, PhD student at Stanford's NeuroAILab\,\nTit
 le :"Counterfactual World Modeling – A framework for constructing vision
  foundation models."\nDate &amp\; Time : December 23\, 2024\, 11:00 AM\nVe
 nue : # 102\, CDS Seminar Hall\n\n\n\nABSTRACT\nFoundation models of natur
 al language have shown how large pre-trained neural networks can provide s
 olutions to a wide range of tasks. However\, in machine vision\, most lead
 ing approaches employ different architectures for different tasks\, traine
 d on costly task-specific labeled datasets. In this talk\, I will introduc
 e Counterfactual World Modeling (CWM)\, a framework for constructing a vis
 ual foundation model: a unified\, unsupervised network that can be prompte
 d to perform a wide variety of visual computations. CWM has two key compon
 ents that resolve the core issues that have hindered the application of th
 e foundation model concept to vision. The first is structured masking\, a 
 generalization of masked prediction methods that encourages a prediction m
 odel to capture the low-dimensional structure in visual data. Specifically
 \, we can sample a patch-level prompt\, to meaningfully control scene dyna
 mics. This in turn enables CWM’s second main idea – the observation th
 at many apparently distinct visual representations can be computed\, in a 
 zero-shot manner\, by comparing the prediction model’s output on real in
 puts versus slightly modified ("counterfactual") inputs. This talk will de
 scribe how CWM enables the extraction of low and mid-level vision structur
 es such as optical flow\, keypoints\, and object segments under a unified 
 architecture. Further\, I’ll demonstrate that patch-level prompting also
  enables sophisticated image editing capabilities which has previously bee
 n challenging to do even with task-specific models. Finally\, I will also 
 discuss how the CWM framework can be bootstrapped to extract increasingly 
 powerful vision structures – paving the way for real-world robotics appl
 ications\, where robust task-general perception still remains a bottleneck
 .\n\nBIO: Rahul is a fourth-year CS PhD student at Stanford's NeuroAILab\,
  where he is advised by Prof. Dan Yamins. His research explores the mechan
 isms that enable the interpretation of physical dynamics from visual image
 ry\, both in humans and machines. He holds a Master's in Computer Vision f
 rom CMU\, where he worked on 3D shape modeling. Prior to that\, he was a r
 esearch assistant at the Vision and AI Lab at the Indian Institute of Scie
 nce\, focusing on human and object pose estimation as well as domain adapt
 ation.\n\nHost Faculty: Prof. Venkatesh Babu\n\n\n\nALL ARE WELCOME
CATEGORIES:Events,Talks
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20231224T110000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR