BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.4.0.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:203@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20260616T110000
DTEND;TZID=Asia/Kolkata:20260616T120000
DTSTAMP:20260529T155003Z
URL:https://cds.iisc.ac.in/events/ph-d-thesis-colloquium-102-cds-16-june-2
 026-continuous-spatial-and-distributional-control-for-faithful-image-synth
 esis/
SUMMARY:Ph.D: Thesis Colloquium: 102: CDS: 16\, June 2026 “Continuous\, S
 patial\, and Distributional Control for Faithful Image Synthesis”
DESCRIPTION:DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES\nPh.D. Thesis Col
 loquium\n\n\n\nSpeaker: Mr. Rishubh Parihar\nS.R. Number: 06-18-01-10-12-2
 1-1-19480\nTitle: “Continuous\, Spatial\, and Distributional Control for
  Faithful Image Synthesis”\nResearch Supervisor: Prof. Venkatesh Babu\nD
 ate &amp\; Time : June 16\, 2026 (Tuesday)\, 11:00 AM\nVenue : #102\, CDS 
 Seminar Hall\n\n\n\nABSTRACT\nDeep generative models have transformed imag
 e synthesis over the last decade\, advancing from early Generative Adversa
 rial Networks (GANs) to modern text-to-image models capable of generating 
 highly photorealistic images. The primary goal of these models is to learn
  the underlying data distribution from a given set of samples (e.g.\, imag
 es)\, enabling the synthesis of novel instances. This typically involves l
 earning a function that maps from a simple tractable distribution (e.g.\, 
 Gaussian) to the data distribution\, parameterized by a neural network. Ho
 wever\, while these models can generate highly diverse image variations\, 
 they offer little to no direct control over the synthesized content. This 
 lack of precision restricts a user's ability to accurately convey their in
 tent about scene layout\, semantic attributes\, and object identities duri
 ng image synthesis\, limiting these models' utility as practical creative 
 tools. To address these challenges\, this thesis proposes a comprehensive 
 suite of frameworks that introduce precise\, intuitive control mechanisms 
 for image synthesis. Specifically\, we explore three crucial dimensions of
  control for image generation:\n\n 	Continuous Control: Smoothly modulatin
 g the intensity of semantic attributes\, such as precisely varying a facia
 l expression by directly manipulating the model's latent representations.\
 n 	Spatial Control: Governing scene composition by realistically inserting
  new elements into existing images or specifying precise geometric propert
 ies of objects\, such as 3D orientation and scale\, during image generatio
 n.\n 	Test-Time Distribution Control: Steering the sample distribution of 
 pretrained generative models to achieve target criteria\, such as attribut
 e balancing for debiased face generation.\n\nContinuous Semantic Control: 
 Visual data inherently contains numerous semantic attributes with continuo
 us factors of variation\, such as a person's age or an object's size. Capt
 uring these continuous changes is essential for fine-grained generative co
 ntrol\, yet standard models often restrict synthesis to discrete or binary
  variations. In our work\, FLAME\, we propose an efficient method to disco
 ver disentangled edit directions in the latent space of pretrained StyleGA
 N models\, enabling continuous control over facial attributes during the s
 ynthesis process in a training-free manner. To extend this capability to t
 he unconstrained setting of in-the-wild\, text-conditioned image generatio
 n\, we next develop PreciseControl\, a method to personalize diffusion mod
 els with fine-grained facial attribute control. By conditioning the diffus
 ion model on the disentangled latent space of StyleGAN\, this approach ach
 ieves smooth attribute editing while preserving the compositionality of te
 xt-guided generation. Finally\, to move beyond specific domains like faces
 \, we introduce KontinuousKontext\, extending continuous control capabilit
 ies to foundational instruction-driven image editing models. This framewor
 k allows users to smoothly adjust the intensity of diverse editing tasks s
 uch as stylization\, object shape\, and scene lighting through an intuitiv
 e\, slider-based interface.\n\nSpatial Control: We investigate the control
  of spatial scene elements through two important tasks - object insertion 
 within existing scenes and the grounding of 3D object properties such as l
 ocation\, orientation\, and scale during generation. For object insertion\
 , we propose Text2Place\, a test-time training approach that leverages the
  generative priors of text-to-image diffusion models to predict 2D afforda
 nces for realistic human placement. We extend this capability to 3D-aware 
 object insertion in MonoPlace3D and Depth-Aware Editing\, ensuring that in
 serted elements naturally blend into the 3D scene with accurate perspectiv
 e\, scale\, and harmonious occlusions. For spatial grounding in text-to-im
 age generation\, we first introduce CompassControl\, a method to precisely
  control the 3D orientation of text-described scene objects. Moving toward
  full scene synthesis\, we develop SeeThrough3D\, where we propose a novel
  primitive based scene representation that models objects as translucent 3
 D boxes to condition the generation process\, achieving robust 3D layout g
 rounding with accurate occlusion handling.\n\nTest-Time Distribution Contr
 ol: Beyond individual image-level control\, an important but largely unexp
 lored direction is steering the distribution of generated samples from a p
 retrained generative model. To achieve this without the prohibitive cost o
 f post-training\, we develop training-free guidance mechanisms that steer 
 the empirical distribution of a sampled batch toward a user-specified targ
 et. In BalancingAct\, we introduce attribute distribution guidance within 
 the bottleneck $h$-space of diffusion models\, enabling the generation of 
 image batches that adhere to a user-provided reference attribute distribut
 ion over subgroups\, such as for demographic balancing. We then address th
 e problem of diversity collapse in pretrained flow models in Do Not Settle
  at the Mode! The key idea is to maximize the pairwise distance between in
 ternal representations within a batch while ensuring samples remain anchor
 ed to the learned feature manifold via feature guidance during sampling. T
 his divergence mechanism significantly enhances the diversity of the base 
 generative model without compromising visual quality.\n\n\n\nALL ARE WELCO
 ME
CATEGORIES:Events,Ph.D. Thesis Colloquium
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20250616T110000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR