{Seminar} @ SERC : 4th Floor Auditorium: 19th September: ” Uncovering the complexity of DNA regulatory sequence”

When

19 Sep 24    
4:00 PM - 5:00 PM

Event Type

Department of Computational and Data Sciences
Department Seminar


SPEAKER : Prof. Rahul Siddharthan, The Institute of Mathematical Sciences, Chennai.
TITLE : “Uncovering the complexity of DNA regulatory sequence”
Date & Time : September 19, 2024, 04:00 PM.
Venue : # 421, 4th Floor Auditorium


ABSTRACT
Genes are regulated by proteins called transcription factors (TFs) which bind DNA and recruit or inhibit the transcriptional machinery. Individual TFs recognize and bind to short patterns or “motifs” in DNA, typically 8-15 basepairs long, and identifying these motifs, both de novo and from databases of known motifs, is a longstanding problem. However, it appears that there are sequence signatures in DNA extending well beyond these “core motifs”. We present two approaches to studying this.
The first, THiCweed (NAR 2018), an algorithm for analysing ChIP-seq data, treats it as a clustering problem. We find that clustering ChIP-seq sequence based on sequence similarity uncovers known motifs, but also many variants of known motifs, extraneous motifs, and sequence signatures extending well beyond the core motif. An extension of this approach to general tabular data, MMM (“Madras Mixture Model”), was published in 2024.

A second approach, SequeCNNs (in preparation), uses a convolutional neural network to distinguish binding from non-binding sequence. It shows high accuracy even when the core motif is removed from the sequence, suggesting that there are other strong sequence signatures for TF-binding DNA. It also provides a framework for analysing and visualising the significance of mutations, individually and in combination, over hundreds of basepairs surrounding the core motif. Extension of this work to identifying other functional sequence, such as TAD boundaries, is in progress, and a future goal is to generate synthetic sequence that can perform such functions in vivo.

References:
THiCweed: A Agrawal, S Sambare, L Narlikar, R Siddharthan, NAR 2018
MMM: C Kumari and R SIddharthan, PLOS One 2024
SequeCNNs: C Mohan Kumar, L Narlikar, R Siddharthan, in preparation

BIOGRAPHY
Dr. Rahul Siddharthan is a Professor at The Institute of Mathematical Sciences, Chennai. He obtained his PhD in physics from the Indian Institute of Science. His interest in biology stems from his second postdoctoral stint at the Rockefeller University, New York. He joined the physics group at IMSc in 2004, and in 2013 he started a new group, and new PhD programme, in computational biology. He is broadly interested in bioinformatic algorithms, regulatory genomics, chromatin biology, evolutionary biology, and, recently, machine learning and health/disease/clinical practice. More details about his ongoing work are available at https://www.imsc.res.in/~rsidd/research.html

Host Faculty: Dr. Chirag Jain


ALL ARE WELCOME