{Seminar} @ CDS: #102: 27th May: “High-coverage information extraction.”

When

27 May 24    
11:00 AM - 12:00 PM

Event Type

Department of Computational and Data Sciences
Department Seminar


Speaker : Sneha Singhania, PhD student at the Max Planck Institute for Informatics and Saarland University in Germany
Title : “High-coverage information extraction.”
Date & Time : May 27, 2024, 11:00 AM
Venue : # 102, CDS Seminar Hall


ABSTRACT
Structured knowledge, in the form of entities and relations, is a powerful asset for search, recommendations, and data integration and is extensively used by different stakeholders. However, converting noisy internet content into crisp knowledge structures requires heavy-duty processing of vast amounts of data. Using language models (LMs) for information extraction (IE) is mature but still struggles to achieve both high precision and high recall, limiting their reliable usage. In my talk, I will present three lines of work for high-coverage IE from various knowledge sources. Firstly, I will detail how one can identify and filter content-rich web documents, laying out our approach to rank documents to automatically build knowledge bases (KBs). Secondly, I will discuss the emerging role of LMs as KBs, using various probing techniques. Finally, I will introduce the L3X framework, which extracts a long list of entities from long documents using retrieval-augmented LMs. Together, these techniques help us better handle the unknowns and construct complete structured knowledge.

BIOGRAPHY
Sneha Singhania is a PhD student at the Max Planck Institute for Informatics and Saarland University in Germany, advised by Gerhard Weikum and Simon Razniewski. Her research aims to close the knowledge gap between data sources and models to generate reliable output. In the past, she graduated from IIIT-Bangalore with a dual degree in Computer Science, worked as a researcher at Accenture Labs, and interned at Apple Research in Cupertino.

Host Faculty: Dr. Danish Pruthi


ALL ARE WELCOME