Network Seminar Series

Robust and efficient frontier pipelines for complex knowledge intensive tasks in the era of LLMs

Dr. Venktesh Viswanathan, Postdoctoral Researcher, TU Delft

#228

Abstract

Complex Knowledge Intensive Tasks (CKIT) involve challenging problems in areas such as complex fact-checking, and complex Question Answering (QA) that require multistep reasoning as shown in the example above using knowledge from hybrid sources. These tasks usually require a complex pipeline involving query understanding which might entail decomposing the query as shown above, retrieving appropriate evidence from knowledge sources and a generative component that reasons over the given evidence and provides the output depending on the task. These tasks have direct impact on addressing real-world complex information needs in domains such as healthcare where MDT (Multi-disciplinary teams) require concise information on protocol to handle a patient by reasoning on his prior reports, matching symptoms with appropriate protocol to aid doctors in informed decision making. Making progress on complex Qa and complex Fact-Checking also aids research analysts and journalists obviating the need for time-consuming search. In my talk, I will first introduce the retrieval and reasoning gap even in the current era or Frontier Models/Large Language Models for CKIT based on detailed studies on existing benchmarks and new benchmarks collected by our group. Then I will introduce efficient exemplar selection methods with theoretical guarantees that can aid in inducing reasoning capabilities in LLMs. I will also introduce neighborhood aware retrieval approaches that aim to bridge the retrieval gap by solving the bounded recall problem in top-k contexts employed in RAG systems for obtaining predictions from the LLM. While presenting encouraging results, we will also discuss limitations and exciting future directions to advance robust frontier pipelines for CKIT.


Bio
Dr. Venktesh Viswanathan, Postdoctoral Researcher, TU Delft

Venktesh Viswanathan is a postdoctoral researcher in Web Information Systems group in department of EEMCS at TU Delft working with Dr. Avishek Anand. He completed his PhD at IIIT-Delhi, India and is also the recipient of the prestigious PMFDR (Prime Minister Fellowship For Doctoral Research) fellowship. His thesis was titled "Learning Content curation and enrichment" where he worked on building pipelines for tagging and organizing content in online learning platforms. He also worked on improving knowledge discovery in such platforms. He currently works at the intersection of NLP and IR on building efficient and robust frontier pipelines for knowledge intensive tasks like complex fact-checking and complex Question Answering for applications that have direct social impact. His works have been published at ECML-PKDD, IEEE TKDE, WSDM, SIGIR, ECIR, CIKM, AIED, EMNLP. He has also reviewed for EMNLP,NAACL,ECIR,SIGIR,CIKM,WSDM,SIGKDD,ACL.