In modern data-driven intelligent systems, data are cheap, but verified truth is scarce. Consider validating digital biomarkers against costly medical imaging, auditing large language models for bias with limited human annotation, or detecting faults in industrial systems where ground-truth inspection is expensive. Across these domains, we face the same challenge: how can we make statistically reliable statements about whether two populations differ when revealing group membership is costly? This talk presents a framework for two-group inference under verification constraints — a modern rethinking of two-sample testing that issues reliable statistical certificates even when only a small, adaptively chosen subset of labels is observed. The approach combines ideas from areas such as active learning, sequential analysis, and classical hypothesis testing, leading to adaptive procedures that improve efficiency without compromising validity. If time permits, I will also touch upon recent work on cohort discovery, where these ideas are used to actively uncover subpopulations with significant effects.
Gautam Dasarathy is an Associate Professor in the School of Electrical, Computer and Energy Engineering at Arizona State University at Tempe, Arizona, USA. He is also an Amazon Scholar in Amazon Last Mile Sciences. His research focuses on statistical machine learning, information processing, and networked systems. He received his Ph.D. from the University of Wisconsin-Madison and held postdoctoral positions at Carnegie Mellon University and Rice University. Dr. Dasarathy received the NSF CAREER Award in 2021 and the Distinguished Alumnus Award from his alma mater -- VIT University (India) in 2022.