CNI Seminar Series

Automated A/B testing with the Upper Confidence Bound

Prof. Koulik Khamaru, Assistant Professor, Rutgers University

#267

Abstract

Modern decision-making increasingly relies on adaptive experimentation, particularly in settings such as A/B testing, multi-armed bandits, and reinforcement learning. While these methods enable more efficient learning and allocation of resources, they fundamentally challenge traditional statistical inference. Classical i.i.d.-based tools often break down under adaptive data collection, resulting in biased estimators and misleading confidence intervals. This talk offers an overview of statistical inference in these adaptive environments. We highlight the pitfalls of naive inference through concrete examples and introduce the concept of stability—originally formulated by Lai and Wei (1982)—as a unifying principle for valid inference under adaptivity. We demonstrate how algorithms like the Upper Confidence Bound (UCB) achieve stability, enabling the application of classical inferential tools despite the lack of independence. Key illustrations include the empirical mean in a stochastic bandit and the contextual bandit problem, both supported by central limit theorems.


Bio
Prof. Koulik Khamaru, Assistant Professor, Rutgers University

Koulik Khamaru is an Assistant Professor of Statistics at Rutgers University. He earned his PhD from the University of California, Berkeley, working under the mentorship of Professor Martin J. Wainwright and Professor Michael I. Jordan. Prior to his doctoral studies, he completed his bachelor’s and master’s degrees in Statistics at the Indian Statistical Institute, Kolkata. His research centers on reinforcement learning (RL) algorithm design, with a recent focus on developing methods for hypothesis testing and inference with RL data. Previously, he worked on a broad range of problems at the interface of statistics, machine learning, and optimization. This includes contributions to the theory of the expectation–maximization (EM) algorithm, Gaussian mixture models, model misspecification, factor analysis, and non-convex optimization.