Abstract In statistical learning theory, multi-armed bandits constitute a popular model for a set of alternatives competing for a fixed amount of resources. The classical multi-armed bandit problem deals with the notion of “maximising rewards” or equivalently “minimising regret” over a finite or infinite time horizon. However, in this talk, I shall discuss the notion of “optimal stopping” in the context of multi-armed bandits, where the central objective is to identify a certain feature of the multi-armed bandit as quickly as possible. The particular feature I shall focus on throughout the talk is that of all but one of the arms (known as anomalous arm) being probabilistically identical, with a goal of identifying the anomalous arm as quickly as possible while ensuring that the probability of error is below a maximum acceptable tolerance. In the literature, this problem is well-studied under the name of “odd arm identification”, for the case when each arm yields iid observations. In this talk, I shall present the first known results for the problem of odd arm identification when each arm yields Markov observations.This is ongoing work with Prof. Rajesh Sundaresan. A part of this work was presented at the 2019 IEEE International Symposium on information Theory (ISIT), a detailed manuscript of which is available at https://arxiv.org/abs/1904.11361.
Karthik P N Karthik is a PhD student in the Wireless Information Systems Lab, Department of ECE, working under the supervision of Prof. Rajesh Sundaresan. Prior to joining for PhD, he served as a project assistant in the Signal Processing for Communications Lab of the Dept of ECE, where he worked with Prof. Chandra R. Murthy. Karthik holds a Bachelor’s degree in Electronics and Communications from RV College of Engineering, Bangalore, where he graduated from in 2014.