Online Sampling Algorithms for Identifying the Largest Community

# 104






Abstract

We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as boxes, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread across multiple boxes. The learning agent can, at any time, sample (with replacement) a random individual from any chosen box; when this is done, the agent learns the community the sampled individual belongs to, and also whether or not this individual has been sampled before. The goal of the agent is to minimize the probability of mis-identifying the largest community in a fixed budget setting, by optimizing both the sampling strategy as well as the decision rule. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm. In several cases of interest, the exponential decay rates of the probability of error under our algorithms are shown to be optimal up to constant factors.

Nikhil Karamchandani, IIT Bombay

Nikhil Karamchandani (Senior Member, IEEE) received the M.S. degree from the Department of Electrical and Computer Engineering from the University of California at San Diego in 2007 and the Ph.D. degree from the Department of Electrical and Computer Engineering, University of California at San Diego, in 2011. From 2011 to 2014, he was a postdoctoral scholar with the University of California at Los Angeles and the Information Theory and Applications (ITA) Center, University of California at San Diego. He is currently an Associate Professor with the Department of Electrical Engineering, IIT Bombay. His research interests include networks, information theory, and online learning.