Constrained Multi-Armed Bandits

The classical multi-armed bandit formulations, the best arm is defined via the optimization of a single attribute associated with each arm distribution, typically its mean reward. In practice however, there are often multiple attributes of interest. For example, in clinical trials, one might be interested in not just the efficacy of a certain treatment protocol, but also the severity of its side effects, price, etc. Similarly, a wireless node trying to determine which channel to transmit on, might want to balance several criteria, including throughput, delay, energy consumption, etc. In this work, we propose a multi-criterion multi-armed bandits formulation, where the best arm is defined as one that optimized one criterion, subject to constraints on others. We propose near-optimal algorithms for regret minimization in this setting, and also establish an interesting tradeoff between regret minimization and feasibility identification. This is joint work with Anmol Kagrecha and Krishna Jagannathan.

Jayakrishnan Nair is an Associate Professor of Electrical Engineering at IIT Bombay. His research draws on tools from queueing theory, applied probability, game theory, and control theory to address performance evaluation and design issues in networks, service systems, and smart power grids. He is a recipient of best paper awards at IFIP Performance 2010 & 2020, and ACM e-Energy 2020.

Abstract

Jayakrishnan U Nair, IIT Bombay