Mean-Field Control for Restless Bandits and Weakly Coupled MDPs

Reinforcement learning suffers from the well-known curse of dimensionality: the size of the state-space explodes as the number of dimensions grows. A typical example of where this occurs is the case of resource allocation problems. In such a problem, an operator is faced with a population of entities whose state evolves over time. The evolution of the entities are coupled only through the actions of the controller: this is a « weakly coupled MDPs ». These problems are in general computationally hard for a finite population of entities but are interestingly easier when the population is infinite. In particular, there exists different LP-based relaxations (including the famous Whittle index) that generally provide near-optimal solutions. The goal of this talk is to introduce these policies, and to present recent results on when they become asymptotically optimal as the number of resources goes to infinity.

Nicolas Gast is a tenured research scientist at Inria (Grenoble, France) since 2014, and currently visiting MIT for the semester. He graduated from Ecole Normale Superieure (Paris, France) in 2007 and received a Ph.D. from the University of Grenoble in 2010. He was a research fellow at EPFL from 2010 to 2014. His research focuses on the development and the use of stochastic models and optimization methods for the design of control algorithms in large-scale systems.

Mean-Field Control for Restless Bandits and Weakly Coupled MDPs

Abstract

Dr. Nicolas Gast, INRIA