CNI Seminar Series

Reinforcement Learning in Non-Stationary Environments

Prof. Pranay Sharma, Assistant Professor, Centre for Machine Intelligence and Data Science (CMInDS), IIT Bombay

#283

Abstract

We consider the problem of non-stationary reinforcement learning (RL) in the infinite-horizon average-reward setting. We model it by a Markov Decision Process with time-varying rewards and transition probabilities. Existing non-stationary RL algorithms focus on model-based and model-free value-based methods. Policy-based methods, despite their flexibility in practice are not theoretically well understood in non-stationary RL. We propose and analyze the first model-free policy-based algorithm, Non-Stationary Natural Actor-Critic (NS-NAC), a policy gradient method with a restart-based exploration for change and a novel interpretation of learning rates as adapting factors. Further, we present a bandit-over-RL-based parameter-free algorithm, BORL-NS-NAC, that does not require prior knowledge of the variation budget.


Bio
Prof. Pranay Sharma, Assistant Professor, Centre for Machine Intelligence and Data Science (CMInDS), IIT Bombay

Pranay is an Assistant Professor at IIT Bombay in the Centre for Machine Intelligence and Data Science (C-MInDS). Till January 2025, he was a Research Scientist in the Department of Electrical and Computer Engineering at Carnegie Mellon University. He finished his PhD in Electrical Engineering and Computer Science at Syracuse University. Before that, he finished his B.Tech-M.Tech dual-degree in Electrical Engineering from IIT Kanpur. His research interests include federated and collaborative learning, stochastic optimization, reinforcement learning, and differential privacy.