Reward Biased Maximum Likelihood Estimate (RBMLE) approach to Online Machine Learning

# 48






Abstract

Topic_abstract: “We revisit the Reward Biased Maximum Likelihood Estimate (RBMLE) algorithm that was proposed in (Kumar and Becker, 1982) to solve the problem of adaptive control (Richard Bellman, 1961, Feldbaum, 1960). It solved the "exploration vs. exploitation problem\” by employing a bias in favor of parameters with larger optimal rewards, thereby introducing the principle of optimism in the face of uncertainty. Modern attention is focused on the much finer notion of “learning regret” (Lai and Robbins, 1985). We show that RBMLE has a regret of O(logT) over a time horizon of T steps, similar to state-of-the-art algorithms. Simulation studies show that RBMLE outperforms other algorithms such as UCRL2 and Thompson Sampling.”


Rahul Singh, IISc Bangalore

Rahul Singh is an Assistant Professor at the department of ECE IISc. Earlier, he has been a postdoctoral researcher at the Ohio State University and LIDS, MIT. He received B.Tech. in EE from the Indian Institute of Technology Kanpur in 2009, M.S. in EE from the University of Notre Dame in 2011, and Ph.D. in Computer Engineering from the Department of Electrical and Computer Engineering, Texas A&M University, College Station, in 2015. He has also worked in the industry as a Data Scientist at Encored Inc., and in the Machine Learning Group at Intel, Santa Clara. His research interests include stochastic control, machine learning, Markov decision processes and networks.