Reward Biased Maximum Likelihood Estimate (RBMLE) approach to Online Machine Learning
# 48
Abstract
Topic_abstract: “We revisit the Reward Biased Maximum Likelihood Estimate (RBMLE) algorithm that was proposed in (Kumar and Becker, 1982) to solve the problem of adaptive control (Richard Bellman, 1961, Feldbaum, 1960). It solved the "exploration vs. exploitation problem\” by employing a bias in favor of parameters with larger optimal rewards, thereby introducing the principle of optimism in the face of uncertainty. Modern attention is focused on the much finer notion of “learning regret” (Lai and Robbins, 1985). We show that RBMLE has a regret of O(logT) over a time horizon of T steps, similar to state-of-the-art algorithms. Simulation studies show that RBMLE outperforms other algorithms such as UCRL2 and Thompson Sampling.”