Risk-Sensitive Bandits: Arm Mixtures Optimality and Regret-Efficient Algorithms

# 212






Abstract

In canonical stochastic multi armed bandit problems, the focus is often utilitarian, with bandit algorithms aiming to maximize rewards while disregarding the associated risks. This talk presents a new framework for risk aware sequential decision making that unifies various risk measures under a common approach. Specifically, the talk explores a broad class of risk metrics called distortion riskmetrics. Unlike previous studies that assume a single best arm maximizing reward, we make the novel observation that for many riskmetrics, the optimal strategy involves selecting a mixture of arms rather than a single one. This finding exposes significant limitations in current bandit algorithms, which are not designed to handle such mixtures. To bridge this gap, we introduce new algorithms capable of tracking optimal mixtures when the risk measure favors them. The talk will also address the technical difficulties in establishing information theoretic lower bounds for regret under the mixtures optimality setting. We will close by discussing open questions related to risk sensitive decision making and future research directions.

Dr. Arpan Mukherjee, Postdoctoral researcher, Imperial College London

Arpan Mukherjee is an incoming postdoctoral researcher at Imperial College London. He obtained his Ph.D. degree at the Department of Electrical, Computer and Systems Engineering (ECSE) at Rensselaer Polytechnic Institute (RPI), where he was a recipient of the B. J. Baliga Fellowship. Prior to joining RPI, Arpan obtained his MTech from the Department of Electronics and Electrical Communication Engineering at IIT Kharagpur in 2019. He is broadly interested in problems at the intersection of signal processing, statistics, and machine learning.