Modern large-scale learning paradigms rely on leveraging data from multiple agents to improve performance. However, to reap the benefits of more data, one must account for two key challenges: (i) the possibility of outliers that can be generated adversarially, and (ii) the communication bottleneck created by limited bandwidth channels. While the themes of adversarial robustness and communication efficiency have been well-explored in the context of supervised learning, not much is known in this regard when it comes to sequential decision-making under uncertainty. In this talk, I will discuss how to bridge the above gap by considering the linear stochastic bandit formalism. First, we will consider a linear bandit setup involving M agents who can collaborate via a central server to minimize regret. A fraction of these agents is adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the learning process due to adversaries. We provide a fundamental understanding of this tension by designing new robust algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. When the fraction of corrupted agents is small, our algorithms enjoy a clear benefit of collaboration despite adversaries. Using an information-theoretic argument, we also prove a matching lower bound, providing the first set of tight, near-optimal regret bounds for collaborative linear bandits with adversaries. In the second part of the talk, we will introduce a linear stochastic bandit formulation over a rate-limited channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To that end, we develop a novel adaptive encoding and decision-making strategy. When the unknown model is d-dimensional, we prove that a channel capacity of O(d) bits suffices to achieve order-optimal regret. We also establish that for the simpler unstructured multi-armed bandit problem, 1-bit channel capacity is sufficient to achieve optimal regret bounds.
Aritra Mitra is currently an Assistant Professor at the Department of Electrical and Computer Engineering at North Carolina State University. His research interests include control theory, optimization, statistical signal processing, machine learning, and distributed algorithms. Previously, he was a Postdoctoral Researcher at the University of Pennsylvania from 2020 to 2022. Prior to that, he received his Ph.D. degree from Purdue University in 2020, his M.Tech degree from the Indian Institute of Technology Kanpur in 2015, and his B.E. degree from Jadavpur University in 2013, all in Electrical Engineering. He was a recipient of the University Gold Medal at Jadavpur University and the Academic Excellence Award at IIT Kanpur.