Abstract
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, prior studies generally rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate scaling inversely with the square-root of T after T iterations (up to polylogarithmic factors) under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling). We also go beyond uniform smoothness function settings, to derive similar results for (L_0, L_1)-smooth (non-convex) functions (where the Hessian can affinely scale with the gradient, e.g. such as the exponential function). This talk is based on joint work with Matthew Faw, Isidoros Tziotis, Litu Rout, Constantine Caramanis, Aryan Mokhtari and Rachel Ward. References: https://arxiv.org/abs/2202.05791 (COLT 2022) and https://arxiv.org/abs/2302.06570 (COLT 2023) Dr. Sanjay Shakkottai, University of Texas
Sanjay Shakkottai received his Ph.D. from the ECE Department at the University of Illinois at Urbana-Champaign in 2002. He is with The University of Texas at Austin, where he is a Professor in the Chandra Family Department of Electrical and Computer Engineering, and holds the Cockrell Family Chair in Engineering #15. He received the NSF CAREER award in 2004 and was elected as an IEEE Fellow in 2014. He was a co-recipient of the IEEE Communications Society William R. Bennett Prize in 2021. He is currently the Editor in Chief of IEEE/ACM Transactions on Networking. His research interests lie at the intersection of algorithms for resource allocation, statistical learning and networks, with applications to wireless communication networks and online platforms. Web: https://sites.google.com/view/sanjay-shakkottai/