Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available. On the other hand, First Order Bayesian Optimization (FOBO) methods exploit the available gradient information to arrive at better solutions faster. However, the existing FOBO methods do not utilize a crucial information that the gradient is zero at the optima. Further, the inherent sequential nature of the FOBO methods incur high computational cost limiting their wide applicability. To alleviate the aforementioned difficulties of FOBO methods, we propose a relaxed statistical model to leverage the gradient information that directly searches for points where the gradient vanishes. To accomplish this, we develop novel acquisition algorithms that search for global optima effectively. Unlike the existing FOBO methods, the proposed methods are parallelizable. Through extensive experimentation on standard test functions, we compare the performance of our methods over the existing methods. Furthermore, we explore an application of the proposed FOBO methods in the context of policy gradient reinforcement learning.

Prabuchandran K.J. is an Assistant Professor at IIT Dharwad. He completed Ph.D. from the Department of Computer Science and Automation, IISc in the area of Reinforcement Learning. Post his PhD, Prabuchandran worked as Research Scientist at IBM Research Labs, India for an year and half on change detection algorithms for multivariate compositional data. After that he pursued his postdoctoral research at IISc, Bangalore as an Amazon-IISc Postdoctoral scholar for a year and half on Multi-agent Reinforcement Learning and Stochastic Optimization algorithms. His research lies in the intersection of reinforcement learning, stochastic control & optimization, Machine Learning, Bayesian Optimization and stochastic approximation algorithms. His research interest also focuses on utilizing techniques from these fields in solving problems arising in applications like wireless sensor networks, traffic signal control and social networks

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning

Abstract

K J Prabuchandran, IIT Dharwad