Nonconvex Policy Search Using Variational Inequalities

Yusen Zhan; Haitham Bou Ammar; Matthew E Taylor

doi:10.1162/neco_a_01004

Nonconvex Policy Search Using Variational Inequalities

Neural Comput. 2017 Oct;29(10):2800-2824. doi: 10.1162/neco_a_01004. Epub 2017 Aug 4.

Authors

Yusen Zhan¹, Haitham Bou Ammar², Matthew E Taylor³

Affiliations

¹ School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163, U.S.A. yusen.zhan@wsu.edu.
² Department of Computer Science, American University of Beirut, 1107 2020, Lebanon hb71@aub.edu.lb.
³ School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163, U.S.A. taylorm@eecs.wsu.edu.

PMID: 28777726
DOI: 10.1162/neco_a_01004

Abstract

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies. These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.