Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems

T Parisini; R Zoppoli

doi:10.1109/72.728390

Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems

IEEE Trans Neural Netw. 1998;9(6):1388-408. doi: 10.1109/72.728390.

Authors

T Parisini¹, R Zoppoli

Affiliation

¹ Department of Electrical, Electronic, and Computer Engineering, DEEI-University of Trieste, 34175 Trieste, Italy.

PMID: 18255818
DOI: 10.1109/72.728390

Abstract

A feedback control law is proposed that drives the controlled vector vt of a discrete-time dynamic system (in general, nonlinear) to track a reference vt* over an infinite time horizon, while minimizing a given cost function (in general, nonquadratic). The behavior of vt* over time is completely unpredictable. Random noises act on the dynamic system and the state observation channel, which may be nonlinear, too. The random noises and the initial state are, in general, non-Gaussian; it is assumed that all such random vectors are mutually independent, and that their probability density functions are known. As is well known, so general a non-LQG (linear quadratic Gaussian) optimal control problem is very difficult to solve. The proposed solution is based on three main approximating assumptions: 1) the optimal control problem is stated in a receding-horizon framework where vt* is assumed to remain constant within a shifting-time window; 2) the control law is assigned a given structure (the one of a multilayer feedforward neural network) in which a finite number of parameters have to be determined in order to minimize the cost function (this makes it possible to approximate the original functional optimization problem by a nonlinear programming one); and 3) the control law is given a "limited memory," which prevents the amount of data to be stored from increasing over time. The errors resulting from the second and third assumptions are discussed. Due to the very general assumptions under which the approximate optimal control law is derived, we are not able to report stability results. However, simulation results show that the proposed method may constitute an effective tool for solving, to a sufficient degree of accuracy, a wide class of control problems traditionally regarded as difficult ones (an example of freeway traffic optimal control is given that may be of practical importance).