Last-position elimination-based learning automata

Junqi Zhang; Cheng Wang; MengChu Zhou

doi:10.1109/TCYB.2014.2309478

Last-position elimination-based learning automata

IEEE Trans Cybern. 2014 Dec;44(12):2484-92. doi: 10.1109/TCYB.2014.2309478. Epub 2014 Apr 2.

Authors

Junqi Zhang, Cheng Wang, MengChu Zhou

PMID: 24710837
DOI: 10.1109/TCYB.2014.2309478

Abstract

An update scheme of the state probability vector of actions is critical for learning automata (LA). The most popular is the pursuit scheme that pursues the estimated optimal action and penalizes others. This paper proposes a reverse philosophy that leads to last-position elimination-based learning automata (LELA). The action graded last in terms of the estimated performance is penalized by decreasing its state probability and is eliminated when its state probability becomes zero. All active actions, that is, actions with nonzero state probability, equally share the penalized state probability from the last-position action at each iteration. The proposed LELA is characterized by the relaxed convergence condition for the optimal action, the accelerated step size of the state probability update scheme for the estimated optimal action, and the enriched sampling for the estimated nonoptimal actions. The proof of the ϵ-optimal property for the proposed algorithm is presented. Last-position elimination is a widespread philosophy in the real world and has proved to be also helpful for the update scheme of the learning automaton via the simulations of well-known benchmark environments. In the simulations, two versions of the LELA, using different selection strategies of the last action, are compared with the classical pursuit algorithms Discretized Pursuit Reward-Inaction (DP(RI)) and Discretized Generalized Pursuit Algorithm (DGPA). Simulation results show that the proposed schemes achieve significantly faster convergence and higher accuracy than the classical ones. Specifically, the proposed schemes reduce the interval to find the best parameter for a specific environment in the classical pursuit algorithms. Thus, they can have their parameter tuning easier to perform and can save much more time when applied to a practical case. Furthermore, the convergence curves and the corresponding variance coefficient curves of the contenders are illustrated to characterize their essential differences and verify the analysis results of the proposed algorithms.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Artificial Intelligence*
Decision Support Techniques*
Game Theory*
Pattern Recognition, Automated / methods*
Reward*