Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events

Kazuhiro Sakamoto; Hinata Yamada; Norihiko Kawaguchi; Yoshito Furusawa; Naohiro Saito; Hajime Mushiake

doi:10.3389/fncom.2022.784604

Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events

Front Comput Neurosci. 2022 Jun 2:16:784604. doi: 10.3389/fncom.2022.784604. eCollection 2022.

Authors

Kazuhiro Sakamoto^{1

2}, Hinata Yamada¹, Norihiko Kawaguchi², Yoshito Furusawa², Naohiro Saito², Hajime Mushiake²

Affiliations

¹ Department of Neuroscience, Faculty of Medicine, Tohoku Medical and Pharmaceutical University, Sendai, Japan.
² Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Abstract

Learning is a crucial basis for biological systems to adapt to environments. Environments include various states or episodes, and episode-dependent learning is essential in adaptation to such complex situations. Here, we developed a model for learning a two-target search task used in primate physiological experiments. In the task, the agent is required to gaze one of the four presented light spots. Two neighboring spots are served as the correct target alternately, and the correct target pair is switched after a certain number of consecutive successes. In order for the agent to obtain rewards with a high probability, it is necessary to make decisions based on the actions and results of the previous two trials. Our previous work achieved this by using a dynamic state space. However, to learn a task that includes events such as fixation to the initial central spot, the model framework should be extended. For this purpose, here we propose a "history-in-episode architecture." Specifically, we divide states into episodes and histories, and actions are selected based on the histories within each episode. When we compared the proposed model including the dynamic state space with the conventional SARSA method in the two-target search task, the former performed close to the theoretical optimum, while the latter never achieved target-pair switch because it had to re-learn each correct target each time. The reinforcement learning model including the proposed history-in-episode architecture and dynamic state scape enables episode-dependent learning and provides a basis for highly adaptable learning systems to complex environments.

Keywords: dynamic state space; episode-dependent learning; history-in-episode architecture; reinforcement learning; target search task.