Emergence of belief-like representations through reinforcement learning

Jay A Hennig; Sandra A Romero Pinto; Takahiro Yamaguchi; Scott W Linderman; Naoshige Uchida; Samuel J Gershman

doi:10.1371/journal.pcbi.1011067

Emergence of belief-like representations through reinforcement learning

PLoS Comput Biol. 2023 Sep 11;19(9):e1011067. doi: 10.1371/journal.pcbi.1011067. eCollection 2023 Sep.

Authors

Jay A Hennig^{1

2}, Sandra A Romero Pinto^{2

3

4}, Takahiro Yamaguchi^{3

5}, Scott W Linderman^{6

7}, Naoshige Uchida^{2

3}, Samuel J Gershman^{1

2}

Affiliations

¹ Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America.
² Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America.
³ Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America.
⁴ Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA.
⁵ Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America.
⁶ Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America.
⁷ Department of Statistics, Stanford University, Stanford, California, United States of America.

Abstract

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.

Copyright: © 2023 Hennig et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Bayes Theorem
Learning*
Neural Networks, Computer
Reinforcement, Psychology*
Reward

Grants and funding

U19 NS113201/NS/NINDS NIH HHS/United States