Meta-reinforcement learning via orbitofrontal cortex

Ryoma Hattori; Nathan G Hedrick; Anant Jain; Shuqi Chen; Hanjia You; Mariko Hattori; Jun-Hyeok Choi; Byung Kook Lim; Ryohei Yasuda; Takaki Komiyama

doi:10.1038/s41593-023-01485-3

Meta-reinforcement learning via orbitofrontal cortex

Nat Neurosci. 2023 Dec;26(12):2182-2191. doi: 10.1038/s41593-023-01485-3. Epub 2023 Nov 13.

Authors

Ryoma Hattori^{1

2

3

4

5}, Nathan G Hedrick^{6

7

8

9}, Anant Jain¹⁰, Shuqi Chen^{6

7

8

9}, Hanjia You^{6

7

8

9}, Mariko Hattori^{6

7

8

9}, Jun-Hyeok Choi⁶, Byung Kook Lim⁶, Ryohei Yasuda¹⁰, Takaki Komiyama^{11

12

13

14}

Affiliations

¹ Department of Neurobiology, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
² Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
³ Department of Neurosciences, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
⁴ Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
⁵ Department of Neuroscience, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, University of Florida, Jupiter, FL, USA. rhattori0204@gmail.com.
⁶ Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
⁷ Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
⁸ Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
⁹ Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
¹⁰ Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA.
¹¹ Department of Neurobiology, University of California San Diego, La Jolla, CA, USA. tkomiyama@ucsd.edu.
¹² Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA. tkomiyama@ucsd.edu.
¹³ Department of Neurosciences, University of California San Diego, La Jolla, CA, USA. tkomiyama@ucsd.edu.
¹⁴ Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA. tkomiyama@ucsd.edu.

Abstract

The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca²⁺/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making.

MeSH terms

Animals
Mice
Prefrontal Cortex / physiology
Reinforcement, Psychology*
Reversal Learning / physiology
Reward*

Abstract

MeSH terms

Grants and funding