A reinforcement learning model with choice traces for a progressive ratio schedule

Keiko Ihara; Yu Shikano; Sae Kato; Sho Yagishita; Kenji F Tanaka; Norio Takata

doi:10.3389/fnbeh.2023.1302842

A reinforcement learning model with choice traces for a progressive ratio schedule

Front Behav Neurosci. 2024 Jan 10:17:1302842. doi: 10.3389/fnbeh.2023.1302842. eCollection 2023.

Authors

Keiko Ihara¹, Yu Shikano^{1

2}, Sae Kato¹, Sho Yagishita³, Kenji F Tanaka¹, Norio Takata¹

Affiliations

¹ Division of Brain Sciences, Institute for Advanced Medical Research, Keio University School of Medicine, Tokyo, Japan.
² Department of Biology, Stanford University, Stanford, CA, United States.
³ Center for Disease Biology and Integrative Medicine, Faculty of Medicine, The University of Tokyo, Tokyo, Japan.

Abstract

The progressive ratio (PR) lever-press task serves as a benchmark for assessing goal-oriented motivation. However, a well-recognized limitation of the PR task is that only a single data point, known as the breakpoint, is obtained from an entire session as a barometer of motivation. Because the breakpoint is defined as the final ratio of responses achieved in a PR session, variations in choice behavior during the PR task cannot be captured. We addressed this limitation by constructing four reinforcement learning models: a simple Q-learning model, an asymmetric model with two learning rates, a perseverance model with choice traces, and a perseverance model without learning. These models incorporated three behavioral choices: reinforced and non-reinforced lever presses and void magazine nosepokes, because we noticed that male mice performed frequent magazine nosepokes during PR tasks. The best model was the perseverance model, which predicted a gradual reduction in amplitudes of reward prediction errors (RPEs) upon void magazine nosepokes. We confirmed the prediction experimentally with fiber photometry of extracellular dopamine (DA) dynamics in the ventral striatum of male mice using a fluorescent protein (genetically encoded GPCR activation-based DA sensor: GRAB_DA2m). We verified application of the model by acute intraperitoneal injection of low-dose methamphetamine (METH) before a PR task, which increased the frequency of magazine nosepokes during the PR session without changing the breakpoint. The perseverance model captured behavioral modulation as a result of increased initial action values, which are customarily set to zero and disregarded in reinforcement learning analysis. Our findings suggest that the perseverance model reveals the effects of psychoactive drugs on choice behaviors during PR tasks.

Keywords: choice stickiness; dopamine; fiber photometry; methamphetamine; mouse; operant conditioning; reward prediction error; ventral striatum.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by MEXT/JSPS KAKENHI (Grant numbers 21K18198, 21H00212, and 22H03033 to NT) and AMED (Grant number JP22dm0207069 to KT and SY).