Test-retest reliability of reinforcement learning parameters

Jessica V Schaaf; Laura Weidinger; Lucas Molleman; Wouter van den Bos

doi:10.3758/s13428-023-02203-4

Test-retest reliability of reinforcement learning parameters

Behav Res Methods. 2023 Sep 8. doi: 10.3758/s13428-023-02203-4. Online ahead of print.

Authors

Jessica V Schaaf^{1

2

3}, Laura Weidinger^{4

5}, Lucas Molleman^{6

5}, Wouter van den Bos^{6

5}

Affiliations

¹ Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands. jessica.schaaf@radboudumc.nl.
² Cognitive Neuroscience Department, Radboud University Medical Centre, Nijmegen, the Netherlands. jessica.schaaf@radboudumc.nl.
³ Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands. jessica.schaaf@radboudumc.nl.
⁴ DeepMind, London, United Kingdom.
⁵ Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany.
⁶ Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands.

PMID: 37684495
DOI: 10.3758/s13428-023-02203-4

Abstract

It has recently been suggested that parameter estimates of computational models can be used to understand individual differences at the process level. One area of research in which this approach, called computational phenotyping, has taken hold is computational psychiatry. One requirement for successful computational phenotyping is that behavior and parameters are stable over time. Surprisingly, the test-retest reliability of behavior and model parameters remains unknown for most experimental tasks and models. The present study seeks to close this gap by investigating the test-retest reliability of canonical reinforcement learning models in the context of two often-used learning paradigms: a two-armed bandit and a reversal learning task. We tested independent cohorts for the two tasks (N = 69 and N = 47) via an online testing platform with a between-test interval of five weeks. Whereas reliability was high for personality and cognitive measures (with ICCs ranging from .67 to .93), it was generally poor for the parameter estimates of the reinforcement learning models (with ICCs ranging from .02 to .52 for the bandit task and from .01 to .71 for the reversal learning task). Given that simulations indicated that our procedures could detect high test-retest reliability, this suggests that a significant proportion of the variability must be ascribed to the participants themselves. In support of that hypothesis, we show that mood (stress and happiness) can partly explain within-participant variability. Taken together, these results are critical for current practices in computational phenotyping and suggest that individual variability should be taken into account in the future development of the field.

Keywords: Computational modeling; Computational phenotyping; Computational psychiatry; Reinforcement learning; Test–retest reliability.

Abstract

Grants and funding