Computational noise in reward-guided learning drives behavioral variability in volatile environments

Nat Neurosci. 2019 Dec;22(12):2066-2077. doi: 10.1038/s41593-019-0518-9. Epub 2019 Oct 28.

Abstract

When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Choice Behavior / physiology
  • Decision Making / physiology*
  • Female
  • Frontal Lobe / physiology
  • Humans
  • Learning / physiology*
  • Magnetic Resonance Imaging
  • Male
  • Models, Neurological
  • Neuroimaging
  • Pupil / physiology
  • Reinforcement, Psychology
  • Reward*
  • Young Adult