Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun.

Abstract

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adaptation, Psychological*
  • Algorithms
  • Choice Behavior / physiology
  • Computational Biology
  • Decision Making / physiology
  • Electroencephalography / statistics & numerical data
  • Exploratory Behavior* / physiology
  • Humans
  • Learning / physiology
  • Models, Neurological
  • Models, Psychological*
  • Reinforcement, Psychology*
  • Reward

Grants and funding

This research was supported by Swiss National Science Foundation No. CRSII2 147636 (Sinergia, MHH and WG) and No. 200020 184615 (WG), and by the European Union Horizon 2020 Framework Program under grant agreement No. 785907 (Human Brain Project, SGA2, MHH and WG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.