Actor-critic reinforcement learning in the songbird

Curr Opin Neurobiol. 2020 Dec:65:1-9. doi: 10.1016/j.conb.2020.08.005. Epub 2020 Sep 6.

Abstract

It feels rewarding to ace your opponent on match point. Here, we propose common mechanisms underlie reward and performance learning. First, when a singing bird unexpectedly hits the right note, its dopamine (DA) neurons are activated as when a thirsty monkey receives an unexpected juice reward. Second, these DA signals reinforce vocal variations much as they reinforce stimulus-response associations. Third, limbic inputs to DA neurons signal the predicted quality of song syllables much like they signal the predicted reward value of a place or a stimulus during foraging. Finally, songbirds may solve difficult problems in reinforcement learning - such as credit assignment and catastrophic forgetting - with node perturbation and consolidation of reinforced vocal patterns in motor cortical circuits. Consolidation occurs downstream of a canonical 'actor-critic' circuit motif that learns to maximize performance quality in essentially the same way it learns to maximize reward: by computing and learning from prediction errors.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Dopaminergic Neurons
  • Motor Cortex*
  • Reinforcement, Psychology
  • Reward
  • Songbirds*
  • Vocalization, Animal