The influence of trial order on learning from reward vs. punishment in a probabilistic categorization task: experimental and computational analyses

Ahmed A Moustafa; Mark A Gluck; Mohammad M Herzallah; Catherine E Myers

doi:10.3389/fnbeh.2015.00153

The influence of trial order on learning from reward vs. punishment in a probabilistic categorization task: experimental and computational analyses

Front Behav Neurosci. 2015 Jul 24:9:153. doi: 10.3389/fnbeh.2015.00153. eCollection 2015.

Authors

Ahmed A Moustafa¹, Mark A Gluck², Mohammad M Herzallah³, Catherine E Myers⁴

Affiliations

¹ School of Social Sciences and Psychology and Marcs Institute for Brain and Behaviour, University of Western Sydney Sydney, NSW, Australia ; Department of Veterans Affairs, New Jersey Health Care System East Orange, NJ, USA.
² Center for Molecular and Behavioral Neuroscience, Rutgers University Newark, NJ, USA.
³ Center for Molecular and Behavioral Neuroscience, Rutgers University Newark, NJ, USA ; Al-Quds Cognitive Neuroscience Lab, Palestinian Neuroscience Initiative, Faculty of Medicine, Al-Quds University Jerusalem, Palestine.
⁴ Department of Veterans Affairs, New Jersey Health Care System East Orange, NJ, USA ; Department of Pharmacology, Physiology and Neuroscience, Rutgers-New Jersey Medical School Newark, NJ, USA ; Department of Psychology, Rutgers University-Newark Newark, NJ, USA.

Abstract

Previous research has shown that trial ordering affects cognitive performance, but this has not been tested using category-learning tasks that differentiate learning from reward and punishment. Here, we tested two groups of healthy young adults using a probabilistic category learning task of reward and punishment in which there are two types of trials (reward, punishment) and three possible outcomes: (1) positive feedback for correct responses in reward trials; (2) negative feedback for incorrect responses in punishment trials; and (3) no feedback for incorrect answers in reward trials and correct answers in punishment trials. Hence, trials without feedback are ambiguous, and may represent either successful avoidance of punishment or failure to obtain reward. In Experiment 1, the first group of subjects received an intermixed task in which reward and punishment trials were presented in the same block, as a standard baseline task. In Experiment 2, a second group completed the separated task, in which reward and punishment trials were presented in separate blocks. Additionally, in order to understand the mechanisms underlying performance in the experimental conditions, we fit individual data using a Q-learning model. Results from Experiment 1 show that subjects who completed the intermixed task paradoxically valued the no-feedback outcome as a reinforcer when it occurred on reinforcement-based trials, and as a punisher when it occurred on punishment-based trials. This is supported by patterns of empirical responding, where subjects showed more win-stay behavior following an explicit reward than following an omission of punishment, and more lose-shift behavior following an explicit punisher than following an omission of reward. In Experiment 2, results showed similar performance whether subjects received reward-based or punishment-based trials first. However, when the Q-learning model was applied to these data, there were differences between subjects in the reward-first and punishment-first conditions on the relative weighting of neutral feedback. Specifically, early training on reward-based trials led to omission of reward being treated as similar to punishment, but prior training on punishment-based trials led to omission of reward being treated more neutrally. This suggests that early training on one type of trials, specifically reward-based trials, can create a bias in how neutral feedback is processed, relative to those receiving early punishment-based training or training that mixes positive and negative outcomes.

Keywords: Q-learning computational model; category learning; intermixed trials; punishment; reward.

Grants and funding

R01 AA018737/AA/NIAAA NIH HHS/United States