Dynamic computation of hierarchical prediction errors during sequence learning
Cognitive, Systems and Behavioural Neurobiology
Final Report Abstract
We investigated whether and to what extend computational and neural mechanisms of reinforcement learning can also be employed in statistical sequence learning. We developed an nback Markov decision task and combined it with model-based fMRI analysis. Human subjects inside the fMRI scanner were instructed to predict the occurrence of a visual reward on either the left or the right side of the screen with the goal of maximizing their rewards. Reward appeared equally often on the left and right side of the screen (zero-order reward probabilities = 0.5) while the temporal dependences of reward were controlled by a set of conditional probabilities. We investigated two sets of conditions: (a) First-order conditions with 1-back conditional reward probability of 0.2 or 0.8 and (b) second-order conditions with 2-back conditional reward probability of 0.2 or 0.8. Crucially, the average rewards for the left and right options were equal but the reward received was conditioned on a specific temporal order of events, thus permitting us to disentangle the different orders of learning processes. We found that subjects were able to benefit from the nback conditional probabilities for maximizing reward. We modeled the choice behavior with different orders of reinforcement learning models. We found a co-existence of 1-back and 2-back Q values in the ventromedial prefrontal cortex (vmPFC), suggesting that this region dynamically represented the RL model of a matching order. The brain might estimate the higher-order sequential dependencies in the same manner as it estimates average rewards. We also analyzed the experimental data with other computational strategies, i.e., Bayesian sequence learning, win-stay-loss-switch heuristics and meta reinforcement learning. The Bayesian sequence learning analysis also confirmed the neural correlates of learning effect from different orders of reward probabilities in the vmPFC. The win-stayloss-switch heuristics did not outperform the random baseline in explaining subjects’ choice behavioral data. Meta reinforcement learning model showed task representations resembling optimal behavior for each matching task order. In summary, our study highlighted the dynamic recruitment of an n-back reinforcement learning mechanism for guiding decision and choice behavior.
Publications
-
Dynamic computation of hierarchical prediction errors during sequence learning. 14th Annual Meeting of Neuroeconomics. Berlin, Germany
Guo, Rong, Felix Blankenburg, Jan Gläscher und Klaus Obermayer
-
(2019). Dynamic n-back reinforcement learning representations in ventromedial prefrontal cortex. COSYNE2019 Abstract. Lisbon, Portugal
Guo, Rong, Jan Gläscher, Felix Blankenburg und Klaus Obermayer