By Topic

Learning from positive and negative rewards in a spiking neural network model of basal ganglia

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jitsev, J. ; Dept. of Cortical Networks & Cognitive Functions, Max-Planck-Inst. for Neurological Res., Cologne, Germany ; Morrison, A. ; Tittgemeyer, M.

Despite the vast amount of experimental findings on the role of the basal ganglia in reinforcement learning, there is still general lack of network models that use spiking neurons and plausible plasticity mechanisms to demonstrate network-level reward-based learning. In this work we extend a recent spiking actor-critic network model of the basal ganglia, aiming to create a minimal realistic model of learning from both positive and negative rewards. We hypothesize and implement in the model segregation of not only the dorsal striatum, but also of the ventral striatum into populations of medium spiny neurons (MSNs) that carry either D1 or D2 dopamine (DA) receptor type. This segregation allows explicit representation of both positive and negative expected reward within respective population. In line with recent experiments, we further assume that D1 and D2 MSN populations have distinct, opposing DA-modulated bidirectional synaptic plasticity. We implement the spiking network model in the simulator NEST and conduct experiments involving application of delayed rewards in a grid world setting, where a moving agent has to reach a goal state while maximizing the total obtained reward. We demonstrate that the network can learn not only to approach the positive rewards, but also to consequently avoid punishments as opposed to the original model. The spiking network model highlights thus functional role of D1-D2 MSN segregation within striatum and explains necessity for reversed direction of DA-dependent plasticity found at synapses converging on different types of striatal MSNs.

Published in:

Neural Networks (IJCNN), The 2012 International Joint Conference on

Date of Conference:

10-15 June 2012