Skip to Main Content
Despite the vast amount of experimental findings on the role of the basal ganglia in reinforcement learning, there is still general lack of network models that use spiking neurons and plausible plasticity mechanisms to demonstrate network-level reward-based learning. In this work we extend a recent spiking actor-critic network model of the basal ganglia, aiming to create a minimal realistic model of learning from both positive and negative rewards. We hypothesize and implement in the model segregation of not only the dorsal striatum, but also of the ventral striatum into populations of medium spiny neurons (MSNs) that carry either D1 or D2 dopamine (DA) receptor type. This segregation allows explicit representation of both positive and negative expected reward within respective population. In line with recent experiments, we further assume that D1 and D2 MSN populations have distinct, opposing DA-modulated bidirectional synaptic plasticity. We implement the spiking network model in the simulator NEST and conduct experiments involving application of delayed rewards in a grid world setting, where a moving agent has to reach a goal state while maximizing the total obtained reward. We demonstrate that the network can learn not only to approach the positive rewards, but also to consequently avoid punishments as opposed to the original model. The spiking network model highlights thus functional role of D1-D2 MSN segregation within striatum and explains necessity for reversed direction of DA-dependent plasticity found at synapses converging on different types of striatal MSNs.