Skip to Main Content
We describe the autonomous development of binocular vergence control in an active robotic vision system through attention-gated reinforcement learning (AGREL). The control policy is implemented by a neural network, which maps the outputs from a population of disparity energy neurons to a set of vergence commands. The network learns to maximize a reward signal that is based on an internal representation of the visual input: the total activation in the population of disparity energy neurons. This system extends previous work using Q learning by increasing the complexity of the policy in two ways. First, the input state space is continuous, rather than discrete, and is based upon a larger diversity of neurons. Second, we increase the number of possible actions. We evaluate the network learning and performance on natural images and with real objects in a cluttered environment. The policies learned by the network outperform policies by Q learning in two ways: the mean squared errors are smaller and the closed loop frequency response has larger bandwidth.