Skip to Main Content
This paper describes a system that autonomously learns to perform saccadic gaze control on a stereo pan-tilt unit. Instead of learning a direct map from image positions to a centering action, the system first learns a forward model that predicts how image features move in the visual field as the gaze is shifted. Gaze control can then be performed by searching for the action that best centers a feature in both the left and the right image. By attacking the problem in a different way we are able to collect many training examples in each action, and thus learning converges much faster. The learning is performed using image features obtained from the scale invariant feature transform (SIFT) detected and matched before and after a saccade, and thus requires no special environment during the training stage. We demonstrate that our system stabilises already after 300 saccades, which is more than 100 times fewer than the best current approaches.