Skip to Main Content
In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations.