Skip to Main Content
Locating and identifying complex objects in a visual scene is a typical problem within the areas of computer vision and image analysis. One technique to minimise the size of image to be identified is to base the classification on smaller features of the image, which are combined into a more complex structure to identify the complete object. For example, locating two eyes, a nose and a mouth can enable us to identify a face without paying attention to the hair, chin or cheeks. In this paper, we present a system and training technique for learning to recognise an object from its component features. Our system incorporates an attention-based mechanism to predict the location of features. We demonstrate the effectiveness of our system with an experiment in face detection; the attention mechanism is shown to improve the overall classification speed and accuracy of feature location.