Skip to Main Content
Progress in the automatic detection and identification of humans in video, given a minimal number of labelled faces as training data, is described. This is an extremely challenging problem owing to the many sources of variation in a person's imaged appearance, such as pose variation, scale, facial expression, illumination, partial occlusion, motion blur, etc. The developed method combines approaches from computer vision, for detection and pose estimation, with those from machine learning for classification. A 'generative' model of a person's head is defined consisting of a coarse 3D model and multiple texture maps. This allows faces to be rendered with a variety of facial expressions and at poses differing from those of the training data. It is shown that the identity of a target face can then be determined by first proposing faces with similar pose, and then classifying the target face as one of the proposed faces or not. Furthermore, the texture maps of the model can be automatically updated as new poses and expressions are detected. Results of detecting three characters in a TV situation comedy are demonstrated.