Skip to Main Content
Keypoint matching is a standard tool to solve the correspondence problem in vision applications. However, in 3-D face tracking, this approach is often deficient because the human face complexities, together with its rich viewpoint, nonrigid expression, and lighting variations in typical applications, can cause many variations impossible to handle by existing keypoint detectors and descriptors. In this paper, we propose a new approach to tailor keypoint matching to track the 3-D pose of the user head in a video stream. The core idea is to learn keypoints that are explicitly invariant to these challenging transformations. First, we select keypoints that are stable under randomly drawn small viewpoints, nonrigid deformations, and illumination changes. Then, we treat keypoint descriptor learning at different large angles as an incremental scheme to learn discriminative descriptors. At matching time, to reduce the ratio of outlier correspondences, we use second-order color information to prune keypoints unlikely to lie on the face. Moreover, we integrate optical flow correspondences in an adaptive way to remove motion jitter efficiently. Extensive experiments show that the proposed approach can lead to fast, robust, and accurate 3-D head tracking results even under very challenging scenarios.