Skip to Main Content
We propose a novel approach to point matching under large viewpoint and illumination changes that are suitable for accurate object pose estimation at a much lower computational cost than state-of-the-art methods. Most of these methods rely either on using ad hoc local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of key points as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual key points and by using statistical classification tools to produce a compact description of this view set. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use a classification method to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. In the context of pose estimation, we present experimental results for both planar and non-planar objects in the presence of occlusions, illumination changes, and cluttered backgrounds. We show that the method is both reliable and suitable for initializing real-time applications.