Skip to Main Content
We wish to match sets of images to sets of images where both sets are undergoing various distortions such as viewpoint and lighting changes. To this end we have developed a joint manifold distance (JMD) which measures the distance between two subspaces, where each subspace is invariant to a desired group of transformations, for example affine warping of the image plane. The JMD may be seen as generalizing invariant distance metrics such as tangent distance in two important ways. First, formally representing priors on the image distribution avoids certain difficulties, which in previous work have required ad-hoc correction. The second contribution is the observation that previous distances have been computed using what amounted to "home-grown" nonlinear optimizers, and that more reliable results can be obtained by using generic optimizers which have been developed in the numerical analysis community, and which automatically set the parameters which home-grown methods must set by art. The JMD is used in this work to cluster faces in video. Sets of faces detected in contiguous frames define the subspaces, and distance between the subspaces is computed using JMD. In this way the principal cast of a movie can be 'discovered' as the principal clusters. We demonstrate the method on a feature-length movie.