In this paper, we address the problem of classifying image sets for face recognition, where each set contains images belonging to the same subject and typically covering large variations. By modeling each image set as a manifold, we formulate the problem as the computation of the distance between two manifolds, called manifold-manifold distance (MMD). Since an image set can come in three pattern levels, point, subspace, and manifold, we systematically study the distance among the three levels and formulate them in a general multilevel MMD framework. Specifically, we express a manifold by a collection of local linear models, each depicted by a subspace. MMD is then converted to integrate the distances between pairs of subspaces from one of the involved manifolds. We theoretically and experimentally study several configurations of the ingredients of MMD. The proposed method is applied to the task of face recognition with image sets, where identification is achieved by seeking the minimum MMD from the probe to the gallery of image sets. Our experiments demonstrate that, as a general set similarity measure, MMD consistently outperforms other competing nondiscriminative methods and is also promisingly comparable to the state-of-the-art discriminative methods.