Skip to Main Content
We consider the problems of clustering, classification, and visualization of high-dimensional data when no straightforward Euclidean representation exists. In this paper, we propose using the properties of information geometry and statistical manifolds in order to define similarities between data sets using the Fisher information distance. We will show that this metric can be approximated using entirely nonparametric methods, as the parameterization and geometry of the manifold is generally unknown. Furthermore, by using multidimensional scaling methods, we are able to reconstruct the statistical manifold in a low-dimensional Euclidean space; enabling effective learning on the data. As a whole, we refer to our framework as Fisher information nonparametric embedding (FINE) and illustrate its uses on practical problems, including a biomedical application and document classification.