Skip to Main Content
In this paper, we propose a statistical framework for clustering spherical data which are usually found in machine learning, data mining and computer vision applications. Our framework is based on finite Langevin mixture models which provide a very natural representation of normalized vectors in high dimensional spaces in which the data lie on unit hypersphere. Moreover, we developed minimum message length (MML) criterion for the selection of finite Langevin mixture components from which different probabilistic information divergence distances are then derived. Through empirical experiments, we demonstrate the merits of the proposed learning framework through challenging applications involving spam filtering using visual email content and email categorization.