Skip to Main Content
We consider the problem of determining the structure of high-dimensional data, without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. In addition, the mathematical properties of this distribution allow highdimensional modeling without requiring dimensionality reduction and thus without a loss of information. The number of clusters is determined using the Minimum Message length (MML) principle. Parameters estimation is done by a hybrid stochastic expectation-maximization (HSEM) algorithm. The model is compared with results obtained by other selection criteria (AIC, MDL and MMDL). The performance of our method is tested by real data clustering and by applying it to an image object recognition problem.