Skip to Main Content
In this paper we examine the problem of count data clustering. We analyze this problem using finite mixtures of distributions. The multinomial and the multinomial Dirichlet distributions are widely accepted to model count data. We show that these two distributions cannot be the best choice in all the applications and we propose another model based on the selection of the generalized Dirichlet as a prior to the multinomial. The estimation of the parameters and the determination of the number of components in our model are based on the expectation-maximization approach and the minimum description length criterion, respectively. We compare our method to standard approaches to show its merits. The comparison involves spatial color image databases indexing.
Date of Conference: 24-27 Nov. 2007