By Topic

Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Nizar Bouguila ; Concordia Univ., Montreal

In this paper, we examine the problem of count data clustering. We analyze this problem using finite mixtures of distributions. The multinomial distribution and the multinomial Dirichlet distribution (MDD) are widely accepted to model count data. We show that these two distributions cannot be the best choice in all the applications, and we propose another model called the multinomial generalized Dirichlet distribution (MGDD) that is the composition of the generalized Dirichlet distribution and the multinomial, in the same way that the MDD is the composition of the Dirichlet and the multinomial. The estimation of the parameters and the determination of the number of components in our model are based on the deterministic annealing expectation-maximization (DAEM) approach and the minimum description length (MDL) criterion, respectively. We compare our method to standard approaches such as multinomial and multinomial Dirichlet mixtures to show its merits. The comparison involves different applications such as spatial color image databases indexing, handwritten digit recognition, and text document clustering.

Published in:

IEEE Transactions on Knowledge and Data Engineering  (Volume:20 ,  Issue: 4 )