By Topic

Research on mixture language model-based document clustering

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Jian Wen ; Comput. Sch., Nat. Univ. of Defence Technol., Changsha ; Zhoujun Li

Language modeling with semantic smoothing is proposed as an effective way to improve the quality of document clustering. However, the existing semantic smoothing model is not effective for partitional clustering because it can not assign fit weight to ldquogeneralrdquo word in a collection. In this paper, inspired by mixture probability model, we put forward a mixture language model for document clustering. The new model can alleviate the effect of ldquogeneralrdquo word, simultaneously, it can integrate the context information and solve the polysemy problems in a document. Based the new model, an EM algorithm for partitional clustering is present. The experimental results show our algorithms are more effective than the previous methods to improve the cluster quality.

Published in:

Granular Computing, 2008. GrC 2008. IEEE International Conference on

Date of Conference:

26-28 Aug. 2008