By Topic

Novel similarity measure for document clustering based on topic phrases

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
ELdesoky, A.E. ; Dept. of Comput. & Syst., Mansoura Univ., Mansoura ; Saleh, M. ; Sakr, N.A.

Document clustering is a subset of the data clustering field which categorizes large set of documents into similar and related groups. In the traditional vector space model (VSM) researchers have considered the unique word which occurs in the document set as the candidate feature. Recently a new trend which considered the phrase to be a more informative feature has taken place; the matter which contributes in improving the document clustering accuracy and effectiveness. This paper proposes a new approach for computing the similarity measure of the traditional VSM by considering the topic phrases of the document as the constituting terms for the VSM instead of the traditional term ldquowordrdquo and applying the new approach to the Buckshot method, which is a mix of the Hierarchical Agglomerative Clustering (HAC) algorithm and the K-means partitioning algorithm. Such a mechanism may raise the effectiveness of the clustering by increasing the evaluation metrics values.

Published in:

Networking and Media Convergence, 2009. ICNM 2009. International Conference on

Date of Conference:

24-25 March 2009