By Topic

A comparative study on unsupervised feature selection methods for text clustering

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Luying Liu ; Sch. of Comput. Sci., Beijing Univ., China ; Jianchu Kang ; Jing Yu ; Zhongliang Wang

Text clustering is one of the central problems in text mining and information retrieval area. For the high dimensionality of feature space and the inherent data sparsity, performance of clustering algorithms will dramatically decline. Two techniques are used to deal with this problem: feature extraction and feature selection. Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, four unsupervised feature selection methods, DF, TC, TVQ, and a new proposed method TV are introduced. Experiments are taken to show that feature selection methods can improves efficiency as well as accuracy of text clustering. Three clustering validity criterions are studied and used to evaluate clustering results.

Published in:

2005 International Conference on Natural Language Processing and Knowledge Engineering

Date of Conference:

30 Oct.-1 Nov. 2005