By Topic

Fuzzy C-Means Text Clustering with Supervised Feature Selection

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Wei Wang ; Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing ; Chunheng Wang ; Xia Cui ; Ai Wang

The traditional text clustering algorithm often uses the unsupervised feature selection method to select the feature. In this paper we propose a new text clustering algorithm SFFCM which use the supervised feature selection method to select the feature. The SFFCM is based on the EM algorithm. In the E-step, to calculate the expectation, we use the supervised feature selection algorithm to calculate the relevancy score for each term. In the M step we use the FCM algorithm to obtain the cluster results based on the selected terms. Our experimental results on standard document clustering benchmark corpuses: OHSUMED, 20-Newsgroups and Reuters-21578 show that the SFFCM text clustering algorithm can generate better clustering results than other control clustering methods and the supervised feature selection can improve the performance of the text clustering algorithm. We also propose a supervised feature selection measure CRF-CHI measure which is based on the chi2 statistic and the category relative frequency. The experimental results also confirm that the CRF-CHI is an effective supervised feature selection measure.

Published in:

Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on  (Volume:1 )

Date of Conference:

18-20 Oct. 2008