Close category search window
 

A feature selection for Korean Web document clustering

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Heum Park ; AI Lab. Dept. of Comput. Sci., Pusan Nat. Univ., South Korea ; Young-Gi Kim ; Hyuk-Chul Kwon

This paper is a comparative study of feature selection methods for Korean Web documents clustering. First, we focused on how the term feature and the co-link of Web documents affect clustering performance. We clustered Web documents by native term feature, co-link and both, and compared the output results with the originally allocated category. And we selected term features for each category using X2, information gain (IG), and mutual information (MI) from training documents, and applied these features to other experimental documents. In addition we suggested a new method named max feature selection, which selects terms that have the maximum count for a category in each experimental document, and applied X2 (or MI or IG) values to each term instead of term frequency of documents, and clustered them. In the results, X2 shows a better performance than IG or MI, but the difference appears to be slight. But when we applied the max feature selection method, the clustering performance improved notably. Max feature selection is a simple but effective means of feature space reduction and shows powerful performance for Korean Web document clustering.

Published in:
Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE  (Volume:3 )

Date of Conference: 2-6 Nov. 2004

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.