By Topic

Enhanced document clustering using fusion of multiscale wavelet decomposition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Mahmoud F. Hussin ; Arab Academy for Science and Technology and Maritime Transport, Alexandria, Egypt ; Ibrahim El Rube ; Mohamed S. Kamel

Most term weighting schemes for text document clustering depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in filtering out noise in most cases. In this paper, we propose a novel weighting approach using fusion technique that can be combined with wavelet-based estimation to achieve consistent improvements in the clustering. Our approach involves three steps: (1) term frequency (TF) weighting scheme, (2) multiple wavelets estimating, and (3) data fusion. Specifically, we apply the wavelet with different scales to produce different estimation values of the original TF, and use the fusion of these different values as new features for clustering the documents. The conducted experiments of clustering the documents from RETURES corpus verify that our weighting schemes using wavelet and fusion techniques reduces effectively the noise and improves clustering performance evaluated using the entropy and F_measure.

Published in:

2008 IEEE/ACS International Conference on Computer Systems and Applications

Date of Conference:

March 31 2008-April 4 2008