Skip to Main Content
Most term weighting schemes for text document clustering depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in filtering out noise in most cases. In this paper, we propose a novel weighting approach using fusion technique that can be combined with wavelet-based estimation to achieve consistent improvements in the clustering. Our approach involves three steps: (1) term frequency (TF) weighting scheme, (2) multiple wavelets estimating, and (3) data fusion. Specifically, we apply the wavelet with different scales to produce different estimation values of the original TF, and use the fusion of these different values as new features for clustering the documents. The conducted experiments of clustering the documents from RETURES corpus verify that our weighting schemes using wavelet and fusion techniques reduces effectively the noise and improves clustering performance evaluated using the entropy and F_measure.