Skip to Main Content
Web page clustering is useful for taxonomy design, information extraction, similarity search, and it can assist to the evaluation and visualization of the results of search engines. Therefore, an accurate clustering is a goal in Web mining and Web information extraction. Besides the particular clustering algorithm, the different term weighting functions applied to the selected features to represent Web pages is a main aspect in clustering task. This paper presents the evaluation of the performance of six different term weighting functions of Web pages, by means of a partitioning clustering algorithm results. Besides, two reduction methods have been applied: (1) the proper function, and (2) removing all features occurring more times than upper thresholds in page and collection, and occurring less times than lower thresholds in page and collection. By means of the experimentation with a collection of Web documents used in clustering research, we have determined that the best results are obtained when the term weighting function based on a fuzzy criteria combination is used.