Skip to Main Content
The popularity of the Internet has caused a massive increase in the amount of Web pages. The information explosion has led to a growing challenge for information retrieval systems. Document clustering becomes an important process for helping the information retrieval systems organize this vast amount of data. It is believed that grouping similar documents together into clusters will help the users find relevant information quicker, and will allow them to focus their search in the appropriate direction. Feature selection is an important task in data analysis. It is useful to limit redundancy of features, promote comprehensibility, and find clusters (or structures) hidden in high dimensional data. This paper addresses the problems of document mining related with Web page clustering and classification using the principle component analysis for feature vector selection. Singular value decomposition is used to find the similarity measure and multilayer neural network used to improve the performance of the clustering algorithm. We illustrate and discuss the system performance by experimental evaluation results.