Abstract:
Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has re...Show MoreMetadata
Abstract:
Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. This paper describes a method developed for the automatic clustering of documents by using a hybrid classifier based on rough sets and neural networks, which we called as Rough-Ann. First, the documents are denoted by vector space model and the feature vectors are reduced by using rough sets. Then using those feature vectors we reduced that are training set for artificial neural network and clustering the documents. The experimental results show that the algorithm Rough-Ann is effective for the documents classification, and has the better performance in classification precision, stability and fault-tolerance comparing with the traditional classification methods, Bayesian classifiers SVM and kNN, especially for the complex classification problems with many feature vectors
Published in: 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops
Date of Conference: 18-22 December 2006
Date Added to IEEE Xplore: 08 January 2007
Print ISBN:0-7695-2749-3