As a simple but efficient classification method, Naive Bayes algorithm has shown its desirable characters in many fields. However, the effect still needs to be improved for applying in practice. In this paper, we construct an extended model with assigning weights to some important features. A method called CF is used to measure the relevance between a feature and a category to make up the deficiency of CHI-Square statistic method. We select best features based on a new proposed method called CHCFW to reinforce the distribution of key features in a document and remove the disturbed features. Compared with the original Naïve Bayes model and other algorithm to assign weight to features, the experiment results show that CHCFW method performs better and more appropriate to larger amounts of training documents.
Published in:
Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on
Date of Conference: 17-19 Sept. 2010