Skip to Main Content
Most content based spam filters are rule based or trained off-line. Handling new spam tactics is difficult and prone to high misclassification rate. This paper proposes an incremental adaptive spam mail filtering using Naiumlve Bayesian classification which gives good performance, simplicity and adaptability. We model an incremental scheme that receives a stream of emails, and applies the concept of sliding window to train only the last w emails for testing new incoming messages. Subsequently, the new features of tested messages are added to the existing features so that the model will be adaptive to future incoming emails. The proposed model is tested on two corpora: Trec05p-1 and Trec06p. The parameters are the window size and the number of features, and the evaluation metrics are the processing time per message, and the ham and spam misclassification rates. The experimental results show that the number of features has little impact whereas the window size has significant effects on misclassification rates and the processing time. In addition, the overall accuracy is even better than that obtained from the batch off-line training and the processing time is reduced significantly.