The problem introduced by the unsolicited bulk emails, also known as "spam" generates a need for reliable anti-spam filters. In this paper, we design and compare the performance of a newly designed SOM based sequence analysis (SBSA) system for the spam filtering task. The system is based on a SOM based sequential data representation combined with a kNN classifier designed to make use of word sequence information. We compare this system with the traditional baseline method naive Bayesian filter. Three different cost scenarios and suitable cost-sensitive measurements are employed. The results show that the SBSA system is superior to the naive Bayesian filter, particularly when the misclassification cost for non-spam message is high.
Published in:
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
(Volume:4
)
Date of Conference: July 31 2005-Aug. 4 2005