Skip to Main Content
The problem introduced by the unsolicited bulk emails, also known as "spam" generates a need for reliable anti-spam filters. In this paper, we design and compare the performance of a newly designed SOM based sequence analysis (SBSA) system for the spam filtering task. The system is based on a SOM based sequential data representation combined with a kNN classifier designed to make use of word sequence information. We compare this system with the traditional baseline method naive Bayesian filter. Three different cost scenarios and suitable cost-sensitive measurements are employed. The results show that the SBSA system is superior to the naive Bayesian filter, particularly when the misclassification cost for non-spam message is high.