Skip to Main Content
Accurate spam filters, such as the Bayesian filter, need a large cost for off-line training (or learning) based on the analysis of a large corpus of email. This paper presents cascaded simple, i.e., rule-based, filters for accurate and lightweight detection of email spam. We cascade three filters that classify email based on respectively the fingerprints of message bodies, the white and black lists of email addresses in the From header, and the words specific to spam and legitimate email in the Subject header. Our filter need no training, but collect by themselves the information above when they are working, and especially when the user notifies them of their false negative decision (classifying spam as legitimate). We show by experiment with about 20,000 real world emails that the cascaded simple filters achieve the false negative rate of about 0.025 with no false positive (deciding legal email as spam) and the high performance of about 90 emails per seconds.
Date of Conference: 18-25 July 2010