Skip to Main Content
We propose an unsupervised spam filter called Bulk Mail Traffic Classification (BMTC) for filtering junk mails from the perspective of ISPs. Our insight is that spammers generally sent mass unsolicited emails with few alterations to a common message content, which can be found at an extensive traffic environment. In our approach, we classify email delivery traffic into different categories by the similarity of message contents. Then we can decide whether or not a particular email category is spam by the number of similar mails of this category and take measures to filter it. We also design a simulator, two sketches data structure, and a series of algorithms to support our method. We have applied BMTC to email traffic data captured at one of the largest commercial Internet service providers in China, and the experimental result indicates that a 70.4% reduction of emails can be achieved with our method. The results also show that BMTC is practical. We can implement it in a high-volume traffic environment handling over millions of mails every day with small memory consumption.