Skip to Main Content
Since 1990s, as the problem of spam has become a serious threat to email communication, the prolonged competition between spammers and anti-spam filters has begun and lasted until today. In order to filter spam based on the semantic analysis of email content, many content-based anti-spam approaches have been put forward, such as text-based filtering, image-based filtering, etc. However, the tricks played by spammers are also evolved quickly. Nowadays, it turns out that the capability of any single anti-spam approach is too limited to handle diverse real-world spam effectively. So, how to combine current techniques to construct more effective anti-spam systems has become the major focus of our research. In this paper, we propose a novel hierarchical anti-spam framework, which adopts multiple techniques including text classification, image processing and Optical Character Recognition in different layers to detect spam. We evaluate the proposed approach on several public spam corpora as well as our personal corpus, and verify the effectiveness of the proposed approach in terms of the filtering capacity and filtering performance.