Skip to Main Content
A new challenge in the spam email detection is the emergence of image spam, which consists in embedding the advertising messages into attached images to defeat the conventional text-based anti-spam technologies. New techniques are needed to filter these spam messages. In this paper, we proposed a prototype system to automatically classify an image directly as being spam or ham. The proposed method extracts latent topics in image to train a binary classifier for detecting spam images, and achieves more promising detection accuracy than conventional antispam approaches. In addition, a detection cascade is proposed to further reduce the computation overhead of the spam filter. Our algorithm is experimentally evaluated under a public spam image dataset, and shown to significantly improve both the detection accuracy and execution efficiency over the baseline approach.