Skip to Main Content
As compared with text spam, the image spam is a variant which is invented to escape from traditional text-based spam classification and filtering. Various approaches to image spam filtering have been proposed with respective advantages and drawbacks in terms of time cost and efficiency. In this paper, we propose a new approach based on Base64 encoding of image files and n-gram technique for feature extraction. By transforming normal images into Base64 presentation, we try to extract features of an image with n-gram technique. With these features we train an SVM (support vector machine) which shows effectiveness and efficiency in detecting spam images from legitimate images. With an online shared personal corpus of images as the input, experimental results show that our approach, in comparison with some of the existing methods of feature extraction, can achieve very high performance for image spam classification in terms of some basic measures such as accuracy, precision, and recall. Moreover, our approach shows its practicability by taking less running time for image spam classification in comparison to other methods.