Skip to Main Content
Nowadays more and more spam emails convey spam messages in a human readable image instead of text, making detection by conventional content filters difficult. However, the text information contained in spam images can be very useful for spam detection. Our goal in this paper is to propose an effective algorithm for text localization in spam images, the basic idea is to discriminate the non-text edges with some selected features of edges. Furthermore, we construct a corner detection algorithm based on a circular template to predict the corner points of the text in an image, which is crucial for text localization. Our evaluation shows that this algorithm can identify 96% of texts contained in spam images and the precision can reach up to 97.6% on real world data (spam image samples come from the SpamArchive public dataset).
Date of Conference: 25-27 May 2008