Skip to Main Content
Email spam is a persistent problem, especially today, with the increasing dedication and sophistication of spammers. Even popular social media sites such as Facebook, Twitter, and Google Plus are not exempt from email spam as they all interface with email systems. With an “arms-race” between spammers and spam filter developers, spam has been continually changing over the years. In this paper, we analyze email spam trends on a dataset collected by the Spam Archive, which contains 5.1 million spam emails spread over 15 years (1998-2013). We use statistical analysis techniques on different headers in email messages (e.g. content type and length) and embedded items in message body (e.g. URL links and HTML attachments). Also, we investigate topic drift by applying topic modeling on the content of email spam. Moreover, we extract sender-to-receiver IP routing networks from email spam and perform network analysis on it. Our results show the dynamic nature of email spam over one and a half decades and demonstrate that the email spam business is not dying but changing to be more capricious.