Skip to Main Content
Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.