Abstract:
This paper proposes a mutual detection mechanism between spam blogs and keywords for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting...Show MoreMetadata
Abstract:
This paper proposes a mutual detection mechanism between spam blogs and keywords for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common keywords, and such blogs and keywords can form large spam bi-clusters. Based on such clusters, this paper explains how to detect spam blogs and spam keywords with mutual filtering. It reports that the maximum precision of the filtering is 95%, based on a preliminary experiment with approximately six months' updated blog data and a more detailed experiment with one day's data.
Date of Conference: 28-31 July 2009
Date Added to IEEE Xplore: 29 September 2009
ISBN Information: