Link spam techniques can enable some pages to achieve higher-than-deserved rankings in the results of a search engine. They negatively affect the quality of search results. Classification methods can detect link spam. For classification problem, features play an important role. This paper proposes to derive new features using genetic programming from existing link-based features and use the new features as the inputs to SVM and GP classifiers for the identification of link spam. Experiments on WEBSPAM-UK2006 show that the classification results of the classifiers that use 10 newly generated features are much better than those of the classifiers that use original 41 link-based features and equivalent to those of the classifiers that use 138 transformed link-based features. The newly generated features can improve the link spam classification performance.
Published in:
Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on
(Volume:1
)
Date of Conference: 28-29 March 2011