Abstract:
Shark-search and HITS are the classical algorithms of subject crawler. Aiming at the phenomenon of the theme drift in HITS algorithm and the noise link in Shark-search al...Show MoreMetadata
Abstract:
Shark-search and HITS are the classical algorithms of subject crawler. Aiming at the phenomenon of the theme drift in HITS algorithm and the noise link in Shark-search algorithm, a new topic crawling strategy for improving Shark-search and HITS algorithm is proposed. Firstly, analyzed the depth of the web page and adopt the VIPS (vision-based page segmentation) block processing algorithm for a given web page. In the prediction of related links, adopt multi-granularity Shark-search algorithm, at the same time combined with HITS algorithm which depends on the query. This not only makes up for the lack of the Shark-search algorithm "global" problem, reduced the noise links, but also eliminated the HITS algorithm "subject drift" phenomenon. Experiments show that the new search strategy compared with the traditional strategy, the total query rate and query information has been further improved.
Date of Conference: 12-13 December 2015
Date Added to IEEE Xplore: 12 May 2016
ISBN Information: