Loading [MathJax]/extensions/MathMenu.js
The Crawling Strategy of Shark-Search Algorithm Based on Multi Granularity | IEEE Conference Publication | IEEE Xplore

The Crawling Strategy of Shark-Search Algorithm Based on Multi Granularity


Abstract:

Shark-search and HITS are the classical algorithms of subject crawler. Aiming at the phenomenon of the theme drift in HITS algorithm and the noise link in Shark-search al...Show More

Abstract:

Shark-search and HITS are the classical algorithms of subject crawler. Aiming at the phenomenon of the theme drift in HITS algorithm and the noise link in Shark-search algorithm, a new topic crawling strategy for improving Shark-search and HITS algorithm is proposed. Firstly, analyzed the depth of the web page and adopt the VIPS (vision-based page segmentation) block processing algorithm for a given web page. In the prediction of related links, adopt multi-granularity Shark-search algorithm, at the same time combined with HITS algorithm which depends on the query. This not only makes up for the lack of the Shark-search algorithm "global" problem, reduced the noise links, but also eliminated the HITS algorithm "subject drift" phenomenon. Experiments show that the new search strategy compared with the traditional strategy, the total query rate and query information has been further improved.
Date of Conference: 12-13 December 2015
Date Added to IEEE Xplore: 12 May 2016
ISBN Information:
Conference Location: Hangzhou, China

Contact IEEE to Subscribe

References

References is not available for this document.