By Topic

Web Page's Blocks Based Topical Crawler

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Weifeng Zhang ; Coll. of Comput., Nanjing Univ. of Posts & Telecommun., Nanjing, China ; Baowen Xu ; Hong Lu

Link context has been widely used in information retrieval and classification. In topical crawlers or vertical crawlers, the link contexts are used to forecast whether the links are related to topics. The context of a link or link context usually includes the anchor text of the link, the whole web page text or the words in the fixed scope near the link. The entire text of the page often contains too many themes, anchor text is too simple, and the scope of fixed windows is not easy to determine. In this paper, we propose to decide the scope of link context by the web page block technology. The links in the same block are more closely related. The corner classification based neural network is used to represent and filter the topics. Our experiments show that web crawlers using web page block based link context have better accuracy, and that the corner classification neural network is suitable for representing and filtering topics.

Published in:

Service-Oriented System Engineering, 2008. SOSE '08. IEEE International Symposium on

Date of Conference:

18-19 Dec. 2008