By Topic

Improved focused crawling using bayesian object based approach

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ghozia, A. ; Comput. Eng. & Sci. Dept., Menofiya Univ., Menouf ; Sorour, H. ; Aboshosha, A.

The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a specific topic became vital. Building vertical search engines is accomplished by the help of a focused crawler. A focused crawler traverses the Web selecting out relevant pages to a predefined topic and neglecting those out of concern. The focused crawler is guided toward those relevant pages through a crawling strategy. In this paper, a new crawling strategy is presented that helps building a vertical search engine. With this strategy, the crawler is kept focused to the user interests toward the topic. We build a model that describes the Web pages' features that distinguish relevant Web documents from those that are irrelevant. This is accomplished in the form of a supervised learning process, the Web page is treated as an object having a set of features, and the features' values determine the relevancy of the Web page through a Bayesian model. Results from practical experiments proved the efficiency of the proposed crawling strategy.

Published in:

Radio Science Conference, 2008. NRSC 2008. National

Date of Conference:

18-20 March 2008