Skip to Main Content
The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a specific topic became vital. Building vertical search engines is accomplished by the help of a focused crawler. A focused crawler traverses the Web selecting out relevant pages to a predefined topic and neglecting those out of concern. The focused crawler is guided toward those relevant pages through a crawling strategy. In this paper, a new crawling strategy is presented that helps building a vertical search engine. With this strategy, the crawler is kept focused to the user interests toward the topic. We build a model that describes the Web pages' features that distinguish relevant Web documents from those that are irrelevant. This is accomplished in the form of a supervised learning process, the Web page is treated as an object having a set of features, and the features' values determine the relevancy of the Web page through a Bayesian model. Results from practical experiments proved the efficiency of the proposed crawling strategy.