By Topic

Analysis of Duplicated Web Pages Identification Methods in Search Engine

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Fei Duan ; Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China ; Yan Zheng

The identification of duplicated web pages is one of the related steps in search engine. The effect of the identification will affect search engine's performance. This article studies and summarizes the basic processing steps, key technologies of duplicated web pages identification in search engine. On the basis of some experiments, we analyze and contrast some basic algorithms' performance. Then summarizes their advantages and disadvantages. Finally, we proposes an idea that use the distributed computing such as Hadoop to identify the duplicated web pages in order to make more efficiency when we try to process the massive internet information in search engine.

Published in:

Database Technology and Applications (DBTA), 2010 2nd International Workshop on

Date of Conference:

27-28 Nov. 2010