By Topic

The Research of Web Page De-duplication Based on Web Pages Reshipment Statement

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Min-yan Wang ; Coll. of Comput. Sci. & Inf. Eng., Zhejiang Gongshang Univ., Hangzhou, China ; Dong-sheng Liu

Web page de-duplication module is an important part of search engine system, which can improve its performance and quality with filtering the Web pages downloaded by crawler system of search engine and eliminating the duplicated Web pages. This paper from the source of duplicated Web pages - reshipment proposes a Web page de-duplication method that the information including original Web sites and Web titles are extracted to eliminate duplicated Web pages based on feature codes. Experiments show that this method can achieve satisfactory results in eliminating large-scale duplicated Web pages.

Published in:

2009 First International Workshop on Database Technology and Applications

Date of Conference:

25-26 April 2009