By Topic

A novel web page duplication detection framework

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)
Zhongming Han ; Sch. of Comput. Sci. & Inf. Eng., Beijing Technol. & Bus. Univ., Beijing, China ; Dagao Duan ; Hongzhi Liu ; Jianzhi Sun

There are a lot of redundant Web pages on Internet. Based on tag statistic and text similarity comparison, we present a novel multilayer framework for detecting duplicated Web pages in this paper. We propose two similarity text paragraphs detection algorithms and implement our framework. The experimental results show that our approach achieves high performance, which means that duplicated Web pages can be efficiently detected simply by tag statistic and text comparison.

Published in:

Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on

Date of Conference:

6-8 Nov. 2009