By Topic

An approach based on extracted data for wrapper maintenance

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Wei Luo ; Sch. of Comput. Sci. & Technol., ShanDong Univ., Jinan, China ; Qingzhong Li ; Yanhui Ding

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel approach to the problem of automatic wrapper maintenance. It is based on the truth that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and content of the extracted data items. The approach uses these preserved features to identify the locations of the desired values in the changed pages, then the wrappers can be repaired. The experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with accuracies.

Published in:

Pervasive Computing and Applications (ICPCA), 2010 5th International Conference on

Date of Conference:

1-3 Dec. 2010