By Topic

Data extraction from Web data sources

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
J. Robinson ; Dept. of Comput. Sci., Essex Univ., Colchester, UK

An explanation is given of the basic data structures used in a new page analysis technique to create wrappers (data extractors) for the result pages produced by Web sites in response to user qeries via Web page forms. The key structure called a tpGrid is a representation of the web page, which is easier to analyse than the raw HTML code. The analysis looks for repetition patterns of sets of tagSets, which are defined in the paper.

Published in:

Database and Expert Systems Applications, 2004. Proceedings. 15th International Workshop on

Date of Conference:

30 Aug.-3 Sept. 2004