By Topic

Exploiting the Information Web

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Gregg, D.G. ; Bus. Sch., Colorado Univ., Denver, CO ; Walczak, S.

The World Wide Web is an increasingly important data source for business decision making; however, extracting information from the Web remains one of the challenging issues related to Web business intelligence applications. To use heterogeneous Web data for decision making, documents containing relevant data must be located, and the data of interest within the documents must be identified and extracted. Currently, most automatic information extraction systems can only cope with a limited set of document formats or do not adapt well to changes in document structure, as a result, many real-world data sources with complex document structures cannot be consistently interpreted using a single information extraction system. This paper presents an adaptive information extraction system prototype that combines multiple information extraction approaches to allow more accurate and resilient data extraction for a wide variety of Web sources. The Amorphic Web information extraction system prototype can locate data of interest based on domain knowledge or page structure, can automatically generate a wrapper for a data source, and can detect when the structure of a Web-based resource has changed and act on this to search the updated resource to locate the desired data. The prototype Amorphic information extraction system demonstrated improved information extraction accuracy for the four different extraction scenarios examined when compared with traditional data extraction approaches

Published in:

Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on  (Volume:37 ,  Issue: 1 )