Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Differential Analysis on Deep Web Data Sources

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Tantan Liu ; Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA ; Fan Wang ; Jiedan Zhu ; Agrawal, G.

The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.

Published in:

Data Mining Workshops (ICDMW), 2010 IEEE International Conference on

Date of Conference:

13-13 Dec. 2010