By Topic

Schema matching using duplicates

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Bilke, A. ; Technische Univ. Berlin, Germany ; Naumann, F.

Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.

Published in:

Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on

Date of Conference:

5-8 April 2005