By Topic

15th International Conference on Scientific and Statistical Database Management. SSDBM 2003

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

In most cases unique identifiers are required to join data from different databases. If global unique keys are absent or corrupted the supplement of data extracted from different sources becomes difficult. The main question is: does a given record relates to an entity, which is identical to an entity corresponding to another record, or not? This leads to a classification problem with at least two classes: identical and not identical. Classifying pairs of records needs a three-step procedure. The first step is to define suitable common properties (attributes) of data for all different sources. Secondly, to allow comparisons the values of the records are transformed to these common properties. Finally, the classification is performed on an almost finite subset, the range of an appropriate comparison function. Different classification techniques can be applied like Association Rules, Classification Trees, Neural networks or Record Linkage techniques. The unknown parameters of the classification rules are computed by sampling and supervised learning. Unbiased error rates can be estimated for instance by cross validation. Special attention must be paid to control the computing complexity of the identification process. The approach is illustrated for data from two library databases and from the planned German administrative record census, which will become a substitute of a regular census.

Published in:

Scientific and Statistical Database Management, 2003. 15th International Conference on

Date of Conference:

9-11 July 2003