By Topic

An Algorithm for Detecting Similar Data in Replicated Databases Using Multi Criteria Decision Making

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Sorkhabi, V.B. ; Dept. of Comput. Eng., Azad Univ. Shabestar Branch, Tabriz, Iran ; Derakhshi, M.-R.F. ; Shahamfar, H.

Identical data may cause many problems in all types of databases, specially distributed and replicated databases. These data will attack consistency and redundancy which are two important problems in databases. Databases or replicas may contain similar records with different appearance, concerning the same real word entity because of many reasons. Some of these reasons are: Entry errors, unstandardized abbreviations, differences details of various databases schemas, package lost, noisy environments and etc are some reasons of duplicates. This paper proposes an approach to detect duplicate or similar data, which are faulty or noisy so they are distinguished as different data, among various replicas in distributed or replicated databases. Multi criteria decision making algorithm is employed for this propose. To detect identical records, at first step some priorities are defined for fields and then percent of similarity of records evaluate. Algorithm's time overhead is improved through using special order of priorities. Multi criteria decision making algorithm is used to decide how to combine records with each other and which record is complete and true one. An instance based learning approach is employed to learn how to set priorities for various fields, creating a uniform schema and find their appropriate match, in other replica.

Published in:

Environmental and Computer Science, 2009. ICECS '09. Second International Conference on

Date of Conference:

28-30 Dec. 2009