I. Introduction
The similarity measure plays a vital role in nearly every field of science and engineering. A similarity measure can be described as a process to determine the degree of similarity that exists between two objects [1], [2]. The identification of similar database records is an important entity matching application. The term ‘duplicate record detection’ is used to describe the process of recognising records that represent the same realworld entity in a specific database. The difficulty associated with duplication is that duplicated records may not share the same record key. Various methods to resolve this issue have been employed to locate and cleanse erroneous duplicated records in a typical dataset. The duplicated or erroneous data can result from several factors which include, data entry errors, such as typing the name of a person like “John” as “Jon”, etc. Moreover, there could be a missing validation check or restriction issue such as an age value of 320, or of multiple conventions such as 22 E, 7th St vs. 22 East Seventh Street). An additional problem may also result from structural differences between database sources [3].