Conferences >2013 IEEE 13th International ...

An Unsupervised Algorithm for Learning Blocking Schemes

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A pair wise comparison of data objects is a requisite step in many data mining applications, but has quadratic complexity. In applications such as record linkage, blockin...Show More

Metadata

Abstract:

A pair wise comparison of data objects is a requisite step in many data mining applications, but has quadratic complexity. In applications such as record linkage, blocking methods may be applied to reduce the cost. That is, the data is first partitioned into a set of blocks, and pair wise comparisons computed for pairs within each block. To date, blocking methods have required the blocking scheme be given, or the provision of training data enabling supervised learning algorithms to determine a blocking scheme. In either case, a domain expert is required. This paper develops an unsupervised method for learning a blocking scheme for tabular data sets. The method is divided into two phases. First, a weakly labeled training set is generated automatically in time linear in the number of records of the entire dataset. The second phase casts blocking key discovery as a Fisher feature selection problem. The approach is compared to a state-of-the-art supervised blocking key discovery algorithm on three real-world databases and achieves favorable results.

Published in: 2013 IEEE 13th International Conference on Data Mining

Date of Conference: 07-10 December 2013

Date Added to IEEE Xplore: 03 February 2014

Electronic ISBN:978-0-7695-5108-1

ISSN Information:

DOI: 10.1109/ICDM.2013.60

Conference Location: Dallas, TX, USA

Contents

References is not available for this document.

An Unsupervised Algorithm for Learning Blocking Schemes

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

An Unsupervised Algorithm for Learning Blocking Schemes

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?