Cleaning Relations Using Knowledge Bases | IEEE Conference Publication | IEEE Xplore

Cleaning Relations Using Knowledge Bases


Abstract:

We study the data cleaning problem of detecting and repairing wrong relational data, as well as marking correct data, using well curated knowledge bases (KBs). We propose...Show More

Abstract:

We study the data cleaning problem of detecting and repairing wrong relational data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB. The main invention is that, a DR simultaneously models two opposite semantics of a relation using types and relationships in a KB: the positive semantics that explains how attribute values are linked to each other in correct tuples, and the negative semantics that indicates how wrong attribute values are connected to other correct attribute values within the same tuples. Naturally, a DR can mark correct values in a tuple if it matches the positive semantics. Meanwhile, a DR can detect/repair an error if it matches the negative semantics. We study fundamental problems associated with DRs, e.g., rule generation and rule consistency. We present efficient algorithms to apply DRs to clean a relation, based on rule order selection and inverted indexes. Extensive experiments, using both real-world and synthetic datasets, verify the effectiveness and efficiency of applying DRs in practice.
Date of Conference: 19-22 April 2017
Date Added to IEEE Xplore: 18 May 2017
ISBN Information:
Electronic ISSN: 2375-026X
Conference Location: San Diego, CA, USA
Department of Computer Science, Tsinghua University
Qatar Computing Research Institute, HBKU
Department of Computer Science, Tsinghua University
Department of Computer Science, Tsinghua University

Department of Computer Science, Tsinghua University
Qatar Computing Research Institute, HBKU
Department of Computer Science, Tsinghua University
Department of Computer Science, Tsinghua University
Contact IEEE to Subscribe

References

References is not available for this document.