D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Bilgic, M.
Licamele, L.
Getoor, L.
Shneiderman, B.
Maryland Univ., College Park, MD;
This paper appears in: Visual Analytics Science And Technology, 2006 IEEE Symposium On
Publication Date: Oct. 31 2006-Nov. 2 2006
On page(s): 43-50
Location: Baltimore, MD,
ISBN: 1-4244-0591-2
INSPEC Accession Number: 9211077
Digital Object Identifier: 10.1109/VAST.2006.261429
Current Version Published: 2006-12-26
Abstract
Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity-resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the benefits of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly identified using D-Dupe's unique combination of entity resolution algorithms within a task-specific visual interface
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.