Skip to Main Content
Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of "why are we able to re-identify entities on real world dynamic networks?" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g., when fraudulent consumers are not labeled as fraudulent) cannot be observed.