t-Closeness: Privacy Beyond k-Anonymity and l-Diversity
Ninghui Li
Tiancheng Li
Venkatasubramanian, S.
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN;
This paper appears in: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Publication Date: 15-20 April 2007
On page(s): 106-115
Location: Istanbul,
ISBN: 1-4244-0803-2
INSPEC Accession Number: 9551415
Digital Object Identifier: 10.1109/ICDE.2007.367856
Current Version Published: 2007-06-04
Abstract
The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute. In this paper we show that l-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We choose to use the earth mover distance measure for our t-closeness requirement. We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.