Skip to Main Content
In the light of developments in technology to analyze personal data, public concerns regarding privacy are rising. Often a data holder, such as a hospital or bank needs to share person specific records in such a way that the identities of the individuals who are the subjects of data cannot be determined. The generalization techniques such as K-anonymous, L-diverse and t-closeness were given as solutions to solve the problem of privacy breach, at the cost of information loss. Also, a very few papers dealt with personalized generalization. But, all these methods were developed to solve the external linkage problem resulting in sensitive attribute disclosure. It is very easy to prevent sensitive attribute disclosure by simply not publishing quasi-identifiers and sensitive attributes together. But the only reason to publish generalized quasi identifiers and sensitive attributes together is to support data mining tasks that consider both types of attributes in the database. Our goal in this paper is to eliminate the privacy breach (how much an adversary learn from the published data) and increase utility (accuracy of data mining task) of a released database. This is achieved by transforming a part of quasi-identifier and personalizing the sensitive attribute values. Our experiment conducted on the datasets from the UCI machine repository demonstrates that there is incremental gain in data mining utility while preserving the privacy to a great extend.
Date of Conference: 6-7 March 2009