Data mining is a technique for identifying patterns and trends from large collection of data. The collected data may contain personal information which may violate the privacy of individuals, which makes data mining a critical issue. Techniques available on hand in the field of privacy preserving data mining work well for relational data with fixed-schema, and low dimensionality. In this paper, an anonymization method for sparse high-dimensional transactional data is proposed. An anonymized group formation strategy is used which relies on efficient Nearest-Neighbor (NN) Search in high dimensional spaces. The problem of high dimensionality is addressed by anonymizing each group of transaction according to relevant Quasi Identifiers (QID). The privacy requirement is fulfilled by partitioning the transactional dataset into disjoint sets of transactions, referred as anonymized groups. These groups contain QIDs and the frequencies of sensitive items. The proposed NN search algorithm maximizes the quality of each individual group and can be used for sparse high-dimensional data. On the other hand, the number of groups formed is proportional to number of sensitive item, which paves way for inference attack. Hence to overcome this problem, anonymization can be integrated with anatomization, where the same data can be published as two distinct tables, the quasi identifier table and the sensitive table. This enhancement would prevent inference attack, which is the major drawback of NN search algorithm.
Published in:
Recent Trends In Information Technology (ICRTIT), 2012 International Conference on
Date of Conference: 19-21 April 2012