Hierarchical DP-K Anonymous Data Publishing Model Based on Binary Tree | IEEE Conference Publication | IEEE Xplore

Hierarchical DP-K Anonymous Data Publishing Model Based on Binary Tree


Abstract:

With the acceleration of data opening and sharing process in the power industry, the risk of sensitive data leakage is also gradually increasing. Privacy protection is on...Show More

Abstract:

With the acceleration of data opening and sharing process in the power industry, the risk of sensitive data leakage is also gradually increasing. Privacy protection is one of the hot issues of privacy leakage control technology research in data release, and k-anonymity is the hot topic of privacy protection research in recent years. In this paper, we propose a hierarchical DP-K anonymous data release model based on binary tree clustering for the existing k-anonymity scheme and for minimizing the amount of information loss. A binary tree-based clustering algorithm (BTCA) is proposed to classify similar data records into the same equivalent class, which can improve the effect of clustering, reduce the information loss caused by anonymous data set release, and improve the availability of data. The clustered anonymous data sets redistribute different privacy budgets according to the privacy rights of the quasi-identifier attributes, and realize the hierarchical protection of the data with different degrees of sensitivity through the differential privacy noise increase mechanism, which enhances the privacy of the data.
Date of Conference: 19-22 February 2023
Date Added to IEEE Xplore: 29 March 2023
ISBN Information:

ISSN Information:

Conference Location: Pyeongchang, Korea, Republic of

Funding Agency:


I. Introduction

As a new factor of production in the era of digital economy, data realizes value creation in the flow and sharing, and at the same time, it also faces a huge risk of leakage. Power data is a vital tangible assets of enterprises,it involves regulation data, marketing data, production equipment data, human and property management data. With the continuous development of the company's data external service business, the power data application and value mining such as data release have become a rigid demand. Data security and privacy protection are the core issues of power data application and value mining process. Power grid data contains a large number of users' personal privacy information and commercial sensitive data, which will not only bring huge economic losses but also bear relevant legal responsibilities. Therefore, data need to be anonymized during data opening to protect sensitive data from leakage. With the development of ubiquitous power Internet of things, electric power data open in information perception, interconnection, open sharing will have a huge role, and more liquidity, big data, cloud computing technology give full play to the advantages, power data energy and value will be more deeply mining and release [1], through open power data can maximize use power grid enterprise equipment, network, customer and platform advantage, power energy ecosystem sharing. But on the other hand, for the purpose of information sharing and data mining in the process of data release privacy is also increasingly prominent, if the attacker to released data reverse reasoning attack, restore the original data, the privacy of enterprise operation data, account information will leak or damage, cause huge economic losses to enterprises, so how to realize the information sharing at the same time, effectively protect private sensitive information not leakage is particularly important. Data publishers release data to protect sensitive information on the data set, and protect the private data without destroying the required data information and data characteristics. There are currently three main methods to protect privacy data: (1) K-anonymity. The most widely used technology today is k-anonymity, where companies hide unique identities such as ID numbers or names when publishing user data to protect users' privacy. However, this way cannot resist link attacks and background knowledge attacks; (2) Data disturbance. This method is to use scrambled, distortion, randomization and other technologies to disrupt the original data, so that the data loses the authenticity and integrity, the attacker can not obtain the real data, but the availability of the data is greatly reduced; (3) Data encryption. Combine cryptography with data security, and use homomorphic encryption and asymmetric encryption technology to form distributed security computing to support the work of privacy protection. For example, secure multiparty calculation, but the problem of this method is that it requires too much computing resources, and the cost is very high. Thus, the three methods have some problems, cannot effectively prevent data open the privacy in the process of leakage problem, as the country in recent years continue to strengthen data security and personal privacy regulation, released on September 1,2021, the data security law, how to ensure data legal compliance use, prevent enterprise important data and personal privacy, is the current digital transformation, mining power data value facing major challenges.

Contact IEEE to Subscribe

References

References is not available for this document.