High level overview of the anonymization process (i.e., converting raw data → de-identified data) used for PPDP, and main focus of this study.
Abstract:
With the continuous increase in avenues of personal data generation, privacy protection has become a hot research topic resulting in various proposed mechanisms to addres...Show MoreMetadata
Abstract:
With the continuous increase in avenues of personal data generation, privacy protection has become a hot research topic resulting in various proposed mechanisms to address this social issue. The main technical solutions for guaranteeing a user’s privacy are encryption, pseudonymization, anonymization, differential privacy (DP), and obfuscation. Despite the success of other solutions, anonymization has been widely used in commercial settings for privacy preservation because of its algorithmic simplicity and low computing overhead. It facilitates unconstrained analysis of published data that DP and the other latest techniques cannot offer, and it is a mainstream solution for responsible data science. In this paper, we present a comprehensive analysis of clustering-based anonymization mechanisms (CAMs) that have been recently proposed to preserve both privacy and utility in data publishing. We systematically categorize the existing CAMs based on heterogeneous types of data (tables, graphs, matrixes, etc.), and we present an up-to-date, extensive review of existing CAMs and the metrics used for their evaluation. We discuss the superiority and effectiveness of CAMs over traditional anonymization mechanisms. We highlight the significance of CAMs in different computing paradigms, such as social networks, the internet of things, cloud computing, AI, and location-based systems with regard to privacy preservation. Furthermore, we present various proposed representative CAMs that compromise individual privacy, rather than safeguarding it. Besides, this article provides an extended knowledge (e.g., key assertion(s), strengths, weaknesses, clustering methods used in the anonymization process, and %age improvements in quantitative results) about each technique that provides a clear view of how much this topic has been investigated thus far, and what are the research gaps that seek pertinent solutions in the near future. Finally, we discuss the technical challenges of applying CAM...
High level overview of the anonymization process (i.e., converting raw data → de-identified data) used for PPDP, and main focus of this study.
Published in: IEEE Access ( Volume: 10)