Privacy-Preserving Personal Identifiable Information (PII) Label Detection Using Machine Learning | IEEE Conference Publication | IEEE Xplore

Privacy-Preserving Personal Identifiable Information (PII) Label Detection Using Machine Learning


Abstract:

In today's data-driven world, the protection of PII is of paramount importance to safeguard personal privacy. PII tags serve as crucial markers for identifying and proces...Show More

Abstract:

In today's data-driven world, the protection of PII is of paramount importance to safeguard personal privacy. PII tags serve as crucial markers for identifying and processing sensitive information within databases. However, the authentication and registration of PII tags can be time-consuming and error prone. To address this challenge, we propose a method for privacy controlled PII tag detection that harnesses the power of machine learning (ML) combined with regular expressions. Proposed approach leverages various techniques, including feature engineering, adaptive learning, and machine learning, to extract meaningful patterns and relationships from data. By training the model on large datasets that encompass diverse PII elements such as names, addresses, phone numbers, email addresses, and social security numbers, enable it to learn and classify PII identifiers across different documents effectively. One of the key advantages of the proposed method is its ability to automate the detection of PII identifiers, thereby reducing the reliance on manual interpretation and minimizing the potential for human error. By integrating machine learning algorithms, empower organizations to efficiently identify and process sensitive information present in their databases, bolstering privacy protection measures. Moreover, this approach facilitates the development of scalable and accurate solutions for privacy based PII tag search. This advancement paves the way for enhanced data privacy across the enterprise, ensuring compliance with regulations and standards pertaining to the protection of personal information. By combining the strengths of ML and regular expressions, the proposed method enables organizations to detect and handle PII identifiers more effectively. This not only streamlines data management processes but also strengthens privacy safeguards, and more secure and privacy-aware data ecosystem.
Date of Conference: 06-08 July 2023
Date Added to IEEE Xplore: 23 November 2023
ISBN Information:

ISSN Information:

Conference Location: Delhi, India
References is not available for this document.

I. Introduction

In the digital age, where data has become a central driver of numerous aspects of our lives, safeguarding PII has emerged as a critical concern. PII encompasses any data that can be utilized to identify or trace an individual, such as their name, address, phone number, social security number, and more. Unauthorized exposure or misuse of PII can result in severe privacy breaches, identity theft, and other detrimental consequences for both individuals and organizations. Consequently, there is an escalating demand for robust mechanisms to detect and handle PII within datasets while upholding privacy preservation. The primary objective of PII label identification is to develop resilient and precise models capable of identifying and labeling PII across diverse types of data, including structured and unstructured text. The main goal is to ensure protection of sensitive information while facilitating the secure and responsible utilization of data for various applications. PII label identification leverages ML and NLP techniques to automatically identify and label PII within textual data. This process finds application in multiple domains, including healthcare, finance, and e-commerce, where protecting sensitive information is paramount and compliance with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) is mandatory. Implementing PII label identification involves a multifaceted approach. Initially, the dataset is preprocessed to extract textual content and transform it into a suitable representation for analysis. This may involve techniques such as tokenization, stemming, and removing stop words. Subsequently, ML algorithms, such as classification or sequence labelling models, are trained on annotated data to recognize patterns and features indicative of PII. These models learn from a large volume of labelled examples, allowing them to generalize and accurately classify PII in new and unseen data. To enhance the performance and adaptability of PII label identification models, additional techniques such as feature engineering, ensemble learning, and deep learning may be employed. Feature engineering involves extracting meaningful attributes from the data that can aid in differentiating PII from non-PII content. Ensemble learning combines predictions from multiple models to improve overall accuracy and robustness. Deep learning techniques, such as recurrent neural networks (RNNs) or transformers, excel in capturing complex patterns and dependencies in text data, thereby enhancing the detection of PII. Successful implementation of PII label identification enables organizations to comply with data protection regulations, safeguard sensitive information, and foster trust with their customers. It empowers them to implement appropriate security measures, including access controls, encryption, and anonymization, to protect PII throughout its lifecycle. Moreover, by automating the identification process, the burden on manual inspection and potential human error is alleviated, leading to greater efficiency and effectiveness in PII handling.

References is not available for this document.

Contact IEEE to Subscribe

References

References is not available for this document.