Contrast-Then-Approximate: Analyzing Keyword Leakage of Generative Language Models | IEEE Journals & Magazine | IEEE Xplore

Contrast-Then-Approximate: Analyzing Keyword Leakage of Generative Language Models


Abstract:

There is an increasing tendency to fine-tune large-scale pre-trained language models (LMs) using small private datasets to improve their capability for downstream applica...Show More

Abstract:

There is an increasing tendency to fine-tune large-scale pre-trained language models (LMs) using small private datasets to improve their capability for downstream applications. In this paper, we systematically analyze the pre-train and then fine-tune the process of generative LMs and show that the fine-tuned LMs would leak sensitive keywords of the private datasets even without any prior knowledge of the downstream tasks. Specifically, we propose a novel and efficient keyword inference attack framework to accurately and maximally recover sensitive keywords. Owing to the fine-tuning process, pre-trained and fine-tuned models might respond differently to identical input prefixes. To identify potential sensitive sentences for training the fine-tuend LM, we introduce a contrast difference score that assesses the response variations between a pre-trained LM and its corresponding fine-tuned LM. Following this, we iteratively fine-tune the pre-trained model using these sensitive sentences to minimize the disparity between the target model and the pre-trained model, thereby maximizing the number of inferred sensitive keywords. We implement two types of keyword inference attacks (i.e., domain and private) according to our framework and conduct comprehensive experiments on three downstream applications to evaluate the performance. The experimental results demonstrate that our domain keyword inference attack achieves a precision of 85%, while our private keyword inference attack can extract highly sensitive personal information for a significant number of individuals (approximately 0.3% of all customers in the private fine-tuning dataset, which contains 40,000 pieces of personal information).
Page(s): 5166 - 5180
Date of Publication: 22 April 2024

ISSN Information:

Funding Agency:


I. Introduction

The rapid development of deep learning techniques in Natural Language Processing (NLP) has led to significant advancements in Language Models (LMs), making them fundamental to various NLP tasks, such as text classification [1], [2] and question answering [3]. Recent popular LMs, such as Google’s BERT [4] and OpenAI’s GPT family [5], are composed of multiple layers of Transformer blocks with millions of parameters [6]. Pre-training these LMs on massive text corpora collected from the Internet is a common practice [7]. Large-scale LMs can understand and generate fluent natural language [8], and minor parameter updates enable direct application to various downstream tasks. Pre-trained LMs can additionally be fine-tuned on small private datasets for domain-specific applications without incurring the high costs of training from scratch [4].

Contact IEEE to Subscribe

References

References is not available for this document.