Conferences >2024 27th International Confe...

Generating Valid and Natural Adversarial Examples with Large Language Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks....Show More

Metadata

Abstract:

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.

Published in: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

Date of Conference: 08-10 May 2024

Date Added to IEEE Xplore: 10 July 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/CSCWD61410.2024.10580402

Conference Location: Tianjin, China

Contents

References is not available for this document.

Generating Valid and Natural Adversarial Examples with Large Language Models

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Generating Valid and Natural Adversarial Examples with Large Language Models

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?