Journals & Magazines >IEEE Internet of Things Journal >Volume: 12 Issue: 8

Decomposition, Synthesis, and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Large language models (LLMs) can transform natural language instructions into executable commands for IoT devices like autonomous aerial vehicles (AAVs), creating new dev...Show More

Metadata

Abstract:

Large language models (LLMs) can transform natural language instructions into executable commands for IoT devices like autonomous aerial vehicles (AAVs), creating new development opportunities. However, safety concerns about LLMs translating commands into machine or program control instructions cannot be overlooked. Currently, jailbreak instructions used to test the LLM security are often restricted to specific modes or tasks, resulting in a lack of diversity and leaving some tasks unexplored. To address this issue, we introduce a multi-instruction fusion (MIF) method that can automatically fuse harmful prompts and various task instructions into jailbreaks. First, we adopt a reverse decomposition strategy to acquire sufficient supervised data for fusing harmful prompts and harmless task instructions into jailbreaks and construct a task instruction synthesizer based on it. Then, to determine the optimal instruction combinations in the vast combination space, we propose a representative-node-based selection strategy, ReNB, to rank and filter the instruction combinations on a few representative samples, thereby accelerating the identification of the valid ones. Experimental results demonstrate that MIF significantly improves the attack success rate (ASR), achieving over 90% on GPT-4o-mini, LLaMa2-70B, and Qwen2-7B models, outperforming the state-of-the-art (SOTA) baselines.

Published in: IEEE Internet of Things Journal ( Volume: 12, Issue: 8, 15 April 2025)

Page(s): 9420 - 9434

Date of Publication: 03 January 2025

ISSN Information:

DOI: 10.1109/JIOT.2025.3525741

Funding Agency:

Contents

References is not available for this document.

Decomposition, Synthesis, and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Decomposition, Synthesis, and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?