Loading [MathJax]/extensions/MathMenu.js
AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception | IEEE Journals & Magazine | IEEE Xplore

AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception


Abstract:

As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification...Show More

Abstract:

As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been demonstrated to be vulnerable to adversarial examples, which have already raised general concerns in both academia and industry. Existing attacks follow the same adversarial example generation paradigm from computer vision, i.e., overlaying the optimized additive perturbations on original voices. However, due to the additive perturbations’ nature on human audibility, balancing the stealthiness and attack capability remains a challenging problem. In this paper, we rethink the stealthiness of audio adversarial examples and turn to introduce another kind of audio distortion, i.e., reverberation, as a new perturbation format for stealthy adversarial example generation. Such convolutional adversarial perturbations are crafted as real-world impulse responses and behave as a natural reverberation for deceiving humans. Based on this idea, we propose AdvReverb to construct, optimize, and deliver phoneme-level convolutional adversarial perturbations on both speech and music carriers with a well-designed objective. Experimental results demonstrate that AdvReverb could realize high attack success rates over 95% on three audio-domain tasks while achieving superior perceptual quality and keeping stealthy from human perception in over-the-air and over-the-line delivery scenarios.
Page(s): 1948 - 1962
Date of Publication: 21 December 2023

ISSN Information:

Funding Agency:


I. Introduction

Voice user interfaces (VUIs) have gained increasing popularity recently due to their non-contact and human-centered interaction experience. Modern audio systems underlying VUIs exhibit prodigious speech cognition capabilities powered by deep learning, including spotting keywords [1], [2], identifying individuals [3], [4], and understanding utterances [5], [6]. However, these powerful features are also accompanied by multifaceted security issues owing to the endogenous vulnerability of deep learning and the omnipresent availability of VUIs. In particular, the latest studies have demonstrated the significant threat of adversarial example attacks to audio systems, enabling adversaries to invade VUIs effortlessly by just slightly perturbing the input. This opens up the potential for stealthy device activation, targeted user impersonation, or even malicious command execution.

Contact IEEE to Subscribe

References

References is not available for this document.