Universal Adversarial Attacks on Text Classifiers | IEEE Conference Publication | IEEE Xplore

Universal Adversarial Attacks on Text Classifiers


Abstract:

Despite the vast success neural networks have achieved in different application domains, they have been proven to be vulnerable to adversarial perturbations (small change...Show More

Abstract:

Despite the vast success neural networks have achieved in different application domains, they have been proven to be vulnerable to adversarial perturbations (small changes in the input), which lead them to produce the wrong output. In this paper, we propose a novel method, based on gradient projection, for generating universal adversarial perturbations for text; namely sequence of words that can be added to any input in order to fool the classifier with high probability. We observed that text classifiers are quite vulnerable to such perturbations: inserting even a single adversarial word to the beginning of every input sequence can drop the accuracy from 93% to 50%.
Date of Conference: 12-17 May 2019
Date Added to IEEE Xplore: 17 April 2019
ISBN Information:

ISSN Information:

Conference Location: Brighton, UK

Contact IEEE to Subscribe

References

References is not available for this document.