Journals & Magazines >IEEE Access >Volume: 10

A Differentiable Language Model Adversarial Attack on Text Classifiers

The training phase of the DILMA architecture consists of several steps. First step: obtain logits from a Language Model for input. Second step: sampling from the multinom...

Abstract:

Transformer models play a crucial role in state of the art solutions to problems arising in the field of natural language processing (NLP). They have billions of paramete...Show More

Metadata

Abstract:

Transformer models play a crucial role in state of the art solutions to problems arising in the field of natural language processing (NLP). They have billions of parameters and are typically considered as black boxes. Robustness of huge Transformer-based models for NLP is an important question due to their wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input invisible to a human eye can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, widely used in computer vision, are not applicable per se. The standard strategy to overcome this issue is to develop token-level transformations, which do not take the whole sentence into account. The semantic meaning and grammatical correctness of the sentence are often lost in such approaches In this paper, we propose a new black-box sentence-level attack. Our method fine-tunes a pre-trained language model to generate adversarial examples. A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust. Hence, it is difficult to defend from the proposed attack, which is not the case for others. Our attack demonstrates the highest decrease of classification accuracy on all datasets(on AG news: 0.95 without attack, 0.89 under SamplingFool attack, 0.82 under DILMA attack).

The training phase of the DILMA architecture consists of several steps. First step: obtain logits from a Language Model for input. Second step: sampling from the multinom...

Published in: IEEE Access ( Volume: 10)

Page(s): 17966 - 17976

Date of Publication: 11 February 2022

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2022.3148413

Funding Agency:

Ivan Fursov

Skolkovo Institute of Science and Technology, Moscow, Russia

Ivan Fursov was born in Chelyabinsk, Russia. He received the master’s degree from the Skolkovo Institute of Science and Technology, in 2020. He is currently a Deep Learning Research Engineer at Tinkoff and continues to work on new approaches in adversarial attacks on NLP models. In his master’s thesis, he proposed a new adversarial attack on sequence classifiers.

Alexey Zaytsev

Skolkovo Institute of Science and Technology, Moscow, Russia

Alexey Zaytsev was born in Kharkiv, Ukraine. He graduated from the MIPT, in 2012. He received the Ph.D. degree in mathematics from the IITP RAS, in 2017. He is currently an Assistant Professor at the Skoltech. His research interests include development of new methods for sequential data, Bayesian optimization, and embeddings for weakly structured data. In his master’s thesis, he proposed a modification of Bayesian approac...Show More

Pavel Burnyshev

Skolkovo Institute of Science and Technology, Moscow, Russia

Huawei Noah’s Ark Laboratory, Moscow, Russia

Pavel Burnyshev was born in Perm, Russia. He graduated from the MIPT, in 2020. He is currently pursuing the Master of Science degree with the Skolkovo Institute of Science and Technology. He is also a Data Scientist at the NLP Department, Huawei, and works on adversarial attacks for machine translation.

Ekaterina Dmitrieva

HSE University, Moscow, Russia

Ekaterina Dmitrieva is currently pursuing the Ph.D. degree with the CS Faculty, HSE University. Her research interests include semantic parsing, in particular text2SQL models and adversarial attacks.

Ekaterina Dmitrieva is currently pursuing the Ph.D. degree with the CS Faculty, HSE University. Her research interests include semantic parsing, in particular text2SQL models and adversarial attacks.View more

Nikita Klyuchnikov

Skolkovo Institute of Science and Technology, Moscow, Russia

Nikita Klyuchnikov received the M.Sc. degree in information science and technology from the Skolkovo Institute of Science and Technology, the M.Sc. degree in applied mathematics and physics from the Moscow Institute of Physics and Technology, in 2016, and the Ph.D. degree in computational and data science and engineering from the Skolkovo Institute of Science and Technology, in 2021. His main research interests include ma...Show More

Andrey Kravchenko

University of Oxford, Oxford, U.K.

Ekaterina Artemova

Huawei Noah’s Ark Laboratory, Moscow, Russia

HSE University, Moscow, Russia

Ekaterina Artemova graduated from HSE University. She received the Ph.D. degree from the Institute of System Analysis, RAS. She is currently a Postdoctoral Researcher at the CS Faculty, HSE University, and advises the Noah Ark’s NLP Team on advanced research topics. She focuses on NLU tasks, ranging from ToD systems to IE and creating new datasets.

Evgenia Komleva

Skolkovo Institute of Science and Technology, Moscow, Russia

Evgenia Komleva graduated from the MIPT, in 2021. She is currently pursuing the master’s degree in data science with the Skolkovo Institute of Science and Technology. She is also working on NLP problems at ABBYY and plans to continue her research on adversarial attacks.

Evgeny Burnaev

Skolkovo Institute of Science and Technology, Moscow, Russia

Artificial Intelligence Research Institute (AIRI), Moscow, Russia

Evgeny Burnaev received the M.Sc. degree from the Moscow Institute of Physics and Technology, in 2006, and the Ph.D. degree from the Institute for Information Transmission Problems, in 2008. He is currently an Associate Professor at the Skolkovo Institute of Science and Technology, Moscow, Russia. His research interests include Gaussian processes for multi-fidelity surrogate modeling and optimization, deep learning for 3D...Show More

Contents

Ivan Fursov

Skolkovo Institute of Science and Technology, Moscow, Russia

Alexey Zaytsev

Skolkovo Institute of Science and Technology, Moscow, Russia

Pavel Burnyshev

Skolkovo Institute of Science and Technology, Moscow, Russia

Huawei Noah’s Ark Laboratory, Moscow, Russia

Ekaterina Dmitrieva

HSE University, Moscow, Russia

Ekaterina Dmitrieva is currently pursuing the Ph.D. degree with the CS Faculty, HSE University. Her research interests include semantic parsing, in particular text2SQL models and adversarial attacks.

Nikita Klyuchnikov

Skolkovo Institute of Science and Technology, Moscow, Russia

Andrey Kravchenko

University of Oxford, Oxford, U.K.

Andrey Kravchenko is currently a Researcher at the University of Oxford and the Skolkovo Institute of Science and Technology. His Ph.D. research was at the intersection of machine learning and unstructured data extraction. He also played a significant role in the DIADEM project, which produced state-of-the art research in the field of large-scale fully automated web data extraction. His current research interests include the theory and application of anomaly detection in big data using sequences and graphs, and in particular, the development of efficient machine learning algorithms based on the embedding of vectors. He also works on exploring the broader connection between black-box machine learning models and knowledge-based systems, with a particular focus on knowledge graphs.

Ekaterina Artemova

Huawei Noah’s Ark Laboratory, Moscow, Russia

HSE University, Moscow, Russia

Evgenia Komleva

Skolkovo Institute of Science and Technology, Moscow, Russia

Evgeny Burnaev

Skolkovo Institute of Science and Technology, Moscow, Russia

Artificial Intelligence Research Institute (AIRI), Moscow, Russia

References is not available for this document.

A Differentiable Language Model Adversarial Attack on Text Classifiers

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Differentiable Language Model Adversarial Attack on Text Classifiers

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?