WordChange: Adversarial Examples Generation Approach for Chinese Text Classification

As an important carrier for disseminating information in the Internet Age, the text contains a large amount of information. In recent years, adversarial example attacks against text discrete domains have been received widespread attention. Deep neural network (DNN) produces opposite predictions by adding small perturbations to the text data. In this paper, we present “WordChange”: an adversarial examples generation approach for Chinese text classification based on multiple modification strategies, and we evaluate the effectiveness of the method in sentiment analysis dataset and spam dataset. This method effectively locates important word positions by designing a keyword contribution algorithm. We first propose a “word-split” strategy to substitute keywords thatare designed by the structure and semantic property of Chinese texts. We also first apply “swap” and “insert” strategies on Chinese texts to generate adversarial examples. We further discuss the influence of multiple Chinese Word Segmentation tools and different text lengths on the proposed method, as well as the diversification of Chinese text modification strategies. Finally, the adversarial texts based on the long short-term memory network (LSTM) can be successfully transferred to other text classifiers and real-world applications.


I. INTRODUCTION
Deep Neural Network (DNN) is widely employed in various fields of scientific research. Recent research finds DNNs are vulnerable to adversarial attacks that refer to the purposeful addition of small perturbationson the original text to deceiving the target classifier [1]. On the one hand, the adversarial attacks prove the vulnerability of DNN models, on the other hand, it reveals that DNN has certain risks when deployed in a higher security system. Attackers could use adversarial samples to disguise spam emails, scam short messages, advertising sales, and online malicious comments as normal textto deceive the system so that seriously affects the security of the network environments.
The adversarial sample was first discovered in the DNNbased image recognition task. It successfully fools the neural network by adding tiny noises that are not noticeable to the image, and it can also be transferred to the physical world [2]. Although adversarial attacks achieved higher successrates The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Shorif Uddin . in images [3]- [5], the natural differences between text and images increase the difficulty in the generation of adversarial text. It is difficult to directly add perturbations in discrete data. Perturbationsin images are not easy to detect but easy of text and it is difficult to maintain the semantic invariance. There hassome research on adversarial textgeneration [6]- [9] which can be divided into black-box attacks and white-box attacks.Attackerscan access all the parameters or gradient information of the model in white-box and the blackboxattackersonly query the output predicted by the model or completely have nomodel information. Therefore, compare with the white-box attacks, black-box attackswerewidely used in practical applications. Meanwhile, there are large differences betweenmultiple languages so that the method of generating adversarial samples between different languages is not universal. And how to keep the semantic integrity and readability in the process of generating adversarial samples is also an urgent problem.
In this paper, we propose a black-box method called Word-Change to generate Chinese adversarial samples. We first perform a purification operation on the original text and VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ then calculate the contribution value of the words to locate keywords. Finally, we design keyword modification strategiesto generate adversarial samplesbased on the language characteristics of the Chinese. This method effectively attacks popular text classification models while retaining readability, and achievesgreatattack results.
The main contributions of this paper are: 1) We proposed a Chinese adversarial sample generation method which successfully deceives the DNN classification model by making simple and small changes to the original text under the condition of unknownmodel parameters and structure; 2) We introduce a keyword search method based on clause split filtering. It can locate the keywords more accurately that affect predictions of the model. We also design more suitable keyword replacement methods for Chinese: Chinese character swap, character insertion, and Chinese character splitandreplacementwhichmaking minor modifications to the original sample and preserving semantic integrity; 3) Using two real-wordreview datasets for experiments to attack the LSTM [16] model, the classification accuracy has dropped by an average of 45%. The experimental results prove that the adversarial samples generated by ourapproach are effective and of high quality; 4) The classification accuracy on the spam dataset decreased by an average of 48%. Our method not only effectively attack texts in common scenarios, but also migrated to more security issues, which has certain universality.

II. RELATED WORK A. TEXT CLASSIFICATION TASKS AND MODELS
The rapid development of the Internet has triggered an explosive growth of network data, and text plays an important role as a way of disseminating information. Faced with enormous text data, text classification tasks have become a research hotspot in the field of Natural Language Processing (NLP). Currently, many text classifications methods are still based on machine learning algorithms, such as Naive Bayes (NB), k-Nearest Neighbors(kNN), Decision Trees, Support Vector Machines (SVM), [10]- [13]. Although these methods can achieve good classification results, but their ability to express text features is relatively weak. Therefore, deep learning has gradually become the main research strategy for text classification tasks, such as Recurrent Neural Network (RNN), Convolutional Neural Networks (CNN), deep learning networks with attention mechanisms, etc [14], [15]. Among them, the most widely used is the RNN which modeled on sequence on sequence data to reduce the bias in semantic understanding. However, RNNs may lose the ability to learn the relationship between information in long texts. Long Short-Term Memory (LSTM) [16] and Gate Recurrent Unit (GRU) emerges as the times require. LSTM and GRU can learn long-term dependencies and suitable for processing and predicting events with relatively long intervals and delays in the sequence, so it can be better applied to text classification, especially long text classification tasks. Zhu and Yang [17] proposed a features fusion model C_BiGRU_ATT based on deep learning which uses CNN and Attention-based Bidirectional Gated Recurrent Unit (BiGRU) at character-level and word-level for text classification. Tao et al. [18] proposed a novel Radicalaware Attention-based Four-Granularity (RAFG) model which applies a serialized BLSTM structure and takes full advantages of Chinese characters, words, character-level radicals and word-level radicals simultaneously. Qiao et al. [19] proposed a Chinese text classification network named wordcharacter attention model (WCAM) which takes GRU to integrate two levels of attention models: word-level and character-level.
Currently, text classification tasks based on deep learning have achieved good results and arewidely used in different security tasks, including spam detection, sentiment analysis, online public opinion monitoring, and fake news detection. The security of all these systems is also particularly important.

B. TEXT ADVERSARIAL ATTACKS
The security of text-based network systems is closely related to the robustness of deep learning models. In 2014, Szeged et al. [1] first confirmed that the DNN model used for object recognition may be deceived by adding perturbed to input images. Many mature methods for generating adversarial samples for DNNshave been proposed, such as FGSM (Fast Gradient Sign Method) [20], JSMA (Jacobianbased Saliency Map Attack) [21], C&W (Carlini and Wagner Attacks) [22], and Deepfool [23], etc. However, most of the above methods are aimed at images. Because of the differences between images and text that some methods can not be directly applied to the text. At present, the adversarial sample generation of text has also made some progress. Jia and Liang [24] were the first to consider text adversarial sample generation on Reading Comprehension Systems and the research gained attention in NLP. Liang et al. [25] adopted the idea of FGSM toidentify text items that are important for classification by computing the cost gradients, and designed three perturbation strategies: insertion, modification, and remove. Suranjana and Mehta [26] also used FGSM to modify the original text by deleting or replacing words in the text. For the added and replaced words, this method constructed a candidate pool with synonyms, spelling errors, and type-specific keywords. Gong et al. [27] used FGSM and DeepFool to attack the word embeddings and found a valid nearest neighbor for replacement. Butthe method relies on well-trained word embeddings so that causestime-consuming research. Ebrahimi et al. [28] used synonyms to replace one or two words to generate an adversarial sample, and it can retain the semantic integrity greatly.
These methods mentioned above are all based on whitebox attacks, and relatively little research has been done on black-box attacks. Gao et al. [29] proposed a black-box algorithm DeepWordBug. According to the output of the model that found corresponding keywords in the text by the word importance calculation function, and modify the text by the way of insertion, deletion, substitution, and swap to generate adversarial samples. Li et al. [30] proposed a general attack framework TEXTBUGGER and evaluated the effect on the Deep Learning-based Text Understanding (DLTU) systems. Ren et al. [31] proposed a greedy algorithm called probability weighted word saliency (PWWS) with substitutions of synonyms. Iyyer et al. [32] proposed a syntactically controlled paraphrasing network (SCPN) and used them to generate adversarial examples. Given a sentence and a target syntactic form, SCPNs are trained to produce sentence interpretations.
However, most of the current adversarial text generation methods were designed for English. The modification rules are mostly based on the operation of a single letter in a word, and can not apply to Chinese text. Wang et al. [33] first proposed amethod for Chinese adversarial text. They designed a keyword calculation function and used homophones to substitute words. But the attack resultsarenot good enough and the modificationstrategy for keywords is relatively single as theydo not make full use of the feature of Chinese characters.

A. PROBLEM STATEMENT
We focus on non-targetChinese adversarial attacks under black-box settings. The keywords are positioned by accessing the predictive tags of the model, and the keyword modification method is used to generate text adversarial samples with semantic integrity. Thepurpose is to generate adversarial text S from legitimate input text S, andexplore a more concise and efficient method from the perspective of maliciousto promote defenses with attacks. The premise of a black-box attack is that attackers can't access information such as parameters, structures, or gradients of the target model F.
Attackers canaddperturbations into keywords x of the input text S to generate adversarial textŜ so that F :Ŝ →ŷ, (ŷ = y), where y is the label of the original text. Figure 1 shows the process of generating adversarial samples.
Since Chinese does not have natural separators like English so the text needs to be segmented first. The text S = x 1 , x 2 , . . . , x n after segmentation is a discrete space, D is a dictionary of input words, x n ∈ D represents the nth word in the original text sequence. For text classification tasks, given a pre-trained LSTM [16] model F : (S) → Y , this model will map the feature space X of the original input text to a set of classification labels Y = {y 1 , y 2 . . . ..y i }, where the labels may come fromseveralcategories.

B. PURIFICATION OPERATION
In general, the key features that determine predictions of the model are not evenly distributed in each clause of a long sentence as many clauses only state facts that are not related to model classification. It is possible to find some words that do not contributeto the classification if the model searches the keywords directly in the entire text. Considering the difference between search spaces of the long and short text, and to more accurately locate the keywords that affect the tag category, we propose a ''purification'' operation, that isfiltering the words or sentences which are not helpful for classifications, leaving the rest text with the highest contributionto the current classification label. Finding keywords in this ''rich text'' will effectively improve accuracy.
According to the characteristics of Chinese, the text is divided into clauses according to different punctuation marks: Input the original text after deleting each clause s i to the model F and output its predicted label y s , where F : (S − s n ) → y s . If y = y s , it indicates that the key information is contained in the clause, and add s i as a candidate sentence to S . Then we use jieba library (A Chinese word segmentation package of python) to tag and record the part-of-speech of all words X in the obtained candidate sentences as a ''word: part-of-speech'' dictionary. We remove the words with meaningless part-of-speech POS = {prep., pron., num., art.} in X to obtain the candidate keywords X .

C. WORD CONTRIBUTION CALCULATION ALGORITHM
In text classification tasks, different words may have different sentiment classification tendencies. To modify fewer words but change the text tendency most, finding the words that have the largest contribution to the original category is the key operation of the algorithm. A word with a high contribution VOLUME 8, 2020 means that the ability to classify into the current category will be greatly reducedafter removing the word. We rank and locate candidate words according to the impact on category contributions. The contribution of each word is measured by the following methods: To get a quantitative representation of the contribution value, the confidence degree P is introduced to calculate specifically: where P F y i | S is the probability that the text gets the predicted label y i according to the classifier F, S x i represents the text after deleting the word x i . For a long piece of text after ''purification'' operation, thetextlength and the time to calculate the contribution value C F (x i , y i ) will be reduced so that can better determine the contribution of each word for the particular classifier F.

D. KEYWORD MODIFICATION STRATEGY
The key to generating adversarial text is to add perturbationson certain words x in the sentence S so that makes the generated text S does not affecthuman normal reading but fool a text detector or classifier. According to current research, keyword modification strategies in English for adversarial texts can be summarized from reference [23], [25], and [29] as follows: (1) replace original words with synonyms; (2) randomly exchange adjacent letters in words; (3) replace a certain letter in a word with other characters; (4) randomly insert letters in a word; (5) randomly delete letters other than the first and last letters in a word, etc. However, the above method can't apply to Chinese text as the basic unit of English is 26 letters that most of them have no practical meanings, and the modification of individual letters does not affect the semantics of words. The basic unit of Chinese is the thousands of Chinese characters commonly used thatalways express different semantics. Therefore, the word modification strategy based on Chinese characters requires diversified attempts and strategic choices. Based on the above analysis, we attempt to use three Chinese keyword modification strategies to generate adversarial samplesthat achieve the purpose of fooling the deep neural networks with small changes compared to the original text. Examples of the modification strategiesare shown in Table 1:

1) CHINESE CHARACTER EXCHANGE (CCE)
Exchange the position of Chinese characters in the words. Although the change of the position of Chinese characters seems not to guarantee semantic Continuity theoretically, psychological studies [34] have shown that humans can read and understand the scrambled text, because the reading inertial thinking will automatically complete the ordering of text to understand the purpose of semantics.

2) CHARACTER INSERTION (CI)
Randomly insert disturbing symbols in words. Artificially create a set of disturbing symbols, which is composed of symbols that have no practical meaning and do not affect the semantics of the text, such as punctuation marks, Roman characters, etc.

3) CHINESE CHARACTER SPLIT AND REPLACEMENT (CCSR)
Chinese characters can be divided into upper and lower structures or left and right structures. Due to the way humans read from left to right, the left-right split text is only slightly different for human observers than the original text. Therefore, we propose a method of Chinese characters split and replacement that uses split variants to replace the characters with left-right structure and then use homophones to replace the other characters. Although the glyphs of Chinese characters have changed, humans can still accurately grasp the semantics of sentences through context. CCSR first constructs a dictionary manually which contains all Chinese characters with left-right structure and the split Chinese character variants. The original text is replaced with the variants by comparing the text with the dictionary. Meanwhile, another dictionary of homophones is constructed manuallyto ensure that every Chinese character can find a homophone that can be replaced.

E. GENERAL DESCRIPTION OF THE ALGORITHM
Based on the above-mentioned word contribution value calculation algorithm and keyword modification strategy, we propose a text adversarial sample generation method for Chinese characters. First, we perform word segmentation on the text and then divide the text into clauses to obtain the clause set S seg ; delete each clause s i of the original text in turn, and scrutinize whether the predicted label is the same as the original label. If it is different, add s i into candidate key sentence set S ; Secondly, tag the clauses in the candidate sentence set and delete the meaningless part of speech POS = {prep., pron., num., art.} to get the candidate keyword set X . Calculate the contribution scores C of each keyword in descending order. Finally, we take the keyword modification strategy function T (·) to modify the keywords and predict the labels respectively. σ is the set maximum modification threshold that within the threshold range the operation amplitude changes dynamically. If the predicted label changes, an adversarial example is successfully generated and we no longer modify the text. T (·) can be any of the three keyword modification methods, and Cost(·) is the cumulative frequency of text modification.
The WordChange algorithm is described as follows:

IV. EXPERIMENTAL RESULTS AND ANALYSIS OF TEXT SENTIMENT CLASSIFICATION
Sentiment analysis is also called opinion mining that is the process of analyzing, processing, inducing, and inferring subjective texts with emotional color [35]. Sentiment analysis text is a kind of subjective text with emotion, including human attitudes and opinions on entities such as products, services, organizations, etc. Potential users can browse the commentary text to understand the views of the public. In this section, we evaluate the effectiveness of adversarial text generated from sentiment analysis datasets. Firstly we introduce the experimental datasets, models, baseline methods and evaluation criteria; then evaluate the experimental results and analyze the effectiveness of the proposed method; finally, transfer the generated adversarial text to the Chinese sentiment analysis platform to observe the transfer performance.

A. EXPERIMENTAL SETUP
We use two public benchmark datasets as the experimental data for the adversarial sample of sentiment analysis: Ctrip Hotel Reviews dataset 1 and JD.com product review dataset 1 . Both sets of data use 1 and −1 to represent positive and negative samples. The specific dataset information is shown in Table 2.
To evaluate the effectiveness of the method intuitively, a Word-LSTM (Long Short-Term Memory Network) [16] model is used as the text-based attack target. Because the LSTM model has a good performance on natural language processing tasks and can measure the effectiveness of our method better. The network contains a random embedding layer to accept word input. The embedding vectors are then fed through five LSTM layers where each layer has 100 hidden nodes. The hidden state of LSTM layers is fed to the fully connected layer with a LogSoftMax activation function to get the final classification confidence value. We set the learning rate to 0.0005, the batch size to 128, and the maximum number of epochs to 20 during training. In deep learning modeling, the unknown words will be mapped to the ''unknown'' embedding vector. The maximum modification threshold is set to 30. Attack performance is measured by the accuracy of classification. The lower classifification accuracy of the model, more effective the attack method is. Table 3 summarizes the experimental results and performance compared with WordHangdling [33] and DeepWordBug [29].

1) BASIC EXPERIMENTS
The keyword contribution value calculation algorithm and three different modification strategies proposed by our methodhave achieved good attack results on two sets of data sets, and the effect is better than the baselines. The CCE strategy can achieve an average decrease of 32.94%, the CI strategy can achieve an average decrease of 44.41%, and the CCSR can reduce the classification accuracy by 45.44%. In summary, the WordChange method can effectively  generate adversarial text with great performance, and use three keyword modification strategies to implement a variety of attacks.

2) THRESHOLD ANALYSIS
The operation threshold σ is a dynamic parameter that represents the maximum number of keywords modifications. To explore the impact of thresholds on the utility of the generated adversarial text, experiments were performed on different thresholds. We take the same experimental conditions and parameters as WordHandling,select 1000 pieces of data longer than 120 words, and the maximum modification range is also set to 30. The accuracy of adversarial examples with different operating thresholds on sentiment analysis datasetsis summarized in Figure 2. With the increase of the threshold, the text-modifiable operating space continues to increase. When the threshold reaches 15, the model accuracy becomes stable. It proved that our method can be used in a smaller operating space than WordHandlingand achieve more effective attacks.

3) ADVERSARIAL SAMPLE QUALITY
To measure the quality of the adversarial samples generated by WordChange, Word Mover's Distance (WMD) [36] method was used to test the similarity between the generated text and the original text. The smaller the WMD score, the higher the similarity between the texts. In the three modification strategies, 2000 pieces of data were randomly selected for testing, and we set the same experimental conditions as WordHandling. Table 4 shows the proportion of data in each interval of the WMD score. The score occupies the largest proportion in the 0-0.2 interval, which verifies that the sample generated byWordChange has higher quality. Note that the adversarial text generated by the CCSR modification strategy in Table 3 has the best attack performance. However, the CCSR method has the worst adversarial sample quality in Table 4. This is because CCSR method is slightly stronger in modifying words than CCE and CI methods, so the quality of the text is not as good as the other two methods.

C. DISCUSSION AND ANALYSIS 1) IMPACT OF DIFFERENT WORD SEGMENTATION METHODS
Word segmentation refers to the process of recombining consecutive sequences into word sequences by certain specifications. In English, spaces are used as natural delimiters between words. Sentencesand paragraphs can be easilyseparated by obvious delimiters but words do not have a formal delimiter in Chinese. The Chinese word segmentation is much more complex and difficult than English, and it has gradually become a research hotspot. We research the impact of the common word segmentation methods such as jieba, THULAC [37], and FoolNLTK on the generation of adversarial texts. Due to different specific word segmentation    algorithms, the word segmentation results of the same sentence are different. Table 5 shows examples of different word segmentation results.
Different word segmentation strategies may also have an impact on the generation of adversarial samples. We generate adversarial text for the aboveword segmentation methods and explores the difference in their ability to deceive classification models. The experimental results are shown in Figure 3 and Figure 4. The adversarial text generated by different word segmentation methods effectively reduced the accuracy of the classifier, and the accuracy difference among them is not large, which illustrates that our attack method can be applied to multiple word segmentation strategies.

2) ANALYSIS OF MODIFICATION STRATEGIES
We also explore the performance of several modification strategies, namely homophone replacement strategy (HR), Chinese character splitting (CCS), and Tongue-flatted or Tongue-rolled Pronunciationreplacement (TTPR). This part of the experiment is to explore the diversity of Chinese adversarial text generation strategies, but also provides more ideasfor future defense work. Figure 5 summarizes the VOLUME 8, 2020  experimental results for all modification strategies on sentiment analysis datasets.
•HR: The homophonereplacement strategy is a modification strategy used in WordHandling which means two Chinese characters have the same pinyin code. We expand the homophone replacement dictionary to a certain extent and almost cover all Chinese with homophones. The experimental results show that the model effect can be reduced by 44.9% on average.
•CCS: Since there are not many detachable Chinese characters with left-right structures, we combine Chinese character splitting(CCS) and homophonereplacement(HR) as Chinese character split and replacement(CCSR) strategy above, which can avoid the situation where the targeted keywords cannot be completely modified and reduce the unreadable text replaced by too many homophones. Although the splitting method may not be able to modify all the keywords like other strategies, the classification performance of the model can still reduce the average performance of the model by 31.36% •TTPR: Tongue-flatted or Tongue-rolledPronunciation is a unique characteristic of Chinese characters. The so called tongue-flflatted pronunciation refers to issue the z, c, s (pinyin code) that the tongue protrudes flatly against or near the upper teeth. The tongue-rolledpronunciation refers to the tip of the tongue rising, touching or approaching the front hard palate, and issue the zh, ch, sh and r (pinyin code). Tongue-flatted and tongue-rolledare issued different from each other but sound similar. Inspired by this, the replacement of tongue-flatted or tongue-rolled pronunciation is also understandable through pronunciation association and contextphrase. TTRP does not modify the keywords comprehensivelycause Chinese characters have a limited number of tongue-flatted or tongue-rolled pronunciation, but it also has a certain attack performance that reduces the classification accuracy by 16%.

3) TRANSFERABILITY
The adversarial samples generated for one classification model can also successfully fool other classification models with the same task, indicating that the adversarial samples are transferable. In the field of computer vision, Papernot et al. [38] have confirmed that generating adversarial examplesby producingwhite-box attacks on an alternative model, an effective black-box attack can be implemented on the target model. In the natural language domain, the transferability of Chinese adversarial texts is also effective.
To investigate whether the Chinese adversarial text has this attribute, this article saves the adversarial text generated on the LSTM [16] model and evaluates their effect on other models/platforms. Due to the results in the threshold analysis experiment, a better attack effect and smaller text modifications can be obtained when the threshold was 15. Therefore, we set the experimental operation threshold to 15. We applied two deep learning classification networks, TextCNN [39] and DPCNN [40], as the models to which our generated adversarial samples transfered. During the training of the two networks, the learning rate was 0.001, the batch size was 64, and the maximum epoch was 50. We also added two Chinese sentiment analysis APIs (Baidu AI https://ai.baidu.com/tech/nlp/sentiment_classify sentiment platform and Tencent Cloud 2 sentiment analysis platform) as migration platforms. The results are shown in Table 6.
As observed in Table 6, the accuracy of the classification results is all reduced in the transferability evaluation of two datasets. Most adversarial texts can be successfully migrated to other models or even text detection platforms. For example, the adversarial texts generated by the Ctrip dataset have a success rate of 66.55% when attacking the DPCNN model, and the original accuracy rate is above 96%. The reduction in classification accuracy can reach a maximum of 34.75% on DPCNN model. Consequently, the adversarial text generated by WordChange can successfully implement adversarial attacks across multiple models and platforms. In particular, for the services provided by Tencent Cloud, the CI strategy cannot completely reduce its classification accuracy. We guess that the service will filter out all the useless special characters in Chinese text when preprocessing the input data. Overall, the CCSR strategy has the best transferability performance, which illustrates that the adversarial  examples generated by focusing on features of Chinese can achieve more effective attacks.

4) COMPARISON OF ATTACKS ON OTHER TEXT CLASSIFICATION MODELS
In the experimental setup, we take into account the better classification performance of the LSTM model, and adversarial attacks on it can effectively evaluate our method. Therefore, the experiments are all performed on LSTM models in this paper. In this section, we further verify the effectiveness of our method on TextCNN [39]. We used the same dataset to train the TextCNN network and get a pre-trained model. During training, the learning rate was 0.001, the batch size was 64, and the maximum epoch was 50. Table 7 shows the comparison of the experimental results on TextCNN [39] and LSTM [16]. As observed in Table 7, the attack performance on the TextCNN model is slightly less than that of the LSTM model. We think it is because the original classification accuracy of the TextCNN model is relatively low. In summary, our method can effectively attack LSTM model as well as TextCNN model.

V. EXPERIMENT RESULTS AND ANALYSIS OF SPAM DETECTION
Spams can easily contain some false information (advertising, financing promotion, gambling information, etc.). When an attacker adds a counter sample to the email, it will cause the detection system to incorrectly divide spams intonormal emailsor classify normal mails as spams, which will increase the probability of users clicking on virus-carrying emails and affect network security. Exploring the security issues against spamadversarial samples effectively promotes the robustness of deep models and can also evaluate the universality of our methodcomprehensively. In this section,we mainly show the performance of adversarial text generated onthe spam dataset.

A. EXPERIMENTAL SETUP
We use apublicspam corpus consisted of an English dataset (trec06p) and a Chinese dataset (trec06c) 3 from the International Text Retrieval Conference. The trec06c was cleaned and the encoding format was converted to utf-8format as the experimental dataset, the specific information is shown in Table 8.  We take a total of selected 10001 data with spams and normal emails as samples. The spam category is marked as −1, and the normal email category is marked as 1. The target modelsare the same as in Chapter 4. The attack performance is measured by the accuracy rate of spam detection, that is, the spam is not consistent with the actual label of the original email, indicating that the method can successfully spoof spam detection systems to achieve attacks.

B. EXPERIMENT RESULTS
We take a randomly choice of words as a benchmark method for comparison. Meawhile, TF-IDF [41] and TextRank [42] are also used as the benchmark keyword selection algorithm. Table 9 summarizes the results of the attack on the LSTM [16] model and the performance from different modification strategies.
As observed in Table 9, a high attack success rate can also be achieved on spam detection. Compared with the baseline methods, we can intuitively observe the experimental results of our keyword contribution value algorithm on different modification strategies that demonstrate the superiority of the proposed approach. Figure 6 shows the effect of different operation threshold σ on the performance of adversarial text. The accuracy ofthe spam dataset gradually decreases as the operation threshold increases. With a threshold of 30, the adversarial text has the highest fool rate.

C. DISCUSSION AND ANALYSIS
Through the analysis of the above experimental results, it can be initially observed that the spam dataset and the sentiment analysis dataset have similar experimental results. However, during the experiment, the performance of the positive and negative samples on the spam dataset is particularlydifferent. To explore this issue, weevaluate the success rate of 1500 positive and negative samples separately. Figure 7 shows the results of the three modification strategies on the normal mails and the spams.
The accuracy rate of spams gradually decreases to less than 10%, while the normal email remains at 90%. We believe that: the content of spam is mostly commercial advertising, porn marketing, scams or phishing sites. The feature of the spamis relatively singular and concentrated, while the content of the normal email is more extensive and diverse.
To further explore the reasons, we randomly select 3,000 spams and 3,000 normal emails from the training data and testing data, and they were made into a word cloud andobserved the keywords that appear more frequently in the text, as shown in Figure 8. The results show that the high-frequency wordsof spamsare concentrated in '' '' (electronic technology), '' '' (international), '' '' (service), '' '' (hotline), etc., and the information of normal email is more discrete and common. After adding disturbance to the keywords of the spam, the  remaining text will get a higher positive score, while normal email still has many normal textswhich are insufficient to obtain a higher score for predicting to be spam.

VI. CONCLUSION
In this paper, we propose a Chinese adversarial text generationstrategy based on multiple modification strategies named WordChange. It is efficiently and accurately misleading the classification model under the black-box condition of unknown model details to deceive security systems. Our method first implements the text filtering operation and filters words with no actual semantics to form a candidate keyword pool; then uses the keyword contribution score to calculate the importance of the words. The approach of extracting keywords based on clauses can effectively reduce the search space and locate words more accurately. Meanwhile, we introduce Chinese character exchange strategies based on reading inertia thinking; character insertion strategy with adding disturbed symbols; and Chinese character split and replacement strategy based on glyph structure and pinyin characteristics. The experimental results show thatWord-Change can generate better and higher quality adversarial samples on both the sentiment analysis dataset and the spam dataset. The average classification accuracy of the LSTM [16] model is reduced by 45% and 48%. We also evaluate the effect of the adversarial samples based on multiple word segmentation processingswhich proves that our method is versatile. Besides, we expand more modifiable operations for Chinese text, such as Tongue-flatted or Tongue-rolled Pronunciation replacement, homophone replacement, etc. For other text classification models or online platforms, the transferable of adversarial samples also implies that they have vulnerabilities that can be attacked. We hope our study will provide more ideas and possibilities for further research on deep neural networksand Chinese natural language processing.