Sentiment Classification for Chinese Text Based on Interactive Multitask Learning

In this paper, an interactive multitask learning method for Chinese text sentiment classification is proposed. Here, the classic BiLSTM + attention + CRF model is used to obtain full use of the interaction relationship between tasks, and it simultaneously solves the two tasks of emotional dictionary expansion and sentiment classification. The proposed method divides text sentiment classification and emotional dictionary expansion into primary task and subtask, and it adopts the Enhanced Language Representation with Informative Entities (ERNIE) model as the text representation learning model for the primary task. Then, through the maximum pooling layer and the fully connected layer, the text sentiment classification task is completed. Meanwhile, the classical BiLSTM + attention + CRF model is used to extract emotional words from the text in the subtask. In addition, the multitask information interaction mechanism is used, and the prediction information on the autonomous subtask is fed back into the potential representation of the two tasks. After iterative training, the performance of the two tasks is further optimized. Micro-blogs with COVID-19 are used here as the subject to form the experimental data set. The results demonstrate the superiority of the proposed method over other approaches, and they further verify the superiority of ERNIE over BERT, RoBERTa and XLNet for the sentiment classification of Chinese text.


I. INTRODUCTION
Currently, with the increasing popularity of social media, people are accustomed to using the Internet as a place to vent their personal feelings. A large amount of text that is full of personal opinions and emotional inclinations appears on the Internet. As a very important part of public opinion research, text sentiment classification has been improving in recent years, and the related research has also been developing rapidly. Text sentiment classification technology is primarily divided into machine learning methods and emotional dictionary methods. At present, many studies have shown that machine learning-based methods have a better classification effect than dictionary-based ones [1], [2]. However, [3], [4] show that machine learning relies heavily on tagging data. With the rapid development of the Internet, the amount of unsupervised data increases exponentially every day.
The associate editor coordinating the review of this manuscript and approving it for publication was Chenguang Yang .
The labour costs of tagging data are also increasing [5]. The dictionary-based sentiment classification method [6] is an unsupervised method that can quickly process large amounts of data on emotion analysis. However, there are some problems with relying on an emotion dictionary alone to categorize text emotions.
1) Unlike the direct expression of emotion in English words, Chinese words may express different emotions in different contexts. For example, the two expressions '' '' (English meaning: what do you mean) and '' '' (English meaning: you are quite interesting) express two opposite emotions. If the classification is based solely on the '' '' (English meaning: mean) of the emotional word contained in the emotional dictionary, it is possible to make the wrong classification.
2). In social networks, there are various types of new buzzwords. For example, ' ' '' (English meaning: I want to commend the medical staff). In the popular language of the Internet, '' '' means ''praise''. It is a positive emotion word. In social networking, there are many similar words. Therefore, the emotional dictionary requires timely updates; otherwise, it will be prone to mistakes in text sentiment classification.
3). In different backgrounds, the same words play different roles in text sentiment classification. For example, in a general scenario, ''infection'' is a neutral word, and the emotions are evenly distributed. It does not play a leading role in text sentiment classification. However, in the context of COVID-19, ''infection'' can be more important.
Many researchers have proposed combining an emotional dictionary and machine learning to make up for the deficiencies in the two tools [7]- [11]. However, these studies either regard the construction of the sentiment dictionary and the sentiment classification of texts as two independent tasks [7]- [9], or the fixed emotional lexicon is regarded as a type of feature of the classification task and directly serves the emotional classification task [10], [11]. Addressing the two approaches as independent tasks to execute in a pipelined manner may result in the insufficient use of the joint information between these tasks [12]. However, if only the fixed emotional dictionary serves the emotion classification task as a feature of the classification task, it cannot meet the need to extend the emotional dictionary, and its adaptability is relatively poor [13]. Recent studies [12], [14], [15] show that the integration model can obtain comparable or even better results than the pipeline method. However, these studies are aimed at only the two tasks of concept extraction and conceptual emotion; moreover, in [14], [15], the two tasks were linked only through unified tags, and the correlation between the two tasks was not modelled. Therefore, we propose an interactive multitask learning model (IMT-SDSA) that makes full use of the interaction between the two tasks and simultaneously solves the two tasks of emotional dictionary extension and emotion classification. The IMT-SDSA model divides text sentiment classification and emotional dictionary expansion into primary task and subtask. In the primary task, the ERNIE model is used as a text representation learning model. ERNIE is the pre-training model proposed by Baidu [16]. It learns the semantic representation of the concept in the real world by learning the entity knowledge. By extending the training corpus, especially with the introduction of the forum dialogue corpus, it enhances the semantic expression ability of the model. In the subtask, the classical BiLSTM + attention + CRF model is used to extract emotional words from the text. IMT-SDSA also introduced a multitasking information interaction mechanism similar to the work of [12]. The prediction information for the autonomous task and the subtask is then fed back into the potential representations shared by the two tasks, and the performance of the two tasks is further optimized through iterative training. However, the models used in [12] are CNNs, which easily lose information from texts with long lengths; however, when monitoring online public opinion, the texts obtained from the Internet are not only short texts but also long ones. In this paper, the BiLSTM and ERNIE we chose are applicable to not only short texts but also long ones. The contributions of this paper include the following: 1). This paper proposes an interactive multitask learning model, which combines the two tasks of emotional dictionary expansion and sentiment classification. Micro-blog texts with COVID-19 as the subject are used as the corpus to verify the superior performance of the model and provide the basis for monitoring network public opinions.
2). The information interaction mechanism of the multitask learning network is introduced to maximize the interaction between tasks, and the experiments show that the information interaction mechanism can significantly improve the performance of the model in Chinese text sentiment classification.
3). The performance of the ERNIE model in Chinese text sentiment classification is compared with other state of the art models such as BERT, RoBERTa and XLNet. The results show that ERNIE won by a narrow margin in terms of performance. However, through our experiments, we find that ERNIE is more efficient than these other models.
This paper is organized as follows. Section 2 discusses related works. Section 3 focuses on the sentiment classification method for Chinese texts. Section 4 presents the procedures for the experiment conducted here and the experimental results. Finally, Section 5 concludes this paper and discusses future work.

A. SENTIMENT CLASSIFICATION
This section will introduce related work from two perspectives: emotion classification methods based on the emotional dictionary and machine learning. The emotion classification method based on an emotional dictionary uses the emotional words recorded in emotional dictionaries and their corresponding emotional scores or their emotional categories as emotional feature information. Moreover, these emotional features are calculated using a specially designed algorithm and are ultimately used for the sentiment classification of short texts to obtain the corresponding emotional polarity of short text. However, the current research on this method is mostly focused on the construction of the emotional dictionary [9], [17]- [19]. Liang W et al. constructed a slang dictionary to solve the problem of short text classification in which slang often appears in short texts [17]. GuiXian X et al. built an extensible dictionary of emotions that includes polysemous emotions and scene emotions in addition to basic emotions. They used the extended dictionary and designed rules for emotional scoring to categorize texts [9]. Martine H et al. collected fine-grained emotional scores from crowdsourcing coding to build a group-based dictionary for analysing statements by political parties and media reports [18]. Wu J et al. proposed a method that constructs multiple emotional dictionaries such as a basic emotional dictionary, emoji dictionary and semantic rule set to classify Chinese VOLUME 8, 2020 micro-blog emotions, but this method relies on an established semantic rule set [19].
Sentiment classification methods based on machine learning are further divided into statistical classification methods and deep learning classification methods. Statistical sentiment classification methods are primarily supervised learning methods. Commonly used models are the naive Bayes [20], random forest [21], support vector machine [21] and Markov blanket models [22]. In recent years, with the development of deep learning technology, research on sentiment classification has gradually begun to focus on the field of deep learning. Zhou J et al. proposed using the stacked BiLSTM model to extract the features of sequence words and apply them to the classification of emotions in micro-blogs, which achieved good results [23]. Zhang K et al. proposed an interactive attention-shifting network (IATN) for the classification of cross-domain emotions. The network includes an interactive attention-shifting mechanism and can better transfer emotional features across domains by combining the information of sentences and concepts [24]. In addition, various advanced pre-training language models have also been applied to sentiment classification research [25], [26]. Song Y et al. added a recursive attention network to BERT to model context and targets, and good results were achieved [25]. On the basis of XLNet, Xinrong G et al. improved the process by combining the advantages of the pre-training language model and the generalized learning system, and they designed a new architecture to extract the features of the deep context and randomly search the high-level context in the generalized space to complete the emotion analysis task [26].

B. MULTITASK LEARNING
Multitask learning has been successful in many fields of AI, such as natural language processing [27]- [34], speech recognition [35], [36] and computer vision [37], [38]. Many researchers have applied multitask learning to sentiment classification and achieved good results [29]- [34]. Guangquan L et al. placed text sequence representation tasks along with binary and ternary classification tasks into a multitask learning framework. The framework learned the representation of input sequences using variational autoencoders (VAEs) and then performed a binary classification task and a ternary classification task, respectively [29]. The work of [31] puts text sentiment analysis and satire detection into a multitask learning framework and uses a neural network to simulate the correlation between two tasks. The work of [32] addresses multimodal problems using a multitask learning framework. The model includes a classifier for text analysis, a classifier for analysing images, and a classifier for prediction by combining two models. To solve the problem of an unbalanced distribution of emotions in supervised learning methods, Fangzhao W et al. proposed an effective non-balanced emotion classification method, which extracted multiple balanced subsets from unbalanced training data and synergistically learned the robust emotion classifier of these subsets through a multitask learning framework. To enhance the learning ability of emotion classifiers in non-equilibrium situations, they also extracted the prior knowledge of emotional expressions from the existing emotional vocabulary and a large amount of unmarked data and applied them to the proposed method [33]. Li and Laim proposed a novel LSTMbased deep multitask learning framework for aspect term extraction by user review sentences, which is named MIN, and they designed two LSTMs with extended memory and neural memory operations to process aspects and viewpoints through memory interaction [34].
The above research does not fully consider the interaction between tasks. In this paper, an interactive multitask learning model is proposed to integrate emotional dictionary expansion with text sentiment classification. It takes full advantage of the relationship between the two tasks and iterates the prediction results of the two tasks through an information transmission mechanism to improve the performance of the model. This model not only solves the problem of sentiment classification in Chinese texts but also solves the problem of the dynamic expansion of emotional dictionaries in other fields.

III. METHOD
In this section, we will introduce the concrete construction of the IMT-SDSA model.
Task definition: We can regard emotion classification and sentiment dictionary expansion as sequence annotation task. Suppose there is a sequence of words T = {w 1 , w 2 , . . . ., w n }. Here, w i represents a single character in Chinese, such as the single character '' '' in the word '' (English meaning: mean)''. There is an emotional tag set y sen = {neg, pos, neu}. The purpose of sentiment classification is to judge the corresponding tags y sen i of T . The purpose of extending the sentiment dictionary is to extract words from the text that affect the classification result. These words are either single words or phrases with multiple words, and so it can also be regarded as a sequence tagging problem using BIO tags. In particular, we use tag sets To solve these two tasks, we introduce a multitask learning network, which combines the two tasks of emotional dictionary expansion and text sentiment classification. The network can improve the performance of the two tasks by facilitating the collaboration between them. We divide these two tasks into primary and secondary tasks. The primary task is emotion classification, and the secondary task is emotional dictionary extension. The structural framework of IMT-SDSA is shown in Figure 1.
As shown in Figure 1, our model contains five parts, namely the feature extraction part, subtask part, primary task FIGURE 1. Structural framework of IMT-SDSA. The representation h t i of a character w i in the text is obtained through CNN and shared to the primary and auxiliary tasks. In the auxiliary task, the new feature representation, which is coloured blue and obtained by BiLSTM layer, is passed to the attention layer, the new feature representation c i of each character that is coloured deep blue is obtained by calculating the relationship between characters and the global feature r g i of the document level, which is obtained by the attention layer. The c i is passed to the CRF and ERNIE model in the primary task, c i , h t i , and the features obtained through the dictionary are concatenated together as the input of the ERNIE model, to obtain the new feature representation using orange colour, and then it predicts the emotion classification result through a max-pooling layer and a full connection layer. The prediction results of the primary and auxiliary tasks are transferred to the shared feature h t i through the full connection layer, and the best prediction results are obtained by t-iteration. At this time, the prediction result for the t-iteration of the auxiliary task is inserted into the dictionary to complete the expansion.
part, joint learning part and information interaction mechanism. Next, we will introduce these five parts.

A. FEATURE EXTRACTION
Unlike English, the basic unit in Chinese is a single character. Therefore, when processing Chinese text, the first thing to do is to segment the text and then input the words into the language model to obtain the vectors of the words. However, there are usually two problems with this approach.
1) The segmentation of longer words; for example, '' '' (English meaning: COVID-19) is an independent term, but it is usually divided into '' '', '' '' and '' ''. 2) On social networks, there are often many network terms that are not on the normal vocabulary list, such as '' '' (English meaning: interesting), '' '' (English meaning: exciting), '' '' (English meaning: sister) and others. Therefore, in this paper, instead of using words as the model input, we use character vectors as input. Let S T be the embedding matrix representing the character sequence T , where S T = [w 1 w 2 . . . w n ]. We use a one-dimensional convolution operation based on a filter F ∈ R d * l to extract the advanced features of the character embedding matrix S T . Here, d represents the dimension of the character vector and l represents the length of the filter window. To solve the problem of longer words, we use l = {2, 3, 4, 5, 6, 7}.
For each character-embedded window c i:i+l−1 , the corresponding features z il can be obtained by Then, the new feature of character w i is obtained through the max-pooling layer, i.e., When t = 1, h cnn i is the same as h t i , which is the output of the feature extraction part and the input of the primary and secondary tasks when t ≥ 2, and the calculation method for h t i will be introduced later.

B. SECONDARY TASK
The secondary task is to classify words that affect the input sequence sentiment class through the BiLSTM + Attention + CRF model. The input is the new feature representation of the character sequence {h t 1 , h t 2 , . . . , h t n }, and the output is the corresponding sequence tag{y sd 1 , y sd 2 , . . . ., y sd n }. The influence of a character should be calculated from the level of the whole character sequence. Therefore, we have adopted the document level entity recognition model proposed in our previous study [39], which is based on the classical BiLSTM-CRF model, to help the attention mechanism to include the correlation between the current word and all the other words in the entire sequence. The global character representation of the character in the character sequence is obtained. The process is described as At this point, a global feature representation r g i can be obtained by The output of the attention layer c i can be expressed as After taking c i as the input of the CRF layer, we find p(y j |S T ) = e score(S T ,y j ) j e score(S T ,y j ) where T y i−1 ,y i is the conversion score of the label from y sd i−1 to y sd i , score() is used to calculate the score of the tag sequence for input sequence T , and y j represents all possible output tag sequences. W represents the model parameters. Unlike previous work [39], we have adjusted the loss function of the model, and it will be introduced during the joint learning part. In the t th iteration, we combine the characters with emotional labels after classification to form emotional words, store them in emotional dictionaries, and build and expand the emotional dictionaries.

C. PRIMARY TASK
In this section, we will introduce a Chinese sentiment classification model that is integrated into the emotional dictionary as our primary task model. It introduces the emotional dictionary on the basis of the ERNIE model and combines the features calculated by the emotional dictionary with the features h t i obtained by the feature extraction module to form a new feature representation as the input of the ERNIE model. Therefore, we will introduce the feature construction based on the sentiment dictionary and the model framework.

1) FEATURE CONSTRUCTION BASED ON SENTIMENT DICTIONARY
We use n-gram templates to match w i and its context. Since the emotional dictionary contains only two types of emotional words, negative and positive, we use a 2-dimensional vector to mark whether the segmented phrase matches a certain emotional word in the dictionary. Here, we set n ≤ 5. Each w i obtains the 18-dimensional feature vector f i , and it uses the 2-d vector f i,j to represent the positive and negative labels of the emotional words in emotional dictionary D, as shown in Figure 2.
Here, ⊕ represents the connection operation.

2) MODEL OF PRIMARY TASK
We use the ERNIE model, the maximum pooling layer and the fully connected network to complete the sentiment classification of Chinese text. We will not introduce more details here about the specific principle underlying ERNIE. The feature d i acquired from the emotional dictionary is connected with the feature c i acquired through the secondary task and the feature h t i obtained by feature extraction to obtain the new feature h new T − Encoder(·) is a multi-layer bidirectional transformer encoder. K − Encoder(·) is a knowledge encoder that injects knowledge into language expression. ERNIE is embedded with many knowledge bases and can automatically match the input sequence. These matching entities receive their corresponding vector representation {e 1 , e 2 , . . . , e m } through the TransE model. We employ the largest feature that can represent the input sequence through a max-pooling layer. The eigenvalue is input into a fully connected layer to achieve sentiment classification.

D. INFORMATION INTERACTION MECHANISM
For better information interaction between the two tasks, the information interaction mechanism updates the prediction results generated by the two tasks in the previous iteration to the shared potential features {h t 1 , h t 2 , . . . , h t n } of the two tasks. At the time of iteration t, the update form of h t i is shown as where represents all the parameters in the fully connected layer, and t ≥ 2, y sd(t−1) i , y sen(t−1) i represent the predicted results of the secondary and primary tasks, respectively, in the t-1 iteration.
Thus, in this iteration calculation, we will share the prediction results generated by the two tasks in the previous iteration so that the prediction results can be adjusted according to the results of the iterations generated by the previous round, and we will constantly improve the performance.

E. JOINT LEARNING
We use aux and main to represent all the parameters of the auxiliary task model and the primary task model, respectively. There are some parameters shared by aux and main , such as the parameters used to extract a single character feature with a CNN and parameter in the information interaction mechanism.
During the auxiliary task, the probability vector P(y k ) t = [p(y 1 ) t , p(y 2 ) t , . . . .p(y k ) t ] of all possible annotated sequences is exported, where p(y j = 1|S T ) t represents the probability that the given j-th tagging sequence is the true annotation and t is the maximum number of iterations. The j-th tagging sequence Y k = {y 1 , y 2 , . . . .y j , . . . , y k } is the representation of the true annotation. This is a one-hot encoding vector. The auxiliary task loss function can be defined as Equation (17). (17) In the primary task, the output is the emotional tag y sen i of the character sequence. The loss function of the primary task can be defined as Equation (18).
[y sen i log p(y sen i ) t + (1 − y sen i ) log(1 − p(y sen i ) t )] (18) In Equation (18), c represents the number of emotional categories. p(y sen i = 1) t represents the probability of real emotion labels as y sen i , t is the maximum number of iterations, and y sen i is a one-hot vector with dimension c. All the parameters in the model are combined together to learn by minimizing the total loss function. The total loss function is defined as (19) where λ is the weight parameter and λ ∈ [0, 1].

IV. EXPERIMENT
The experiment is divided into 4 small experiments. The first experiment is used to obtain the weight λ of the total loss function and the number of iterations t. In the second experiment, the superiority of the IMT-SDSA model is verified by comparing the IMT-SDSA model proposed in this paper with existing advanced models. Then, the superiority of the multitask learning framework is verified by comparing the IMT-SDSA model with the IMT-SDSA-part that keeps only the primary task and IMT-SDSA-auxiliary part that keeps only the auxiliary task. Finally, the ERNIE models in the IMT-SDSA-part are replaced by BERT, XLNet, and RoBERTa.
We verify the superiority of the ERNIE model for use in Chinese text sentiment classification.

A. DATA SET
In view of the fact that COVID-19 has evolved into a global public event, a related public opinion monitoring problem is imminent. Therefore, the data set used in the experiment comes from micro-blog text on COVID-19. There are sixty thousand articles, and artificial annotations are made. The total number of positive emotions expressed in the text is 25392, and there are 16902 texts expressing negative emotions. There are 17706 neutral texts. In this paper, a 1∼50-character text is defined as an ultra-short text. A text with 51∼200 characters is defined as a short text. A text with 201∼300 characters is defined as a middle-short text. The proportions of the three types of texts in the data set are shown in Figure 3. In addition, when constructing the emotional dictionary, we used the Chinese commendatory Dictionary of Li Jun as the basic sentiment dictionary, and we randomly extracted 10% from the positive, negative and neutral text databases. Based on the basic sentiment dictionary, we made positive and negative emotional annotations on a total of approximately 27000 words in the 6000 texts, which are used as the training data.

B. EXPERIMENTAL SETUP
In this experiment, the dimension of all the feature vectors is set to 300, which is obtained through word2vec training; the best λ is found through Grid search. The step size is set to 0.05. The number of neural units in BiLSTM is set to 1000, the learning rate is set to 10 −3 , and the value of l 2 is set to 10 −5 . To avoid overfitting, we adopted dropout technology. The dropout values of the BiLSTM and attention layers are 0.3 and 0.5, respectively. The Adam weight decay strategy VOLUME 8, 2020 of the migration optimization was adopted for ERNIE, BERT, XLNet and the other models used in the experiment. We set the maximum learning rate learning_rate = 5e −5 to avoid overfitting, and we set the weight attenuation weight_decay = 0.01. The proportion of training preheating is set to warmup_proportion = 0.1. Thus, the first 10% of the training will gradually increase to the learning_rate. We set the number of epochs num_epoch = 3 in each training session, and the batch_size was 128. To understand the training situation in real time, we set eval_interval = 100, that is, after every 100 rounds of validation, we verified a score and preserved the optimal model. The other benchmark model parameters used in this experiment follow their original parameters.
Our experiments were performed on a Tesla V100 GPU with 16 GB of memory, and our PC had 64 GB of memory.

C. EXPERIMENTAL RESULTS AND ANALYSIS
First, the weight of Loss main in the total loss function Loss all and the optimal number of iterations t in the message-passing mechanism are obtained. The value range of λ is [0, -1], and we set the weight value from low to high with a step size of 0.05. The Precision (P), Recall (R), and F-measure (F) are used as the evaluation indicators of IMT-SDSA. The performance of IMT-SDSA with different values of λ is shown in Figure 4.  Figure 4, when the value of λ is 0.7, the performance of the model is optimal, and the F value is approximately 0.84 at that time.

As shown in
TABLE 1 shows the optimal number of iterations t in the messaging mechanism, and the value range is set to [0,5]. When t = 0, there is no message-passing mechanism. We found that when the number of iterations exceeded 5, the performance of the model had not changed significantly, but the running time of the model was affected.
From TABLE 1, we can observe that when the number of iterations is 3, the performance of the model is optimal; as the number of iterations continues to increase, the performance of the model does not change significantly, but it shows a downward trend. When t = 5, the F-measure of the model decreased by 0.004 compared with that of t = 3, and the running time also doubled. From TABLE 1, we can also observe that when T ≤ 3, the F-measure of the model presents a gradual upward trend, and the F-measure at t = 3 increases by 0.043 compared with that at t = 0. When the number of iterations is 0, the performance of the model is the worst, and the performance of the model is significantly improved as the number of iterations increases, which indicates that the message-passing mechanism is conducive to improving the performance of the model.
Then, to verify the superiority of the IMT-SDSA model for Chinese text sentiment classification in the context of COVID-19, the benchmark models used in this experiment include the following. 1) The AEN-BERT model proposed in [25]: this model is based on the BERT model, and it uses a recursive free attention encoder network (AEN). In this network, the attention mechanism-based encoder is used to model the context and the target. 2) A method of combining the word2vec model with the stacked BiLSTM model is proposed by [23] to analyse the emotions in Chinese microblogs. We selected the best CBOW+ stacked BiLSTM model as the benchmark model.
3) The HSSC model proposed by [40] was used. The model is an end-to-end hierarchical text summarization and sentiment classification learning model. On this basis, the emotion classification layer is placed on the text summary layer, and the hierarchical structure is derived. The benchmark model was applied to the data set we collected, along with the proposed model (IMT-SDSA). A comparison of results for the four models are shown in TABLE 2. From the comparison of results in TABLE 2, we can observe that IMT-SDSA has better performance than the other three benchmark models. As shown in TABLE 2, the F-measure of the IMT-SDSA model was 0.192 higher than the weakest CBOW + stacked BiLSTM model and 0.015 higher than AEN-BERT, which has the best performance among the three other models. Through the analysis, we found that the input of AEN-BERT must be divided into two parts, that is, the ''context part'' and the ''target word'' part, through which the goal of sentiment classification can be achieved by performing a two-part operation. When the two parts of the input are consistent, the Inter-MHA part of the model loses its meaning. Even though the multi-head self-attention mechanism is still retained, its performance is poor when classifying long texts. This finding is also verified in the statistical error samples. Basically, the effect of text categorization is bad when the length of text is more than 150 characters. For the CBOW + stacked BiLSTM model, to ensure that the model can obtain the same size input, it is necessary to standardize the length of the text. Therefore, the model sets the average length of the sentence and eliminates the redundant words from the texts that exceed the average length. Instead, it is supplemented by 0 s until the length reaches the average length. In the literature, the average length is 13, and so the model performs well when addressing ultrashort text. However, in the corpus we collected, the average length of the text is 105.2764, which is much different from the text length processed by the model. In many people's writing habits, the first few sentences of a longer text may not express true feelings. Here is an example: ''  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  , , , , '' (English meaning: early in the morning, it was a bit overcast. After breakfast, I stood by the window and looked at the nearby street. There were no people there. Because of the new crown pneumonia, the New Year atmosphere had disappeared. This is really the most deserted Spring Festival in my life, but fortunately, I can still be with my parents, we haven't stayed together for a long time, just like this doing housework together, reading books, and chatting. I believe everything will pass. When the spring flowers are blooming, we can walk out of the house as before, look at the flowers, and enjoy the beautiful scenery). At this point, if we judge from only the first 20 words of the text, we can conclude that this is a neutral emotion, but in fact, the text expresses a positive emotion.
Therefore, the tool is rather weak for short text and middleshort text. At this moment, the F-measure of the model is 0.651. The HSSC model primarily extracts the abstract from the text and then classifies the text based on the abstract. Therefore, when we train the model, we extract 1000 texts from the training corpus. After analysing the error text, the performance of the model was found to be generally poor because the writing in micro-blog text is relatively free and undisciplined. It is more difficult to obtain important information when obtaining the abstract, which is different from the experimental data used in [39]. The experimental data used in [39] is aimed at the evaluation of commodities.
The texts have fewer words and focuses on the description of a commodity, so it is easier to obtain the key information when obtaining the abstract.
Next, we compare the performance of the IMT-SDSA model, the emotional classification part (IMT-SDSA-primary part) and the sentiment dictionary extension part (IMT-SDSA-auxiliary part) on the experimental data set to verify the superiority of the multitask learning framework. The results are shown in TABLE 3 and TABLE 4.  Clearly, the performance of the IMT-SDSA model is better than that of the IMT-SDSA-part. By analysing the samples of the IMT-SDSA-part classification errors, we found that the model performs more poorly when classifying some of the less emotional sentences in the training corpus. For example, '' , '' (English meaning: my brother had a fever yesterday, but he is well today) should be classified as positive emotion words. However, when the IMT-SDSA-part classifies this phrase, it predicts that the tag is neutral and apparently does not address '' '' (this is an Internet term that means good) as a positive sentiment word. Therefore, there is a classification error. IMT-SDSA predicts the distribution of emotions through each task in the sequence through auxiliary tasks, and it predicts strong positive emotions in the four characters '' '' (English meaning: good). The distribution of the emotions across other words is more uniform. This finding indicates that ''wen de yi pi'' is an important word that can represent the emotion of this sentence. IMT-SDSA can aggregate this knowledge through the message-passing mechanism so that we can use this knowledge in a later iteration to obtain the correct classification. Similarly, TABLE 4 also indicates that our multitask learning framework also plays a role in improving the performance of a single sentiment dictionary expansion. In this article, we regard the sentiment dictionary expansion as a sequence labelling problem, and regarding the performance of the global features at the document level VOLUME 8, 2020 in the sequence labelling problem, we have addressed it in previous work [39].
Finally, we will replace ERNIE in the primary task of the IMT-SDSA part model with BERT, XLNet, and RoBERTa, which are also presently used, to verify the superiority of the ERNIE model in conducting Chinese text sentiment classification. In TABLE 5, we use the ERNIE, BERT, XLNet and RoBERTa models as replacements. The experimental results are shown in TABLE 5. From the results in TABLE 5, we can observe that ERNIE won by a narrow margin in terms of performance. However, with experiments, we find that ERNIE is more efficient than other models. In using 50000 training samples as an example, the training time of ERNIE is 21.3 minutes, that of XLNet is 33 minutes, and BERT and RoBERTa take 46.2 minutes and 48.4 minutes, respectively. Therefore, in the sentiment analysis task for Chinese text, the performance of ERNIE is superior to those of other models.

V. CONCLUSION
In this paper, a multitask learning framework is proposed, and it integrates emotional dictionary expansion and text sentiment classification to improve sentiment classification performance for Chinese text. An information transfer mechanism is designed to improve the prediction results of the two tasks and enhance the correlation between the two tasks. Experiments show that the proposed method has a better performance than other benchmark models. In addition, we also demonstrate the superiority of the ERNIE model over other advanced models such as BERT, RoBERTa and XLNet in Chinese text sentiment classification. In social media, in addition to expressing emotion through words, people also like to express their feelings through emoticons and various abbreviations, such as ''awsl'' (English meaning: that's great, I'm blown away). Therefore, we will introduce more features in future studies to improve the performance of the model. In addition, as mentioned in the reference [41], we should pay attention to whether the privacy of users is violated when we collect these texts from the Internet for emotion analysis research, which is a question that is worth consideration.