Emotion Wheel Attention-Based Emotion Distribution Learning

Emotion distribution learning is an effective multi-emotion analysis model proposed in recent years. Its core idea is to record the expression degree of examples on each emotion through emotion distribution, which is suitable for handling emotion analysis tasks with emotional ambiguity. To solve the problem that the prior knowledge of emotion psychology is seldom considered in the existing emotion distribution learning methods, we propose an Emotion Wheel Attention based Emotion Distribution Learning (EWA-EDL) model. EWA-EDL generates a prior emotion distribution describing the relevance of emotional psychology for each basic emotion, and then directly integrates the prior knowledge based on the emotion wheel into the deep neural network through the attention mechanism. The deep network of EWA-EDL is trained using an end-to-end approach to learn both emotion distribution prediction and emotion classification tasks. The EWA-EDL architecture includes five main parts: input layer, convolutional layer, pooling layer, attention layer and multi-task loss layer. Extensive comparative experiments on 8 commonly used textual emotion datasets show that EWA-EDL outperforms the comparison emotion distribution learning methods on both emotion distribution prediction and emotion classification task.


I. INTRODUCTION
The task goal of emotion analysis is to uncover the emotional tendencies of people embedded in data [1], [2], which has important applications in many emerging artificial intelligence scenarios such as personalized recommendation [3] and intelligent customer service systems [4]. Emotion analysis mainly consists of three subtasks, namely emotion information extraction, emotion information classification, and emotion information retrieval and summarization [2]. The research in this paper focuses on emotion information classification, with the goal of improving the generalization performance of emotion recognition models.
Traditional emotion recognition models, mostly based on Single-Label Learning (SLL) or Multi-Label Learning (MLL), associate one or more emotion labels for examples and cannot quantitatively analyze multiple emotions with different expression strengths [5]. To address this problem, Zhou

et al first proposed Emotion Distribution Learning
The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan .
(EDL) in 2015 for the facial expression recognition task [6]. EDL draws on Label Distribution Learning (LDL) [5], [7] to argue that emotions expressed by media such as text and images are a mixture of multiple basic emotions, with various basic emotions expressed at different intensities on the same example. The expression intensity of each basic emotion of an example is between 0 and 1, and the sum of the expression degrees of all emotions is 1. The expression degrees of all basic emotions on an example together constitute an Emotion Distribution [6]. In text emotion recognition tasks, it is a common phenomenon that a sentence expresses multiple basic emotions of different intensities at the same time. As shown in Figure 1, The SemEval text emotion dataset [8] gives specific scores for each sentence on how well it expresses each of the six basic emotions. EDL represents multiple emotions simultaneously through emotion distribution, and can better handle emotion recognition tasks where emotional ambiguity exists.
Human emotions are a complex phenomenon, with a variety of basic emotions highly interrelated, showing positive or negative correlations. Positively correlated emotions are more likely to occur together, while negatively correlated emotions rarely occur together [9]. For example, example sentence (a) in Figure 1 labels two basic emotions, sadness and fear, where sadness is the dominant emotion. By manually analyzing the text content, we found that the example sentence (a) should also imply the emotion of surprise (marked with a dashed box in Figure 1). Further analysis reveals that surprise and sadness are two emotions that are highly positively correlated in psychology. The case of example sentence (b) is similar to that of example sentence (a), and the unlabeled aversive emotion highly correlated with sadness in this sentence can be found by manual analysis. How to effectively consider the correlation between emotions in emotion analysis models is a common research idea for various emotion recognition models.
In recent years, EDL is a research hotspot in the field of machine learning, and many scholars at home and abroad have published several EDL-related works in top international conferences and journals [10]- [14]. From the existing research work, it is an important research direction to consider the correlation between emotion labels in the EDL model. For example, Jia et al. [10] proposed a method for learning facial emotion distribution based on local relevance of labels using local low-rank structure to capture local relevance of labels in 2019. Xu and Wang [11] in 2021 proposed an attention mechanism-based method for learning the emotion distribution to obtain the relationship between various regions of an image and the emotion distribution. The basic idea of the above work is to add consideration to the emotional correlation mined from the training data in the EDL model, and has achieved certain results. However, there is still relatively little work on introducing emotional correlation based on prior knowledge of psychology into the EDL model.
Plutchik's wheel of emotions is a classic psychological model of emotion proposed by psychologist Robert Plutchik in 1980 to describe the correlation between basic human emotions [15]. Plutchik considered human emotions as a mixture of eight basic emotions (anger, anticipation, joy, trust, fear, surprise, sadness, and disgust). The eight basic emotions together form Plutchik's wheel of emotions, with two emotions in adjacent positions having a positive correlation and two emotions in opposite positions having a negative correlation. Overall, the interval angle on the emotion wheel represents the degree of psychological correlation between the corresponding emotions. He and Jin proposed a graph convolutional neural network-based EDL approach considering psychological prior knowledge on image emotion recognition task in 2019 with good results [12]. However, the model structure of this work is relatively simple and the training of the graph convolutional network containing prior knowledge is relatively independent. So far, no EDL work has adopted the attention mechanism to directly integrate the prior knowledge of psychology based on the emotion wheel into the deep learning network.
To effectively introduce psychological prior knowledge into the EDL model to improve the performance of emotion analysis, we propose the Emotion Wheel Attention based Emotion Distribution Learning (EWA-EDL) model in this paper. The EWA-EDL model is based on the classical textual convolutional neural network, and introduces prior knowledge of emotion relevance based on psychology into the neural network through the attention mechanism, and then uses a multi-task loss function to learn both the emotion distribution prediction and emotion classification tasks. The EWA-EDL model consists of five main components, namely, input layer, convolutional layer, pooling layer, attention layer, and multi-task loss layer. The input sentences are represented as a matrix of word embedding vectors in the input layer, and then new output features are generated using multiple filters of different widths in the convolution layer, followed by a standard maximum pooling operation in the pooling layer to obtain the maximum value among the features. In the attention layer, the EWA-EDL model constructs a prior emotion distribution based on Plutchik's wheel of emotions for each basic emotion, and fuses the prior emotion distribution of each emotion into a final emotion distribution output through the attention mechanism. In the multi-task loss layer, the EWA-EDL model combines KL (Kullback-Leibler) loss and cross-entropy loss to learn both emotion distribution prediction and emotion classification tasks simultaneously. In this paper, we conduct comparative experiments on English emotion distribution dataset (SemEval 2007 Task 14 [8]), four English singlelabel datasets (CBET [16], ISEAR [17], TEC [18] and Fairy Tales [19]) and three Chinese single-label datasets (NLP&CC 2013, NLP&&CC 2014 [20] and WEC [21]). Experimental results show that the EWA-EDL model outperforms the comparison EDL model on both emotion distribution prediction and emotion classification tasks. The main contributions of the paper include the following two points: (1) The attention mechanism is used to directly integrate the prior knowledge of psychology based on emotion wheels into the deep learning network, and proposed the EWA-EDL model.
(2) Comparative experiments on 8 commonly used text emotion data sets show that the EWA-EDL model proposed in this paper is fully compared with other EDL models. Experimental results show that the performance of EWA-EDL model on emotion distribution prediction and emotion classification tasks is better than the existing emotion distribution learning methods.
The remainder of this paper is organized as follows, Section 1 introduces emotion distribution learning and some related research work; Section 2 describes in detail emotion wheel attention based emotion distribution learning proposed in this paper; Section 3 describes the experimental setup, experimental results and analysis; and finally Section 4 concludes the full paper.

II. RELATED WORK
Traditional emotion analysis of texts focuses on classifying sentences in terms of emotion polarity, i.e., discerning positive, negative, or neutral polarity of emotion [2]. However, the emotion polarity recognition model is unable to capture fine-grained emotions and is only suitable for simple emotion analysis tasks. Unlike traditional emotion polarity analysis, the goal of fine-grained emotion analysis is to identify finegrained emotions in a text [6], that is, specific emotions such as anger, anticipation, and joy. Classical fine-grained emotion recognition models are generally modeled based on singlelabel learning or multi-label learning, associating one or more emotion labels for examples [5]. Fine-grained emotion recognition models are capable of handling many emotion analysis tasks, but their modeling capabilities are still lacking to quantitatively represent the multiple emotions with varying degrees of expression that examples imply [13]. In practical applications, many sentences exist to express multiple emotions of varying degrees at the same time. For example, the commonly used SemEval text emotion dataset is annotated with the degree of expression of each of the six fine-grained emotions of the sentence [8]. As shown in Figure 1, sadness is the main emotion of sentence (a), with an expression level of 91.1%; as a secondary emotion, the expression level of fear is 8.82%. The degree of expression of the primary emotion (sadness) in sentence (b) is 82.85% and the degree of expression of the secondary emotion (anger) is 17.14%.
In order to quantitatively deal with the situation where an example expresses multiple emotions of different intensities at the same time, Zhou et al. [6] proposed Emotion Distribution Learning (EDL) in 2015, drawing on the research idea of Label Distribution Learning (LDL) [5]. Thereafter, Zhou et al. [13] proposed a text-oriented EDL method in 2016. A sentence may express one or more emotions, and each emotion has a different intensity of expression.
Emotion distribution learning can effectively handle the problem of expressing multiple emotions simultaneously in one example and is suitable for tasks where emotional ambiguity exists [6]. In recent years, many scholars have put forward much effective work in the field of EDL [11], [12], [14], [22]- [24]. Existing EDL works can be divided into three main categories according to whether or not they consider correlations between emotion labels.
The first class is EDL methods that do not consider the relevance of emotion labels, e.g., Zhang et al. [22] proposed an EDL model based on Multi-Task Convolutional Neural Network (MT-CNN) to optimize emotion distribution prediction and emotion classification tasks; Pandeya and Lee [23] presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. And then Pandeya and Lee [24] construct a balanced music video emotion dataset including diversity of territory, language, culture and musical instruments.
The second class of methods learns label relevance from training data, which is a popular research direction in current EDL methods. Xu and Wang [11] proposed an attention mechanism-based method for learning emotion distribution, obtaining the relationship between various regions of an image and emotion distribution. Fei et al. [14] proposed a latent emotional memory network to learn the potential emotion distribution in the data, and used it effectively in the classification network.
The third class of EDL work considers the correlation between emotion labels using psychological emotional prior knowledge to improve the generalization performance of emotion recognition models. There is relatively little work on EDL considering prior knowledge of emotions, and He and Jin [12] proposed an EDL approach (EmotionGCN) based on an emotion wheel psychological model of emotion graph convolutional networks. EmotionGCN consists of two main components: a CNN module for extracting image features and a GCN-based weight generator that considers prior knowledge. However, the model structure of EmotionGCN is relatively simple. The GCN module, which learns the relevance of emotion labels, and the image-based CNN module are relatively independent, and only matrix multiplication is used to merge the outputs of the two modules directly together. Researchers need to propose more effective EDL work that integrates psychological prior knowledge directly into deep learning networks. Compared with MT-CNN and EmotionGCN, the EWA-EDL model proposed in this paper is based on the emotion wheel psychology model to define the distance between each basic emotion and generate the corresponding prior emotion distribution. The distribution is fused with the convolutional neural network model, and then the multi-task loss function is used to simultaneously learn emotion distribution prediction and emotion classification tasks in an end-to-end manner.

III. EMOTION WHEEL ATTENTION-BASED EMOTION DISTRIBUTION LEARNING A. PLUTCHIK'S WHEEL OF EMOTIONS
According to psychological research, human emotions are clearly interrelated [15]. Certain emotions often appear at the same time, that is, showing a high positive correlation; while other emotions are the opposite. The theory of the emotion wheel, proposed by Robert Plutchik in 1980, is a classic model for describing the interrelationship between emotions from a psychological perspective [15]. Plutchik's wheel of emotions contains eight basic emotions: anger, disgust, sadness, surprise, fear, trust, joy, and expect. As shown in Figure   We define the psychological distance between emotions according to the size of their spacing angle in the emotion wheel. The distance is defined as 1 for each 45-degree interval between the two emotions. The smaller the interval angle of two emotions, the smaller their psychological distance and the higher the emotional similarity. As shown in Figure 2, joy and trust are adjacent emotions with an interval angle of 45 degrees, and the distance is set to 1; expect and surprise are opposite emotions, the interval angle is 180 degrees, and the distance is set to 4.
In the field of EDL, there have been some researches using the emotion wheel as prior knowledge [12], [13]. For example, Zhou et al. [13] introduced an emotional constraint based on Plutchik's wheel of emotions in the optimization objective of the EDL method based on the maximum entropy model and achieved good results. He and Jin [12] proposed an EDL method based on graph convolutional neural network and emotion wheel, which showed good performance. But overall, there has been relatively little work on EDL based on emotion wheels. At present, there is no EDL work that uses the attention mechanism to directly integrate the prior knowledge of psychology based on the emotion wheel into the deep learning network.

B. PLUTCHIK'S WHEEL OF EMOTIONS ATTENTION-BASED EMOTION DISTRIBUTION LEARNING
We use d y x to denote the intensity of expression of the emotion y in sentence x. The expressed intensity scores of various emotions on each sentence constitute an emotion distribution, which is guaranteed by vector normalization that d y x ∈ [0, 1] and y d y x = 1. It is important to note that d y x is not a probability, but the proportion of the emotion y in the emotion distribution. If d y x is considered as probability, it means that only one emotion label is correct for a sentence, while EDL considers an example to contain multiple emotions at the same time. The modeling goal of EDL is to learn a mapping from the sentence space X = R m to the emotion distribution space Y = {y 1 , y 2 , . . . y c }, with each label y i representing a basic emotion.
We introduce the psychological prior knowledge based on Plutchik's Wheel of Emotions through the attention mechanism, and adopt a multi-task deep convolutional neural network to propose Emotion Wheel Attention based Emotion Distribution Learning (EWA-EDL) model. The architecture of the EWA-EDL model is shown in Figure 3.
Given a training dataset The modeling goal of EWA-EDL is to learn a mapping from the sentence s i to the emotion distribution d i . EWA-EDL simultaneously optimizes two training tasks, namely emotion distribution prediction and emotion classification. The model architecture of EWA-EDL consists of five parts, namely, input layer, convolutional layer, pooling layer, attention layer, and multi-task loss layer.

1) INPUT LAYER
The input to the EWA-EDL model is a sentence s =< w 1 , w 2 , . . . , w M > consisting of M words. We denote the m-th word w m as a k-dimensional word embedding vector x m ∈ R k . Then, all the word vectors in the sentence are joined together to form the word vector matrix in Eq. 1: If the length of the sentence text does not reach M , we fill in the end of the word vector with 0 and represent each sentence as a word vector matrix of dimension k × M .

2) CONVOLUTIONAL LAYER
The convolutional layer contains multiple filters ω ∈ R h×k , where h is the window width, and each filter produces a new feature representation. Let x p:p+h−1 denote the concatenation of words x p , x p+1 , . . . , x p+h−1 , then the feature v p is computed using filter ω on a set of words x p:p+h−1 , with the following equation in Eq. 2: where f (·) is a nonlinear activation function, such as the Sigmoid or ReLu function, and b is a bias term. The filter window covers all words x 1:h , x 2:h+1 , · · · , x M −h+1:M in the sentence, which can produce a set of feature maps v in Eq. 3:

3) POOLING LAYER
A series of standard max-over-time pooling operations applied to the feature map v are used to obtain the maximum value in the feature as the significant feature in Eq. 4: wherev denotes the feature representation corresponding to the input sample and the particular filter.

4) ATTENTION LAYER
The attention mechanism was first proposed in the field of computer vision, the main purpose is to allow the neural network to focus on specific parts of the image as needed, rather than the overall image [25]. Bahdana et al. [26] successfully applied the attention mechanism in the field of natural language processing in 2014. In view of the outstanding performance of the attention mechanism on natural language processing tasks, we use the attention mechanism to introduce prior knowledge of emotional psychology into the EDL model based on deep networks. We first generate a prior emotion distribution describing the correlation between emotions for each basic emotion, train the CNN to obtain the preliminary emotion distribution as weights, and use the attention mechanism to weight the final emotion distribution for prediction. According to the psychological distance of various emotions in the emotion wheel model, we generate a prior emotion distribution f α for each emotion α, α ∈ {1, 2, · · · , C}. We believe that in the prior emotion distribution f α , the value of the emotion label α should be the largest and the degree of expression is the highest, and the values of other emotions decrease as the distance from the emotion label α in the emotion wheel increases. Overall, the prior emotion distribution f α should be a symmetrically decreasing distribution centered on the emotion label α. The prior emotion distribution generated for each basic emotion is a fixed value. According to the conclusion of the LDL-based face age prediction work of Geng et al. [27], we assume that the prior emotion distribution obeys the Gaussian distribution. Given the emotion label α, we generate a prior emotion distribution f α based on a Gaussian distribution, calculated as follows in Eq. 5 and Eq. 6: where σ is the standard deviation of the prior emotion distribution, Z is the normalization factor such that a f a α = 1, and |a − α| is the distance of the emotion wheel between emotion a and true emotion α.

4.51×10
The output vectorv of the pooling layer is transformed through a fully connected layer and Softmax activation function to obtain the preliminary emotion distribution layer g = [g 1 , g 2 , . . . , g C ], where g j is the preliminary predicted degree of expression of the j-th emotion. Then, we weighted the attention to a prior emotion distribution of each basic emotion with the preliminary emotion distribution g. Finally, the weighted prior emotion distributions are superimposed to output an emotion distributiond that implies prior knowledge of the emotion psychology in Eq. 7.

5) MULTI-TASK LOSS LAYER
The EWA-EDL model combines cross-entropy loss function and KL loss function to train both emotion distribution prediction and emotion classification tasks using an end-to-end approach. Two simultaneously trained learning tasks can reinforce each other and learn to obtain a more robust neural network model. For the dataset with labeled emotion distribution, the emotion with the highest degree of expression in the emotion distribution d i is used as the true emotion label of sentence s i for emotion classification. For the single-label dataset without labeled emotion distributions, we used label enhancement [28] to extend the true emotion labels into emotion distributions.
The objective loss function of EWA-EDL is a weighted combination of cross-entropy loss and KL loss by Eq. 8. (8) where E cls represents the cross-entropy loss used for the emotion classification task, E edl is the KL loss for the emotion distribution prediction task, and λ is the weight parameter. Based on previous work [22], λ is set to 0.7. The cross-entropy loss maximizes the probability of the target label, an objective function commonly used for classification tasks, defined as where 1(δ) is the indicator function, 1(δ) = 1 when δ is true and 0 otherwise, y i is the true emotion label of sentence s i , and a (i) j | j = 1, 2, . . . , C denotes the output value of sentence s i at the last layer.
For emotion distribution prediction, the KL loss metric measures the difference between the predicted and true distributions, as defined in Eq. 10: where d j s i is the total loss of different categories of labels for sentence s i .
The source code of the EWA-EDL model is released at https://github.com/zeakyop/EWA-EDL.

IV. EXPERIMENTS
To examine the performance of the EWA-EDL model proposed in this paper, three sets of experiments were conducted to analyze the effect of the parameter σ of the prior emotion distribution on the performance of the EWA-EDL model, to compare the emotion prediction performance of multiple EDL methods on the English and Chinese datasets and to compare the emotion classification performance of three deep network-based EDL models on seven single-label datasets.

A. DATASETS
In this paper, eight textual datasets were used for the experiments, namely the SemEval emotion distribution dataset [8] (SemEval 2007 Task 14), four single-label English datasets (CBET [16], ISEAR [17], TEC [18] and Fairy Tales [19]) and three single-label Chinese datasets (NLP&CC 2013 [20], NLP&&CC 2014 [20], and WEC [21]). We present the details of all Chinese and English experimental datasets in Table 1, including the number of sentences for each emotion, the number of all sentences, and the average number of words per sentence.
The basic information of the five English text datasets used in this paper is as follows. The SemEval dataset [8] is an emotion distribution dataset labeled with multiple emotion expression intensities and contains 1250 news headlines, each with six emotion labels and their corresponding expression intensities, i.e., anger, joy, fear, surprise, sadness, and disgust. The SemEval dataset mainly collects news headlines from mainstream English newspapers such as New York Times, CNN, BBC, and Google News to collect news headlines. The CBET dataset [16] consists of 76,860 tweet texts containing 9 emotions and 8540 tweets for each emotion. We selected a total of 51240 tweets from the CBET dataset for the six emotions included in the Plutchik's wheel of emotions (anger, joy, fear, surprise, sadness, and disgust). The ISEAR dataset [17] contains 7666 sentences and 7 emotion labels. The sentences are about people describing their life situations and experiences when they experience certain emotions (anger, joy, fear, sadness, disgust, shame, and guilt). The five emotions (anger, joy, fear, sadness, and disgust) and 5431 sentences from the ISEAR dataset were selected for the experiments in this paper. The TEC dataset [18] consists of 21051 emotional tweets, each labeled with one of the six emotions, i.e., anger, joy, fear, happiness, surprise, and sadness. The Fairy Tales dataset [19] from 185 fairy tales 1204 English sentences were extracted with 5 emotions (anger, joy, fear, surprise and sadness) and each sentence was labeled with 1 emotion label.
For the Chinese datasets, the NLP&CC 2013 and NLP&CC 2014 single-label Chinese datasets [20] contain 32,185 sentences (10,552 emotional sentences and 21,633 emotionless sentences) and 45,421 sentences (15,690 emotional sentences and 29,731 emotionless sentences) collected from Sina Weibo, respectively. The NLP&CC 2013 and NLP &CC 2014 datasets contain 7 emotions, namely anger, joy, fear, surprise, sadness, disgust, and like. In this paper, we retained 6 of these emotions and selected 7581 and 11431 emotion sentences from these two datasets, respectively. The WEC (Weibo Emotion Corpus) dataset [21] is an emotion corpus constructed by the Hong Kong Polytechnic University in 2016 based on Weibo posts and contains a total of 7 emotions (anger, joy, fear, surprise, sadness, disgust, and like). In this paper, a total of 35,121 sentences from the six emotions in the WEC dataset were selected for the experiment.

B. IMPLEMENTATION DETAILS
The text preprocessing steps for the English dataset are as follows. First, special characters such as punctuation are removed, and only English letters and numbers are retained. Then, all English letters are converted to lowercase, and words are stemmed. Finally, the open-source pre-trained word2vec word embedding model [29] is used to represent the words as 300-dimensional vectors. The word2vec model is trained on the Google News dataset of about 100 billion words, and the dictionary contains about 3 million words. For the preprocessing of Chinese text, we first remove special characters such as punctuation and retain only Chinese and numbers. Then, Jieba 1 was used for Chinese word separation. Finally, the Chinese pre-trained word vector model Chinese Word Vectors [30] is used to represent words as 300-dimensional vectors. The Chinese Word Vectors model consists of 850,000 words, obtained by training on approximately 136 million words on a Chinese corpus such as Baidu Encyclopedia. For the unlogged words of the word embedding model, a uniform distribution U(−ε, ε) was used for random initialization, where ε was set to 0.01. As the input to the neural network model, each sentence was converted into a 300-dimensional word embedding matrix by preprocessing. The maximum number of words in a sentence is set as the number of words in the longest sentence in each dataset. The pre-trained word embedding vector is kept fixed during the training of the neural network.
To reasonably evaluate the performance of the model, the experimental setup uses the standard stratified 10-fold cross-validation. In detail, we divide the dataset into ten equal parts, using each part as a test set once and merging the corresponding remaining data into the training set, and so on ten times, while maintaining the category ratio. Each fold in the cross-validation is an independent emotion prediction task, and one-tenth of the training set is randomly selected as the validation set in each fold of the experiment. To make the experimental results comparable, the models involved in the comparison were divided with consistent data. The average 1 https://github.com/fxsjy/jieba of each evaluation metric over ten cross-validations is used to evaluate the final performance of the EDL model.
For the emotion distribution prediction task, we used six commonly used EDL metrics for evaluating the quality of the predicted emotion distribution, including Euclidean, Sqrensen, Squared χ 2 , KL Divergence, Cosine, and Intersection [13]. For the emotion classification task, we used four classification evaluation metrics, namely Precision, Recall, F1-score, and Accuracy. In addition, the parameters of the EWA-EDL model were set as shown in Table 2. The experiments in this paper are run on a Lenovo workstation with the main hardware configuration of Intel Core i9-10900X 3.70GHz 10-core CPU and 128G RAM. The operating system is Ubuntu 18.04, and the deep learning framework is Pytorch 1.5.0.

C. EFFECT OF THE PRIOR EMOTION DISTRIBUTION PARAMETER σ ON THE PERFORMANCE OF EWA-EDL MODEL
The standard deviation parameter σ of the prior emotion distribution is used to control the dispersion of the emotion distribution generated from the ground-truth emotion labels, and is an important parameter affecting the performance of the EWA-EDL model. When σ is larger, the emotion distribution is more dispersed and the Gaussian distribution curve is flatter; when σ is smaller, the emotion distribution curve is thinner and taller. To analyze the effect of the standard deviation parameter σ on the performance of the EWA-EDL model, we take the parameter σ from 0 to 1 at every interval of 0.1 and record the changes of the corresponding Accuracy and Cosine metrics. The experimental results of the effect of the prior emotion distribution parameter σ on the performance of the EWA-EDL model on the SemEval dataset are shown in Figure 5.
As shown in Figure 5, the EWA-EDL model on the emotion prediction task of the SemEval dataset, the evaluation metrics Accuracy and Cosine both reached the highest at σ = 0.8. When σ takes values between 0 and 0.7, Accuracy and Cosine basically show a gradually increasing trend, indicating that when expanding individual emotion labels into emotion distributions, appropriately increasing the dispersion of the emotion distribution is beneficial to improve the performance of VOLUME 9, 2021 the EDL model. When σ = 0.8, Cosine and Accuracy reach the highest value, indicating that the dispersion degree of the generated emotion distribution is optimal at this time. Taking surprise emotion as an example, the prior emotion distribu- . When the value of σ increased to between 0.8 and 1, the evaluation indexes Cosine and Accuracy showed a significant decrease, indicating that the score of true emotion in the emotion distribution was too low at this time and the dispersion of the generated emotion distribution was too large. The above experimental results indicate that while maintaining the dominance of true emotion, appropriately increasing the dispersion degree of the prior emotion distribution helps to improve the performance of the EDL model.

D. PERFORMANCE COMPARSION OF MULTIPLE EDL MODELS FOR EMOTION DISTRIBUTION PREDICTION AND EMOTION CLASSIFICATION
To evaluate the performance of the EWA-EDL model proposed in this paper for emotion distribution prediction and emotion classification on emotion distribution datasets, we compare EWA-EDL with commonly used EDL and LDL methods, including SA-CPNN, AA-KNN, SA-LDSVR, SA-IIS, SA-BFGS, AA-BP [5], TextCNN [31] and MT-CNN [22]. AA-KNN and AA-BP are extended versions of the classical KNN algorithm and BP (Back Propagation) neural network to solve LDL tasks [5]. SA-LDSVR, SA-IIS, SA-BFGS and SA-CPNN are algorithms specifically designed for LDL tasks [5]. TextCNN [30] is a convolutional neural network model for text emotion classification. MT-CNN [22] is an algorithm proposed by Zhang et al. proposed a multi-task convolutional neural network model for textual EDL. The specific experimental results of eight emotion distribution learning methods for emotion distribution prediction and emotion classification on the SemEval dataset are shown in Table 3 (the best results for each metric are bolded). Table 3, the EWA-EDL model proposed in this paper showed better performance than other EDL models on the SemEval dataset in general. The EWA-EDL model outperforms the six models SA-CPNN, AA-KNN, SA-LDSVR, SA-IIS, SA-BFGS, and AA-BP on all 10 metrics of the emotion distribution prediction and emotion classification tasks. The average ranking of EWA-EDL on different indicators is the first, and MT-CNN and TextCNN are the second and third, indicating that the deep learning method is better than the traditional method on the EDL task. On the emotion distribution prediction task, the EWA-EDL model showed optimal results on all five metrics except for KL Divergence, which was slightly lower than the MT-CNN model. For example, on the Cosine metric, the EWA-EDL is 0.9% higher than the next best MT-CNN model. The EWA-EDL model also showed excellent performance on the emotion classification task. For example, on the Accuracy metric, the EWA-EDL model outperformed the next best MT-CNN model by 1.68%. In the other three classification metrics except for Precision, the EWA-EDL model achieves the best score. The above experimental results show that the EWA-EDL model has a better performance of emotion prediction than the compared EDL model. Plutchik's wheel of emotions can better describe the correlation between basic emotions, and the introduction of psychological prior knowledge based on the emotion wheel through the attention mechanism has a significant effect on improving the performance of the EDL model.

E. EMOTION CLASSIFICATION PERFORMANCE COMPARSION OF DEEP NETWORK-BASED EMOTION DISTRIBUTION LEARNING MODELS
To evaluate the performance of the EWA-EDL model on traditional single-label emotion datasets, we conducted experiments comparing the performance of emotion classification on four English single-label datasets (CBET, ISEAR, TEC and Fairy Tales) and three Chinese single-label datasets (NLP&CC 2013, NLP&&CC 2014 and WEC). On the single-label datasets, we used the LLE (Lexicon-based emotion distribution Label Enhancement) label enhancement method [28] to enhance the example emotion labels into emotion distributions. The LLE method introduces lexicon information in the text to generate emotion distributions in addition to the true emotion labels of the sentences, which has a better good performance [28]. We compare the EWA-EDL model with deep neural network-based TextCNN [31] and MT-CNN model [22] for experiments, and the specific experimental results are shown in Table 4 (the best results of each metric are marked in bold).
As shown by the results in Table 4, the EWA-EDL model proposed in this paper exhibits an overall better performance than the TextCNN and MT-CNN models on the seven single-label datasets. Taking the F1 score and accuracy metrics as examples, the EWA-EDL model achieves the best scores on all seven datasets. Specifically, on the CBET, ISEAR, TEC, Fairy Tales, NLP&&CC 2013, NLP&&CC 2014, and WEC datasets, the F1 score of the EWA-EDL TABLE 3. Experimental results comparing the performance of eight emotion distribution learning methods for task 1 (emotion distribution prediction) and task 2 (emotion classification) on the SemEval dataset (↑ indicates that the larger the indicator is, ↓ indicates that the smaller the indicator is).

TABLE 4.
Experimental results comparing the performance of three deep network-based emotion distribution learning models for emotion classification on seven single-label datasets.
model outperformed the MT-CNN model by 1.25%, 1.12%, 1.07%, 3.62%, 0.49%, 0.31%, and 0.17%, and higher than the TextCNN model by 1.49%, 2.82%, 3.57%, 4.87%, 2.74%, 6.58%, and 1.61%, respectively. The EWA-EDL model outperforms the MT-CNN and TextCNN models by 1.04% and 3.27%, respectively, in terms of the average accuracy across all seven datasets. The experimental results illustrate that the introduction of psychological prior knowledge in the deep network-based EDL model helps to improve the performance of the emotion classification task. The EWA-EDL model proposed in this paper effectively utilizes psychological knowledge and has better performance for emotion classification than TextCNN and MT-CNN models.
In addition, consistent with the experimental results of Zhang et al [22], the MT-CNN model outperformed the TextCNN model in all metrics on the seven datasets. This experimental result suggests that the multi-task neural network model trained using a combination of cross-entropy loss and KL loss is more suitable for the emotion classification task than the traditional convolutional neural network model.

V. CONCLUSION
In this paper, we propose an Emotion Wheel Attention based Emotion Distribution Learning (EWA-EDL) model by using the attention mechanism to effectively utilize psychological prior knowledge. The EWA-EDL model computes correlations among basic emotions based on Plutchik's wheel of emotions, and incorporates psychological prior knowledge into a multi-task convolutional neural network with the attention mechanism. Comparative experimental results on eight Chinese and English text emotion datasets show that the EWA-EDL model has better performance than existing EDL methods on emotion distribution prediction and emotion classification tasks.
In future work, we will consider using prior knowledge in a more effective way to improve the attention mechanism and improve the performance of the emotion distribution learning model.