Affective Knowledge Augmented Interactive Graph Convolutional Network for Chinese-Oriented Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis(ABSA) aims to identify the sentiment polarity of specific aspects in sentences, which can more accurately mine the sentiment polarity of users towards different aspects. Most of the existing works derive the sentiment features of specific aspects by interactively learning the dependencies between different aspects of the context. However, the above work has neglected to use the external affective commonsense knowledge to augment the ability of the Graph Convolutional Networks(GCNs) to interactively capture sentiment dependencies of the inter-aspect words in different contexts. In addition, compared to the ABSA research in English, the existing research pays less attention to the Chinese-oriented research. Meanwhile, multi-head self-sttention(MHSA) is applied to extract richer context syntax and semantic interaction features. In this paper, we propose a novel knowledge-aware model in which affective knowledge augments interactive GCN for Chinese-oriented ABSA, namely AKM-IGCN. Moreover, this model can be applied to effectively analyze both Chinese and English comments simultaneously. Hence, we conducted experiments on four Chinese datasets(Camera, Phone, Notebook and Car) and six English benchmark datasets(Restaurant14, Restaurant15, Restaurant16, Twitter, MAMS, Tshirt). Experimental results illustrate that our proposed model outperforms or approaches state-of-the-art models.


I. INTRODUCTION
With the rapid development of social media and e-commerce platforms, texts with sentiment tendencies have increased dramatically. People are eager to dig out more detailed information from these texts, which leads to the gradual inability of coarse-grained sentiment classification to meet people's needs. Therefore, aspect-based sentiment analysis(ABSA), has been well applied in e-commerce and other fields, and has received extensive attention from academia and industry.
The associate editor coordinating the review of this manuscript and approving it for publication was Jolanta Mizera-Pietraszko .
Aspect-based sentiment analysis, namely the fine-grained sentiment classification task, aims to predict the sentiment polarities of a specific aspect of the sentence, such as positive, neutral, and negative. For example, given a sentence ''The price is reasonable, but the service is terrible.'',as shown in Figure 1(a), the sentiment polarity of the two aspects ''price''and ''service'' are positive and negative respectively, and with opposite sentiment polarities. Therefore, compared to sentence-level sentiment analysis, ABSA can accurately identify users' attitudes toward specific aspects. In this work, we investigate a task of judging sentiment polarity in terms of specific aspects. Recently, the research on ABSA is mainly based on deep neural models, because they can capture semantic features of given specific aspects [1], [2], [3], [4]. Specififically, existing research on ABSA based on deep neural network can be divided into two categories: context-based methods [5], [6] and syntax-based methods [7], [8], [9]. Context-based methods usually combine convolutional neural networks or recurrent neural networks with attention mechanisms to capture aspect word representations in a sentence. Syntax-based models usually use graph convolutional networks to model sentences. In the example in the Figure 1(b), we use spaCy to get the dependency relationship of sentences, where ''nsubj'' means nominal subject, ''det'' means determiner, and ''root'' means root node. Below the example is the part of speech corresponding to each word, where ''JJ'' means adjective and ''VBD'' means past participle of verb. In Figure 2, we also give some examples of dataset in Chinese and English, and list their aspect and polarities respectively.
Furthermore, affective commonsense knowledge in natural language understanding (NLU) tasks [10], [11], researchers used external knowledge to enhance semantic features in ABSA models [12], [13], [14]. However, most graph-based neural network models only consider the syntactic dependencies of sentences and tend to ignore commonsense knowledge information when constructing the graph. In order to solve the problems of aforementioned models ignore the rich semantic information in external knowledge and enhance the extraction ability of global semantic information and interactive information between aspect words, we further solve the problem by introducing external affective commonsense knowledge and MHSA in the ABSA task. Hence, we propose an affective knowledge augmented interactive GCN(AKM-IGCN) which takes full advantage of pre-trained BERT [15] and leverages affective knowledge from SenticNet to enhance sentence dependency graphs. Experimental results show that our method outperforms or approaches state-of-the-art baseline methods on both Chinese and English datasets.
The main contributions of this paper are as follows: 1. For the first time, the paper adopts external knowledge based SenticNet in the Chinese-oriented ABSA task, which provides a new idea for the research of Chinese.

2.
We propose a novel model that affective knowledge augmented dependency tree to improve the interactive GCN to capture inter-aspect affective dependencies. The proposed model has the ability to handle both Chinese and English reviews.
3. We incorporate MHSA to further model multiple aspect to obtain richer hidden feature information between context and aspect. Furthermore, Layer Normalization is introduced for interactively learning syntactic features and global semantic features and improving the training speed of the model. 4. Experimental results demonstrate the effectiveness of our proposed model on four Chinese and six English review datasets, and achieve state-of-the-art performance on some benchmark datasets.

II. RELATED WORK A. ASPECT-BASED SENTIMENT ANALYSIS
Aspect-based sentiment analysis(ABSA) is to identify the sentiment polarity expressed by a sentence on different aspects. With the rapid development of deep learning, many neural networks have also been applied to aspect-based sentiment classification tasks. We divide the previous research methods into seven categories: traditional machine learning models, recurrent neural networks (RNNs), long short term memory networks(LSTMs), convolutional neural networks(CNNs), memory networks(MNs), attention mechanism, graph convolutional networks(GCNs).

1) TRADITIONAL MACHINE LEARNING MODELS
In the ABSA task, the traditional machine learning-based model needs to build a sentiment dictionary and manually extract features, which requires a lot of labor and time costs. Kiritchenk et al. [16] used sentiment dictionary and feature engineering to extract sentiment information, and achieved text sentiment classification through Support Vector Machine (SVM). Li et al. [17] used chi-square statistics to extract the features of feature words, and gave each feature word a weight, and finally used the SVM method to achieve text classification, which effectively improved the performance of text classification.

2) RNN BASED MODELS
Luo et al. [18] proposed the Dual Cross-shared RNN framework (DOER) model to solve both attribute-level entity extraction and sentiment classification tasks at the same time, the model uses two RNNs to handle the two tasks respectively, and a cross-shared unit is used to enhance the hidden layer representation of the two parts.

3) LSTM BASED MODELS
Tang et al. [19] exploited two LSTM networks to capture bidirectional affective features in sentences in order to detect key information. Ma et al. [20] used two LSTMs to model context and aspect words respectively, which introduced an VOLUME 10, 2022 interactive attention network to learn interaction features between context and aspect words.

4) CNN BASED MODELS
Li et al. [21] proposed a CNN-based method for the T-NET model in order to retain the information of the original context in the LSTM layer, making up for the defect that CNN cannot handle multi-target sentiment well. Huang et al. [22] combined CNN and gating mechanism, which incorporated aspect information into CNN through parameterization. Xue et al. [23] used the CNN and the gating mechanism to extract affective features related to the aspect words, which both validated that the CNN can effectively extract affective features.

5) MN BASED MODELS
Tang et al. [24] designed the memory network SemEval-2014 and achieved good results at the time. The RAM model proposed by Chen et al. [25] was an improvement on the memory network model. Zhu et al. [26] proposed an aspect-level sentiment classification method with an auxiliary memory network to learn aspect words and sentiment words, further optimizing the aspect-level sentiment classification model. Chen et al. [25] capture long-range sentiment information by employing multiple attention-based memory network models.

6) ATTENTION BASED MODELS
Ma et al. [20] proposed an interactive attention network(IAN) that can extract sentiment features in contextual and aspect terms, respectively. Fan et al. [27] proposed a fine-grained attention mechanism to extract the interactive information between context and aspect words and achieved good results. Huang et al. [28] extract context-and aspect-dependent representations through LSTM layers and interact through an attention-over-attention (AOA) model.

7) GCN BASED MODELS
Meanwhile, graph convolutional networks(GCNs) have been successfully applied to many NLP tasks.
Zhang et al. [7] proposed an aspect-specific graph convolutional network(ASGCN) for the first time by exploiting syntactic dependencies in aspect-level sentiment analysis tasks. Firstly, the bidirectional long short-term memory (Bi-LSTM) network is used to extract the context information, the mask mechanism is used to extract the aspect word information. Sun et al. [29] proposed an ABSA method based on dependency tree convolution. Zhang et al. [30] proposed a bidirectional GCN model, which established a hierarchical syntactic graph and a lexical graph for each sentence through syntactic dependencies and word co-occurrence information, and then fused the two through bidirectional GCN. Tang et al. [31] adopted a dependency graph-based enhanced double-transformer structure for aspect sentiment classification. To focus on key aspects in sentences. Wang et al. [32] reshaped and pruned ordinary dependency trees by exploiting relational graph attention networks.
Although the above GCN-based models achieve good performance on ABSA, they ignore the sentiment information between different aspects in the context. Instead, we incorporate external affective knowledge to augment the interactive sentiment relations between aspects extracted by GCNs on the dependency tree, and obtain rich semantic information between context and aspects via MHSA.

B. AFFECTIVE COMMONSENSE KNOWLEDGE
External knowledge plays an important role in many NLP tasks. Similarly, commonsense knowledge of external affective is often introduced to augment sentiment feature representations on the ABSA task, because external commonsense knowledge can help to understand natural language. Incorporating commonsense knowledge into deep learning models has become a popular topic in many fields of researches [33], [34]. However, the existing methods lack the ability to explore external knowledge to improve sentiment analysis in the task of ABSA. Therefore, several studies in recent years have attempted to incorporate external affective knowledge to alleviate the above problems [35], [36], [37]. SenticNet is a publicly available sentiment analysis resource that provides a corresponding sentiment value for each concept [38], [39], [40], [41], [42], [43]. In SenticNet, positive sentiment values are close to 1, while negative sentiment words are close to -1. SenticNet is comparatively powerful, which has a significant performance in learning sentiment features [44]. Xing et al. [45] demonstrated that SenticNet performed better than other sentiment dictionaries. Ma et al. [44] combined external commonsense knowledge with LSTM models to capture affective features based on SenticNet. Therefore, we use external knowledge base SenticNet to augment the dependency graph representation of sentences and then enhance the GCN model's ability to capture sentiment. Figure 3 illustrates the overall architecture of our proposed AKM-IGCN model. It consists of four parts, namely the embedding layer, the GCN layer, the information interaction layer, and the output layer. We describe these sections in detail sequentially.

A. TASK DEFINITION
Suppose a sentence containing n words and two aspects, i.e. s = {w 1 , w 2 , . . . , a 11 , a 12 , . . . , a 1p , . . . , a 21 , a 22 , . . . , a 2q , . . . , w n }, where w i denotes the i-th word of the context and a ij denotes the j-th word of i-th aspect. Each instance contains the sentence, the aspect word and the corresponding sentiment polarity (Positive, Neutral, Negative), and each aspect consists of one or more words. The goal of ABSA is to identify the sentiment polarity of a given aspect by capturing aspects of relevant sentiment features from context.

B. CONSTRUCTING ASPECT GRAPHS AND AFFECTIVE KNOWLEDGE GRAPH 1) GENERATING ASPECT-KEY GRAPHS
In order to exploit the word dependent representation of sentences, inspired by previous GCN-based works [18]. We first construct the dependency graph of sentences on the dependency tree of each input sentence. Then, the adjacency matrix D ∈ R n×n of the sentence graph is derived as follows: To highlight specific aspects in the context and extract features from key aspects in the context. In Figure 4, we build an augmented dependency graph in terms of specific aspects, and refine the graph by weighting the relative position of each element in the adjacency matrix: where p is the starting position of the aspect word, a s i represents the set of all aspect words, |.| denotes the VOLUME 10, 2022 absolute value of the element. Therefore, we can extract relative dependencies between aspects and contexts. We integrate the weights of key aspects and the common dependency graph to enhance the syntactic dependencies of context words and derive a syntactic dependency matrix G of key aspects: To exploit the connections between multiple aspects in a sentence, we get a more refined graph of key aspects by incorporating information from other aspects into the adjacency matrix of key aspects: Among them, a represents the aspect word, the starting position of the other aspect words is indicated as p o , and a o i denotes the set of other aspects of length l. We construct an undirected adjacency matrix to enhance sentence dependen- Some aspects need to be aided by the affective relationship between other aspects to assist sentiment polarity classification. Therefore, we construct an adjacency matrix of inter-aspect words in Figure 4: Similarly, we also construct an undirected interactive aspect graph of sentences to extract the interactive dependencies between multiple aspects: A Inter To exploit the sentiment features between context and aspect, we combine sentiment scores for each word in SenticNet to enhance the matrix representation: where SenticNet(w i ) ∈ [−1, 1] represents the word w i sentiment score. Among them, for the stronger positive word, the closer its affective value is to 1, while for the stronger negative word, the closer its affective value is to -1. We extracted a total of 39,891 words and their sentiment scores from SenticNet 6. On the basis of ordinary dependency trees, we merged Sen-ticNet to obtain affective knowledge graphs with enhanced dependencies in Figure 4. Instances of some words and their corresponding sentiment scores are listed in Table 1. Then we can obtain the adjacency matrix augmented with affective knowledge: C. EMBEDDING LAYERS Pre-trained BERT models has received a wide range of attention from researchers in many NLP tasks. The BERT model performs better for sentences with high complexity as well as for sentences with more ambiguous words. Therefore, we chose the pre-trained model BERT [15], which can extract information in two directions at the same time, and is able to mine semantically richer information in sentences.

D. AFFECTIVE KNOWLEDGE AUGMENTED INTERACTIVE GCN LAYERS
Based on the sentiment adjacency matrix derived on the dependency tree and SenticNet, we feed the sentiment adjacency matrix into the GCN layer to learn aspect-specific sentiment dependencies: where A key i is an aspect-key normalized symmetric adjacency matrix, D Key i is the degree of A key i , and g l−1 i is the hidden representation of the previous GCN layer. Figure 5 shows an example of a single-layer GCN. We fused the final representations of aspect-key, inter-aspect and affective-knowledge and fed it into the GCNs model. Finally, we got the rich syntactic dependence characteristics by this GCN Layers:

E. INFORMATION INTERACTIVE LEARNING LAYER
This layer is exploited to extract richer contextual semantic and sentiment information, and interactively learn syntactic and global semantic information. Figure 6 illustrates a brief structure of MHSA. We introduce MHSA to capture richer  semantic information after we acquire h produced by interactive GCN: where h s ∈ R d h ×n , d h refers to the dimension of MHSA. Meanwhile, Layer Normalization(LN) is applied for interactively learning syntactic features and global semantic features.
where a t i represents the fusion of the semantic features extracted by BERT and the syntactic features extracted by GCN, H is the number of GCN layers, and σ is to normalize each output and pass it into a nonlinear activation function.
We normalize all neuron nodes of a single sample at each layer, resulting in better overall feature quality, faster model training and better classification.

F. OUTPUT LAYER
The final comprehensive representation o is obtained, which is fed into a fully connected layer, and the softmax function is used to obtain the probability distribution of sentiment classification: G. TRAINING The aspect word representation is fed into the softmax function, and its probability in different sentiment categories is calculated. The model parameters are then adjusted by continuously decreasing the value of the minimum cross-entropy loss function: where i is the index of the data sample and j is an sentiment category, N is the total number of samples, C is the number of sentiment categories, y j i is the real sentiment category, and y j i is the predicted sentiment category.

IV. EXPERIMENTS A. DATASETS AND EXPERIMENTAL SETTINGS
To demonstrate the effectiveness of our proposed model, we tested the performance of the AKM-IGCN model on the four Chinese datasets(Car, Phone, Notebook, Camera) [47], [48], [49]. The sentiment polarity in the Chinese datasets was classified as positive and negative.
Inspired by the [50], the statistical distribution of the Chinese datasets and six English benchmark datasets is shown in Table 2. In the ABSA, since the Restaurant14, Restaurant15, Restaurant16, MAMS and Twitter datasets are the most commonly used and popular benchmark dataset and have abundant sources, we chose them to test the universality of our model. In which Restaurant14 datasets are from the SemEval2014 task4 [51], Restaurant15 datasets are from the SemEval2015 task12 [52], Restaurant16 datasets are from the SemEval2016 task5 [53], MAMS datasets are multiple aspects [54], the Twitter dataset is a collection of tweets [55]. We also conducted experiments on the low-resource English dataset Tshirt [56] to make tests more universal. The sentiment polarity in the English datasets was divided as positive, neutral and negative.
In our implementation, we carefully validate the effectiveness of our AKM-IGCN on nine ABSA datasets. In particular, we employ pretrained model BERT to initialize the word embeddings. To avoid overfitting, we apply dropout on the word embeddings with a drop rate. The specific hyperparameters are shown in Table 3.

B. EVALUATION METRICS
To evaluate the performance of the model, we denoted the Accuracy(Acc.) and Macro-F1(MF1.) values as the evaluation metrics. The implication of accuracy is how many samples are predicted as the correct proportion in the total samples.

Acc =
TP + TN TP + TN + FP + FN (16) MF1 is the F1 score averaged across all sentiment categories, and the F1 score is the harmonic average of Recall and Precision. Where, the value of C is equal to 3, because there are three categories altogether (Positive, Neutral and Negative).
Precision is the proportion of true correct samples in the sample predicted to be correct, Recall is the correct sample proportion across all correct samples.

C. COMPARED METHODS
To comprehensively evaluate the performance of our model on Chinese and English datasets, we introduce recent models for the ABSA task for comparison, including some state-ofthe-art models. These models include context-based models, syntax-based models and external knowledge-based models.
We compare the proposed model(AKM-IGCN) with the following methods: • TD-LSTM [19] models the context before and after aspect words separately, and uses two LSTM layers, forward and reverse, for fine-grained sentiment classification.
• MGAN [57] proposes a coarse-grained and fine-grained classification task and introduces an attention mechanism to learn the interactions between aspect words and context words.
• RAM [25] leverages a bidirectional LSTM and designs a multi-attention mechanism for sentence representation.
• ATAE-LSTM [3] utilizes LSTM and attention mechanism to interact information by splicing aspect words and context, and explore the connection between aspect and context.
• IAN [4] exploits LSTM to encode context and aspect words separately, and then uses an interactive attention network to extract categorical features in context and aspect words, respectively.
• MemNet [32] proposes multiple attention mechanisms to obtain the importance of context, and the upper layer outputs This allows the lower layer to obtain more accurate information.
• AOA [28] designs an interactive attention mechanism to jointly model context and aspect words.
• ASGCN [7] proposes to learn specific aspects of feature representation through GCN and dependency trees to solve the problem of long-distance multi-word dependencies for the first time.
• InterGCN [58] refines the graph by considering syntactic dependencies between context words and aspectspecific words, resulting in inter-aspect graphs.
• DGEDT [31] utilizes the planar representation and graph-based representation learned by Transformer to enhance the capability of GCN on ABSA task.
• RGAT [32] is applied for sentiment classification by efficiently encoding syntactic information and pruning a common dependency parse tree.
• TD-GAT [59] proposes a new ABSA method based on target-dependent graph attention network (TD-GAT), which explicitly exploits the dependencies between words.
• BiGCN [30] uses a two-layer graph convolutional neural network to fuse a hierarchical syntactic graph and a hierarchical vocabulary graph, making full use of the syntactic information between words, and then combining the word co-occurrence information to classify sentiment.
• Sentic-LSTM [42] is applied for sentiment classification by efficiently encoding syntactic information and pruning a common dependency parse tree.
• MTKFN [35] utilizes multiple sources of knowledge points to improve the ability of the model in ABSA.  [50], with are retrieved from [68], with are reported based on the open source codes. The optimal performances are in bold, and the second-best results underlined.
• SK-GCN [12] models syntactic dependency trees and commonsense knowledge graphs through syntax and knowledge.
• BERT-BASE [15]  • AEN-BERT [60] designs an attention encoding network to model the relationship between context and specific aspects.
• MAN-BERT [61] constructs two attentions with position function for sentiment classification and uses BERT as the encoder.
• ASGCN-BERT [7] On the basis of ASGCN, BERT is used as the encoder.
• TGCN-BERT [62] proposes an approach to exploit the ABSA task with type-aware graph convolutional networks. BERT is denoted as the encoder.
• LCF-APC [50] proposes a multilingual and multitaskoriented model, and assists sentiment classification with the ATE subtask.
• AM-word-BERT [66] employs a masked attention mechanism to eliminate input noise by ignoring parts that are less relevant to the aspect.
• depGCN-BERT [63] adopts a self-attention method, which takes the output of the sentence encoding layer as input to obtain the latent graph structure, and further improves the original syntactic information through a graph convolutional network and gating mechanism.
• kumaGCN-BERT [64] adopts the latent graph structure, pays attention to the information of individual text words that have direct grammatical dependencies with aspect words, and filters out the grammatical information of texts that are indirectly dependent on them.
• SK-GCN-BERT [12] leverages syntax and knowledge to jointly model dependency trees and knowledge graphs.
• DGEDT-BERT [31] uses BERT as an encoder based on the DGEDT model.
• SPRN-BERT [65] designs a complex aspect-level sentiment analysis network model to extract contextual and aspect features through multi-channel convolution.
• AFGCN-BERT [58] designs an aspect focus graph to capture key aspect words and context words.
• AKM-IGCN is a novel model that affective knowledge augmented dependency tree to capture aspect-specific affective dependencies.

D. MAIN RESULTS AND ANALYSIS
In the four Chinese and six English comment datasets, the experimental results of the comparison model and the AKM-IGCN model proposed in this paper are shown in Table 4 and Table 5.
In the Chinese dataset, except for the Car dataset, all others have achieved state-of-the-art methods. This may be because the aspects in Car contain a large number of proper nouns in the automotive domain, which undoubtedly makes the task of identifying aspect sentiment more difficult.
In the English dataset, ATAE-LSTM connects the aspect word encoding information with the context encoding, but the simple connection fails to effectively utilize the effect of aspect words on sentiment prediction. The IAN model further considers the interaction between aspect words and targets. Compared with ATAE-LSTM, the accuracy rate Acc is improved on the three public datasets. Except for REST14, others have achieved good results on AKM-IGCN, especially on the REST15, REST16, and MAMS datasets, which are 1.15, 5.70, and 1.39 higher than the second-ranked Macro-F1, respectively. This is due to the fact that external affective knowledge largely helps GCN to extract sentiment features between aspects. Although the AKM-IGCN model is better than or close to the compared models in the TWITTER dataset and the REST14 dataset, the gap with other models is not obvious, which may be due to the fact that the TWITTER dataset is derived from social networks, and its insufficient VOLUME 10, 2022  [68], with ‡ are retrieved from [70], with are retrieved from [69], with are reported based on the open source codes, with † are retrieved from the original papers, and others are reported from [58].
normativeness of sentences and insensitive grammar. The reason for REST14 is that some sentences contain only one aspect, and the complexity of the entire dataset is relatively large.
Our model obtain state-of-the-art performance on the REST16 dataset and Tshirt datasets. Compared with InterGCN, the accuracy and Macro-F1 of our proposed AKM-IGCN are better than InterGCN on all datasets except the Acc of the REST14 dataset, indicating that AKM-IGCN excavates and utilizes the sentiment information ignored by InterGCN, which also verifies that external affective knowledge can effectively enhance the ability of InterGCN to capture the affective dependencies of inter-aspect words based on the syntactic information between words. Besides, the proposed AKM-IGCN also alleviates the consequences of grammar parsing errors, resulting in better results in aspect-based sentiment classification tasks.

E. ABLATION STUDY
To further test the impact of each component of AKM-IGCN on performance, we conduct an ablation study on AKM-IGCN, and each model is described as follows: AKM-IGCN w/o MHSA: The model consists of an interactive GCN layer enhanced with external affective knowledge and a Layer Normalization module. The GCN layer is used to learn the syntactic feature representation of each sentence, and the Layer Normalization module is introduced for interactive learning of syntactic features and global semantics feature.
AKM-IGCN w/o LN: The model consists of an interactive GCN layer enhanced with external affective knowledge and a multi-head self-attention layer, and the MHSA layer extracts richer contextual semantics and sentiment by establishing semantic relationships between aspects and contexts information.
AKM-IGCN w/o AK: The model is composed of an interactive GCN layer and an information interactive learning layer (MHSA+LN), which lacks the utilization of contextual affective knowledge of specific aspects.
AKM-IGCN w/o dependency tree: We find that removing the ''dependency tree'' causes decreased performance, which indicates that the dependency treebased GCN can improve the quality of the dependent representation.  AKM-IGCN w/o MHSA+LN: The model only enhances the interactive GCN layer with affective knowledge, making the contextual affective knowledge of aspects more fully utilized.
AKM-IGCN: A complete model that considers interactively learning syntactic features and global semantic features, as well as contextual word-aspect dependencies and sentiment information between opinions and aspects.
The performance comparison of the ablation experiment is shown in Table 6 and Table 7. It can be seen that the Acc and F1 values of the AKM-IGCN w/o MHSA+LN model are generally inferior to other models in the five datasets, which indicates that the MHSA and LN modules can learn interactively Syntactic and semantic features. The performance of AKM-IGCN w/o MHSA is generally better than that of AKM-IGCN w/o LN model, but there is still a gap compared with AKM-IGCN, which indicates that although the syntactic dependency information is beneficial to the recognition of sentiment polarity. The performance of AKM-IGCN w/o AK model is worse than AKM-IGCN on all Chinese and English datasets, but better than AKM-IGCN on TWITTER dataset, probably because Twitter dataset is more sensitive to semantic and syntactic information. The performance of AKM-IGCN w/o AK on the REST14 dataset is close to AKM-IGCN, which indicates that the REST14 dataset is less sensitive to sentiment knowledge. It can be seen that the affective knowledge contributes the most to the AKM-IGCN model, followed the information interaction layer composed of MHSA+LN module. But the influence of interactive GCN on extracting inter-aspect features cannot be ignored. We enhance the dependency representation of each sentence by SenticNet, which can greatly facilitate the model to use sentiment features to judge the sentiment polarity of a given aspect.

F. IMPACT OF GCN LAYERS AND MHSA HEADS
In the experiment, the number of GCN layers is set from 1 to 8, and the Acc and MF1 score on the all datasets are shown in Figure 7. It can be observed that when the number of GCN layers is 2, the model achieves the best results in terms of Acc and MF1 score, which also shows that it is reasonable to set the number of GCN layers to 2 in this paper. The parameters increase, the model training becomes more difficult, and the model performance gradually decreases.
In addition, this paper also researches the effect of the heads of MHSA on the final performance of AKM-IGCN when the number of GCN layers is 2. We vary the number of MHSA heads from 1 to 20 to test Acc and Macro-F1 score of AKM-IGCN on Phone and REST14 datasets. The experimental results are shown in Figure 8. It can be seen that VOLUME 10, 2022   the accuracy rates Acc and Macro-F1 score of AKM-IGCN are the largest when the number of heads of MHSA is 12, and then begin to decline. Considering various factors, this paper sets the number of GCN layers and MHSA heads to 2 and 12.

G. CASE STUDY
To better understand how AKM-IGCN works, Figure 9 shows the attention visualization of the predictions for sentences and aspect in MHSA, ASGCN-DG, InterGCN and AKM-IGCN models. Among them, the depth of the color indicates the importance of the opinion word with sentiment to the aspect word. The darker the color, the more important the opinion word is to the aspect word.
For example, the first sentence ''The restaurant is amazing but the food is pricey!'' contains two aspects of the opposite polarity, which may prevent attention-based models from accurately aligning these aspects with their related descriptors. The second example ''The place is cute but not upscale'' contains double negation words, which can easily lead to incorrect predictions by the model. MHSA failed twice in these three examples. With positional weights and syntactic dependencies, both ASGCN-DG and IntertGCN correctly handle different sentiment polarities in multiple aspects, which means that GCN effectively integrates syntactic dependency information into a rich semantic representation.
Both AKM-IGCN and InterGCN can correctly identify double negative words, but from the visualization results, it can be seen that AKM-IGCN with enhanced external affective knowledge has more obvious attention on aspect words and context words than InterGCN. This is because we exploit external affective knowledge to augment interactive GCN, which captures sentiment dependency features between different aspect-words. Moreover, we leverage MHSA to further model multiple aspects and obtain richer information about hidden features. This means that AKM-IGCN effectively learn the interaction features between multiple aspects and extract the richer contextual semantic information.

V. CONCLUSION AND FUTURE WORK
Most of the existing work ignores the use of external affective knowledge to enhance the sentiment dependency ability of GCN in extracting multiple aspects, and there is little research on Chinese-oriented ABSA compared to English ABSA. To address the above problems, we propose a novel AKM-IGCN model for aspect-based sentiment analysis. The proposed model integrates external affective knowledge, attention mechanism and graph convolutional networks. Which exploits sentiment enhancement graphs, aspect interaction graphs and key aspect graphs of review texts to effectively identify the sentiment polarity of aspect words. The MHSA and LN are used to strengthen the interaction between context and aspect, so that context and aspect are further coordinated and optimized. The experimental results demonstrate the effectiveness of our proposed model in aspect-based sentiment analysis.
Future work will try to apply the new graphs based on affective knowledge to solve the problem that new popular words cannot be queried, and improve the syntactic relationship of different types of words in the text.