IDCN: A Novel Interactive Dual Channel Network for Aspect Sentiment Triplet Extraction

Aspect sentiment triplet extraction (ASTE) is one of the important subtasks of aspect-based sentiment analysis, it aims at detecting the aspect terms, opinion terms, and the corresponding sentiment polarity, simultaneously. Most methods directly employ GCNs to capture the syntactic dependency information in ASTE. However, these methods may lead to error propagation. Besides, the GCN-based methods are weak at capturing sequence information and long-distance information. The general neural networks such as LSTM are good at capturing this kind of information. However, these general neural networks are weak at modeling syntactic dependency information. To alleviate the above problems, we propose a novel interactive dual channel network (IDCN) for ASTE. In IDCN, an interactive word pair generating (IWPG) module is designed to model the sequence information, long-distance dependency information, and correlation relations between word pairs, simultaneously. In the IWPG module, the dual channels can learn different representations. Based on these representations, the informative word-pair representations can be learned by the interaction mechanism of dual channels. Besides, we design the syntactic dependency fusion module to model the syntax dependency information by constructing word pair dependency relation tensors and pooling mechanism, which can naturally inject the syntactic dependency knowledge into the general neural networks and reduce error propagation. Abundant experiments have been performed on multiple datasets. The experimental results show that IDCN acquires state-of-the-art results and validates the effectiveness of IDCN.


I. INTRODUCTION
Aspect-based sentiment analysis (ABSA) has become a hot topic of research in natural language processing (NLP) [1], [2]. In the ABSA task, there are three basic tasks: aspect term extraction (ATE), opinion term extraction (OTE), and aspect level sentiment classification (ALSC) [3]. The ATE task aims at detecting aspect terms expressed in a specific sentence, the OTE task aims at detecting opinion terms expressed in a specific sentence, and the ALSC task aims at predicting the sentiment polarity toward a specific aspect term in a sentence [4]. Aspect sentiment triplet extraction (ASTE) is a combination of these three separate tasks. The ASTE task focuses on extracting aspect sentiment triplets that are expressed in a The associate editor coordinating the review of this manuscript and approving it for publication was Agostino Forestiero . sentence [5]. An aspect sentiment triplet consists of the aspect term, opinion term, and the associated sentiment polarity. The ATE, OTE, ALSC, and ASTE tasks are all illustrated in Fig. 1.
In the previous works of the ASTE task, there are three kinds of paradigms. The first type is the pipeline method. The pipeline methods extract aspect term, opinion term, and sentiment polarity of the aspect sentiment triplet independently [5]. The pipeline methods have achieved great success. However, the pipeline methods ignore the close relation between aspect term, opinion term, and sentiment polarity in the aspect sentiment triplet. And the error generated by the former approaches has a great influence on the performance of the subsequent approaches in pipeline methods. To alleviate the problems mentioned above. Some works treated the ASTE task as the machine reading comprehension task [3], [6]. The methods based on machine reading VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. A sentence with its dependency relations is used to illustrate the ATE, OTE, ALSC, and ASTE tasks. The aspect terms and opinion terms are marked in blue and yellow, respectively. The positive sentiment polarity and the negative sentiment polarity are marked in red and green, respectively.
comprehension approaches are the second type in the ASTE task. The multiple subtasks are trained jointly in the machine reading comprehension approaches. The third type is the endto-end method [7], [8], [9]. The end-to-end methods focus on extracting the aspect sentiment triplets by developing different kinds of tagging schemes based on pre-trained language models, such as BERT [10] and BART [11]. Graph convolutional networks (GCN) have obtained significant performance in many fields, such as the recommendation system, community detection, aspect-level sentiment classification, and so on. Recently, GCN has been also applied to the ASTE task which is constructed by syntactic parsing in a sentence [12]. Although these GCNbased methods have achieved excellent results, there still exist several challenges in the ASTE task.
These GCN-based methods are heavily dependent on the quality of the dependency tree of a sentence in ASTE. The errors from the syntactic parser which are out of the box can be propagated to GCN-based methods, and the model may learn the wrong and noise information for ASTE. Secondly, in ASTE, GCN directly operates on the dependency graph and it ignores the sequence information and longdistance dependency information in a sentence. Thirdly, the dependency relations can help the ASTE task to some extent. However, it is difficult to utilize general neural networks to model syntax dependency information. How to effectively integrate the syntax dependency information with general neural networks, such as Bi-LSTM, remains a challenge.
In this paper, we propose a novel interactive dual channel network (IDCN) to alleviate the aforementioned problems. Firstly, we design an interactive word pair generating (IWPG) module to model the sequence information and long-distance information, as well as to capture the correlation relations between words in IDCN. In the IWPG module, two channels can learn different representations. Based on these representations, the informative word-pair representations can be obtained by the interaction mechanism of dual channels. Secondly, in order to model the sequence information and long-distance dependency information. The dual Bi-LSTM networks are employed in the IWPG module based on BERT. Thirdly, we develop a novel way to model the syntax dependency information and design the syntactic dependency fusion module in the IDCN model. We transform the dependency relations between words into the word pair dependency relation tensors. These word pair dependency relation tensors contain informative syntax dependency information about each word. The pooling mechanism is used to generate the syntactic dependency-aware word embeddings. And then these syntactic dependency-aware word embeddings are fed into the IWPG module. In short, the syntax dependency information can be learned by utilizing the word pair dependency relation tensors and the pooling mechanism in the IDCN model. The IDCN model does not operate on the dependency tree of a sentence. Hence, it can avoid error propagation from the dependency tree. The main contributions from this research are the following.
• We propose a novel IDCN model for the ASTE task.
The IDCN model can simultaneously model the correlation relation between words and capture the sequence information, long-distance dependency information, and syntactic dependency information in a sentence.
• We propose a novel interactive word pair generating (IWPG) module in IDCN. IWPG designs dual channels to interactively learn the informative word-pair representations and capture the semantic relations between word pairs.
• We propose a novel way to utilize the syntax dependency information and design the syntactic dependency fusion module in IDCN, which can naturally inject the informative syntax dependency information into the general neural networks such as Bi-LSTM, and alleviate the error propagation.
• Extensive experiments are conducted on multiple benchmark datasets. The experimental results show that IDCN obtains state-of-the-art results. Further, these results also verify the effectiveness of IDCN.

II. RELATED WORK
Sentiment Analysis (SA) can be divided into three granularities: document-level sentiment analysis, sentence-level sentiment analysis, and fine-grained aspect-based sentiment analysis (ABSA) [13]. In document-level sentiment analysis, it assumes that a whole document only contains a single sentiment polarity. In sentence-level sentiment analysis, it assumes that a whole sentence only contains a single sentiment polarity, while a document can contain multiple different sentiment polarities. In aspect-based sentiment analysis, it assumes that a whole sentence can contain several different sentiment polarities [14], [15]. In ABSA, there are multiple basic subtasks: aspect term extraction (ATE), opinion term extraction (OTE), and aspect-level sentiment classification (ALSC). Recently, the aspect sentiment triplet extraction (ASTE) task is developed as a new subtask of ABSA. In essence, the ASTE task is a combination of three basic subtasks and is completely different from these three basic subtasks. The ASTE task aims at extracting aspect terms, opinion terms, and the corresponding sentiment polarity in a sentence, simultaneously. The ATE task only focuses on extracting the aspect terms in a sentence, the OTE task only focuses on extracting the opinion terms in a sentence, and the ALSC task only focuses on predicting the sentiment polarity toward an entity or a specific aspect of the entity in a sentence. The ASTE task is more complex and harder than the ATE task, the OTE task, and the ALSC task.
In the ATE task, Hu and Liu [16] were the first to study detecting the aspect terms in a sentence. They developed several association rules to extract aspect terms. To further improve the performance of the rule-based methods, Popescu and Etzioni [17] applied the pointwise mutual information (PMI) in ATE. Tubishat et al. [18] developed new rules and the combination of dependency-based rules and pattern-based rules for explicit aspect term extraction. And they proposed the improved whale optimization algorithm (IWOA) for the selection of different rules. Ozyurt and Akcayol [19] proposed a sentence segment LDA (SS-LDA) to alleviate the data sparsity problem and the lack of co-occurrence patterns in the short sentence.
Zhang et al. [20] proposed a CNN-based model with filters that are dynamically generated by the aspect informa-tion. Venugopalan and Gupta [21] proposed an enhanced guided LDA model with BERT to extract aspect terms. The BERT model is employed as the semantic filter to enhance the ability to incorporate semantics. To save training time, Kumar et al. [22] proposed a novel hierarchical self-attention network (HSAN), and they fused two-attention mechanisms to improve the performance of the model. Chen and Qian [23] proposed a novel active domain adaptation method for ATE. They developed the syntactic bridge and the semantic bridge to transfer knowledge across different domains. Klein et al. [24] proposed another cross-domain aspect term extraction method by multi-task learning. They utilized the relational features to improve the performance of the model.
In general, the OTE task is often viewed as a co-extraction task with other subtasks of ABSA. Yu et al. [25] proposed a multi-task learning framework for the aspect and opinion terms co-extraction. They developed a global inference method to model the intra-relation and inter-relation between these two tasks. Zhao et al. [26] treated the co-extraction task as the aspect-opinion pair extraction (AOPE) problem. They developed a multi-task learning framework based on shared spans. The aspect and opinion terms are jointly recognized by the span representations. Gao et al. [27] viewed the AOPE task as a machine-reading comprehension task and proposed a question-driven span labeling model (QDSL). In QDSL, they explored the internal relation between aspect terms and opinion terms to identify the aspect-opinion pairs in a sentence.
Zhang et al. [28] proposed a two-stage neural network model for the quadruple extraction in ABSA. The elements in the quadruple extraction are the aspect term, aspect category, opinion term, and the associated sentiment polarity, respectively. Dai et al. [29] investigated reinforcement learning in OTE and proposed a Padding-Enhanced Reinforcement learning model (PER). In PER, a multiplex heterogeneous graph is proposed to model the sequential information and syntactic information. To address the problem of lacking labeled data, Wu et al. [30] proposed a hybrid unsupervised method for the ATE and OTE tasks. The GRU network is used to extract aspect and opinion terms with the pseudo-labeled data.
In the ALSC task, Tang et al. [31] proposed the targetdependent LSTM (TD-LSTM) and the target-connection LSTM (TC-LSTM). In TD-LSTM, the left and right parts of a sentence are fed into the model, separately. TD-LSTM is weak at capturing aspect-dependent sentence information. In order to further model the relation between aspect terms and the contexts, aspect embedding and word embedding are concatenated in TC-LSTM. Wang et al. [32] proposed an attention-based LSTM with aspect embedding (ATAE-LSTM) model for predicting the sentiment polarity towards an aspect. The ATAE-LSTM model utilized the attention mechanism and aspect embedding to obtain the aspect-dependent sentence representation. Xue and Li [33] proposed a gated convolutional network with aspect embedding (GCAE). In GCAE, the novel gated tanh-relu units can generate aspect-dependent sentiment features.
To tackle the aspect sentiment bias generated by the pre-trained model, Cao et al. [34] proposed a no-aspect differential sentiment (NADS) framework. And they utilized contrastive learning between the raw sentence and the sentence template to improve the robustness. Zhao and Yu [35] proposed a knowledge-enabled language representation model BERT for ALSC, the sentiment knowledge is learned by a sentiment knowledge graph in the language representation model. Li et al. [36] proposed a dual graph convolutional networks (DualGCN) model for ALSC. DualGCN can model the semantic and syntax information simultaneously. Liang et al. [37] utilized the SenticNet to inject the affective knowledge into the model. They developed a graph convolutional network based on SenticNet to model the contextual affective knowledge toward an aspect. To utilize the dependency types between words, Tian et al. [38] proposed a type-aware graph convolutional network (T-GCN). In T-GCN, the attention mechanism is employed to model different relations. Li et al. [39] investigated the generative model for ALSC and proposed a joint term-sentiment generator (JTSG) model. In JTSG, the encoder is used to encode the sentence information, and the decoder is used to predict the aspect term and its corresponding sentiment polarity. Yadav et al. [40] proposed a novel human-interpretable learning method for ALSC, they utilized the Tsetlin Machine (TM) to learn the particular sentiment toward the specific aspect.
Peng et al. [5] were the first to study the ASTE task. They proposed a two-stage framework to extract triplets from a sentence. In particular, the first stage of the model can output the aspect term, opinion term, and sentiment polarity. The second stage of the model can generate the triplet based on the outputs from the first stage. Mao et al. [3] viewed the ASTE task as a machine reading comprehension (MRC) problem. They developed two shared BERT-MRC models to solve two MRC problems which are constructed for the ASTE task. Meanwhile, Chen et al. [6] viewed the ASTE task as a multi-turn machine reading comprehension (MTMRC) problem. They designed three types of queries to model the relation between subtasks, and they proposed the bidirectional MRC (BMRC) to detect triplets in a sentence. Wu et al. [9] proposed a novel end-to-end framework that is based on the developed grid tagging scheme (GTS). In GTS, an effective inference strategy was designed to model the mutual indications between opinion factors. Zhang et al. [41] proposed a multi-task learning framework to jointly extract three elements of a triplet. Xu et al. [8] focused on the multi-word aspect and opinion terms and proposed a spanlevel approach. Besides, they developed a span pruning strategy for reducing the computation cost.
Yan et al. [42] investigated the ASTE task by the generation approach and proposed a unified generative framework. They utilized the pointer indexes and sentiment class indexes to extract the aspect term, opinion term, and the corresponding sentiment polarities. To model relations between words in ASTE, Chen et al. [12] proposed an enhanced multi-channel graph convolutional network (EMC-GCN) model. They also employed four linguistic features to improve the performance of the EMC-GCN model. All these methods have achieved great success. However, these methods ignore modeling the syntactic dependency information through the general neural networks, such as Bi-LSTM. And the GCN-based models are weak at capturing the sequence information and longdistance dependency information. Besides, the dependency trees generated by the external dependency parsers can have great influence on the performance of the GCN-based models.

III. METHODOLOGY
In this section, we describe detailedly the architecture of the proposed IDCN model.

A. TASK DEFINITION
Given a sentence S = {w 1 , w 2 , . . . , w n }, n is the total number of all words in a sentence, the goal of the ASTE task is to extract all triplets T = {(a 1 , o 1 , s 1 ), . . . , (a m , o m , s m )}from a specific sentence S, m is the total number of all triplets in a specific sentence S, (a i , o i , s i ) denotes the aspect term, opinion term, and the corresponding sentiment polarity of the i-th triplet, separately. The sentiment polarity s of the aspect term a has three labels, they are positive, negative, and neutral. And the sentiment polarity belongs to one of three labels.

B. OVERVIEW OF IDCN
In order to capture the correlation relations between word pairs, the sequence information, and long-distance dependency information in a sentence, to alleviate the wrong and noise information that is introduced by the off-the-shelf dependency parser, and to model the syntax dependency information using the general neural networks, we proposed an end-to-end framework, named interactive dual channel network (IDCN). The overall framework of the proposed IDCN is illustrated in Fig. 2.
In IDCN, a novel interactive word pair generating (IWPG) module is designed to learn the informative word pair representation and model the correlation relations between word pairs. In the IWPG module, the feature learning layer is used to capture the sequence information and long-distance dependency information, as well as to model the syntactic dependency information from word pair dependency relation tensors. And the interaction layer is used to capture the comprehensive correlation relation between two words and output the informative word pair representation. The syntactic dependency fusion module is used to inject the syntactic dependency information into the model and generate the syntactic dependency-aware word embeddings from the constructed tensors. The embedding layer is employed to generate the contextualized word embeddings, it can provide general knowledge information for the model. In general, the pre-trained language model such as BERT is trained on the large-scale general data and can learn the inherent knowledge from the large-scale general data.

C. EMBEDDING LAYER
The embedding layer consists of BERT, the pre-trained BERT model can generate the contextual word embeddings and provide a lot of general knowledge for the model. The contextual word embeddings can capture the more precise meaning of a word than the fixed word embedding according VOLUME 10, 2022 to the dynamic context of a specific sentence. The general knowledge information can enhance the extraction ability of the model to some extent. The calculation formula is as following: where w i is the i-th input of a sentence S, h i ∈ R n×d bert is the i-th output of BERT, n is the total length of a sentence S, d bert is the dimension of hidden contextual representations in BERT. The symbol BERT (·) denotes all operations of the BERT model.

D. SYNTACTIC DEPENDENCY FUSION MODULE
The syntactic dependency fusion module is designed to inject the syntactic information into the model and generate syntactic dependency-aware word embeddings. Firstly, we transform a specific sentence into a word pair dependency matrix. Each element of the word-pair dependency matrix denotes the syntactic dependency relation between two words, such as nsubj and compound. Then, the word pair dependency relation tensors can be obtained by the lookup table operation. The calculation processes can be obtained as following: wherer ij ∈ R k is the one-hot vector of the syntactic dependency relation r ij between word i and word j, k is the total number of syntactic dependency relations. E ∈ R k×d r is the embedding matrix and is the learning parameter of the model. Then the word pair dependency relation tensors R ∈ R n×n×d r can be obtained, R i,j,: = r D ij . After we get the word pair dependency relation tensors R, we can obtain the syntactic dependency-aware word embeddings by the pooling mechanism such as average pooling and max-pooling, or the complicated networks such as attentionbased methods. In other words, the syntax dependency information can be learned by utilizing the word pair dependency relation tensors and the pooling mechanism or complicated networks in IDCN. The proposed IDCN does not operate on the dependency tree of a sentence. Hence, it can avoid error propagation from the dependency tree.
Besides, we have performed experiments on the average pooling mechanism and the max-pooling mechanism. We found that the average pooling mechanism performs better than the max-pooling mechanism in the syntactic dependency fusion module. To further improve the performance of the model and simplify the model, we use the average pooling mechanism to obtain the syntactic dependency-aware word embeddings. The average pooling mechanism can be viewed as an aggregation of syntactic relations between the current word and its syntactically adjacent words.
The calculation processes can be obtained as following: where R i,:,: ∈ R n×d r denotes the i-th slice of R, the symbol average_pooling(·) denotes the average pooling operation along the first dimension, h d i ∈ R d r is the syntactic dependency-aware word embedding of the word w i , H D ∈ R n×d r denotes the embedding matrix of the specific sentence S, the symbol '';'' denotes the concatenation of vectors along the first dimension. By the pooling mechanism, word pair dependency relation tensors can be easily converted to syntactic dependency-aware word embeddings. These syntactic dependency-aware word embeddings can be naturally utilized and modeled by the general neural network models such as Bi-LSTM. And the syntactic dependency information can be injected into the model by the syntactic dependency fusion module.

E. IWPG MODULE
The IWPG module consists of the feature learning layer and the interaction layer. In the following sections, we discussed them in detail.

1) FEATURE LEARNING LAYER
The feature learning layer is used to model the sequence information, long-distance dependency information, and syntactic dependency information. The feature learning layer consists of dual channels, and each channel can consist of any type of networks such as Bi-LSTM, CNN, and transformer. The Bi-LSTM is better than CNN and transformer at modeling the sequence information and long-distance information, hence we employ Bi-LSTM in each channel of the feature learning layer. Besides, we have performed experiments on dual Bi-LSTMs with shared parameters and dual Bi-LSTMs without shared parameters. We found that the dual Bi-LSTMs with shared parameters perform better than the dual Bi-LSTMs without shared parameters in the left and right channels of the feature learning layer. To further improve the performance of IDCN and reduce the parameters of IDCN, the dual Bi-LSTMs are set to share parameters in the left and right channels.
For capturing the sequence information, long-distance dependency information, and syntactic dependency information simultaneously, the contextual word embeddings generated by BERT and syntactic dependency-aware word embeddings generated by the syntactic dependency fusion module are concatenated as the inputs of the feature learning layer.
The calculation processes can be obtained as following: where the symbol BiLSTM (·) denotes all operations of two Bi-LSTMs in the left and right channels, respectively. The dual Bi-LSTMs share parameters in the left and right channels. The symbol '';'' denotes the concatenation of vectors along the second dimension. h l i ∈ R d f and h r i ∈ R d f denote the i-th outputs of the left and right channels, respectively.

2) INTERACTION LAYER
The interaction layer is designed to capture the association relation between two words and generate the informative word pair representation. The association relation between two words in a sentence plays an important role in the ASTE task. For example, in the sentence of Fig. 1, the word ''chicken'' and word ''tikka'', word ''chicken'' and word ''marsala'', word ''tikka'' and word ''marsala'' belong to the same aspect term ''chicken tikka marsala''. To effectively detect the aspect term ''chicken tikka marsala'', the words ''tikka'' and ''marsala'' can provide effective information for the word ''chicken'' and vice versa. The opinion term ''good'' can provide the sentiment information for the aspect term ''chicken tikka marsala''. In a word, the informative word pair representations are important for the ASTE task. The calculation processes of the interaction layer can be obtained as following: where h a ij ∈ R d a is the word pair representation of word w i and word w j and reflects relations between word w i and word w j , d a is a hyper-parameter and denotes the dimension of the word pair representation, and W 2 ∈ R d a ×(d f +d f ) are the learning parameters of the model for the left and right channels in the interaction layer, respectively. The symbol '';'' denotes the concatenation of vectors.
The biaffine transformation is the left channel in the interaction layer, and the concatenation and transformation is the right channel in the interaction layer. The dual channels in the interaction layer can capture and model the more comprehensive association relation between two words. The addition of two channels can generate more informative word pair representations.

F. PREDICTION AND EXTRACTION LAYER
After we get the word pair representation, the word pair representation h a ij is fed into the prediction and extraction layer, and the calculation formula of label distribution can be obtained as following: where the symbol softmax(·) denotes the softmax function, p ij ∈ R q is the label distribution of the word pair, and q is the total number of classes of output labels. Following Chen et al. [12], we design ten kinds of word pair labels: B-AT, I-AT, A-WP, B-OT, I-OT, O-WP, P-S, N-S, NEU-S, and None. The label B-AT denotes any word in the word pair is the beginning of the aspect term. The label I-AT denotes any word in the word pair is the inside of the aspect term. The label A-WP denotes that the word pair forms the aspect term. The label B-OT denotes any word in the word pair is the beginning of the opinion term. The label I-OT denotes any word in the word pair is the inside of the opinion term. The label O-WP denotes that the word pair forms the opinion term. The label P-S, N-S, and NEU-S denote that a word in the word pair is an aspect term and another word in the word pair is an opinion term. And they also denote that aspect-opinion term pair with positive, negative, and neutral sentiment polarity, respectively. The label None denotes that the relations between word pairs do not belong to the above relations.
By the argmax function on the label distribution, we can get the specific label of the word pair. Then the triplets can be obtained from the predicted results of a sentence. Firstly, we can obtain the candidate aspect terms and opinion terms from the predicted results. Specifically, the aspect terms and opinion terms can be identified by the B-AT, I-AT, B-OT, and I-OT labels along the diagonal of the predicted word pair table. The predicted word pair table can be seen in the upper right of Fig. 2. After we get all aspect terms and opinion terms, we utilize the lower triangular table to extract the triplets. We can choose an aspect term and an opinion term from the extracted aspect and opinion terms until all items are selected in the extracted aspect and opinion terms. If an aspect term and an opinion term have any sentiment relation, then they form the aspect-opinion pair. According to the sentiment relation of the aspect-opinion pair, we can obtain the most predicted sentiment relations. Finally, we can acquire a triplet (aspect term, opinion term, sentiment polarity).

G. LOSS FUNCTION
Our goal is to minimize the loss function. The calculation processes can be obtained as following: L w = n i n j y ij log p ij (12) L d = n i n j y ij log r D ij (13) where L w is the standard loss function for the aspect sentiment triplet extraction, L d is the constraint loss for word pair dependency relation tensors, α is the hyperparameter, y ij is the true probability distribution of the word pairs.

IV. EXPERIMENTS A. DATASET
We use two typical datasets for training and testing of the model in experiments, these two typical datasets are all from the SemEval 2014 [43], 2015 [44], and 2016 [45]. The first dataset is collected by Wu et al. [9]. The second dataset is collected by Xu et al. [7], and they corrected some errors based on the dataset proposed by Peng et al. [5]. Both datasets involve the restaurant and laptop domains. The statistics of both datasets are shown in Table 1.

B. EVALUATION METRICS
We use precision, recall, and F1 as the evaluation metrics, which are often used to evaluate the performance of the model in the ASTE task. The calculation formulas of precision, recall, and F1 are given as following: where P and R respectively denote the precision and recall. TP means the true positives, FP means the false positives, and FN means the false negatives.

C. EXPERIMENT SETTINGS
We use Stanza [46] to obtain the syntactic dependency information in a sentence. The basic version of pre-trained BERT implemented by Transformers from Huggingface [47] is employed in the proposed IDCN. The AdamW optimizer is employed as the default optimizer. The learning rate for the BERT is set to 2e-5, and the learning rate for other parameters in IDCN is set to 1e-3. The dropout strategy is used to prevent model overfitting and the value is set to 0.5. The dimensionality of Bi-LSTM is 300. The dimensionality of syntactic dependency-aware word embeddings is the same as the number of word pair relations. The hyper-parameter of loss function is set to 0.01 in D 2 and 14Lap of D 1 datasets, and it is set to 0.1 in other datasets of D 1 . The epoch number is set to 100 and the batch size is 16.

D. BASELINES
In the experiments, we compare the proposed IDCN with the variants of IDCN and the state-of-the-art baselines.

1) VARIANTS OF IDCN
The variants of the proposed IDCN are used to validate the effectiveness of the designed modules or layers in IDCN. The IDCN-v1, -v2, and -v3 are designed to verify the effectiveness of the IWPG module. The IDCN-v4 and -v5 are used to verify the effectiveness of the syntactic dependency fusion module. IDCN-v1 1. : This is the first variant of IDCN. It removes the IWPG module in IDCN. The concatenation of the con-textualized word embeddings generated by BERT and the dependency-aware word tensors generated by the syntactic dependency fusion module is directly fed into the prediction and extraction layer. The concatenation tensors are directly used to predict the labels of word pairs.

IDCN-v2
2. : This is the second variant of IDCN. It removes the interaction layer in the IWPG module. The outputs of the dual Bi-LSTMs are concatenated with the dependency-aware word tensors, and then the dual concatenation tensors are added as inputs to the prediction and extraction layer.

IDCN-v3
3. : This is the third variant of IDCN. It replaces the dual Bi-LSTMs with the dual transformers. The outputs of the dual transformers are fed into the interaction layer. The dual transformers also share the same parameters.

IDCN-v4
4. : This is the fourth variant of IDCN. It replaces the average pooling with the max pooling mechanism in the syntactic dependency fusion module. The outputs of the max pooling operation are viewed as the outputs of the syntactic dependency fusion module.

IDCN-v5
5. : This is the fifth variant of IDCN. It discards the syntactic dependency fusion module in the IDCN. In other words, the inputs of the IWPG module in IDCN-v5 are only the contextualized word embeddings generated by BERT, not involving the syntactic dependency information.

2) STATE-OF-THE-ART METHODS
The state-of-the-art methods are used to compare with the proposed IDCN model. In general, there are three kinds of methods in the ASTE task, which are pipelinebased methods, end-to-end methods, and machine reading comprehension-based methods. In the first stage, the model jointly extracts the spans of aspect terms, opinion terms, and sentiment polarity. In the second stage, the MLP classifier is used to determine whether the candidate triplet can be viewed as the predicted results or not [5]. 2. RINANTE+: This model employs LSTM-CRF to jointly extract aspect terms, opinion terms, and the corresponding sentiment. In the first stage, they utilized rules as weak supervision to model dependency relations between words. In the second stage, it also employs the MLP to output all the valid triplets as CMLA+ [5]. 3. Li-unified-R: This model designs a customized multi-layer LSTM to respectively identify aspect terms with the corresponding sentiment polarity and the opinion terms. In the second stage, it also employs the MLP to output all the valid triplets as CMLA+ [5]. 4. Peng-two-stage:This model extracts the aspect terms with the corresponding sentiment and opinion spans, simultaneously. The GCN is used to model the dependency relations between words for enhancing the performance of the model. In the second stage, it also employs the MLP to output all the valid triplets as CMLA+ [5]. 5. Peng+IOG: This model combines Peng-two-stage and the IOG model [48] for the ASTE task [9]. 6. IMN+IOG: This model combines the IMG model [49] and the IOG model [48] for the ASTE task [9]. 7. GTS-CNN: This model views the ASTE task as a unified tagging task, and designs a novel grid tagging scheme for the ASTE task in an end-to-end fashion instead of a pipeline fashion. Specifically, GTS-CNN employs the CNN model to extract the triplets [9]. 8. GTS-BiLSTM: This model is also based on the grid tagging scheme presented in GTS-CNN. GTS-BiLSTM employs Bi-LSTM to extract the triplets [9]. 9. GTS-BERT: This model is also based on the grid tagging scheme presented in GTS-CNN. GTS-BERT employs BERT to extract the triplets [9]. 10. OTE-MTL: This model is a multi-task learning framework for the opinion triplet extraction task, which jointly detects aspects, opinions, and sentiment dependencies. It employs Bi-LSTM for sentence encoding [41]. 11. JET-BERT: This model is based on a position-aware tagging scheme for jointly extracting the aspect sentiment triplets, which can specify the structural information for a triplet. It develops factorized features to effectively capture the interactions between elements in a triplet [7]. 12. S 3 E 2 : This model is a semantic and syntactic enhanced aspect sentiment triplet extraction. Specifically, it designs a graph representation for integrating the syntactic dependency information and semantic information. Besides, the LSTM is used to capture the contextual semantics [50].

EMC-GCN:
This model is the current state-of-theart framework for ASTE. It employs GCN to model the word relations between words. Besides, the partof-speech information, syntactic dependency information, tree-based position information, and relative position information are used to enhance the model performance [12]. 14. BMRC: This model is a multi-turn machine reading comprehension framework. It develops three types of queries for capturing the correlation relation between different subtasks. Two directions in the model can benefit each other and generate the more precise triplets [6].

E. OVERALL RESULTS
In this section, we report all experimental results and discuss the reasons why the proposed IDCN model can obtain better results than the baselines.

1) RESULTS OF VARIANTS
The ablation experiments are used to verify the effectiveness of the important modules or layers of the proposed IDCN. The experiment results of all variants of IDCN on dataset D 1 are shown in Table 2. And the experiment results of all variants of IDCN on the dataset D 2 are shown in Table 3.
We can observe from the experimental results of Table 2 and Table 3 that the proposed IDCN acquires the best precision, recall, and F1 values on all datasets.
The difference between IDCN-v1 and IDCN is that the IDCN-v1 discards the overall IWPG module. The IDCN-v1 model lacks the ability to model sequence information and long-distance information, as well as to capture the correlation relation between word pairs simultaneously. The above information and relation are crucial for the ASTE task. Hence, the absence of the IWPG module leads to the worst performance of the IDCN-v1 model on all datasets. The experimental results of IDCN-v1 verify the effectiveness of the designed IWPG module in IDCN.
There are two layers in the IWPG module, which are the interaction layer and the feature learning layer. The IDCN-v2 and IDCN-v3 are used to verify the effectiveness of them. The difference between IDCN-v2 and IDCN is that IDCN-v2 does not consider the interaction mechanism of dual channels and does not capture the correlation relation between word pairs. Compared with IDCN, the performance degradation of IDCN-v2 validate the importance of the interaction layer. However, only modeling the correlation relation between word pairs cannot guarantee the performance of the model. This is demonstrated by the experimental results of IDCN-v3 on all datasets. The IDCN-v3 model ignores modeling the sequence information. Besides, the experimental results of IDCN-v3 also show that Bi-LSTM is more appreciated than transformers in the feature learning layer of the proposed IDCN model.
The difference between IDCN-v4 and IDCN is that IDCN-v4 uses the max pooling mechanism to obtain the   syntactic dependency-aware word embeddings in the syntactic dependency fusion module. The experimental results of IDCN-v4 validate the effectiveness of the average pooling mechanism in IDCN.
The syntactic dependency information can improve the performance of the model for the ASTE task. The syntactic dependency relation can provide more informative information for the model to extract the aspect sentiment triplet. The difference between IDCN-v5 and IDCN is that IDCN-v5 discards the syntactic dependency fusion module and ignores the syntactic dependency information. The performance degradation on all datasets verifies the effectiveness of the syntactic dependency fusion module. Besides, the experimental results also demonstrate that syntactic dependency relation can enhance the expression ability of the model to some extent.

2) RESULTS OF STATE-OF-THE-ART METHODS
The state-of-the-art methods are used to compare with the proposed IDCN and validate the superior performance of the proposed IDCN. We reported the experimental results of EMC-GCN by running their original codes, and the experimental results of the other state-of-the-art methods are reported from their original papers or Chen et al. [12]. All experimental results of state-of-the-art methods on datasets D 1 and D 2 are reported in Table 4 and Table 5, respec-tively. In Table 4 and Table 5, the symbol '−' denotes that the experimental results are not reported in their original paper.
As shown in Table 4, the proposed IDCN outperforms state-of-the-art models on most of the datasets, it surpasses state-of-the-art models by 0.21%-0.88% F1 values. As seen shown in Table 5, the proposed IDCN achieves the best performance on all datasets, it surpasses state-of-the-art models by 0.56%-1.77% F1 values.
The first reason that the proposed IDCN performs better than other state-of-the-art methods is that the IWPG module in the proposed IDCN can model the sequence information, long-distance information, and the association relation between word pairs, simultaneously. Other state-of-the-art methods lack the ability to capture multiple types of information and relations. The sequence information, long-distance information, and association relation between word pairs are important for the ASTE task. In the feature learning layer of the IWPG module, it can capture the sequence and long-distance information by employing the Bi-LSTMs. The dual channels in the IWPG module can learn different informative representations. By the interaction of dual channels, the interaction layer can capture the association relation between word pairs and generate informative wordpair representations.  The second reason is that the syntactic dependency fusion module can capture the syntactic dependency information between word pairs and inject the informative syntactic dependency information into the general neural network such as Bi-LSTM. In general, the syntactic dependency information is often modeled by GCNs, which are weak at capturing sequence information and long-distance information. In general neural networks, the syntactic dependency information is difficult to combine with the model. In the syntactic dependency fusion module, we construct the word pair dependency relation tensors from the syntactic dependency information between word pairs and then utilize the pooling mechanism to obtain the syntactic dependency-aware word embeddings. In this way, the syntactic dependency fusion module can naturally inject the informative syntactic dependency information into the general network works. Besides, the syntactic dependency information can also be captured and learned in the syntactic dependency fusion module.
The third reason is that the dual architecture of the proposed IDCN model can effectively learn different representations and integrate these different representations to generate the informative word-pair dependent representations. In other words, the dual architecture can provide more useful and comprehensive information for extracting the aspect sentiment triplets.
In order to further validate the generalization ability of the proposed IDCN, we have performed additional experiments on the aspect term extraction (ATE) task. The compared stateof-the-art method is EMC-GCN. The experimental results on all datasets are reported in Fig. 3. In Fig. 3, (a) and (b) report the F1 values of EMC-GCN and IDCN on the datasets D 1 and D 2 , respectively. As seen in Fig. 3, the proposed IDCN acquires the best F1 values on most datasets. The experimental results demonstrate that the proposed IDCN is not only good at the ASTE task, but also good at the ATE task. Besides, the experimental results also verify the effectiveness of the proposed IDCN.

3) EFFECT OF DIMENSION OF FEATURE LEARNING LAYER
We have performed additional experiments to investigate the effect of different dimensions of the feature learning layer of IDCN. Specifically, we employ Bi-LSTMs in the feature learning layer. The performances of different dimensions of feature learning layer are illustrated in Fig. 4. The specific dimensions are 100, 200, 300, and 400, separately. In Fig. 4, (a) reports the F1 values on 14Res and 14Lap of the dataset D 1 , and (b) reports the F1 values on 14Res and 14Lap of the  dataset D 2 . We can observe that the IDCN model acquires the best F1 values when the dimension is 300 on all datasets. When the dimension is less than 300, the performance of the model increases with the increase of dimensions. When the dimension is greater than 300, the performance of the model decreases with the increase of dimensions.

4) CASE STUDY
We report some predicted results of IDCN in Fig. 5. In Fig. 5, the aspect terms are marked in orange, the opinion terms are marked in blue, and the green solid line denotes that the aspect term and the opinion term belong to the same triplet and the corresponding sentiment polarity is positive, the red solid line denotes that the sentiment polarity of a triplet is negative, the symbol ''Gold'' denotes the gold true triplets in a sentence, and ''IDCN'' denotes the predicted results of the proposed IDCN. As seen in Fig. 5, the IDCN model can extract multiple complicated triplets in a sentence. The predicted results further verify the effectiveness of the proposed IDCN.

V. CONCLUSION
In this paper, we propose a novel interactive dual channel network (IDCN) model for the ASTE task. In the IDCN model, the IWPG module is designed to capture the sequence information, long-distance dependency information, and association relations between word pairs, simultaneously. The dual channels can effectively learn different representations. Based on these representations, the informative word pair representation can be obtained by the designed interaction mechanism of dual channels. In order to naturally inject the syntactic dependency information into the general neural networks and alleviate error propagation, the syntactic dependency fusion module is developed. It can model the syntactic relations and provide the syntactic dependency information for the IWPG module by constructing the word pair dependency relation tensors and average pooling. Multiple datasets are used to verify the effectiveness and generalization ability of the proposed IDCN model. The experimental results show that the proposed IDCN model acquires state-of-theart results. In the next step, we plan to apply the proposed IDCN model to other related tasks. Besides, incorporating multi-hop reasoning with the proposed IDCN model can further increase the interpretability of the prediction results.