A Multi-Layer Network for Aspect-Based Cross-Lingual Sentiment Classification

In the recent era, the advancement of communication technologies provides a valuable interaction source between people of different regions. Nowadays, many organizations adopt the latest approaches, i.e., sentiment analysis and aspect-oriented sentiment classification, to evaluate user reviews to improve the quality of their products. The processing of multi-lingual user reviews is a key challenge in Natural Language Processing (NLP). This paper proposes a multi-layer network with divided attention to perform aspect-based sentiment classification for cross-lingual data. It extracts the Part-of-Speech (POS) tagging information of the given reviews, preprocesses them, and converts them into tokens. Furthermore, bi-lingual dictionaries are leveraged to map the converted tokens from one language to another. Given the preprocessed and mapped reviews, vectors are generated by leveraging the multi-lingual BERT and passed to the proposed deep learning classifier. The 10351 restaurant reviews from SemEval-2016 Task 5 dataset are exploited for the prediction of aspect-based sentiment. The results of cross-lingual validation suggest that the proposed approach significantly outperforms the state-of-the-art approaches and improves the precision, recall, and F1 by more than 23%, 20%, and 22%, respectively.


I. INTRODUCTION
In a globalized world, the rapid growth of web technologies, i.e., social media and digital marketing, generates vast amounts of data and sets new trends. For the evaluation and analysis of such data, researchers and organizations exploit data mining techniques to extract meaningful patterns from the collected data [1]. Among them, NLP is the field that deals with the processing of human-generated text. Sentiment analysis is one of the core tasks of NLP that aims to predict opinion polarity. It often divides the predicted sentiment into three categories (positive, negative, and neutral) that apply to nearly every domain, e.g., customer product reviews, political predictions, healthcare, and financial services.
The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Tang . Sentiment analysis of text can be performed on documentlevel, sentence-level, and aspect-level [2]- [4]. Businesses focus on aspect-level product analysis to understand the impact and limitation(s) of products. Such analysis helps in making plans to meet the requirements of consumers. Based on target words of aspects, the aspect-level study predicts sentiments of consumers from the product reviews [5]. The target words of aspects may have explicit/implicit inclusion in the text of product reviews. If the target words of an aspect physically exist in the text of the product review, such aspect is called explicit aspect, otherwise implicit aspect. For example, a review ''The food quality of this restaurant is excellent but very costly'' contains a clear aspect ''food quality'' and has positive sentiment because of an opinion word ''excellent''. At the same time, the review contains an implicit aspect ''food price'' and has negative sentiment VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ because of an opinion word ''costly'' [6]. Notably, we focus on explicit aspects in this paper. The recent advancements in technology provide communication channels among people of different regions and countries having different social values, cultures, and languages. Therefore, international companies receive reviews of their products in multiple languages. Since each language has a different grammatical structure, word sense, and background history, aspect-based sentiment analysis of such reviews is challenging for companies [7].
Heretofore, most of the efforts for aspect-based sentiment analysis are related to monolingual [7]. The use of multi-lingual word embedding for the data mapping from one language to another handles cross-lingual data efficiently. However, the performance of multi-lingual word embedding gradually decreases from rich-resource language to poor-resource language because of data unavailability [8]. To this end, different neural network-based models are presented to perform multiple NLP tasks [3], [6], [7], [9]- [11], OpenAI [12], ULM-FiT [13], ELMo [14], and BERT [15]. Notably, BERT provides multi-lingual word embedding for more than 100 languages [16] and outperforms the above models for the prediction of sentiments even for poor-resource languages. Similarly, some researches [17]- [19] exploit attention mechanism among different neural networks to extract targeted context. Although many studies are conducted for the aspect-based sentiment analysis, the grammatical properties (Parts-of-Speech (POS)), i.e., nouns, adjectives, and verbs, are ignored that have strong connectivity with aspects and their sentiments. For example, a sentence with its POS tags ''The-DET food-NOUN seemed-VERB pretty-ADV fresh-ADJ and-CCONJ the-DET service-NOUN impeccable-ADJ'' contains nouns, verbs, and adjectives indicating that the food is pretty fresh and service is impeccable. Moreover, homographic words are not targeted in multi-lingual aspect-based sentiment classification due to the small vocabulary size of cross-lingual data.
In this perspective, we propose a deep learning approach for cross-lingual aspect-based sentiment classification by exploiting the multi-lingual BERT and bi-lingual dictionaries. The proposed approach first extracts the Part-of-Speech (POS) tagging information of the given reviews. Second, it preprocesses the reviews and converts them into tokens. Third, it leverages bi-lingual dictionaries to map the tokens from one language to another. Fourth, given the preprocessed and mapped reviews, it generates vectors by exploiting the multi-lingual BERT. Fifth, the generated vectors are pass to the proposed deep learning classifier for training. Finally, the proposed approach is evaluated with a multi-lingual dataset for the aspect-based sentiment classification. The results of cross-lingual validation suggest that the proposed approach is significant and improves the precision, recall, and F1 by more than 23%, 20%, and 22%, respectively. This paper makes the following contributions: • A data modeling technique is introduced to map cross-lingual data from one language to another by exploiting bi-lingual dictionaries.
• A deep learning-based neural network is proposed for aspect-based sentiment classification to utilize the proposed data modeling technique for cross-lingual data effectively. It uses a divided attention mechanism for paying attention to POS tagging information.
• The proposed approach is compared with state-of-the-art approaches for the performance evaluation. The evaluation results of cross-lingual validation suggest that the proposed approach is accurate and outperforms the stateof-the-art approaches.
The rest of the paper is divided as follows. Section II presents the related work. Section III explains the proposed approach. Section IV describes the evaluation process of the proposed approach, compares the performance results of the proposed approach with the baseline approaches, and explains the threats. Section V concludes the paper.

II. RELATED WORK
Sentiment analysis has a significant impact on digital marketing and customer reviews. Therefore, most of its applications target such domains [20]. Cambria et al. [21] argued that most of the researchers consider sentiment analysis as a simple task, but in reality, it is a complex problem. The authors further explained that the primary aim of NLP is to achieve human-like performance in NLP tasks and identified fifteen issues that need to be solved to achieve human-like performance in the field of sentiment analysis. Such problems are divided into three main layers: a syntactic layer that deals with the pre-processing of the text, i.e., micro text normalization, sentence boundary disambiguation, POS tagging, text chunking, and lemmatization; semantics layer that deals with the deconstruction and normalization of the text, i.e., word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection; and pragmatics layer that solves problems (i.e., personality recognition, sarcasm detection, metaphor understanding, aspect extraction, and polarity detection) by using syntactic and semantics layers.
Tang et al. [22] explained some of the essential applications of sentiment detection that include product comparison, opinion summarization, and opinion reason mining. They also mentioned other valuable tasks that can be performed using sentiment analysis, i.e., political discussion group posting, message sentiment filtering, email sentiment classification, attitude analysis, and sentiment with search engines.
Medhat et al. [23] described the taxonomy of sentiment analysis techniques and divided it into two main categories: machine learning approaches and lexicon-based approaches. Liu [24] further described feature selection as a critical factor for machine learning-based techniques and identified the most common feature selection methods: terms and their frequency, POS tagging, identification of sentiment words and phrases, sentiment shifters, and syntactic dependencies.
Traditionally, a one-hot vector represents textual information that suffers from high dimensionality and poor co-relation problems. To avoid such issues, Bengio et al. [25] replaced the one-hot vector with a low-dimensional distributed representation that becomes a standard technique. Notably, several pre-trained word embedding techniques are used to capture syntactic and semantic information from text, e.g., Word2Vec [26] and Glove [27].

A. MONO-LINGUAL ASPECT-BASED SENTIMENT CLASSIFICATION
The goal of aspect-based sentiment classification is to predict the sentiment polarity to a particular aspect or target word. Schouten and Frasincar [5] categorized aspect-based sentiment analysis into three significant categories: aspect detection, sentiment analysis, and joint-aspect detection and sentiment analysis. Earlier methods for aspect-based sentiment classification are based on rule-based methods, i.e., Hu and Liu [20] used frequent nouns as product features using association rule mining. They exploited WordNet to get the synonym words and extract the sentiment of extracted nouns. Nasukawa and Yi [28] identified the importance of the semantic relationship between sentiment expression and targeted subject. They used syntactic parser and sentiment lexicon to achieve better performance.
In supervised-based machine learning techniques, Jiang et al. [9] introduced the target-dependent features to perform better target sentiment classification on Twitter data using a support vector machine. other researchers [10], [11], [17], [18] also adopted neural network-based models for targeted sentiment classification. Among them, Tang et al. [29] proposed the target-dependent long shortterm memory network to perform aspect-based sentiment analysis tasks. They applied forward and backward LSTM to capture bi-directional information for target words, whereas Huang and Carley [30] constructed parameterized filters and gates to incorporate aspect information in CNN. However, Wang et al. [17], Tang et al. [31], and Chen et al. [11] exploited attention mechanism for aspect-based sentiment classification, attention mechanism with explicit memory to perform aspect-based sentiment classification, and multi-layer attention with recurrent neural network to combine the output of multiple layers.
In the neural network, LSTM with target-level attention is most frequently adopted. However, LSTM has certain limitations, i.e., it is difficult to remember long-term patterns for the complex input sequence. Therefore, few researchers [19], [32] used pre-trained transformer-based models for aspect-based sentiment classification to their success especially BERT [15]. Moreover, Song et al. [19] proposed an attention encoder network and performed target word-based attention on context without using recurrent structure. Zeng et al. [32] proposed a local context mechanism by dividing context into the local context and global context. Sun et al. [33] constructed an auxiliary sentence to convert aspect-based sentiment classification tasks as sentence pair classification tasks.
Although the approaches mentioned above are proposed for mono-lingual aspect-based sentiment classification, most models are built explicitly for mono-lingual data (i.e., English). Therefore, these models cannot process and capture the context information from cross-lingual data in an effective manner.

B. CROSS-LINGUAL ASPECT-BASED SENTIMENT CLASSIFICATION
Tubishat et al. reported that most of the work in NLP is limited to mono-lingual data related to English, Chinese, and French [6]. Due to digital and social media advancement, many other languages need to be explored to use data effectively. The significant challenges of dealing with cross-lingual data are: each language has different word resources, and the grammatical structure of one language is different from other languages. To overcome such challenges, the following approaches are proposed for cross-lingual data.
Based on Dashtipour et al. [34] research work that summarizes the mono-lingual approaches, Deriu et al. [35] leveraged large amounts of weekly supervised data with multi-layer CNN to perform sentiment classification for multiple languages. Moreover, Balahur and Turchi [36] used three different machine translation methods: Bing, Google, and Moses to perform sentiment classification for French, German and Spanish. Similarly, Lambert [37] proposed a machine learning-based model for aspect level sentiment analysis for English and Spanish language. The proposed model is assumed to have parallel annotated data for both source and target languages. If similar data is unavailable, the author used a translation medium to convert source language data into the target language or target language data into the source language. However, this method is highly dependent on parallel data and translation medium.
Zhou et al. [7] Proposed a supervised machine learningbased model (CLOpinionMiner) for Opinion target extraction from cross-lingual scenarios for English and Chinese. The proposed approach used Bing translator online services for translating English annotated data into Chinese. Although this technique was effective, it has certain disadvantages, i.e., this technique depends on the translation medium.
Conneau et al. [38] proposed an unsupervised method for cross-lingual word embedding. They used an adversarial learning method to map source and target language embeddings into the same vector space. This multi-lingual word method supports up to 30 languages. However, the accuracy of the mapping source and target language into the same vector space decreased gradually from resource-rich to resourcepoor language due to the lack of availability of data for resource-poor languages.
Ghadery et al. [8] proposed a multi-lingual n-gram based CNN for aspect category detection in online reviews. The author used multi-lingual word embedding to deal with VOLUME 9, 2021 multi-lingual data. The author divided aspect category detection into three subtasks: entity detection, attribute detection, and aspect category detection. The significant advantage of this method is that it does not depend on translation techniques. However, this method is only used to get pre-defined aspect categories and is less effective for low-resource languages.
In conclusion, researchers have proposed many approaches for aspect-based sentiment classification. However, only three studies [15], [19], [32] focus on the monolingual aspect-based sentiment classification. Our proposed approach differs from the existing approaches in that we apply an attention-based deep learning algorithm for the multi-lingual aspect-based sentiment classification.

III. APPROACH
A. OVERVIEW Fig. 1 depicts an overview of the proposed approach. The multi-lingual aspect-based identification of sentiments of reviews from 4 different languages is essentially a binary class classification. All submitted multi-language reviews are automatically classified into two classes (i.e., positive or negative) against each aspect based on the identified factors. The proposed approach predicts aspect-based sentiments of reviews as follows: A brief introduction is presented as follows: • First, we extract the POS tags from the text of each review.
• Second, we preprocess each review to remove punctuation and stop-words and to split it into tokens.
• Third, we extract the aspects from each review and map them into other languages using ground-truth bilingual dictionaries [38].
• Fourth, we convert each aspect into vectors using multilingual BERT.
• Fifth, we pass the POS information and the generated vectors of each aspect and their classification information to an attention-based convolutional neural network as input for training.
• Finally, we extract the POS tagging information and aspect vectors of new reviews and input them to the trained binary-class classifier to predict their labels (i.e., positive or negative) against each aspect. Each of the essential steps of the proposed approach is presented in the following sections.

B. ANNOTATING EXAMPLES
The following examples are considered to annotate how the proposed approach predicts the aspect-based sentiment of reviews. The example reviews of English and French languages are taken from the restaurant domain of SemEval-2016 Task 5 dataset, respectively. Notably, the dataset is public and contains the annotated reviews. In the above examples, The features (''Text'', ''Language'', and ''Aspect Terms and Polarity'') represent the text of the reviews, the languages in which the reviews are written, and the associated term/aspects and their polarities/sentiments with the reviews, respectively. The details on how the proposed approach performs for the annotating example are given in the following Section.

C. PROBLEM DEFINITION
A review r from a set of reviews R can be formalized as where, tr is the textual information of r, lr is the language of r, atr i are the terms/aspects of r, and pr i are the polarities/sentiments of atr i of r. For the example review 2 from the annotating examples presented in Section III-B, we have r e = < tr e , lr e , atr e i , pr e i > where, tr e , lr e , atr e i , and pr e i are ''Endroit sympa en plein centre touristique'', ''French'', ''Endroit'', and ''positive'', respectively. The proposed approach predicts the aspect-based sentiments of new reviews as either positive (noted as p), or negative (noted as n). Consequently, the automatic prediction of aspect-based sentiment of r could be defined a mapping function f : where, c is a suggested sentiment from a polarity set (p, n) against each aspect.

D. POS TAGS EXTRACTION
For each r, we extract the POS tags by exploiting spaCy 1 (a Python library). We exploit spaCy as it is an open-source library and provides a NLP operation for multi-lingual text. After extracting POS tagging information, a review r can be formalized as r = < tr, lr, where, tr pos contains the tagging information of r.
For the example review 2 from the annotating examples presented in Section III-B, we have r e = < tr e , lr e , atr e i , pr e i , tr e pos > where, tr e pos = Endroit/ NOUN, sympa/ ADJECTIVE, en/ PREPOSITION, plein/ ADJECTIVE, centre/ NOUN, touristique/ ADJECTIVE.

E. PREPROCESSING
The multi-language reviews contain irrelevant text, e.g., punctuation and stop-words. The input of such text into deep learning algorithms is an overhead in terms of memory and processing time. Therefore, we exploit spaCy to preprocess each r to avoid to make the proposed approach cost-effective. Our preprocessing steps remove punctuation and stop-words, and split into tokens. After preprocessing, a review r can be formalized as r = < tr, lr, where, w i represents the preprocessed tokens of r.
where, w e i are the preprocessed token as presented in Table 1.

F. DATA MAPPING
An effective data modeling is the most critical step for cross-lingual data mapping from one language (source language) l s to another language (target language) l t because word embedding vectors for common words of two different languages may not be similar. To handle such dissimilarity, we propose a data mapping technique by exploiting and combining the ground-truth bi-lingual dictionaries [38].
For the data mapping from l s to l t , we exploit bi-lingual dictionaries. Notably, if the source/target word is not listed in bi-lingual dictionary, both words will be considered identical. We map w i of a preprocessed review r into 4 languages. The mapping of w i can be formalized as where, all tokens from w i of l s is mapped asẂ i into l t . For the example review 2 from the annotating examples presented in Section III-B, Table 2 shows the mapping of tokens between two languages.

G. WORD EMBEDDING
To convert the tokens of l s , l t , mapping from l s to l t , mapping from l t to l s into vectors, and POS tags, we exploit multilingual BERT. Notably, we only consider 'noun', 'verb', 'adjective', and 'adverbs' among POS tags as they contains information about aspects and generate vectors for them.
For the example review 2 from the annotating examples presented in Section III-B, Table 3 presents the generated vectors from multi-lingual BERT. Fig. 2 depicts the composition of the deep neural networkbased classifier. Convolutional Neural Network (CNN) is exploited for the prediction of the aspect-based sentiment of R. We use CNN for the composition of the proposed model because of the following reasons: 1) the deep semantic relationships may be learned through CNN layers between input preprocessed words for the aspect-based sentiment classification; 2) CNN has the ability of parallel computation on modern powerful GPUs that reduces its training time [39]; and 3) the proposed model may avoid the exploding gradient problem of recurrent neural network [40], [41] by assigning different filter sizes.   In order to the training of the proposed deep learning model, we first concatenate the embeddings of preprocessed token of each source language review SE W i (Eq. 6) and mappings of SE W i into target language MEẂ i (Eq. 8) and pass into a CNN. Second, we input the POS tags (tr pos ) of SE W i (Eq. 6) and preprocessed tokens TE W i of target language to separate CNNs. Third, the output of each CNN is passed to separate dense layer to equalize them. We use three layers of CNN with settings: filter = 128, kernel size = 1 and activation = tanh. Fourth, the equalized outputs of all dense layers are passed to an divided attention layer. The divided attention is the psychological term to simultaneously paying attention to two or more factors. Notably, we exploit POS words, preprocessed target words, and aspect classes to performed multi-head attention. The divided attention layer also merges the given outputs by the merge layer [42]. Fifth, the output of the divide attention is given to the dense layer that fully connects the 128 neurons to those in the next layer. Finally, the output layer 2 neurons) map both inputs into a single output (prediction) that predicts the sentiment (p or n) of each aspect of r. We set the loss function for the proposed model as binary_crossentropy that computes the performance of a classification model. Notably, we evaluate the proposed deep learning model on different settings, i.e., epoch (5, 10, 15, and 20), batch size (16, 32, 64, and 128) and activation function (tanh, sigmoid, and relu), and find the optimal hyperparameters with epoch = 10, batch size = 16, and activation = tanh. We also incorporate the Pooling unit between the merge layer and divided attention layer to reduce the dimensions of the features.

IV. EVALUATION
In this section, the performance of the proposed approach is evaluated by comparing the proposed approach with the stateof-the-art approaches.

A. RESEARCH QUESTIONS
The following questions are investigated for the evaluation of the proposed approach.
• RQ1: Does the proposed approach outperform the stateof-the-art approaches? If yes, to what extent?
• RQ2: How do the bi-lingual dictionaries influence the proposed approach?
• RQ3: How does the divided attention influence the proposed approach?
• RQ4: How does the POS tagging information influence the proposed approach? To answer the research question RQ1, the proposed approach is compared with three state-of-the-art approaches: SPC-BERT [15], AEN-BERT [19], and LCF-BERT [32] to check performance improvement. We choose these approaches for comparison because these are recent approaches in aspect-based sentiment analysis.
To answer the research question RQ2, the performance results of the proposed approach (without divided attention) are compared with AEN-BERT which is best among the stateof-the-art approaches and does not use the bi-lingual mapping and divided attention to check the influence of bi-lingual mapping on the proposed approach.
To answer the research question RQ3, the performance results of the proposed approach are compared by excluding the divided attention to check its influence on the proposed approach.
To answer the research question RQ4, the performance results of the proposed approach are compared by excluding the POS tagging information to check its influence on the proposed approach.

B. DATASET
We exploit SemEval-2016 Task 5 dataset 2 created by Pontiki et al. [43]. It is a multi-lingual dataset for aspectbased sentiment analysis tasks, available in eight different languages: English, French, Dutch, Spanish, Turkish, Chinese, Arabic, and Russian. Moreover, the dataset contains reviews from seven domains. Although the dataset is available in eight languages and having reviews from seven domains, we only include reviews from ''restaurant'' in four languages (English, French, Dutch, and Spanish) in our experiments due to the limitations of bilingual dictionaries. The total number of selected reviews is 10351, in which approximately 25.85%, 23.45%, 22.19%, and 28.51% of reviews belong to English, French, Dutch, and Spanish, respectively.

C. PROCESS
We evaluate the proposed approach as follows. First, we exploit the multi-lingual reviews R from an open-source dataset and extract their POS tagging information as discussed in Section III-D. Second, we preprocess each review r from R as discussed in Section III-E. Third, the preprocess reviews (tokens) of l s are mapped into l t using bi-lingual dictionaries as discussed in Section III-F. Fourth, given the preprocessed information, we generate the input vectors for the proposed deep learning model using multi-lingual BERT as discussed in Section III-G. Finally, we carry out a cross-language validation on R. We divide R into four sets based on their language notated as l i (i = 1 . . . 4). For the i th cross-validation, we consider all reviews except for those in l i as a training dataset and consider the reviews in l i as a testing dataset. For the i th cross-validation, the evaluation process as follows: • First, all the reviews (R train ) are selected from training dataset that is a combination of all sets but l i . [1,4].j =i l j (9) • Second, we train SPC-BERT, AEN-BERT, and LCF-BERT with data from R t against each language.
• Third, we train the proposed approach (CNN-BERT) with data from R t against each language on different scales, i.e., with and without bi-lingual mapping and multi-lingual BERT, with and without divided attention, and with and without POS tagging information.
• Fourth, for each review R test i from the testing dataset, we predict its aspect-based sentiment using the trained SPC-BERT, AEN-BERT, LCF-BERT, and CNN-BERT to compare its original sentiment.
• Finally, we compute the evaluation metrics for each approach to compare their performances.

D. METRICS
Given the reviews R, we calculate the aspect-based sentiment specific precision Pre, recall Rec and f-measure F1 for the evaluation of the proposed approach as these metrics are well-known and have been used in previous studies [44], [45]. The metrics Pre, Rec, and F1 can be formalized as where, Pre, Rec and F1 present the precision, recall and f-measure of the approaches for aspect-based sentiment prediction of R whose actual aspect-based sentiment is as i .

VOLUME 9, 2021
TP is the number of R that are truly predicted as as i , FP is the number of R that are falsely predicted as as i , and FN is the number of R that are not predicted as as i but they are actually as i .

1) RQ1: COMPARISON AGAINST STATE-OF-THE-ART
To answer the research question RQ1, we compare CNN-BERT with state-of-the-art approaches (SPC-BERT, AEN-BERT, and LCF-BERT). The results of all approaches are presented in Table 4. The first column represents the languages of the training dataset, the second column represents the languages of the testing dataset for each cross-language validation, and columns 3-5, 6-8, 9-11, and 12-14 represent the performance of SPC-BERT, AEN-BERT, LCF-BERT, and CNN-BERT, respectively. The first row represents the evaluation metrics against each approach, rows 2-4, 6-8, 10-12, and 14-16 represent the performance of the approaches against each testing language dataset, and rows 5, 9, 13, and 17 represents the average performance of the approaches against all testing languages dataset. The table presents the best performance for each testing category in bold. From Table 4, the following observations are made. • Third, CNN-BERT has significant improvement in Rec and F1 on all testing categories against all training languages. However, CNN-BERT has slight reduction in Pre against SPC-BERT, AEN-BERT, and LCF-BERT on one testing category each, respectively (English against Spanish training language), i.e., the reduction on English is 4.92% = (61.41% − 58.53%) / 58.53%, (Dutch against Spanish training language), i.e., the reduction on Dutch and English is 1.97% = (50.06% − 49.09%) / 49.09%, and (English against French training language), i.e., the reduction on English is 2.76% = (59.92% − 58.31%) / 58.31%. The reason of such reduction is that the bi-lingual dictionaries (i.e., Spanish-to-English, Spanish-to-Dutch, and French-to-English) are not contextually rich.
The one-way ANOVA is applied on F1 to further check the performance of CNN-BERT. It compares the given approaches to computes the performance difference among them. The results of ANOVA are presented in Fig. 3. The results suggests that F > F crit , i.e., 21.16 > 2.76, and P value is 1.8E-09 that is less than 0.05 that indicates a significant difference among the F1 of the given approaches. Note that, we also apply ANOVA on Pre and Rec that confirms the significant improvement of CNN-BERT.
Based on the above analysis, we conclude that CNN-BERT significantly improves the state-of-the-art in aspect-based sentiment classification of reviews.

2) RQ2: INFLUENCE OF BI-LINGUAL DICTIONARIES
To answer the research question RQ2, we compare CNN-BERT (without attention) against the best state-of-theart approach (AEN-BERT). The results of the approaches are presented in Table 5. The first column represents the languages of the training dataset, the second column represents the languages of the testing dataset for each cross-language validation, and columns 3-5 and 6-8 represent the performance of AEN-BERT and CNN-BERT, respectively. The first row represents the evaluation metrics against each approach, rows 2-4, 6-8, 10-12, and 14-16 describe the performance of the approaches against each testing language dataset, and rows 5, 9, 13, and 17 represents the average performance of the approaches against all testing languages dataset. The table presents the best performance for each testing category in bold.
From Table 5, the following observations are made.
• First, the use of bi-lingual dictionaries (CNN-BERT without attention) improves the performance of the proposed approach. Compared to best state-of-the-art approach (AEN-BERT) that does not use bi-lingual dictionaries, the improvement of the proposed approach in average Pre,  Based on the above analysis, we conclude that bi-lingual dictionaries improve the aspect-based sentiment classification of reviews.

3) RQ3: INFLUENCE OF DIVIDED ATTENTION
To answer the research question RQ3, we compare the performances of CNN-BERT with and without divided attention. The results of the approaches are presented in Table 6. The first column represents the languages of the training dataset, the second column represents the languages of the testing dataset for each cross-language validation, and columns 3-5, and 6-8 represent the performance of CNN-BERT (without attention) and CNN-BERT (with attention), respectively. The first row represents the evaluation metrics against each approach, rows 2-4, 6-8, 10-12, and 14-16 describe the performance of the approaches against each testing language dataset, and rows 5,9,13, and 17 represents the average performance of the approaches against all testing languages dataset. The table presents the best performance for each testing category in bold.
From Table 6, the following observations are made.
• First, the divided attention in the proposed approach improves the performance of the proposed approach.  Based on the above analysis, we conclude that the divided attention significantly influences the proposed approach in aspect-based sentiment classification of reviews.

4) RQ4: INFLUENCE OF POS TAGGING INFORMATION
To answer the research question RQ4, we compare the performances of CNN-BERT without POS tagging information (i.e., verbs and adjectives). The results of the approaches are presented in Table 7. The first column represents the languages of the training dataset, the second column represents the languages of the testing dataset for each crosslanguage validation, and columns 3-5, 6-8, and 9-11 represent the performance of CNN-BERT (without verbs), CNN-BERT (without adjectives), and CNN-BERT, respectively. The first row represents the evaluation metrics against each approach, rows 2-4, 6-8, 10-12, and 14-16 describe the performance of the approaches against each testing language dataset, and rows 5,9,13, and 17 represents the average performance of the approaches against all testing languages dataset. The table presents the best performance for each testing category in bold. VOLUME 9, 2021    .51%, respectively. Based on the above analysis, we conclude that the POS tagging information significantly influences the proposed approach in aspect-based sentiment classification of reviews.

F. THREATS 1) THREATS TO VALIDITY
A threat to external validity is that only a limited number of reviews from the restaurant domain are considered (mentioned in Section IV-B) for the evaluation of the proposed approach. Although the performance of the proposed approach is significant for the selected reviews, the results may not hold for other domains. Notably, bi-lingual dictionaries are either not available for other domains are not contextually rich.
A threat to construct validity is that the aspect-based classification (i.e., labels) in the exploited dataset may be incorrect. Consequently, the results may not hold for false labeling or revised version(s) of the exploited dataset.
Another threat to construct validity is that the proposed approach is not evaluated on the external validation data as the experts from each language are required to validate results. Consequently, the results of such experiments decrease the performance of the proposed approach.
A threat to internal validity is that we replicate SPC-BERT, AEN-BERT, and LCF-BERT for the comparison/evaluation of the proposed approach. There could be some unseen coding issues. However, we verify the implementation and evaluation results to mitigate the threat.

V. CONCLUSION
Aspect-based sentiment classification for cross-lingual data is a challenging task due to the diversity of language structures. To perform effective aspect-based sentiment classification for cross-lingual data, a deep learning-based classifier is proposed. The proposed approach extracts POS tagging information, preprocesses the given reviews and tokenizes them, performs mapping of tokens from one language to another language, and generates vectors for the preprocessed reviews for the training and evaluation of the proposed approach. The results of cross-lingual validation suggest that the proposed approach significantly outperforms the stateof-the-art approaches and improves the precision, recall, and F1 by more than 23%, 20%, and 22%, respectively.
In future, we are intended to validate the proposed approach for multiple domains of cross-lingual data. Moreover, performing co-extraction of aspect-term and sentiment-polarity and finding the co-occurrences and dependency-relationships for cross-lingual data could be the key future directions.
QASIM UMER received the B.S. degree in computer science from Punjab University, Pakistan, in 2006, and the M.S. degree in net distributed system development and the M.S. degree in computer science from the University of Hull, U.K., in 2009 and 2013, respectively, and the Ph.D. degree from Beijing Institute of Technology, China. He is currently an Assistant Professor with the Department of Computer Sciences, COMSATS University Islamabad, Vehari Campus, Pakistan. His research interests include machine learning, data mining, and software maintenance. He is also interested in developing practical tools to assist software engineers. Korea. His research interests include cloud computing, peer-to-peer and mobile networking and computing, and distributed computing technology. VOLUME 9, 2021