Cross-Platform Reputation Generation System Based on Aspect-Based Sentiment Analysis

The active growth of Internet-based applications such as social networks and e-commerce websites leads people to generate a tremendous amount of opinions and reviews about products and services. Thus, it becomes very crucial to automatically process them. Over the last ten years, many systems have been proposed to generate and visualize reputation by mining textual and numerical reviews. However, they have neglected the fact that online reviews could be posted by malicious users that intend to affect the reputation of the target product. Besides, these systems provide an overall reputation value toward the entity and disregard generating reputation scores toward each aspect of the product. Therefore, we developed a system that incorporates spam filtering, review popularity, review posting time, and aspect-based sentiment analysis to generate accurate and reliable reputation values. The proposed model computes numerical reputation values for an entity and its aspects based on opinions collected from various platforms. Our proposed system also offers an advanced visualization tool that displays detailed information about its output. Experiment results conducted on multiple datasets collected from various platforms (Twitter, Facebook, Amazon $\dots $ ) show the efficacy of the proposed system compared with state-of-the-art reputation generation systems.


I. INTRODUCTION
Having easy access to the web has radically changed the way people interact with brands and products. From physical products to online services, people tend to instantly share their opinions and reviews on various platforms on the Internet. A recent research experiment 1 shows that consumers are more willing to share a review when the experience they have had evokes emotions, whether positive or negative. This large volume of consumers' reviews holds insightful information about the quality of the product/service, therefore analyzing them will help consumers make a better judgment toward the targeted item. In the past few years, a new subfield of natural language processing (NLP) called reputation generation has been well-established as an area of interest. The main focus of reputation generation systems is to produce a numerical value in which an entity is held based on mining customer reviews and their numerical ratings.
The associate editor coordinating the review of this manuscript and approving it for publication was Jad Nasreddine . 1 https://business.trustpilot.com/reviews Over the last decade, many reputation generation systems have been proposed [1]- [8] to generate and visualize reputation of online products and services based on fusing and mining textual and numerical reviews. However, these systems have not taken into consideration (1) extracting and processing reviews from various platforms, (2) filtering reviews written by potential spammers, (3) generating a numerical reputation value toward each aspect of the target product, and, (4) providing an advanced reputation visualization tool for a better decision-making process. Thereby, we designed and built an upgraded reputation generation model that overcomes the shortcomings of the previous systems in order to compute and visualize the reputation of an entity (product, movie, hotel, restaurant, service) with consistent reliability.
The proposed system collects and processes data from both e-commerce and social media platforms. Then, a spam filtering system is applied to eliminate spam reviews and prepare the cleaned output for aspect-based sentiment analysis (ABSA), where aspects of the target entity are extracted from the reviews with their sentiment polarities. Later, the time and popularity features of the reviews are exploited along with VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the ASBA results to finally generate a reputation value of each aspect of the target entity as well as the overall reputation value using mathematical formulas. The system also proposes an analytical dashboard that displays in-depth information about the reputation of the target entity.
In this manner, this study addresses the following research question: with the consideration of review popularity, review time, spam filtering, and ABSA, can the proposed reputation model offer better results in terms of generating and visualizing reputation than state-of-the-art (SOTA) systems?
This paper is organized as follows. Section 2 presents the related work concerning the previous reputation generation systems as well as the ABSA models. Section 3 presents the preliminaries. Section 4 describes our proposal. Section 5 details the experiments. Section 6 presents the discussion. And finally, Section 7 concludes this paper.

II. RELATED WORK
This section reviews the work done in the field of ABSA, and reputation generation systems based on NLP techniques.

A. ASPECT-BASED SENTIMENT ANALYSIS
Sentiment Analysis (SA), also defined as opinion mining is one of the rapidly growing research areas in the past few years [9] that aims to extract the polarity of an entity. SA can generally occur at different levels: Document-level [10], sentence-level [11], and aspect-level [12]. This sub-section will focus on ABSA since it is applied in this paper. ABSA identifies the aspects in the given textual review about a product/service and determines which class of sentiment those aspects belong to. ABSA can be categorized by two main processing phases: Aspect extraction (AE), and Aspect polarity classification (APC). The first phase deals with the extraction of aspects, which can be aspect terms [13], explicit aspects [14], and implicit aspects [15]. The second phase sentimentally classifies the predefined aspects into positive, negative, or neutral. In [16], the authors were the first to propose a set of techniques for mining and summarizing product reviews based on NLP. Their main objective was to provide a feature-based summary of a large number of customer reviews of an online product. They started by mining product features that have been expressed by the customers using the association rule mining algorithm [17]. Next, they identified the opinion sentences in each review in order to determine the polarity of each opinion sentence. Finally, they produced a summary using the discovered information. Further, in [18], Poria et al. presented the first deep learning approach for the AE task in opinion mining. The authors employed a 7-layer deep convolutional neural network to tag each word in the textual opinions as either aspect or non-aspect word. The authors also proposed a set of heuristic linguistic patterns and integrated them with the deep learning classifier which significantly improves the accuracy compared with the previous SOTA methods. In [19], the authors proposed an attention-based long short-term memory (LSTM) [20] for aspect-level sentiment classification. The idea is to learn aspect embeddings and let aspects participate in computing attention weights. The proposed model can focus on different parts of a sentence when different aspects are given so that they are more competitive for aspect-level classification. The proposed model achieved better results compared with the standard LSTM on the SemEval 2014 Task 4 dataset [21]. In [22], Wei and Toi improved the deficiencies of the previous LSTM approaches by proposing convolutional neural networks [23] and gating mechanisms (GCAE) based model, which has been proved to be more accurate and efficient. The novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the provided aspect or entity. The architecture of the proposed model is much simpler than the attention layer used in the previously existing models. The experiments on SemEval datasets show a performance improvement compared with the LSTM based models. The authors in [24] proposed an interactive multi-task learning network (IMN) capable of jointly learning multiple related tasks simultaneously at both the token-level as well as the document-level. The IMN introduces a message passing mechanism that allows informative interactions between tasks, enabling the correlation to be better exploited. Experiments on three benchmark datasets, taken from SemEval2014 and SemEval 2015 [25] show that IMN outperforms other baselines by large margins. Since most existing methods ignore the position information of the aspect when encoding the sentence, authors in [26] proposed a hierarchical attention-based position-aware network (HAPN), which includes position embeddings to learn the position-aware representations of sentences to generate the target-specific representations of contextual words. HAPN achieved the SOTA performance on SemEval 2014 dataset compared with the previous methods. Xu et al. [27] presented a review reading comprehension (RRC) task where they adopted BERT [28] as a base model, and proposed a joint post-training and finetuning approach for ATE, APC. Experimental results show that the proposed post-training approach is very effective. Later in [29], the authors proposed a novel architecture called BERT Adversarial Training (BAT) to employ adversarial training for AE and APC by generating artificial data which is carried out in the embedding space. The proposed model outperforms the standard BERT as well as the in-domain post-trained BERT in both AE and APC tasks. In [30], the authors exploit domain-specific BERT language model finetuning in addition to supervised task-specific finetuning to produce a new SOTA performance on the SemEval 2014 Task 4 restaurants dataset. The authors also showed that cross-domain adapted BERT model performs better than strong baseline models such as XLNet-base [31] and vanilla BERT-base. In [32], the authors compared the induced trees from pre-trained models and the dependency parsing trees on various popular models for the ABSA task. They found that the induced tree from finetuned RoBERTa [33] (FT-RoBERTa) outperforms the parser-provided tree. The experiments show that the RoBERTa-based model can outperform or approximate the previous SOTA performances on six datasets across four languages including SemEval 2014 task 4. Recently, authors in [34] proposed a multi-task learning model named LCF-ATEPC for ABSA based on the multi-head self-attention and the local context focus (LCF) [35] mechanisms. The proposed model is multilingual and applicable to the classic English review SA task, such as the SemEval-2014 task4. The proposed model can automatically extract aspects and determine their sentiment polarities. Since the LCF-ATEPC model currently achieves SOTA performance on AE and APC tasks, 2 it was selected to be employed in this paper.

B. REPUTATION GENERATION
The Oxford Learner's Dictionaries 3 defines reputation as ''the opinion that people have about what someone or something is like, based on what has happened in the past''. In the 21st century, several reputation systems have been proposed to compute a satisfaction score toward various online items [36]- [42] including movies, TV shows, hotels, and products. Surprisingly enough, these systems have relied only on numerical reviews (ratings) during reputation computation and have disregarded the exploitation of textual reviews until 2012 when Abdel-Hafez et al. [1] designed a reputation model that uses reviews expressed in natural languages rather than users' ratings to compute a realistic reputation value for every feature of the product and the product itself by incorporating opinion orientation and opinion strength (opinion mining), nevertheless, no evidence has been provided to support the efficiency of their product reputation system. Yan et al. [3] proposed the first system that combines opinion fusion and semantic analysis to generate and visualize reputation toward Amazon's products. This system has been lately improved in [4] by adding a binary sentiment classification step before the opinion fusion and grouping phase. Benlahbib and Nfaoui [6] designed and built a reputation model that considers review time, review helpfulness, and review sentiment intensity during reputation computation and visualization. Elmurngi and Gherbi [5] proposed a system that computes reputation scores from users' feedback based on a SA model. The reputation score of a product is the ratio of the number of positive reviews over the total number of reviews toward this product. The same idea was adopted in [43], [44]. In [45], the authors presented a reputation system dedicated to movies and TV shows. The model integrates fine-grained opinion mining (Multinomial Naïve Bayes classifier trained on the SST-5 dataset [46]) and semantic analysis (Embeddings from Language Models (ELMo) [47]) to generate a realistic reputation value from user's reviews. Recently, Boumhidi and Nfaoui [8] proposed the first system that generates a reputation value toward various entities (movies, products, hotels, and restaurants) from user-generated data posted on Twitter microblogging website. The system applied a Bidirectional Encoder Representations from Transformers (BERT) classifier to extract the sentiment orientation of the textual tweets. Next, a sentiment intensity score is calculated from the positive tweets. Finally, the proposed system incorporated the previous results with a computed popularity score from the extracted Twitter features (number of followers, account authenticity, number of likes, number of retweets) to generate a single numerical reputation value between 0 and 10. TABLE 1 summarizes the opinion mining techniques exploited during the reputation generation and visualization process for the previous reputation systems and for ours.

III. PROBLEM STATEMENT
The purpose of this research is to compute a numerical reputation value for a specific entity as well as the reputation of each aspect of that entity. Let's assume that the target entity is a phone product, the goal is to generate the overall reputation value for the phone e.g., ''7/10'' based on textual reviews collected from various platforms, as well as generating a reputation value for the aspects of the phone e.g., ''camera: 5/10'', ''design: 9/10'', etc. The collected set of reviews R j = {r 1jk 1 , r 2jk 2 , . . . , r mjk p } for an entity E j posted by a set of users U j = {u jk 1 , u jk 2 , . . . , u jk p } is passed through the process of review spam filtering where the output results is a free-spam set of reviews R j = {r 1jk 1 , r 2jk 2 , . . . , r njk o } (n ≤ m), that will be passed to an ABSA model named LCF-ATEPC for the purpose of extracting the entity aspects and their sentiment orientations for each review in the R j set. Similar aspects are then grouped jointly with their polarities, and by using mathematical formulas, the previous results are incorporated with a calculated set of review time scores TSR j = {tsr 1j , tsr 2j , . . . , tsr nj } and a set of review popularity scores PSR j = {psr 1j , psr 2j , . . . , psr nj } to finally generate reliable and trustworthy reputation values. The set of review popularity scores PSR j is computed based on the set of review likes Lj = {l 1j , l 2j , . . . , l nj } and the set of review shares Sj = {s 1j , s 2j , . . . , s nj }.

IV. PROPOSED APPROACH
This section is divided into eight subsections that describe the architectural overview of the proposed system, data collection and processing, opinion spam detection, aspect extraction and classification, popularity score calculation, time score calculation, reputation generation, and finally, reputation visualization, respectively.

A. SYSTEM OVERVIEW
This system aims at generating a reputation value toward online entities (movies, hotels, restaurants, services, etc.) and computing a satisfaction score toward each aspect of the target entity by processing textual and numerical data collected from multiple platforms. FIGURE 1 presents its architecture. First, we start by collecting users' reviews from different platforms such as Twitter, Amazon, YouTube, etc. Next, an automatic spammers filtering system is employed to detect and eliminate unwanted spam reviews. Then, we apply a SOTA ABSA model to users' textual reviews in order to compute a score based on the sentiment orientation of the extracted aspects from those reviews. Further, we calculate a popularity score and a time score based on statistical features extracted with the textual reviews. Finally, we compute a reputation value based on the previously calculated scores, and we propose a new user-friendly visualization interface that displays in-depth details about the reputation of the target entity.

B. DATA COLLECTION AND PREPROCESSING
One of the important features of the proposed system is the ability to collect and process data from various platforms. Previous reputation generation systems gather necessary data from either e-commerce websites such as Amazon, TripAdvisor, or social media platforms such as Twitter and Facebook. In this work, we decided to normalize the features of all platforms in order to create a single merged dataset by classifying the platforms on the Internet into two types: the first type provides the accessibility of extracting the textual review with the number of likes received for that review such as Amazon, YouTube, etc. The second type provides the accessibility of extracting the textual review with the number of likes received for that review along with the number of times the review was shared among the network such as Twitter, Facebook, etc.
By using web scraping tools, we have collected data from both types of platforms to create a merged dataset that contains: ''user name'', ''review text'', ''review time'', ''review likes'', ''review shares'', and ''review host''. We have assigned ''0'' to ''review shares'' for reviews that are extracted from platforms of the first type since they only provide the number of likes.
The textual reviews are cleaned using NLP techniques (text normalization, lower-casing, noise removal, etc.).

C. OPINION SPAM DETECTION
One of the main drawbacks of opinion-sharing platforms is that anyone from anywhere in the world can post reviews about products or services without any boundaries. Opinion spammers aim to manipulate customers' opinions by either promoting or demoting the reputation of the target entity, thereby misleading the consumers [53]. Filtering and eliminating spam reviews is very critical for our reputation system in order to produce a trusted and reliable reputation value that leads to a safe decision-making process for the customers. Commercial review hosting sites e.g. Amazon and Yelp have already put through remarkable progress in detecting spam reviews [54]. However, since we are collecting peoples' opinions from multiple platforms, we choose to detect spam reviews using two normalized spammer behavioral features [55]. Notations used in this sub-section are listed in TABLE 2.

1) AUTHOR CONTENT SIMILARITY (CS)
Spammers regularly post reviews that are identical or near-identical to their previous reviews, since it is a time-consuming activity for them to write a new spam review each time. Thus, we identify spammers by calculating the similarity of their reviews, where we determine a pair-combination of reviews from the set of reviews R jk posted by author k without repetition, then we converted them into vectors using pre-trained BERT model from Huggingface, 4 and we apply the cosine similarity of each pair of reviews from the combination set CP(R jk ). We obtained a single numerical score between 0 and 10 after we calculated the average of all the results obtained from the cosine function of each pair using Equation (1). This score is going to be used to calculate a spammer behavior score later in this sub-section.

2) USER NUMBER OF REVIEWS FREQUENCY (UNRF)
Posting too many reviews about the same entity is not considered normal behavior for genuine reviewers. A recent study shows that only 5-8% of spammers have a lesser ratio of posting reviews in a single day [56]. Our collected dataset may contain many reviews of the same user; therefore, we proposed Equation (2) to calculate the frequency of the number of reviews posted by user k toward the target entity j.
Based on the two spammer behavioral features suggested previously, Equation (3) is used to calculate the spammer score. Each author is assigned a label from the set L = {normal, spammer} by comparing the spammer score with a predetermined threshold τ described in section 5. The label ''normal'' is used for regular reviewers, and ''spammer'' is used for spammers reviewers. Equation (4) is used to label each user.
Whenever a user is labeled as a spammer, all of his reviews are going to be eliminated from the dataset. The newly cleaned spammers-free dataset is now ready for the next step of the proposed reputation system.

D. ASPECT-BASED SENTIMENT ORIENTATION SCORE
The goal of this sub-section is to extract the aspects from the reviews and predict their sentiment polarities. For example, given a product review: ''The camera on this phone is good but the design is bad,'' the ABSA model needs to extract the aspects ''camera'' and ''design'', and correctly determine their polarity. In this review, the consumers' opinions on ''camera'' and ''design'' are positive and negative respectively. Therefore, we employed a multi-task learning model for ABSA, namely LCF-ATEPC. 5 This model combines APC task and aspect term extraction task (ATE), which implies that it is capable of extracting aspect term and inferring aspect term polarity. The input sequences are tokenized into separate tokens and each token is assigned two kinds of labels. The first label indicates whether the token belongs to an aspect, the second label marks the polarity of the tokens belonging to the aspect. LCF-ATEPC model integrates the pre-trained BERT model and applies self-attention and local context focus concept (LCF) to ABSA. FIGURE 2 depicts the network architecture of LCF-ATEPC. Its main components are described as follows: • BERT-Shared Layer: LCF-ATEPC model deploys two independent BERT-Shared layers to extract local and global context features BERT l , and BERT g respectively. Both BERT-Shared layers are regarded as embedded 5 https://github.com/yangheng95/LCF-ATEPC layers, and the fine-tuning process is conducted independently according to the joint loss function of multi-task learning.
• Multi-Head Self-Attention (MHSA): The multi-head attention mechanism helps the model to learn the words' relevant information in different presentation subspaces.
MHSA is based on multiple scale-dot attention that can be used to extract deep semantic features in the context. MHSA can avoid the negative influence caused by the long-distance dependence of the context when learning the features.
• Local Context Focus: Local context is a new technique that can be adapted to most fine-grained NLP tasks. The determination of local context relies on semantic-relative distance (SRD), which is used to identify how far a token is from the aspect in order to determine whether the context word belongs to the local context of a targeted aspect. Context-features Dynamic Mask (CMD) layer is employed to mask non-local context features learned by the BERT l layer. With the CDM layer deployed, only the features of the lesssemantic-relative context itself on the corresponding output position will be masked. The correlative representations between less-semantic-relative context words and aspects are reserved on corresponding output positions. In addition to (CDM) layer, another Context features Dynamic Weighted (CDW) layer is employed to focus on local context words. The goal of CDM is to drop the features of the non-local context completely. The features of a semantic-relative contextual word are retained intact while the features of less-semantic relative context will be weighted decay based on their SRD concerning a targeted aspect.
• Aspect Polarity Classifier: To perform the sentiment polarity classification, the LCF-ATEPC model combines the local context features and the global context features. Then, the aspect polarity classifier performs a head-pooling on the learned concatenated context features from the feature interactive learning layer. The Softmax function is applied to the hidden states on the corresponding position of the first token in the input sequence, in order to predict the sentiment polarity of the aspect.
• Aspect Term Extractor: Aspect term extractor performs the basic token-level classification for each token, which means that each token will be given a label, and a classification is performed to predict the aspects in the sentence. Authors in [34] trained LCF-ATEPC model on commonly used ABSA datasets, including the Laptop and Restaurant datasets of SemEval-2014 Task4, and ACL Twitter social dataset. However, they trained the model on those datasets separately. In this paper, we trained the model on a mixed dataset of the three previously mentioned datasets in order to allow our system to treat reviews of different domains. LCF-ATEPC model achieved SOTA performance on the SemEval-2014 Task4, and it will be employed to perform ABSA later in this paper.

E. REPUTATION GENERATION
Since we are dealing with opinions shared on the Internet, the reputation of an entity is formed and influenced by various factors that shift public opinion over time. These factors include the popularity and reliability of the user sharing the online review, the time the review was posted online, and finally, the sentiment orientation of aspects of an entity within the online review. This section aims to introduce those three factors, and to calculate for each factor a numerical score that will be exploited to generate the final reputation value for a specific entity. VOLUME 10, 2022

1) REVIEW POPULARITY SCORE
It is well known that social media and e-commerce platforms contain a huge amount of user reviews toward various online products. However, those reviews have different weights in affecting the opinion of other users as well as the reputation of the entity. A statistical study 6 was done in 2017 shows that 93% of consumers admit that online reviews influenced their purchase decisions, and 92% of consumers are more likely to purchase after reading a trusted review. In fact, what makes a review more influential than others is the amount of engagement it received from other users. Since we are dealing with reviews collected from multi-platforms on the Internet, we have chosen two types of engagement features: Likes and, Shares. Likes represent a direct reflection of user preferences. If a user likes a review, he is more likely to be interested and approved of that review. However, by sharing a review, users essentially endorse the content to all of their followers.
The goal of this sub-section is to calculate a popularity score for each user's review in order to differentiate between those reviews based on the amount of engagement received from likes and shares. As mentioned previously in the paper, our system collects data from two different types of platforms, the first one provides just the text and like features, the second type provides the text, like, and share features. For example, 'TripAdvisor' is considered as a platform from the first type, that doesn't provide a share feature to the users, which means that the review can not be shared within the network, and it will have an assigned value of zero in our collected dataset for the attribute ''review shares''. Therefore, we designed Equation (5) to calculate the popularity score for each review in our dataset. We multiplied the values of likes and shares in the equation with 0.5 in order to fit the calculated popularity score into an interval between 0 and 1. Notations used in this sub-section are listed in TABLE 3.
The result is a numerical value between 0 and 1 calculated for each review in order to indicate its popularity. A higher popularity score is an indicator that the review is influential. Those popularity scores are going to be used to compute the reputation for the target entity.

2) REVIEW TIME SCORE
One of the first things people do when reading a product review is to check the posting time of that review, and frequently focus on the most recent reviews. The date of the 6 https://www.podium.com/resources/podium-state-of-online-reviews/ reviews plays a major role in instilling confidence in potential customers since the ownership of a business can change, branding can change, and products and services are constantly evolving. Older reviews carry much less weight with consumers as shown in a recent study done by Brightlocal, 7 where about 85% of consumers consider any old review to be irrelevant, this can impact the computation of reputation for a product/service. However, time does not affect every product in every domain. An old review on the Internet about certain products like cheese or some type of old classic movies can still be relevant today. This means the date of the review doesn't matter in this situation. In this sub-section, we proposed Equation (6) to calculate a review time score. The result of the proposed equation is a numerical score between 0 and 1 where most recent reviews will receive a score close to 1 and vice versa. Notations used in this sub-section are listed in TABLE 4. We mentioned that this feature will be optional in the proposed reputation system. The user can choose to discard the aspect of time, which means that the review time score will not be employed when generating the reputation for certain products or services.

3) REVIEW SENTIMENT ANALYSIS
We employ the LCF-ATEPC model to the spam-free review dataset for the purpose of extracting the aspects from each review with their associated sentiment polarity. Then, we group the same extracted aspects for all the reviews with their sentiment polarity. The popularity score calculated previously for each review will also be assigned to its extracted aspects.  Our system compute a sentiment score ssasp ij for each aspect asp ij based on their positive and negative sentiment orientation using Equation 7. Notations used in this sub-section are listed in TABLE 5. Aspects with neutral sentiment  orientation will be dismissed and not going to be considered when calculating the ssasp ij , since the user was sentimentally unbiased towards those aspects.
Based on the previously calculated features, the proposed system computes a reputation value for each aspect using Equation (8). The sentiment score ssasp ij and the average time scores are multiplied by 9 in order to obtain a number bounded between 0 and 9, then we aggregate the result with a customized average of negative and positive popularity scores , which is bounded between 0 and 1. The final result is a numerical value between 0 and 10 that represents the reputation of an aspect asp ij . Notations used in this sub-section are listed in TABLE 8.

Rep(asp ij
In order to generate the overall reputation for an entity, the system calculates the average of all aspects' reputation values using Equation (9).

G. REVIEW SCORE
The review score is a value calculated based on the popularity and time scores using Equation (10), and it is used to determine the most influential review. This score is not considered in the reputation value computation, and it is only employed during reputation visualisation in order to determine the most influential posting review.
We denote: rs ij : score of review i toward entity j

H. REPUTATION VISUALIZATION
Our system provides an advanced efficiently-designed userfriendly interface compared with the previous reputation systems. The interface displays all details regarding the reputation of a specific product such as the overall reputation value of an entity, aspects reputation values, most reviewed aspects, and finally, the most influential reviews. As we can see in FIGURE 3, the proposed visualization tool is an interactive user interface where the user can obtain more detailed information about a specific feature, by placing a cursor over its position on the display using a pointing device, and it is then initiated by clicking. The proposed reputation visualization tool will help the users to have better insights toward the targeted product/service, and therefore supporting them during their decision-making process.

A. EXPERIMENTAL DATA COLLECTION AND PREPROCESSING
Four experimental review datasets were collected where each dataset belongs to a different domain (product, movie, hotel, restaurant). Every dataset contains a combination of reviews from various social media and e-commerce platforms, and each review includes a textual opinion expressed by the user, user name, review posting year, number of likes, number of shares, and the platform host. We hired four human  annotators to manually label the four datasets by extracting and identifying the polarity of each aspect in the reviews .  TABLE 9 shows the statistical information about the evaluation dataset while TABLE 10 shows review samples from one of those datasets. All the textual reviews of the dataset are cleaned and pre-processed by removing URLs, punctuations, special characters, and replacing slang words with formal ones. Finally, we prepare the cleaned textual reviews for the LCF-ATEPC model by performing tokenization and adding special tokens.

B. OPINION SPAM DETECTION
Due to a lack of availability of spam review datasets, an evaluation dataset with 1000 reviews was manually collected from various platforms on the internet. We hired annotators to manually annotate each user into two possible classes (Normal/Spammer) based on their review posting behavior. The outcome of this procedure results in identifying 682 genuine reviews and 318 spam reviews in our evaluation dataset. The spam review detection using behavioral features of the spammer has two phases: (1) calculating the spammer score Score(a) based on two spammer behavioral features CS and MNR.
(2) Evaluating the performance of the proposed spam review detection model by varying the value of the threshold from 0.50 to 0.68 with a step of 0.01, and using precision, recall, and accuracy as evaluation measures. TABLE 7 shows that the threshold value τ = 0.57 leads to the best performance in term of accuracy.

C. ASPECT-BASED SENTIMENT ANALYSIS 1) TRAINING DATASETS & HYPERPARAMETERS
LCF-ATEPC was trained on a merged dataset that we created from three most commonly used ABSA datasets, the Laptops and Restaurant datasets of SemEval-2014 Task4 and an ACL Twitter dataset [57]. The original three datasets were reformated, and each sample was annotated with the Insideoutside-beginning (IOB) labels for ATE and polarity labels for APC tasks respectively. The polarity of each aspect may be positive, neutral, or negative.

2) MODEL COMPARAISON
We have compared the LCF-ATEPC model with the following SOTA methods: • AEN-BERT [58]: which is an attentional encoder network that employs the pre-trained BERT model to solve the APC.
• BERT-BASE [28]: which is the original pre-trained model. It was adapted to the aspect-based sentimentanalysis in order to automatically extract aspect terms and classify aspects' polarity.    • BERT-SPC [58]: which is a fine-tuned BERT designed for text pair classification, and it is adapted to solve the aspect-based sentiment-analysis task.

3) LCF-ATEPC MODEL'S REUSLTS ON THE EVALUATION DATASETS
The pre-trained LCF-ATEPC model was applied to the four evaluation datasets in order to assess the performance of ATE and APC. FIGURE 4 displays the F1-score obtained for each dataset. We can observe that the model achieves good results in extracting the aspects and predicting their sentiment orientation. Interestingly enough, we can see that the model performed well on the movie and hotel datasets even though it was trained on the mixed dataset which doesn't contain any reviews related to the previously mentioned domains.

D. REPUTATION VISUALIZATION
Our proposed reputation generation system provides a detailed visualization concerning the reputation of a specific item. The output visual representation makes it easier to identify new insights about the target entity. The designed dashboard provides the reputation value of each aspect, the overall reputation of the entity, the most influential reviews as well as other statistical details. The Dashboards can be used by users in order to make data-driven business decisions. Compared with the previous reputation generation systems [3], [4], [6], our proposed system offers more advanced details about the reputation visualization of a specific entity.

E. SYSTEM EVALUATION
Previous reputation generation systems have focused on generating the reputation from either social media or e-commerce websites, relying on the overall sentiment of the collected reviews. In this paper, we proposed an advanced cross-platform system that: 1) extracts and adjusts users' data from various platforms at the same time, which allows it to generate a reliable reputation value. 2) incorporates a spam filtering mechanism to remove reviews written by potential spammers. 3) employs the aspect-based sentiment-analysis technique in order to extract aspects related to the target entity and predict their sentiment polarities, which allow us to generate the reputation of each aspect using mathematical formulas. 4) considers other important features such as the time feature and the popularity of the people sharing their opinions in order to increase the reliability of the generated reputation value. TABLE 16 reveals the differences between previous reputation generation systems and our proposed system.
In order to evaluate the reliability of our system's output and the effectiveness of its components, and due to the nonexistence of standard evaluation metrics for this kind of systems, we followed the same procedure used in Experimental results (%) of the LCF-ATEPC model.F 1 ate , Acc apc and F 1 apc are the macro-F1 score of aspect term extraction (ATE) subtask, accuracy and macro-F1 score of the aspect polarity subtask. The unreported experimental results are indicated by ''-''. The '' '' means the F1 score of the ATE task is not available for the BERT-SPC input format. The optimal performances are in Bold.   several works including [6], [59]. We have invited back the same 32 users from [6] that belong to different backgrounds (TABLE 17) for the reason of evaluating the effectiveness of 4 reputation generation systems: system 1 (our reputation system), system 2 [6], system 3 [4] and system 4 [3]. The volunteers were asked to assign a satisfaction score between 0 and 10 to each system based on its efficiency and helpfulness. To increase the validity of the experiments, we invited three different experts to rate and judge each reputation system. TABLES 17 and 18 present information about the participants.
Each Volunteer rated the four systems based on their efficiency and helpfulness in supporting them during the decision making process while asking the question of ''which system is more reliable and helpful?''. In TABLE 19, we calculated the average of all ratings provided by the users for each  system µ = 1 n n i=0 x, where x 1 , x 2 , . . . , x n are the observed ratings and n is the total number of ratings. We also measured the coefficient of variation (CV), which is the standard deviation divided by the mean times 100% as shown in Equation 11.
We denote: σ : Standard deviation As we can see, our system was higher-rated among the other systems based on the average rating. Moreover, our proposed system is the only one to get the perfect rating (10 out of 10) from 10 different users. With an average rating of 9.33, our proposed system is ranked first in comparison with the others. System 2 got the second higher rating with 7.78. In third place, we have system 3 with an average rating of 6.63. Finally, system 4 came in last with an average rating of 6.37, which is close to system 3 since both systems only use the sentiment or semantic features for generating the reputation on an entity. FIGURE 5 displays the ratings given by the users.
We also calculated and compared the coefficient of variation of the users' ratings for each system in order to measure the spread of the ratings. If the ratings all lie close to the mean/average, then the percentage of the coefficient of variation will be small, while if the ratings are spread out over a large range of values, then the percentage coefficient of variation will be large. This will help us determine if the ratings given for each system are balanced. As we can see in TABLE 19, the coefficient of variation of the group of ratings for our proposed system is the lowest compared to the other systems with a value of 6.12%.
In addition to the 32 voluntary users, we also asked three experts (TABLE 18) to rate the four systems based on their helpfulness and functionality. The results are shown in TABLE 20. As we can see, all the experts favor our proposed reputation generation system by giving it a higher rating score compared with the other systems. With an average rating of '8.83', our proposed system takes first place, then system 2 in second place with an average rating of '7.5', next is system 3 in third place with an average score of '6.83', and finally system 4 in the last place with an average rating VOLUME 10, 2022  of '6.0'. FIGURE 6 shows a comparison between users' and experts' average ratings, which indicates that our system is more reliable to generate and visualize reputation compared with the previous systems. We also asked the experts to share a review where they expressed their opinions about the proposed system, which is presented in TABLE 21.

VI. DISCUSSION
The system proposed in this paper can be defined as an advanced decision-making tool, capable of producing numerical values that reflect the reputation of an entity (products, services, movies, hotels, etc.) from opinions and reviews shared on the internet. The proposed system is the first to deal  with opinions from various platforms by having the ability to process features of different platforms with high flexibility. The proposed reputation system is also the first to integrate an opinion spam filter, that detects and eliminates spam opinions based on the characteristics of spammers' behaviors, making our system more secure from spammers attacks, which leads to the generation of reliable reputation values. Furthermore, one of its main components is the ability to extract and analyze aspects of the target entity using SOTA aspect-based sentiment-analysis tools. The system additionally incorporates the opinions posting time and popularity features for the purpose of generating reputation, which makes it more reliable and trustworthy. A visualization tool is proposed, where the detailed output results of the whole reputation generation procedure are displayed in an interactive user-friendly interface, which will facilitate the online decision-making process for both regular users and business owners.

VII. CONCLUSION
In this paper, we proposed a reputation system capable of generating numerical reputation values for a specific item (product, movie, service, hotel, etc.) and its aspects based on opinions and reviews expressed online. The contribution of this work revolves around four components that were not exploited in previous systems. The first one is cross-platform compatibility, where the proposed system can collect and process opinions from different platforms (Facebook, Amazon, Twitter, TripAdvisor, etc.) as well as managing and standardizing those platforms' features. The second one is opinion spam filtering, where the spam opinions are detected and eliminated based on spammers' behavior features, keeping only authentic opinions. The third one is employing a SOTA aspect-based sentiment-analysis model named LCF-ATEPC in order to extract and analyze the aspects within the textual opinions. Finally, we incorporated the previous results with a calculated review time score and review popularity score using mathematical formulas to obtain a reputation value for the targeted entity as well as the reputation values of the entities' aspects. In addition, a holistic reputation visualization is provided within the system that displays the detailed output results of the reputation generation process. To assess the effectiveness of our reputation system, we invited 32 participants and 3 experts to choose the best performing system out of four SOTA reputation systems by giving numerical satisfaction scores to each system. Our reputation system achieved the highest average satisfaction scores from both users and experts. In the future, we propose to investigate the effectiveness of our proposed system by attempting to generate more than the numerical reputation values, such as extending the system to automatically generate a textual summary of the benefits and drawbacks of the targeted entity. Also, we intend to extend this system to be used in multilingual content.
ACHRAF BOUMHIDI is currently pursuing the Ph.D. degree with the Faculty of Science Dhar El Mahraz, Fez, Morocco. His research interests include surrounding natural language processing (NLP) and social network analysis (SNA) for decision making in social media platforms.
ABDESSAMAD BENLAHBIB received the Ph.D. degree in computer science. He has published several papers in journals and conferences in the area of computer science, such as IEEE ACCESS, Journal of Organizational Computing and Electronic Commerce, International Journal of Electrical and Computer Engineering, and SemEval. His research interest includes the application of natural language processing techniques to support customers during their decision-making process in e-commerce platforms.
EL HABIB NFAOUI (Member, IEEE) received the joint Ph.D. degree in computer science from the University of Sidi Mohamed Ben Abdellah, Morocco, and the University of Lyon, France, in 2008, under a Cotutelle Agreement (doctorate in joint-supervision), and the HU Diploma degree (accreditation to supervise research) in computer science from the University of Sidi Mohamed Ben Abdellah, in 2013. He is currently a Professor of computer science with the University of Sidi Mohamed Ben Abdellah, Fez, Morocco. He has published in international reputed journals, books, and conferences, and has edited seven conference proceedings and special issue books. His current research interests include information retrieval, language representation learning, machine learning and deep learning, web mining and text mining, semantic web, web services, social networks, and multi-agent systems. He is a Co-Founder and the Chair of the IEEE Morocco Section Computational Intelligence Society Chapter. He is a Co-Founder and an Executive Member of the International Neural Network Society Morocco Regional Chapter. He co-founded the International Conference on Intelligent Computing in Data Sciences (ICSD2017) and the International Conference on Intelligent Systems and Computer Vision (ISCV2015). He has served as a reviewer for scientific journals and as program committee of several conferences. VOLUME 10, 2022