To Cluster or Not to Cluster: The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering

Collaborative filtering (CF) is one of the most widely utilised approaches in recommendation techniques. It suggests items to users based on the ratings of other users who share their preferences. Thus, one of the aims of CF is to find reliable neighbours. Typically, CF produces a sparse user-item rating matrix, when relying only on the ratings to identify the precise neighbours, resulting in poor performance. User reviews can be essential in overcoming those situations because of the diverse elements available in reviews. The most popular element is aspects, which can provide a fine-grained analysis of users’ behaviours, thus improving personalised recommendations. However, increasing the number of aspects also results in sparsity, therefore may deteriorate the recommendation performance. As a result, clustering of aspects may lessen this sparsity, but it is yet unclear how much this would affect the performance of CF systems. This study proposes a CF approach based on aspect clustering that addresses the above issue in terms of rating prediction. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Our approach extracts aspects and represents them using Google’s pre-trained Word2vec model. Then, aspects are organised into clusters using the K-means clustering algorithm. Multi-dimensional Euclidean distance is used as a similarity measure for finding the appropriate neighbours and predicted ratings of unseen items are then made using the $k$ NN algorithm. This study also identifies the number of aspects that significantly impacts CF performance. Experiments are carried out using a real large-scale dataset: the Amazon movie dataset. Evaluation is also performed by comparing CF performance of the proposed approach with three different baseline approaches. Results show that the proposed approach improves CF performance compared to other approaches in terms of three predictive accuracy metrics.

approach, which is the most commonly utilised approach in RSs [2]. CF produces a recommendation for a target user based on the similarities between the target user and other users who have previously expressed similar preferences. The primary premise behind the CF method is that users who have previously exhibited similar behaviours are more likely to have similar preferences in the future. Although CF techniques have yielded impressive outcomes, some main challenges are involved, such as data sparseness due to insufficient user ratings, reducing the effectiveness of RSs.
The number of e-commerce and social websites encouraging users to discuss their experiences has recently increased significantly. As a result, online comments (i.e., reviews) on various issues continue to increase exponentially [3], [4]. These reviews can be incorporated into the recommendation process as a new source of information because of the sizeable and rich information related to users or items they include (i.e., the review elements). Reviews can be utilised to solve the data sparsity and hence improve the performance of RSs by providing fine-grained analyses of users' behaviours, which can help improve personalised recommendations. For this purpose, different elements can be derived from reviews, such as aspects, review helpfulness, and contextual information [1], [5]. Aspects, also referred to as opinion targets, are concepts in which the opinions are presented in a specific text/review. Aspects of restaurants, for instance, might be broken down into categories like ambience, meal quality, service, and pricing range.
Most of the studies that exploits aspects in CF demonstrates their benefit in improving CF performance, particularly in the rating prediction task [6], [7], [8], [9]. Extracting aspects from reviews is a challenging process due to the distinct characteristics of the reviews. As a result, most current studies [10], [11], [12] use fixed aspects defined manually or available in some datasets such as Yahoo! Movies. On the other hand, learned aspects extracted from reviews using specific methods are preferred over fixed ones [1], [13]. Learned aspects can help identify user preferences better, because the user will only include the aspects that are important to him. Furthermore, learned aspects can be utilised to describe items more accurately, which facilitate personalised recommendations based on user preferences.
Conventional CF relies on a user rating matrix to find the reliable neighbours of target users. Each cell in the rating matrix represents an overall rating of an item for a user. Based on the rating matrix values, neighbours with the highest similarity values to the target user are selected using any similarity metric method such as the Cosine function or the Pearson correlation function [14]. In contrast to aspectbased CF, each aspect is a criterion that must be considered throughout the neighbour selection process in addition to the overall rating. For this purpose, the similarity computation of conventional CF must be extended to reflect multicriteria CF. There are two possible ways of multi-criteria CF [15], [16], [17]. The first is to calculate the similarity value concerning each criterion separately and then aggregate these values into a single similarity value. The second is to calculate the distances with respect to the multiple criteria directly in the multi-dimensional space. In this case, neighbours are users with similar ratings for the same criteria (i.e., aspects) as the target user. The cost of finding the nearest neighbours rises as the number of users/items and aspects increases. Clustering of aspects can reduce such a cost as the matrix dimensions will be reduced [10], [18]. However, there have been some concerns that it will reduce the performance of CF systems. Yet, to what extent it will impact the performance of CF systems is subject to further research.
This study, therefore, proposes a CF approach based on aspect clustering involving large-scale datasets. The approach aims to minimise the cost of discovering neighbours for a target user through a clustering strategy for aspects that reduces the sparseness in the multi-criteria rating matrix while still providing a good CF performance [19]. Throughout this paper, we refer to such an approach as aspectclustering-based CF. The selection of relevant neighbours is a critical factor for improving CF performance. To a certain extent, the predictive accuracy of CF approaches depends on the choice of neighbours [10], [20], [21]. This study uses the semantically enhanced aspect extraction (SEAE) approach proposed in an earlier work [22] to extract aspects. The aspects are then represented using Google's pre-trained Word2vec model and grouped into clusters based on their semantic similarity using the K-means clustering algorithm. In addition, user similarity is calculated based on the cluster value (i.e., each cluster has a different number of aspects) rather than an individual aspect. This study explores the impact of clustering aspects on the performance of CF systems in terms of their prediction accuracy. We also examine whether all aspects extracted from reviews have the same impact on CF performance or whether some have a more significant effect than others. The contributions of this study are summarised as follows: • We propose an enhanced aspect-based CF approach by clustering the learned aspects to minimise the cost of the CF rating prediction process without compromising its accuracy involving large-scale datasets.
• We demonstrate that clustering the learnt aspects makes the selection of neighbours more relevant, as evidenced by enhanced prediction accuracy, since the prediction accuracy of CF techniques depends on the choice of neighbours.
• We carried out several experiments and identified the aspects that impact the performance of aspect-based CF. The remainder of this paper is organised as follows. Section II presents a quick overview of the current state of the art in our field. Section III presents our proposed approach. Section IV describes the experimental results, and Section V discusses the evaluation and analyses the proposed and baseline approaches results. Finally, Section VI summarises this study and gives research directions for the future.

II. STATE OF THE ART
Presently, RSs are widely regarded as one of the most significant tools in the digital world. Over the last two decades, RSs have been successfully employed as information filtering methods to address information overload problems [1], [23]. They have been integrated into various platforms, including e-commerce and social networking sites [24]. RSs usually concentrate on two issues: predicting ratings and making recommendations [25], [26]. The present study focuses on predicting ratings for the CF approach, which is currently the most popular approach used in RSs to identify neighbours [27], [28], [29].

A. CONVENTIONAL COLLABORATIVE FILTERING
In the traditional CF recommendation algorithm, a userrating matrix R: U×I, in which a set of items I are rated by a set of users U, is created for making recommendations for a target user. In general, the matrix R matrix has numerous missing ratings (i.e., unrated items), and the more missing ratings there are, the sparser the matrix becomes, which results in poor CF performance. Conventional CF aims to estimate the values of these unrated items to recommend the relevant ones to the target user. The quality of the predicted values impacts the quality of the recommendations. The rating predictions for these ratings can be calculated using various prediction algorithms, including nearest-neighbour algorithms and model-based approaches. The k-nearest neighbour (kNN) algorithms have emerged as one of the most popular algorithms for CF [30], [31]. kNN's entire success can be attributed to its automation of the process of acquiring and integrating content that reflects human decisions [31]. By doing so, it can compute inherently meaningful recommendations rather than just the compatibility of two items' specifications [31]. The kNN algorithm is used for regression and classification, and it was introduced by Joseph Hodges and Evelyn Fix in 1951 [32]. It is then developed as a non-parametric, lazy learning method by Thomas Cover. The kNN algorithm assumes that similar things converge. It is determined by the similarity of item features (also known as closeness, distance, or proximity) and does not make any assumption regarding the data distribution [33]. Because the points are in the features' space, the algorithm presumes that the data is in feature space; this enables the concept of distance. It also implies that each training dataset consists of a collection of vectors, each of which is connected to a class label. The number of neighbours that affect the classification is determined by a single given value of k.
A predicted rating p (U 1 , i) for user U 1 and item i is obtained by first using kNN to select the k best correlated (or most similar) users for the target user U 1 . The Pearson correlation and cosine-based functions are the most commonly utilised measures for determining user-item similarity [34]. Below is a detailed description of the two functions.

1) PEARSON CORRELATION COEFFICIENT
The Pearson correlation coefficient determines the strength of a relationship between two user-item pairings by using the following equation [35]: where Sim(U 1 , U 2 ) represents the similarity of two users U 1 and U 2 ; r U 1 ,i is user U 1 's rating for item i; r U 1 is user U 1 's average rating, and n is the overall number of user-item pairs.

2) COSINE SIMILARITY
Cosine similarity is a vector-space model based on linear algebra rather than just a statistical approach, which differs from the Pearson-based measure. It computes the cosine angle between two vectors in a multi-dimensional space. The lower the angle and the greater the similarity between the vectors, the closer the cosine value to 1. The cosine similarity between two users U 1 and U 2 is described as follows: After the k most similar users to user U 1 (i.e., neighbours) are selected, the rating prediction of user U 1 for item i is made using the following function: where p (U 1 , i) is the prediction of user U 1 's rating of item i, r U ,i refers to user Us rating of item i, Sim(U 1 , U ) represents the similarity score of two users U 1 and U , and N (U 1 ) is the user U 1 's neighbour set.

B. ASPECT-BASED COLLABORATIVE FILTERING
Several studies have taken advantage of textual user reviews to improve CF performance. Several review elements can be extracted and incorporated into CF to enhance its performance [1], [5]. The present study focuses on the aspects element which has the most significant impact on CF performance. An aspect is a concept that depicts a topic and needs to be included in each item, such as the aspects 'story', 'actor', and 'director', which are well-known concepts for movies. Ironically, despite the benefits of the aspects element for improving CF performance, only a few researchers have looked at their potential as a tool for enhancing CF performance and resolving CF issues [36], [37]. Most of these researchers rely on defining a fixed number of aspects due to difficulties in extracting them from the reviews. On the other hand, some researchers employ a text analysis technique called aspect-based sentiment analysis (ABSA). ABSA aims to extract the aspects and identify the sentiment associated with each aspect. Two main tasks are involved in it: aspect extraction and aspect sentiment analysis, and these tasks can be categorised into three types: semi-supervised, supervised, and unsupervised [38]. As the size of the reviews grows, finding labelled data for reviews, which is required for supervised approaches, becomes more difficult. As a result, most research focuses on unsupervised approaches, which do not require laborious and time-consuming data annotation tasks, nor do they suffer from domain adaption issues [39], [40]. The unsupervised approaches are based on vocabulary, frequency, syntactic relations, and topic models [22]. Rather than relying merely on the overall rating for making recommendations, aspects can enhance the recommendations by providing a fine-grained analysis of the users' interests. The efficiency of this element in improving CF performance has been demonstrated by the research that extracts this element from user reviews and incorporates it into CF [7], [9], [27], [41]. Musto et al. [6] extracted aspects and sub-aspects using Kullback-Leibler divergence, a non-symmetric measure and Nielsen's lexicon [42] based on the AFINN wordlist to assign the sentiment score for each extracted aspect/sub-aspect. The extracted aspects were then integrated into a multi-criteria user-item CF algorithm, in which the multi-dimensional Euclidean distance [15] is used to calculate the similarity between two user-item pairs. The experimental results for different datasets-Amazon, TripAdvisor, and Yelp-proved that the proposed algorithm outperformed all algorithms based on matrix factorisation and the single-criterion recommendation algorithms in term of the mean average error metric. Bauman et al. [43] developed a method for recommending items to improve the user's experience with the recommended items by using their most valuable aspects. The valuable aspects for users are identified using the Sentiment Utility Logistic model, for which an opinion parser: Double Propagation, is used for extracting the aspect-sentiment pairs. The experiments used the Yelp dataset, and the results for this method outperformed the most positive/negative aspect approaches and the most popular aspect approach.
As artificial neural networks have advanced, researchers have looked into employing deep learning approaches to extract aspects and improve RSs [44], [45], [46], [47]. For example, Da'u et al. [45] presented a model that employs a deep learning technique for extracting aspect-sentiment pairs to improve the accuracy of the recommendations. It comprises two parts: extracting the aspect-sentiment pairs and predicting ratings. The experimental findings demonstrated the proposed model's usefulness in enhancing recommendation accuracy.
Some studies, such as the work of Wasid and Ali [10], utilise user-based clustering to group users into clusters that generate neighbourhood sets. The K-means algorithm is used to cluster users, and a Mahalanobis distance metric is used to compute similarities between users. Experiments show that the performance of CF with user clustering outperforms CF without clustering using the Yahoo! Movies dataset.
Zhang et al. [48] presented a user-based clustering approach to reduce the impact of data sparsity. In their algorithm, users with similar preferences are grouped into a similar cluster, and only neighbours from the same cluster as the target user are chosen. The efficiency of the proposed algorithm in enhancing CF performance was demonstrated by experimental findings utilising both the MovieLens and HetRec2011-MovieLens datasets. Xiaojun [49] presented an algorithm for clustering users using K-means clustering and developed an improved similarity metric to discover relevant neighbours to the target user. Then a list of recommendations was produced. The experimental findings demonstrated the effectiveness of the suggested algorithm in solving the data sparsity problem, and the recommendations were able to adapt to changes in the user's interests.
Although studies incorporating aspects into the CF recommendation process are in existent, little to no studies have addressed aspect-based CF utilising clustering. Instead, we hypothesised that clustering of aspects in multi-criteria CF would result in the selection of more similar neighbours and thus improve the CF performance in terms of rating prediction.
The objective of the current study is to propose a practical CF approach based on aspect clustering in which learnt aspects are grouped into clusters according to their semantic similarity using the K-means clustering algorithm. We extract aspects using the semantically enhanced aspect extraction (SEAE) approach described in [22]. The extracted aspects mentioned in the user review are assigned scores using a domain-specific lexicon discussed in the previous work [50]. Because the review text is often brief and only contains a limited number of aspects, only a few extracted aspects will obtain scores for a given review. The rest will be left without scores, creating a highly sparse multi-criteria rating matrix. This issue is the reason for developing the proposed approach using clustering in this study. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity. The size of the multi-criteria rating matrix will thus depend on the number of aspect clusters rather than the total number of aspects. As a result, it will be less expensive and require less memory to discover the neighbourhood set. Furthermore, the proposed approach is designed to increase the ability to locate dependable and accurate neighbours for target users, which will improve CF performance. In addition, this study examines whether all the learnt aspects have the same impact on CF performance or whether some have a more significant impact than others. As such, other scenarios which do not have explicit feedback in the form of user reviews are beyond the scope of this work. Finally, CF performance using the proposed approach is compared with different approaches to evaluate its efficiency.

III. METHODOLOGY
This study aims to investigate the effectiveness of clustering aspects for improving the performance of aspect-based CF in terms of rating prediction. This study's methodology is based on an experimental approach using the Amazon movie dataset 1 [51], a large real-world dataset. A two-phase methodology has been proposed to achieve the aim: dataset preparation with aspects clustering and developing an aspectclustering-based CF approach. The former phase aims to populate the dataset with aspect clusters. Thus, it includes two steps: extracting and clustering aspects and extending the used dataset by including new attributes that describe the aspect clusters. The latter phase aims to develop an aspect-clustering-based CF approach by following a twostep procedure for selecting the appropriate parameters to gain better CF performance. The steps are determining the optimum number of clusters and identifying the significant aspects. FIGURE 1 describes the general methodology of this study. The following sections describe each phase in detail.

A. PHASE 1: DATASET PREPARATION WITH ASPECTS CLUSTERING
This study uses a real large-scale dataset: the Amazon movie dataset. In order to use this dataset in our study, this phase extracts aspects from the dataset, and groups them into clusters. The aspect clusters are then added to the dataset as new attributes so that they can be used in the following stage of developing the aspect-clustering-based CF approach. This phase consists of two steps: extracting and clustering aspects, and extending the Amazon movie dataset.

1) EXTRACTING AND CLUSTERING ASPECTS
This step extracts the aspects from the Amazon movie dataset using the Semantically Enhanced Aspect Extraction (SEAE) approach described in [22] and the blocking technique mentioned in [52]. The SEAE is a hybrid approach consisting of three approaches: syntactic-relation-based and frequencybased, that work in parallel-followed by a semantic similarity-based approach that aims to filter the aspects and extract only the ones relevant to the target domain. The SEAE approach generates a list of main aspects and a list of core terms. The core terms are words closely related to the main aspects but do not appear in the list of the main aspects due to their low frequency compared to the main aspects. In particular, limiting the aspects to specified words and ignoring those with similar meanings can have a negative impact on the aspect extraction process. In the movie domain, for example, the word picture appears in the list of main aspects, but the words image, photo, photograph and snapshot do not appear in the list, although having a similar meaning to the word picture. Ignoring those words might negatively impact the extraction process; therefore, these words are identified as the core terms for the aspect picture using the SEAE approach. To summarise, the extraction process results using the SEAE approach are the main aspects and the core terms relating to the main aspects.
After the aspects are extracted, this step arranges them into clusters based on their semantic similarity and compiles the core terms with their associated main aspects. Specifically, this step partitions aspects into K clusters based on their semantic similarity using Google's pre-trained Word2vec model and the K-means clustering method. K-means is a widely used distance/centroid-based algorithm, where distances are determined in order to allocate a point to a cluster [53]. The K-means algorithm associates each cluster with a centroid and aims to minimise the sum of the distances between the cluster centroid and the points assigned to the cluster. The distance between aspects (i.e., points) is calculated based on cosine similarities that are represented using a word embedding algorithm: Google's pre-trained Word2vec model that can accurately determine the similarity values among words. We chose Google's pretrained Word2vec model because of its large vocabulary, which is trained using the Google News dataset on about 100 billion words and consists of three million words and phrases [54]. The clustering aspects process comprises three sub-steps: group the main aspects into clusters, match the core terms with their related main aspects and merge the main aspects with their core terms into the identified clusters. The three sub-steps are described in the following sub-sections.

a: GROUP THE MAIN ASPECTS INTO CLUSTERS
In this step, the main aspects are extracted from the Amazon movie dataset using the SEAE approach are divided into clusters based on their semantic similarity. This step is accomplished by a simple function shown in FIGURE 2, which takes the main aspects as input and generates aspect clusters, which are then stored in AspectClusters_list. For clarification, the main aspects are clustered using the K-means clustering algorithm. Cosine similarity is used for measuring the similarity between aspects. This step specifies four different values for the number of clusters (K), with K = 8, 10, 13, and 15. These K numbers were chosen with the help of the elbow method, a heuristic one that determines the best number of clusters. It is based on the idea that a number of clusters should be chosen such that adding another cluster does not result in significantly better data modelling [55].

b: MATCH CORE TERMS WITH THEIR RELATED MAIN ASPECTS
Core words are derived using the SEAE approach; most of these terms are semantically related to the main aspects. Due to the small number of occurrences of these terms relative to the main aspects, they do not appear in the list of the main aspects. However, ignoring such terms would have a detrimental effect on the SEAE performance and subsequently affect CF performance. Several core terms are extracted from the Amazon movie dataset using the SEAE approach. This step seeks to match these terms with the most relevant main aspects based on their semantic similarity.
Because this study focuses on the movie domain, all the extracted main aspects are relevant to the movie domain. As a result, several extracted main aspects are semantically similar, implying that some core terms are likely to be related to multiple main aspects. The decision to link a core term to a specific primary aspect is based on the aspect having the highest similarity value for that term. Table 1 shows an example of some core terms and a list of the main aspects with the similarity values to the core terms. The aspects with the highest similarities to the core terms are selected. The words are represented using Google's pretrained Word2vec model, and the semantic similarity between the core term and the main aspect is calculated using the cosine similarity. For example, the core term baby is most like the aspect child. Thus, the aspect child will be selected.

c: MERGE THE MAIN ASPECTS WITH THEIR CORE TERMS INTO CLUSTERS
This step combines the results of the previous two steps to produce a list of aspects that have been grouped into the clusters determined in step a (i.e., the main aspects grouped into clusters) together with the core terms identified in step b. This step is conducted using the function defined in
• AspectRating is a score given to the aspect based on the sentiment words belong to the aspect. The scores of these sentiment words are determined using the domain-specific lexicon proposed in [50].
• NumberOfOccurrences shows the number of times the aspects mentioned in a review.

b: BRILLIANT ASPECT
This attribute identifies an aspect a user focused on more than the other mentioned aspects within the user's written review. The Brilliant aspect has the highest rating among all the aspects mentioned in the review. Referring to the previous example, the aspect 'music' will be selected as the Brilliant aspect since it is the highest ratings among the three aspects.
After the dataset has been extended, this step partitions the aspects in the Aspects attribute into the clusters defined in the previous step. Thus, the new attribute Aspects is defined as follows (4), shown at the bottom of the page, where K is the number of clusters, n is the number of aspects, and m is the number of core terms. A i,j is the name of j main aspect in cluster i, and W A i,j , R A i,j , and O A i,j are the weight, rating and occurrences of aspect A i,j respectively. C s A i,j is the core term s related to the main aspect are the weight, rating and occurrences of the core term C s A i,j respectively. The Feature i refer to the average rating of all aspects and core terms in cluster i. This step is performed by a function named DataSet_ GroupingAspects, presented in the next section, which takes two inputs: the extended Amazon movie dataset and the AspectCoreTerms_list generated in the previous step, which contains the aspects in each identified cluster with their core terms. This function returns a new dataset called Modified Dataset, which includes the extended dataset and the additional attributes specified for aspects.
The function generates a dataset for each of the four K values. The differences among the generated datasets are the number of the added attributes and the aspects that belong to each cluster which identify by the value of K . In detail, the procedure of this function is the same for each K value. It focuses on partitioning the aspects list in the Aspects attribute into the previously determined clusters. The aspects with their core terms corresponding to each cluster are stored in a newly added attribute. The Feature i attribute is then updated with the average rating for all aspects listed in each cluster.

B. PHASE 2: DEVELOPING AN ASPECT-CLUSTERING-BASED COLLABORATIVE FILTERING APPROACH
This phase aims to develop an aspect-clustering-based CF approach to evaluate how well the clustering improves CF performance. Before determining the appropriate approach for this study, we must check two parameters: the optimal number of aspect clusters and the optimal number of the used aspects. Follows is the description of identifying these two parameters.
where p (U 1 , i) is the prediction function for user U 1 's rating of item i, sim(U 1 , U ) is the similarity value between the two users U 1 and U , r U ,i is user U 's rating of item i, r U 1 is user U 1 ′ s mean rating, and N (U 1 ) is the neighbour set of user U 1 .
The similarity between users is calculated using the multi-dimensional Euclidean distance metric, one of the most popular metrics. The following equation describes its formulas for determining the distance d(R (U 1 , i) , R (U 2 , i)) between U 1 and U 2 users on item i.
where k denotes the number of criteria determined by the number of clusters, and R c (U 1 , i) is user U 1 's rating of item i on criterion c. Simply put, the average distance for all the shared items (I) between two users is the overall distance between the two users, as shown in Eq (7): The relation between the distance and similarity is an inverse relationship. The greater the distance between two users, the smaller their similarity value, and vice versa. As a result, using this distance measure, the similarity between two users is calculated as follows: The optimum number for aspect clustering is determined by the best results obtained from the four experiments for the different K values representing the best CF performance. The CF performance for rating prediction is evaluated using the predictive accuracy metrics, which determine how closely the predicted ratings match actual ratings provided in the dataset. In particular, Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE) are used. A lower value of these metrics indicates a higher CF performance since they typically calculate the error difference between predicted and actual ratings [57]. The following equations define the equations of the three metrics, respectively.
where N : is the size of the test set, p i : predicted rating calculated by CF approach, and r i : actual rating given by the user.

2) DETERMINING THE NUMBER OF SIGNIFICANT ASPECTS
The second step of phase 2 determines whether all the main aspects that were extracted from the Amazon movie dataset using the SEAE approach have the same impact on CF performance or whether some have a more significant impact than others. The work of Musto et al. [6] was the inspiration of this step; they used their extraction method to extract 50 aspects and found that the CF algorithm performs better when only 10 of the 50 extracted aspects are used rather than all 50 extracted aspects. This step evaluates CF performance using three different numbers/forms of aspects, the description of them as follows: • Top-10 most frequent aspects (Frtop-10): the top 10 aspects among all the extracted aspects with the highest number of occurrences (i.e., frequencies).
• Top-10 most relevant aspects to the domain (Sitop-10): the top 10 aspects of the extracted aspects with the highest semantic similarity values to the movie domain.
• The brilliant aspect: the aspect with the highest rating among all the aspects mentioned by a user. The top-10 aspects for the previous two forms are shown in Table 2.
For each number/form of aspects mentioned before, an experiment is conducted to assess CF performance in terms of rating prediction using the kNNwithMeans algorithm shown in Eq (5). The aspects are organised into 10 clusters in the Frtop-10 and Sitop-10; each cluster represents the score of an aspect included in the form. The multi-dimensional Euclidean distance metric is used to determine the similarities between users based on all their shared items. For the brilliant aspect, neighbours for a target user are selected based on their similarities with the target user in the shared items and have a similar brilliant aspect.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
This section presents the experiments carried out to accomplish the previously described phases. All the experiments use the extended Amazon movie dataset. We only consider users that have rated at least 20 movies and movies that have been rated by at least 20 users, yielding a total of 13,214 users, 17,022 movies, and 650,145 reviews. Also, all the experiments employed the kNNWithMeans rating prediction algorithm shown in Eq (5), with k = 50. The CF performance is assessed using three error metrics: MAE, MSE, and RMSE. The following subsections provide and discuss the results of each phase of the proposed approach.

A. PHASE 1: DATASET PREPARATION WITH ASPECTS CLUSTERING
This section provides the results of the two steps performed for phase 1.

1) EXTRACTING AND CLUSTERING ASPECTS
In this step, the aspects are extracted from the Amazon movie dataset using the SEAE approach, resulting in extracting VOLUME 11, 2023  49 main aspects and 481 core terms. Then, the aspects are clustered by applying the three sub-steps of the clustering aspects. The results of these steps are described below.

a: GROUP THE MAIN ASPECTS INTO CLUSTERS
In this step, the main aspects are divided into K clusters based on their semantic similarity using the K-means clustering algorithm. The output of this step generates four different lists because K is set to four different values: 8, 10, 13, and 15. The distributions of the main aspects in the lists that were generated based on each K value are shown in Table 3. This table demonstrates how effectively the K-means algorithm performs because all the aspects that are contained in the same cluster are semantically similar. The efficacy of the K-means algorithm is further demonstrated by the fact that the size of each cluster for each proposed K value is generally consistent with the other clusters.

b: MERGE CORE TERMS WITH THEIR RELATED MAIN ASPECTS
This step generates a list of core terms for each main aspect based on the semantic relation between the core term and the main aspect. Table 4 displays the results of this step and lists each main aspect along with its associated core terms. The main aspects are listed in order of the highest number of core terms (i.e., song) to the lowest. In addition, the lists of core terms are alphabetically sorted. Two points stand out in Table 4. First, it shows how closely related the meanings of most core terms are to the relevant aspects, such as the core terms 'image' and 'photo' for the aspect picture and 'essay' and 'novel' for the aspect book. The second point is that some of the core terms, such as those listed for the performance, edition, and character aspects, are spelt incorrectly. This type of error is common in many.user written reviews and considering these words will positively impact the aspectbased CF approach.

c: MERGE THE MAIN ASPECTS WITH THEIR CORE TERMS INTO CLUSTERS
This step creates a list comprising the aspects organised into clusters along with their core terms for each proposed K value (i.e., 8,10,13,15). Table 5 illustrates the result of this step for K = 8 as an example, where each cluster contains the relevant aspects based on their semantic similarity using the Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. K-means clustering algorithm. Each aspect (shown in bold) is followed by its core terms (if available) to benefit from the diversity of vocabularies for the same aspect, which will improve aspect-based CF performance. These clusters will be used to find neighbours for a target user based on the similarities of each cluster's overall rating (where the clusters 41990 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. contain multiple aspects) rather than the single rating of each aspect, as in the conventional aspect-based CF.

B. PHASE 2: DEVELOPING AN ASPECT-CLUSTERING-BASED COLLABORATIVE FILTERING APPROACH
This phase aims to develop an aspect-clustering-based CF approach. In order to develop this approach, several experiments are carried out to determine the values of two parameters: the optimal number of aspect clusters and the optimal number of used aspects. The results for identifying the optimal value for each parameter are described below.

1) DETERMINING THE OPTIMUM NUMBER OF CLUSTERS FOR ASPECTS
The results of phase 1 are four new datasets, named Modified Dataset, which includes the extended dataset along with the additional attributes specified for aspects. Each dataset is associated with a particular number of aspect clusters (K=8, 10, 13, and 15). This step uses these datasets for performing the CF rating prediction process individually, using the kNNwithMeans algorithm to determine the optimal number of aspect clusters. The CF performance for each dataset is reported in terms of the three metrics. The one with the best CF performance determines the optimum number for the aspect clusters (K). In particular, four experiments are conducted to identify the optimum number of aspect clusters, and each experiment is performed using five-fold cross-validation. In each fold, the dataset is split into two parts: training and testing (80% and 20%, respectively). The results of CF performance in terms of MAE, MSE, and RMSE are shown in Table 6.  Table 6 is presented in boldface, which reflects the best CF performance for the rating prediction process using the kNNwithMeans algorithm. As a result, the optimum number for aspect clustering (K) is eight.

The best result in
One interesting finding from Table 6 is that, when using the elbow method to determine the optimal value for K, the best value was 10. Subsequently, in an effort to demonstrate this, we conducted experiments using three other numbers close to the elbow's (10) and then reported the findings. After the experiments, K=8 yields better results than K=10, indicating that we should conduct further experiments instead of relying just on the elbow method.

2) DETERMINING THE NUMBER OF SIGNIFICANT ASPECTS
In this step, three experiments are performed, each of which offers diverse numbers/forms of aspects, as previously mentioned. It aims to verify whether all 49 aspects extracted from the Amazon movie dataset have the same impact on CF performance or whether some have a more significant impact than others. For each experiment, CF performance is evaluated in terms of the three metrics and then compared with CF performance using all extracted aspects organised into eight clusters (as previously indicated, the optimum number for aspect clustering is eight).
Similarly, five-fold cross-validation is carried out for each experiment, and the dataset is divided into training (80%) and testing (20%). The results of CF performance using the different forms of aspects are reported in Table 7.  Table 7 shows that using All_aspects method produces the best results (shown in bold) compared to the other methods. However, the Frtop-10 and Sitop-10 methods, on the other hand, also perform well because their values of the error metrics are just slightly different from those of the best method. This suggests that while all aspects are important, the top 10 aspects have the most significant impact on CF performance. Finally, while the Brilliant_aspect method has the lowest performance compared to the other methods, it still produces good results for the three metrics since there is no big difference between its results and those of the best method. This highlights that the brilliant aspect can influence the selection of neighbours' process which will affect CF performance.

V. EVALUATION AND ANALYSIS
The aspect-clustering-based CF approach (ASCF#8), in which all the aspects are organized into eight clusters, produces the best results based on prior experiments. The effectiveness of the ASCF#8 approach will be evaluated by comparing its performance with other available approaches. Specifically, the performance of the ASCF#8 approach is compared with three different approaches. The first is the aspect-based CF approach without clustering the aspects (ASCF#0). This approach aims to assess the clustering process's effectiveness in enhancing CF performance using the kNN algorithm (k value is set to 50 as ASCF#8 approach) with the Euclidean similarity metric. The second compared approach is the multi-criteria CF approach (MCCF) proposed by Wasid and Ali [10], which clustered users based on their shared criteria (i.e., aspects). Our study employs a large-scale dataset, whereas Wasid and Ali's study employed a smallscale dataset. The MCCF approach used the Yahoo! Movies dataset, which has 62,156 ratings for 976 movies from 6,078 users. The dataset is further reduced to only 19,050 ratings provided by 484 users in 945 movies by extracting only users who gave ratings for at least 20 movies. This is not the case in our study, which concentrates on large-scale datasets. Besides, the kNN algorithm, which is used for the MCCF approach's rating prediction, has a k value set to 30 in their study [10]. Due to the various dataset sizes utilised in our study and their study, three k values are tested in this evaluation section for MCCF approach. Specifically, we examined three neighbourhood size values: 10, 30, and 50 because we cannot just rely on the neighbourhood size that was determined in their study (i.e., 30). The last approach is the single-criterion CF that used the Pearson correlation similarity metric (CFP). CFP relies only on the overall ratings for rating prediction and does not use aspects.
All the experiments use the Amazon movie dataset. fivefold cross-validation are used for each approach, with the dataset being split into 80% training and 20% testing for each fold. The experiment findings are shown in Table 8 in terms of MAE, MSE, and RMSE as well as the percentage of improved performance of the proposed approach over the baselines.
The results show that the proposed ASCF#8 approach considerably outperformed the baseline approaches in the three metrics. It can be noted from the results that clustering aspects improve CF performance, as hypothesised in this study. This is evidenced by the fact that, when compared to ASCF#0, the values of MAE, MSE and RMSE all indicate improved performance of 12.26%, 26.27%, and 14.13%, respectively. Additionally, the ASCF#8 and ASCF#0 approaches show better performance when compared to the CFP approach, proving that multiple-criteria (i.e., aspects) CF performs better than single-criteria CF. It offers more information about user preferences, which helps identify the most suitable neighbours for the target user and enhances CF performance.
Moreover, according to the results in Table 8, the MCCF approach performs best when k = 10, and these results exceed the CFP approach, which is consistent with Wasid and Ali's findings [10]. On the other hand, the ASCF#8 approach significantly outperforms the MCCF (10) approach with improvements of 9.85%, 18.21%, and 9.56%, respectively in terms of MAE, MSE and RMSE. One of the reasons of this finding is that we rely on learned rather than fixed aspects, unlike Wasid and Ali's work. The learned aspects are aspects extracted from the user reviews, not general ones as the fixed aspects. The fixed aspects are few and technical ones that have an impact on calculating the appropriate neighbors and do not adequately reflect the user preferences [1]. Also, the results of both ASCF#8 and MCCF (10) provide an interesting finding for this study, which is applying approaches designed for small-scale datasets to large-scale datasets does not work efficiently. Lastly, the MCCF (10) approach surpasses the ASCF#0 approach in terms of the error metrics, demonstrating the significance of the clustering process in improving CF performance.

VI. CONCLUSION AND FUTURE WORK
In this study, we proposed an aspect-clustering-based CF approach to improve CF performance for the rating prediction process. The aim of using aspect clustering is to enhance the selection of the neighbourhood set by finding users with similar preferences to the target one, which impacts CF performance. Specifically, the approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Aspect clusters are multi-criteria that are integrated into the aspect-based CF approach, and the similarities between users are calculated using the multidimensional Euclidean distance to identify the appropriate neighbours for the target user who share similar preferences on the available aspect clusters. The clustering process is done using the K-means algorithm, which proves its efficacy in aspect clustering, as shown in the results. In addition, different forms of aspects are proposed and assessed using the CF rating prediction algorithm to identify the number of aspects that significantly impacts CF performance. Experiments are carried out using the Amazon movie dataset to show the efficiency of the proposed approach in improving CF performance. Results show that grouping aspects into eight clusters and calculating user similarity based on these clusters significantly affects CF performance. Moreover, among the 49 extracted aspects, the top 10 aspects significantly impact CF performance. On the other hand, utilising all the aspects in the proposed approach is superior to utilising only the top 10 aspects. Finally, the proposed aspectclustering-based CF approach outperformed the CF approach without clustering and the other baselines in prediction accuracy by the MAE, MSE, and RMSE metrics.
The evaluation was conducted mainly on objective prediction accuracy, i.e., algorithm performance. However, prediction accuracy metrics do not replicate the real user experience. According to McLaughlun and Herlocker [58], precision, recall, and Normalised Discounted Cumulative Gain (NDCG) metrics reflect the user's real experience because, in most cases, users actually received ranked lists from a recommender. Thus, the near future is to evaluate the proposed method based on these metrics.
Furthermore, we also plan to extract and explore more review elements and improve CF recommendation systems using deep learning techniques.