Novel SDDM Rating Prediction Models for Recommendation Systems

The accuracy of behavioral interactive features is a key factor for improving the performance of rating prediction. In order to deeply explore the potential rules of user behavior and enhance the accurate representation of interactive features, this paper proposes two rating prediction models, based on the spatial dimension and distance measurement (SDDM), under the premise of taking the mean value of the user behavior history as a user feature, and obtaining the interactive features of an item and a user by calculating the distance between them in each feature dimension. In the proposed SDDM-Var and SDDM-PCC models, the variance and the Pearson correlation coefficient (PCC) are respectively utilized to evaluate the user’s attention to each feature dimension as to further obtain the weight vector of the interactive features. Finally, in order to improve the generalization ability of the proposed models, the rating prediction is accomplished by means of a specially designed multi-layer full-connection neural network. The conducted experiments with two public MovieLens datasets demonstrate the superior rating prediction performance of the proposed models in comparison with the existing baseline models, in terms of the root mean square error (RMSE), by achieving values of 0.865 and 0.872 on MovieLens 100K, and 0.839 and 0.832 on MovieLens 1M, respectively for SDDM-Var and SDDM-PCC.


I. INTRODUCTION
It is quite difficult for users in the 'Big Data' era to quickly find and obtain valuable knowledge from the massive information volumes presented by multiple sources. The emergence of recommendation systems has provided a generic solution to this problem. Such systems are now widely used in different fields, such as e-commerce, video and music streaming, news delivery, etc., and huge profits have been made by many Internet companies as a result of this [1], [2].
In the field of personalized recommendation systems, the most widely utilized approach is collaborative filtering (CF) [3], which is based on the user behavior sequence.
The associate editor coordinating the review of this manuscript and approving it for publication was Ehab Elsayed Elattar .
CF provides recommendations based on computing the similarity between users or items, e.g., by using the cosine similarity or the Pearson correlation coefficient (PCC). CF is simple, quick, and efficient. However, it is difficult for the traditional CF to meet the current needs due to its limitation in generalization ability. For this, matrix factorization (MF) [4] has been proposed, which maps users and items into the same potential feature space according to the rating matrix. The rating prediction can be then realized by calculating the inner product of the potential feature vectors of a user and an item. MF with gradient descent proved to deliver an obvious performance improvement compared to traditional CF. However, the simple inner product operation still heavily limits the generalization ability of MF. With the deep learning (DL) technology developed in the past few years, the deep neural VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ networks (NNs) have evolved as a feasible way to improve the generalization ability of recommendation systems. For instance, by linking classical MF with NN, the neural CF (NCF), proposed by He et al. in [5], uses multi-layer perceptron to replace the inner product operation of MF, which effectively enhances the generalization ability of the model. NCF is able to achieve better recommendation performance only at the expense of regularly training the potential feature vectors of a user and an item. However, due to the large number of users and items, such regular training consumes lots of computing resources and time. As a solution to this problem, in some recent research works, the user features have been obtained according to the mean value of the user behavior history [6], [7]. However, this approach often reduces the accuracy of rating prediction. Consequently, some other researchers attempted to introduce time decay and attention mechanism for achieving more accurate user feature representation [8], [9]. Our previous research efforts in the field of rating prediction are presented in [10], [11].
In this paper, two rating prediction models, based on spatial dimension and distance measurement (SDDM), are proposed. The variance and PCC are used to deeply explore the variation rules of potential feature vectors from the user behavior history and a multi-layer full-connection NN is adopted to boost the accuracy of rating prediction and the generalization ability of the proposed models. The initial item features are obtained by pre-training, based on the Item2Vec model proposed in [12] for training item feature vectors. The user feature is the mean value of the potential feature vectors of the item in the user behavior history, while the interactive feature represents the difference (distance) between the item feature and the user feature in each dimension. A weight vector is constructed by taking the variance of the user behavior history and PCC as the interactive feature to alleviate the limitation of using historical mean value as a user feature. Different from traditional CF that uses PCC to measure the correlation between users or items [13], the research presented in this paper adopts PCC to measure the correlation between potential features, and rating level and variance to evaluate the stability of potential features in various dimensions. The analysis of results, obtained from the conducted experiments based on two public MovieLens datasets by using the mean absolute error (MAE) and root mean square error (RMSE) as evaluation metrics, indicates that two proposed SDDM models are equipped with excellent rating prediction ability.
In summary, the research work presented in this paper was focused on solving two main problems: 1) How to enhance the interactive feature representation by exploring the variation rules of the user potential features. 2) How to improve the generalization ability of the recommendation model as to more accurately predict the user rating of items. In order to solve these problems, we have conducted a lot of research and experiments. For this paper, the main contributions are as follows: 1) Two SDDM rating prediction models are proposed, whereby the interactive features of users and items are obtained by virtue of the calculation of the distance between them in each dimension.
2) The utilization of the variance and PCC is put forward to deeply explore the variation rules of user potential features. Based on this, a weight vector of interactive features is constructed for the accurate representation of interactive features. 3) Under the premise of using the Item2Vec model for item feature pre-training, a specially designed multi-layer full-connection neural network is utilized to improve the generalization ability and rating prediction performance.

II. RELATED WORK A. CONVENTIONAL RECOMMENDATION APPROACHES
The conventional recommendation approaches could be divided into three groups: content-based filtering (CBF), CF, and hybrid approaches utilizing both content and collaborative information. CBF [14] depends on the item portrait and user behavior. It can search for similar items under the portrait information of interesting items in the user history and recommend them to the user. CBF is widely used in industry due to its simplicity and efficiency.
However, when user-item ratings are the only information available for analysis, CF [15]- [17] is favored over CBF. A variety of CF recommendation models have been developed, such as SVD [18], SVD++ [19], and other MF-based models. Compared with the most basic CF, these models can improve the recommendation-and generalization performance to a large extent. CF and MF are still hot research topics in the area of recommendation systems.

B. DEEP LEARNING-BASED RECOMMENDATION APPROACHES
Deep learning (DL) has been successfully applied in the field of computer vision and natural language processing (NLP). A lot of research work has verified the excellent performance of DL in dealing with regression and classification tasks, e.g., [20], [21]. A new trend in recent years is represented by the DL-based recommendation approaches, initially developed by some large Internet technology companies. For instance, the wide & deep model, proposed by Google [22], is provided with the advantages of the logical regression model and the NN model, resulting in better memoryand generalization abilities. The NN-based recommendation model, proposed by YouTube in the field of video streaming [23], is suitable for large-scale data screening recommendation scenarios. The deep interest network (DIN) model, proposed by Alibaba, can capture the user's interest direction by utilizing the attention mechanism, leading to a better recommendation performance [24]. Academia has also paid a lot of attention to the use of DL in recommendation systems. For instance, Kuang et al. proposed the DMF-CDR model, utilizing a multi-layer perceptron to learn the feature representation of users and items, and adding a cross-domain information to alleviate the sparsity problem of CF and to improve the recommendation performance [25]. Based on a graph neural network (GNN), Xian et al. proposed the ReGNN model for recommendation tasks, which combines a repeated search mechanism and achieved more accurate prediction by modeling the repeated exploration behavior pattern of users [26]. These and other DL-based recommendation models, developed in recent years, demonstrated the advantages of the DL technology in the field of recommendation systems.

C. ITEM2VEC MODEL
A common approach in this field is to construct corresponding potential feature vectors for each user and item, regarded as an embedding process. Being an improved version of Word2Vec [27], [28], Item2Vec [12] is an excellent item embedding training model, which can better adapt to different recommendation scenarios. Basically, Item2Vec maps all items to a potential feature space according to the user behavior history and calculates the similarity between items using the cosine of the angle. In addition, the simple full-connection layer structure in Item2Vec ensures its fast-computing speed and excellent embedding performance. The optimization objectives of Item2Vec are defined as follows: where w i and w j are the center item and its corresponding surrounding items in any behavior sequence, and K is the sequence length. of the interaction behavior of user u n with these items, the purpose of the proposed SDDM models is to predict the future interaction behavior S i m u n of user u n towards item i m , i.e., to predict any rating values S i m u n missing in S u n .

B. OVERVIEW
The proposed SDDM models consist of the following two modules ( Fig. 1): 1) Item feature pre-training module -based on Item2Vec [12], this module maps all items to the feature space by inputting the user behavior record in order to get the item feature vectors that express user-and item features existing in a multi-dimensional space. The user behavior record refers to the behavior history sets, generated so far based on the user interaction with items, containing the rating values given by the user to different items (generally, a separate rating value is assigned by the user to each item feature). The behavior record could be interpreted as a sentence composed of different words, following the Word2Vec concept.
To enable faster convergence of the rating prediction algorithm executed in the other module, this module obtains the feature vector of an item by calculating the co-occurrence probability of different items by utilizing, for pre-training, the Item2vec model instead of a common embedding layer. 2) Rating prediction module -based on a specially designed multi-layered full-connection NN, this module is used to improve the generalization ability of the proposed models by enhancing the feature representation. The input user-and item feature vectors are respectively processed by the first two layers of the neural network and then the difference between the two corresponding feature dimensions is calculated to obtain the interactive feature vector of the particular user and item. Finally, the Hadamard product calculation is performed on the interactive feature vector and the interactive weight, and the predicted rating is obtained through the final four layers of the full-connection NN. To compensate the shortcomings of the full-connection NNs, such as over-fitting, regularization parameters are added in the process of model training. The implementation of the designed multi-layer full-connection NN is described in detail in Subsection III.F.

C. PRE-TRAINING OF ITEM FEATURES
Pre-training of item features can be done in different ways, e.g., by utilizing the SVD model [18] or an embedding model [12], [27], [29]. The latter approach was utilized in the proposed SDDM models for the pre-training of item features, based on Item2Vec.
In the original Item2Vec model, all items in the behavior record of user u n are regarded as a sentence for training without distinguishment between high and low ratings. However, the SDDM models, proposed in this paper, focus on explicit feedback behavior, so for each user, the items in his/her behavior records are first divided into different sentences according to their rating value given by the user, so that the items in the same sentence are equally graded. Then these sentences are inputted into the Item2Vec model for training and getting the set contains the feature vectors of all items. Each vector v i m is a dense vector with dimension j, where j represents the total number of item features.
In the proposed SDDM models, the interactive features of a user and an item are calculated through the difference between them in each dimension. The interactive features need to reflect the similarity between the user-and item features to a certain extent. However, the feature vector generated by Item2Vec cannot meet this requirement because it measures the similarity between the two kinds of features through the cosine of the angle and the difference in each dimension cannot directly reflect the level of similarity. Consequently, it is necessary to normalize the calculation of all generated item feature vectors so that each vector can be transformed into a unit vector with a module length of 1. So, in the SDDM models presented in this paper, the following normalization is applied for each vector

D. REPRESENTATION OF USER FEATURES
User feature is the mean value of all positive rating values given by a user (and presented in his/her behavior history record) to a particular feature of items. Given a set V q , the interest feature matrix M u + n of user u n can be presented as follows: Then, the feature vector of user u n can be obtained by average pooling of the item feature vectors: It can be considered that the user-and item feature vectors are in the same potential space as the former is obtained by average pooling of the item feature vectors present in the user behavior history record. A sample 3D feature space is shown in Fig. 2, where the blue arrows represent the feature vectors v 1 and v 2 of two items in the user behavior sequence, whereas the red arrow represents the feature vector of user v u resulted from the average pooling of the feature vectors of these two items.

E. FEATURE INTERACTION
Feature interaction is about getting an interactive feature vector according to the user-and item feature vectors under interaction rules, which is similar to the concatenation operation performed on the user-and item features. The user-and item feature vectors are mapped respectively through the first two layers of the designed full-connection NN as to enhance feature representation prior to the implementation of feature interaction. The process can be represented as follows: where ψ None denotes the full-connection NN and None means that the activation function is not set.
Interactive feature is a vector that is composed of the distance between the user features and item features in each dimension. In this step, the user-and item feature vectors could be each regarded as a point in the potential space and the distance between the two points in each dimension could be deemed as a reflection of the similarity or difference between the two entities to some extent. Thus, the interactive feature vector of item i m and user u n can be expressed as follows: However, it is hard to achieve a good performance only by using the above interactive features as a basis for rating prediction because: (i) the user features are obtained by average pooling at the beginning, which leads to some inaccuracy; and (ii) not all dimensions would be of equal importance/interest to users, so the distance on some dimensions cannot serve as a real basis for rating prediction. Therefore, it is necessary to determine which dimensions of interactive features are meaningful and to weaken the influence of meaningless or less significant dimensions. Consequently, two different methods -the variance and PCC -are used in this paper to capture the user's interest in each dimension, based on the user behavior history record, which results in two different models, called SDDM-Var and SDDM-PCC, respectively. The two ways of calculation of the weight w (k) u n of the k th dimension for user u n are described in the following two subsections.

1) CALCULATION OF DIMENSIONS' WEIGHT USING VARIANCE
According to (3), the variance of each column in matrix M u + n is calculated under the circumstance of considering only the positive feedback of user u n . The smaller the variance is, the more stable the value of the corresponding dimension and the stronger the reference value. Since the purpose of variance calculation is to get the weight of interactive features, the smaller the variance of the dimension is, the larger the weight value. Therefore, the weight w (k) u n of the k th dimension for user u n can be obtained as: wherev (k) represents the mean value at the K dimension in the behavior history record of user u n , and γ is a user-defined parameter.

2) CALCULATION OF DIMENSIONS' WEIGHT USING PCC
When PCC is used for calculating the dimensions' weight, all rating values (both negative and positive) given by a user (and presented in his/her behavior history record) are considered, i.e., not just the positive rating values. So, the interactive feature vector of item i m and user u n becomes: Then, the weight w (k) u n of the k th dimension for user u n can be obtained by using PCC as: where r i m u n is the rating value given by user u n to item i m and r u n is the average value of all ratings in the behavior history record of user u n .
The process of interactive weight calculation is formally presented as Algorithm 1.

Algorithm 1 Calculation of Dimensions' Weight for User u n
Input: i_vec_List (feature vector of items in the behavior history record), i_rat_List (items rating record), γ (user-defined parameter), dim (dimension of feature vector) Output: dw (dimensions' weight) 1: start: 2: Define dw as a list 3: for k = 1 to dim 4: if using PCC: if using Variance: 10: append w (k) to dw 13: end for 14: return dw 15: end After calculating the weight of each dimension (either by using the variance or PCC), the final interactive feature vector is obtained as follows: VOLUME 9, 2021

F. OUTPUT OF RATING PREDICTION
In the proposed SDDM models, a specially designed multilayer full-connection NN is used for rating prediction. The outputŷ of the corresponding rating prediction module (c.f. Fig. 1) is: where f LReLu denotes a full-connection NN layer using Leaky ReLu as an activation function, ψ None denotes a full-connection NN layer without activation function, g denotes the calculation process of the final interactive feature vector, b u n and b i m are the user-and item offset terms (c.f. [4]), and τ (0 ≤ τ ≤ 1) is the offset term coefficient. When τ > 0.5, the rating value is inclined to user personalization, otherwise, it tends to item popularization.
In the training process of the rating prediction module, the output of the l th layer of the full-connection NN can be expressed as in the process of forward propagation as: where X l denotes the input of the l th layer of the full-connection NN, f a is the activation function, and W l and B l are the weight and bias, respectively, which are constantly updated during the training.
In the process of backward propagation, the mean square error (MSE) function is used as a loss function and L2 regularization is added to prevent the model from overfitting: where y i andŷ i are the real-and predicted rating values, respectively, k is the number of samples of the current training batch, L is the total number of NN layers, and λ is the regularization coefficient. The rating prediction process is formally presented as Algorithm 2.

A. DATASETS
To evaluate the rating prediction performance of the proposed models in comparison with the existing baselines, corresponding experiments were conducted on two public movie rating datasets -MovieLens 100K (ML-100K) and Movie-Lens 1M (ML-1M). 1 ML-100K contains 100,000 ratings from 1000 users on 1700 movies, whereas ML-1M contains one million ratings from 6000 users on 4000 movies. In the experiments, the ratings in each dataset were divided into five groups, corresponding to the rating values, ranging from 1 to 5. In terms of the user rating behavior, only users who provided at least 20 movies ratings were considered in the experiments.
None (i_vec)) 7: Calculate v i−u based u_vec and i_vec , using (6) and (9)  8: Calculate Loss(ŷ, y label ) 11: Update all NN parameters in f For evaluating the rating prediction performance of different models included in the comparison, the standard metrics RMSE and MAE were used, defined as:

C. MODELS FOR COMPARISON
The two proposed models, SDDM-Var and SDDM-PCC, were compared to the following eight baseline models: • IGMC [30] -an inductive graph-based matrix completion model, which can achieve good recommendation performance without using any auxiliary information.
• GC-MC [31] -a graph autoencoder model, based on distinguishable messages passed on a bidirectional interaction graph.
• Factorized EAE [32] -a DL-based model for cross-domain recommendation, with good generalization performance.
• NNMF [33] -a recommendation model combining NN and MF, whereby the inner product operation of MF is replaced by a multi-layer perceptron.
• MetricF [34] -a measurement decomposition model, based on the Euclidean distance meeting inequality attributes to measure the explicit proximity between users and items for rating prediction or personalized ranking.
• AutoSVD++ [35] -a hybrid model, which integrates a compression autoencoder into the MF framework.
• HGAR [36] -a recommendation model, which combines the SVD algorithm and a multi-layer perception while also considering both implicit and explicit information.
• SSAERec [37] -a rating prediction model, based on a stacked sparse autoencoder and MF.

D. EXPERIMENT SETTINGS
The experiments with the proposed models were carried out under the DL framework Pytorch. Each dataset was divided according to the level of rating by means of stratified random sampling, in which the training set and the test set account for 80% and 20%, respectively. In addition, the experiments with the proposed models were conducted in two stepspre-training of item features and rating prediction. When using Item2Vec to pre-train item features, the training method Skip-gram was applied with 150 iterations. In the rating prediction step, the learning rate, the regularization coefficient, and the bias term coefficient were set to α = 0.005, λ = 0.005, and τ = 0.2, respectively. For calculating the weight of dimensions based on the variance, the value of parameter γ was set to 15. Fig. 3 and Fig. 4 show the MAE and RMSE downward trend, respectively, with increasing the number of epochs 2 for the two proposed models, SDDM-Var and SDDM-PCC, based on the ML-100K and ML-1M datasets, whereby the item-and user feature dimensions 3 are set to 128 for both models. From Fig. 3, the following observations can be made: (i) with increasing the number of epochs, the decrease of MAE is more stable for both models on ML-100K; (ii) on ML-1M, SDDM-PCC needed 35 epochs to start performing stably better than SDDM-Var (in terms of MAE); however, on ML-100K, SDDM-PCC is generally outperformed by SDDM-Var; (iii) on ML-1M, the MAE values for both models are lower than that on ML-100K, as a result of the much greater amount of data available on ML-1M, which allows the neural network to better learn the sample features of data and achieve better rating prediction performance. From Fig. 4, it can be observed that the downward trend of RMSE closely follows that of MAE depicted in Fig. 3. Here, 30 epochs were needed for SDDM-PCC to start performing stably better (in terms of RMSE) than SDDM-Var on ML-1M. However, on ML-100K, SDDM-Var is overall outperformed by SDDM-PCC. 2 Epoch -a parameter, defining the number of times a model has worked through the entire training set. 3 Feature dimension represents the amount of information expressed by a feature vector.  As different values of feature dimension (d) may result in different prediction performance, additional experiments were carried out to investigate this. Tables 1 and 2 show the obtained MAE and RMSE results, respectively, for both proposed models after 70 epochs. From these results, it can be seen that both MAE and RMSE get their highest value when d = 32, after which they stably decrease, reaching a minimum when d = 128, except for the SDDM-PCC model applied on the ML-100K dataset, where the minimum is reached earlier -for MAE when d = 64, and for RMSE when d = 96, respectively. Thus, d = 128 is recommended for use as a good balance between the calculation time/cost and the prediction performance, as further increase of the feature dimension leads to a significant increase of the calculation time and cost.

2) PERFORMANCE COMPARISON OF ALL MODELS
The rating prediction performance comparison of all considered models (in terms of RMSE) is presented in Table 3. The RMSE results for the proposed models, SDDM-Var and SDDM-PCC, were obtained based on the conducted   experiments, whereas the RMSE values for the baselines were all taken from the corresponding papers. As most of these papers do not provided MAE results, this metric was not used in this comparison.
The RMSE results in Table 3 clearly demonstrate that both proposed models outperform all the baseline models, on both datasets, with only one exception on ML-1M, namely the GC-MC model, which shows equal rating prediction performance with the leader SDDM-PCC. This superior performance of the proposed models is due to the fact that they fully consider the changes of item feature vectors in the user behavior history records in each dimension, which, compared to the baselines, allows them to more accurately grasp the users' interests in each feature dimension.

V. CONCLUSION
This paper has put forward two recommendation models, based on the spatial dimension and distance measurement (SDDM), for rating prediction by means of utilizing a specially designed multi-layer full-connection neural network. In the proposed models, the interactive features are obtained by calculating the distance between the user feature and item feature in each feature dimension. In addition, in order to achieve better prediction performance, the user's attention to different feature dimensions is fully considered and the interaction weight is obtained by utilizing the variance and the Pearson correlation coefficient (PCC), in the presented models, SDDM-Var and SDDM-PCC, respectively. The conducted experiments on two public MovieLens datasets confirmed that both proposed models outperform (in terms of RMSE) the baseline models considered.
Although the rating prediction performance of the proposed models is sufficiently high, there is still room for further improvement. For instance, extra user features, such as age and gender, as well as other relevant item attributes can be taken into account to improve the performance of the models. Extra work is also needed as to achieve feature crosses. All this will be a subject of our future research work. In addition, we plan to optimize both models to work faster with high values of feature dimension, as to make them more suitable for real-time recommendations. DIAO  IVAN GANCHEV (Senior Member, IEEE) received the degree (summa cum laude) in engineering and the Ph.D. degree (summa cum laude) from the Saint-Petersburg University of Telecommunications, in 1989 and 1995, respectively.
He is currently a Full Professor with the University of Plovdiv ''Paisii Hilendarski,'' an International Telecommunications Union (ITU-T) Invited Expert, and an Institution of Engineering and Technology (IET) Invited Lecturer. He was involved in more than 40 international and national research projects. He has served on the TPC of more than 330 prestigious international conferences/symposia/workshops, and has authored/coauthored one monographic book, three textbooks, four edited books, and more than 280 research papers in refereed international journals, books, and conference proceedings. He is also on the editorial board of multiple international journals and served as a guest editor for multiple international journals.