Dual Auto-Encoder Based Rating Prediction Recommendation Algorithm

Collaborative filtering is the most widely used method in recommendation algorithms, but it still faces the serious problem of data sparsity. Traditional collaborative filtering uses matrix decomposition to learn the latent features of users and items. As an extension model of matrix decomposition, Funk-SVD model has attracted wide attention due to its good scalability and easy implementation, but it is difficult to extract the latent features of users and items from sparse rating information because it essentially learns the linear relationship between users and items. To solve this problem, we propose a Dual auto-encoder based Rating Prediction Recommendation Algorithm (DRPRA) model. The DRPRA model uses the strong ability of deep learning in feature learning, which combines double auto-encoders with Funk-SVD. First, the auto-encoder captures the latent features of users and items respectively. Then, the Funk-SVD combines the user features with item features to reconstruct the rating matrix. After that, we minimize the error between original rating matrix and reconstructed rating matrix, and to alleviate the problem of data sparsity and improve the accuracy of rating prediction effectively. We conducted extensive experiments on Movielens-100K, Movie Tweeting-10k, and Film Trust datasets, and the results show that the rating prediction model based on dual auto-encoders has a superior recommendation performance.

preferences, so they may be interested in the same item. 29 By calculating the similarity between users, the historical 30 data of similar with high similarity to target users are used 31 for prediction [3]. Item-based CF is based on the similarity 32 between items and determines the similar items of historical 33 items according to the historical ratings, and recommends 34 the items with high similarity to target users [4]. Model-35 based CF relies on matrix factorization (MF), Funk-SVD 36 and other machine learning models, uses existing sparse data 37 to predict missing user-item ratings, and recommends the 38 items with the highest predicted ratings to users [5]. Funk-39 SVD learns the linear relationship between users and items. 40 In fact, the relationship between users and items is com-41 plex and non-linear, so it is difficult for Funk-SVD to cap-42 ture the complex interaction relationship between users and 43 items.  [10] 49 GAN [11], [12] and AE [13], [14] have been applied to 50 recommendation systems due to their advantages in feature 51 learning. Based on the above, we propose a rating predic-  ings into the matrix co-decomposition model. The method 124 of calculating and quantifying the similarity values of dif-125 ferent users is proposed to weight the identification of mul-126 tiple neighboring users, and the identified and calculated 127 neighboring users are applied to the rating prediction task 128 of the matrix decomposition model [18]; Al-Shamri et al. 129 argue that an unknown user rating value can be directly 130 predicted without finding and weighting similar users [19]. 131 The above models introduce objective features of users or 132 items as auxiliary information and search for possible asso-133 ciations between users based on objective features, which 134 improves the rating prediction accuracy. PKER [20] intro-135 duces knowledge graphs for item representation and feeds 136 them as auxiliary information into an extensible self-encoder 137 to alleviate the data sparsity problem; Agrec [21] treats the 138 rating matrix as a graph and extracts graphical features of 139 items as higher-level feature representations to improve rec-140 ommendation accuracy; GraphRec [22] uses user-item sym-141 biotic graphs (bipartite graphs) to construct generic user and 142 item attributes that do not require external information to 143 alleviate the sparsity problem and obtain higher recommen-144 dation results. This graph does not require external infor-145 mation to alleviate the sparsity problem and obtain higher 146 recommendation results; CAPR [23] uses autoencoders with 147 graph regularization to extract user features to construct 148 higher-level features to obtain better user-item interaction 149 features.

150
In this paper, we also use dual autoencoders to extract user 151 item features for recommendation, but we introduce more 152 important user item attributes, such as user zip code (which 153 contains the user's geographic information), item name, and 154 M users is denoted as R U ∈ R M ×N ; the item rating vector is 208 denoted as r i ∈ {R 1i , · · · , R Mi } ∈ R M , and the overall rating 209 vector of all the overall rating vector of N items is denoted 210 as R I ∈ R M ×N ; Each user and item has unique attribute fea-211 tures, such as the time of movie release, the time and type 212 of song composition; the age and occupation of the user, etc. 213 We take full account of the user and item attribute features 214 by one-hot coding them. The attribute feature vector of item i 215 is denoted as A I ∈ R W ×N , and the attribute feature vector of 216 user u is denoted as A U ∈ R M ×Y .

217
To refer to the auxiliary feature information, we fuse 218 the overall item rating vector R I with the attribute feature 219 vector A I of the item as the input of an auto-encoder to learn 220 the latent features of item i. The overall user rating vector 221 R U is fused with the user's attribute feature vector A U as the 222 input of another auto-encoder to learn the latent features of 223 user u. The fusion of the overall item rating vector and the 224 item attribute feature vector as the overall feature vector of 225 the item is defined as cat R I ; A I ; the fusion of the overall 226 user rating vector and the user attribute feature vector as the 227 overall feature vector of the user is defined as cat R U ; A U . 228 As shown in Eq. (1)  To further alleviate the data sparsity problem and improve the 236 accuracy of recommendation algorithms when performing 237 rating prediction tasks, we designed Dual auto-encoder based 238 Rating Prediction Recommendation Algorithm framework is 239 shown in Figure 1. we utilize auto-encoders to learn potential 240 features of both users and items, and minimize the bias of 241 the training data through the user and item representations 242 learned by FUNK-SVD. 243 We combine the overall rating vector R I of items with 244 the item attribute feature vector A I as the input to the 245 auto-encoder to learn the implicit special of item i. We denote 246 cat R I ; A I ∈ R (M +W )×N as the item features of R I and A I 247 in series, where R I denotes the rating vector of all N items 248 and A I denotes the attribute feature vector of the items. Using 249 cat R I ; A I as the input to the auto-encoder, the implicit 250 features of the learned items are denoted p i . The purpose of 251 the auto-encoder is to approximate the initial input, where 252 we use the auto-encoder to make the output approximate the 253 rating part of the input, while extracting the implicit features 254 of the items p i . The loss function of this part, as well as the 255 encoding and decoding are shown in Eq.(3) Eq.(4) and Eq.(5) 256 are shown.  278 Above equation, Q u ∈ R (N +Y )×H and Q u ∈ R H ×N are 282 used as the weight matrix, b u ∈ R H and b u ∈ R N are the 283 bias term, g(·) and f (·) are the activation functions, where the 284 activation function g(·) uses Sigmoid and f (·) uses Identity. 285

Require:
The rating matrix R ∈ R M ×N , the number of hidden neurons h, and the dimention of user and item attribute vector. Ensure: The prediction matrixR ui = q T u · p i 1: Get the attribute information vector a u for each user; 2: Get the attribute information vector a i for each item; 3: Get the splicing vectors (R U , A U ) and (R I , A I ) of the attribute vectors of the users and the items and the corresponding rating vectors; 4: Initialize Q u , Q u , Q i , Q i by truncating a normal-distributed random number, and set p i q u to 0 vectors. 5: Input (R U , A U ), (R I , A I ) to two semiautoencoders; 6: Minimize Eq. (11) using a stochastic gradient descent algorithm until the algorithm converges; Here, the potential features of users and items are obtained 286 separately using auto-encoder, and the high-dimensional 287 sparse data are mapped to the low-dimensional potential 288 space by the auto-encoder model, and the prediction rating 289 matrix q T u · p i is reconstructed in the low-dimensional space 290 SVD technique, so that the final prediction matrix q T u · p i has 292 the lowest deviation from the original rating matrix, where 293 the loss function is shown in Eq.(9) The overfitting problem is a common problem in the train-296 ing process of recommendation algorithm models, and to 297 avoid this problem, we introduce the regularization term of 298 the auto-encoder weight matrix. As shown in Eq. (10).
From this we obtain the loss function of the DRPRA model, 301 as shown in Eq. (11).
Considering the problem of model overfitting, and utilize  As shown in Table 1, specific information about the dataset 338 is presented in detail. In these datasets, we use not only the 339 rating matrix of the user's project, but also the user's age, 340 gender, occupation, zip code (which implicitly contains the 341 user's location information), year, type, theme, and name of 342 the project (these are the key factors influencing the user's 343 preferences and play an important role in the training). Movielens-100K includes 100000 ratings for 1682 movies 345 from 943 users. The user data provides demographic data 346 for three domains: gender, age, and occupation; movie data 347 includes movie title and genre; and the rating data includes 348 the user ID of each rating, the movie ID of the rating, and the 349 timestamp, with a rating range of 1 to 5.

350
The Movie Tweetings-10K dataset is an extremely sparse 351 dataset with ratings ranging from 1 to 10. In the experiments, 352 the dataset we processed to retain users with at least 10 items, 353 and this dataset contains only item attribute information, and 354 without user attribute information.

355
Film Trust dataset contains 35497 ratings of 2071 movies 356 by 1508 users, with ratings ranging from 0.5 to 4. There are 357 no attribute features of users and items in this dataset, so we 358 exploit rating information in advance, while comparing the 359 proposed algorithm with the trust-based algorithm model. In recommendation Algorithms, the evaluation criteria of the 362 recommendation algorithm as an important concern in assess-363 ing the accuracy of the prediction. The accuracy evaluation 364 criteria measured by the root mean square error (RMSE) 365 and the mean absolute error (MAE), revealing the deviation 366 between the predicted values of the values in the experimental 367 results and the corresponding values in the validation dataset. 368 Thus minimizing the error value and thus making the best pre-369 diction performance, which is calculated as shown in Eq.(13) 370 and Eq. (14).
Above formula, R u,i is the global rate matrix, and R u,i 374 the prediction matrix. Obviously, the values of MAE and 375 RMSE are the smallest and the algorithm recommends the 376 best performance.  In the experiments about parameter settings, we set the   is a small batch SGD with a batch size of 128. As shown 413 in Figure 2 and Figure 3, we use a histogram to show the 414 experimental results of the data DRPRA algorithm more 415 visually. 416 Here, we tested the effect of the parameters α, β and 417 γ in Eq.(11) on the dataset by subjecting the dataset to a 418 random sampling at a ratio of 80%. Initially, α = 0.5, 419 β = 0.5 and γ = 1. Next, we utilized the control variable 420 method to Choose optimal parameters of the model, specif-421 ically, all three parameters α,β and γ were sampled from 422 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5 to test one param-423 eter, and the remaining two parameters were kept constant. 424 As shown in Figure 4, we analyse the effects of the parameters 425 α, β, and γ on the dataset.