Selective Knowledge Transfer for Cross-Domain Collaborative Recommendation

Data sparsity is a major challenge for collaborative filtering recommender systems. A promising solution is to utilize feedback or ratings from multiple domains to improve the performance of recommendations in a collective way, known as the cross-domain recommendation. Cross-domain recommendation using heterogeneous feedback is a popular solution, which transfers knowledge from the more easily available auxiliary binary feedback to improve the prediction performance of the target domain. Most of the existing work focuses on the transfer of knowledge between different domains from the same website, where user behavior data in different domains can be fully shared. The existing work mainly assumes that data from different domains can be fully shared. However, due to the constraints of business privacy policies, it is difficult to directly share exactly the same user behavior data between different e-commerce websites. It results in that the user’s latent factors learned in the auxiliary domain cannot be directly transferred to the target domain, otherwise, it will cause a negative transfer issue. In this article, we consider that the auxiliary domain with numerical ratings and target domains with binary feedbacks only share overlapping items rather than users. We propose a Selective Knowledge Transfer for Cross-domain Collaborative Recommendation, called SKT. The proposed SKT framework not only transfers the item’s latent factors learned from the auxiliary domain to the target domain, but also selectively transfers the user’s latent factors learned from the auxiliary domain to the target domain. In addition, due to the introduction of co-graph regularization of user graphs and item graphs, SKT can maintain respective intrinsic geometric structure within each domain and thus avoid negative transfer issue. Extensive experiments conducted on two real-world datasets show that our SKT method is significantly better than all baseline methods at various density levels.


I. INTRODUCTION
With the explosive growth of online information, the recommender system has become an important tool to help users find the information they desired effectively [1], [2]. At its core is estimating how likely a user will adopt an item based on historical interactions such as ratings, purchases and clicks [3], [4]. Collaborative filtering (CF) addresses it by assuming that users with similar behaviors in history will also exhibit similar preference on items [5], [6]. Among the various existing collaborative filtering techniques, matrix factorization (MF) which can learn the latent factors for users The associate editor coordinating the review of this manuscript and approving it for publication was Nilanjan Dey . and items is the most popular one [7]- [10]. However, the recommendation accuracy of MF is highly dependent on the rating matrix. Unfortunately, the rating matrix is usually very sparse in the real world, which forms a barrier to the widespread use of MF in realistic recommender systems. We have noticed that some websites often have a degree of homogeneity in their functionality and provided information. For example, there are many overlapped movies on IMDb 1 and Douban, 2 overlapped products on Amazon 3 and Taobao, 4 and overlapped music on last.fm 5 and Yahoo Music, 6 and overlapped videos on Iqiyi 7 and Tencent Video. 8 This provides us opportunities to improve the recommendations quality by enriching data, that is, using a rich user-item interaction data (such as purchases, clicks, and ratings) on one website to improve the quality of recommendations (such as ratings prediction) on another website. The task of using the auxiliary data from other domains to improve the recommendation quality of the target domain is called cross-domain recommendation [11]- [15].
Most of the existing cross-domain recommendation techniques focus on transferring the rating patterns [16]- [18] or latent factors [14], [19]- [21] learned from the auxiliary domain to the target domain as a priors or regularization to improve the accuracy of the recommendation. The rating pattern transfer generally requires the user feedback of auxiliary domain and target domain to be homogeneous. When the user rating feedback of the auxiliary and the target domain is heterogeneous, it is necessary to extract the partial rating pattern [22] or deeper latent knowledge [23] from the auxiliary data's rating pattern for knowledge transfer. The latent factor transfer typically requires that the auxiliary domain and target domain share overlapping users or items. However, due to the constraints of the business privacy policy, it is often difficult to share user behavior data between different e-commerce websites [11]. Therefore, in cross-domain recommendations, it is often more realistic to adopt auxiliary domains that overlap with items in the target domain. Some cross-domain recommendation methods that only share overlapping items between domains have been proposed, and knowledge transfer is achieved by sharing the same item latent factors between domains. For example, the Collective Matrix Factorization (CMF) [24] is proposed for jointly factorizing the target rating matrix and an item-side content matrix with the constraints of sharing the same latent factor of item. Since the users in the auxiliary domain do not overlap with the users in the target domain, the CMF does not constrain the auxiliary domain to share any the latent factors of users with the target domain. However, there may be some users in the auxiliary domain that are similar to the users in the target domain, that is, they have similar preferences on the corresponding items. Selectively transfer the latent factors of users from the auxiliary domain to the target domain, which can further alleviate the sparsity of the target data and help improve the accuracy of user-item ratings prediction in the target domain.
In this work, we aim to help boost the prediction performance of the target domain with numerical ratings (e.g. 5-star rating) by using auxiliary domain with binary ratings (e.g. likes or dislikes) that overlap with the items in the target domain. To effectively utilize the latent factor of extracted from the auxiliary data, we propose a Selective Know Transfer for Cross-domain Collaborative Recommendation (SKT). First, SKT jointly factorizes the auxiliary rating matrix and the target rating matrix with the constraint of sharing the same latent factor of item and selectively sharing the latent factor of user. Second, to ensure positive transfer, we integrate the graph co-regularization of user graph and item graph into the proposed SKT model to maintain the respective intrinsic geometric property of the learned latent factor. The main contributions of this article are summarized as follows.
-We propose that in addition to the latent features of items, the latent features of users with similar preferences between domains can also be used as a bridge for knowledge transfer between domains when only overlapping items are shared between domains. As far as we know, this is the first work of exploring the establishment of user connections between domains with only overlapping items.
-We extend the CMF model [24] by selectively sharing user's latent knowledge between domains and preserving the intrinsic geometry of entities within domains.
In this way, more useful knowledge can be transferred to the target domain while avoiding negative transfer.
-On two real-world datasets, we demonstrate the effectiveness of proposed SKT method at a variety of density levels of 0.01% ∼ 1%, and the proposed method SKT shows better performance compared to several state-ofthe-art baseline methods. The organization of this article is as follows. We first review about some related work in Section II. We then formulate the problem and describe the proposed SKT method in Section III, and conduct extensive empirical studies of our SKT and the state-of-the-art methods in Section IV. Finally, we conclude this article in Section V. The notations used through this article are listed in Table 1.

II. RELATED WORK
In this section, related works about cross-domain recommendations are reviewed.
CF exploits user-item behavior interactions (e.g., ratings) only and therefore suffers from the data sparsity issue. To address this issue, one solution is to transfer knowledge from relevant domains, called cross-domain recommendation [8], [16], [19], [25]. According to the overlapping scenarios of entities between the target and auxiliary domains, existing cross-domain recommendation technologies mainly include the following two categories. assumes that two domains have similar group-level user behavior (i.e., user-item rating pattern, referred to as codebook), and then transfers the extracted codebooks in the auxiliary domain to the target domain to reconstruct the target domain's rating matrix. An extension method of CBT is called the Rating Matrix Generation Model (RMGM) [17], which relaxes the hard membership constraint for user/item groups to soft membership. Gao et al. [26] relaxed the constraint of sharing the same cluster-level rating model between the auxiliary and target domains in the CBT method, but instead achieved the transfer of knowledge by sharing partial cluster-level rating pattern across multiple rating matrices. Since transferring the knowledge extracted from the auxiliary domain directly to the target domain may result in inconsistent knowledge, the CIT method [18] uses domain adaptation technology to map and adjust the potential groups of users and items in the two domains to maintain consistency in the transfer learning process.
Another solution to the scenario where entities between domains do not overlap at all is to exploit social tags as a bridge for knowledge transfer between domains to achieve cross-domain recommendation [27], [28]. Their core idea is that users in different domains may have the same tagging behavior and thus tend to have similar preferences. The shared tags can be used to connect different domains.

B. CROSS-DOMAIN RECOMMENDATION WITH COMPLETELY OVERLAPPING ENTITIES
In the category, it is assumed that there are completely overlapping entities between domains. The key idea of this type of approach is to use the latent factors of overlapping entities shared between domains as bridges to achieve knowledge transfer. Collective Matrix Factorization (CMF) [24] approach jointly factorizes the auxiliary ratings matrix and the target rating matrix concurrently by sharing the item latent factors and hence it enbles knowledge transfer. Coordinate System Transfer (CST) [19] transfers the coordinate system from two binary auxiliary rating matrices to a numerical target rating matrix in an adaptive manner. CST performs well when the target data is not very sparse, while when the target data is extremely sparse, constructing shared latent factors in a collective way may perform better [20]. Transfer by Collective Factorization (TCF) [20], [29] transfers the latent preference of users and the latent feature of items from binary auxiliary rating matrices to a numerical target rating matrix in an collective manner. Transfer Probability Collective Factorization (TPCF) [30] aims to introduce different information from multiple CF tasks into the target domain to alleviate sparsity problems. Interaction-rich Transfer by Collective Factorization (iTCF) [31] extended the CMF [24] method, not only constraining the auxiliary domain and the target domain to share the same item latent factors, but also allowing interaction between the latent factors of users in the two domains. Transfer by Mixed Factorization (TMF) [21] introduces two interest profiles of user to model the user's latent preference based on the iTCF [31] method. Although TCF, iTCF and TMF can handle heterogeneous user feedback, their assumptions are too strict, that is, users and items must have a one-to-one mapping between domains, which limits their application in practice. Embedding and Mapping framework for Cross-Domain Recommendation (EMCDR) [14] uses multi-layer perceptron to capture the nonlinear mapping function across domains, which provides high flexibility for learning the domain-specific features of entities in each domain. Since it is usually expensive to identify cross-domain entity correspondences in real-world scenarios. To this end, Zhao et al. [13] proposed an active transfer learning method for cross-domain recommendation, which constructed the entity correspondence between different domains through an entity selection strategy, and then used it as a bridge for knowledge transfer.
In addition to the two types of methods mentioned above, Zhang et al. [32] explored how to achieve cross-domain knowledge transfer when there are only partially overlapping entities between domains. In reality, due to the constraints of company policies, it is difficult to completely share behavior data of different users between websites [11]. To avoid leakage of user privacy, different websites usually only share overlapping items. For this reason, different from the above work, in this article, we focus on how to selectively transfer the latent knowledge of user from the auxiliary binary rating data to reduce the sparsity of the target numerical rating data when only the overlapping items are shared between the auxiliary and target domains.

III. SELECTIVE KNOWLEDGE TRANSFER FOR CROSS-DOMAIN COLLABORATIVE RECOMMENDATION
In this section, we first define the problem setting, then propose Selective Knowledge Transfer for Cross-domain VOLUME 9, 2021 Collaborative Recommendation (SKT) framework. Lastly, we will show the optimization process of the proposed SKT method, and analyze the convergence and computational complexity of the optimization method.
Here, we assume that R t and R a only share overlapping items, that is, the items in them are aligned, but do not share overlapping users. Let U = {U 1 , U 2 , · · · } and V = {V 1 , V 2 , · · · } denote the cross-domain user and item sets, respectively. Denote D t the target domain, D a the auxiliary domain and τ ∈ {t, a} the domain index. Our task is to predict the missing values of the extremely sparse rating matrix R t in the target domain D t by selectively transferring the rating knowledge of the dense rating matrix R a in the auxiliary domain D a .

B. SKT METHOD
The proposed SKT framework jointly decomposes the auxiliary binary rating matrix and the target numerical rating matrix, with the constraints of sharing item-specific latent factors and selectively sharing user-specific latent factors. In addition, the co-graph regularizations of user and item graphs from two domains are integrated into the collective matrix factorization framework, so that the learned latent factors can preserve their intrinsic geometric property to avoid negative transfer issues. The graph model is shown in Figure 1.

1) WEIGHTED COLLECTIVE MATRIX FACTORIZATION
Given the nonnegative target numerical rating matrix R t ∈ R |U |×|V| and auxiliary binary rating matrix R a ∈ R |U |×|V| , the latent factors of each rating matrix can be extracted by Weighted Nonnegative Matrix Factorization (WNMF) [33]. In WNMF, the nonegative user-item rating matrix R τ ∈ R |U |×|V| can be decomposed into two low rank matrices U τ ∈ R |U |×d and V τ ∈ R |V|×d , such that the reconstruction error of matrix R τ is minimized. WNMF amounts to the following optimizaton problem: where denotes the element-wise product of matrices. · F is the Frobenius norm, for a matrix X ∈ R m×n , the Frobenius norm as Since the target numerical rating matrix R t shares the same item as the auxiliary binary rating matrix R a , the prediction accuracy can be improved by sharing the common latent factors of items underlying these two rating data. Similar to CMF [24], we extend basic WNMF to simultaneously factorize two relevant matrices, which leads to Weighted Collective Matrix Factorization (WCMF) where λ is a trade-off parameter for balancing auxiliary and target data. The above optimization problem can be further expressed as where the tradeoff parameters γ V represent the confidence on the auxiliary data. In formula (3), the item latent factors in the auxiliary domain are transferred to the target domain by regularization term V t − V a 2 F . However, when the target data is extremely sparse, it may not be sufficient to transfer only the latent factors of item learned from the auxiliary domain to the target domain.

2) SELECTIVE TRANSFER THE LATENT FACTOR OF USERS LEARNED FROM AUXILIARY DOMAIN
Although the auxiliary and target domains do not share overlapping users, some users in the auxiliary domain may have similar interactions with users in the target domain on the corresponding items, that is, they have similar preferences. These latent preference information is encoded into the user's latent factor by collective matrix factorization. Adaptively selecting the latent factor of the user in the auxiliary domain that is useful to the target domain, and then transferring them to the target domain, which will further alleviate the sparsity issue of the target data and improve the predictive performance of cross-domain recommendation. To achieve this, we introduce transformation matrix P, while constraining P with row-sparsity, so that transformation matrix P can adaptively select the latent factors of users in U a and transfer them to the target domain. To this end, we integrate the constraint U t = U a P T into framework (3). In addition, to introduce row-sparsity to the transformation matrix P, we propose to impose the l 2,1 -norm structured sparsity regularization on the transformation matrix P. Therefore, we can obtain the following optimization objective where · 2,1 denotes the l 2,1 -norm, for a matrix X ∈ R m×n , the l 2,1 -norm as The above optimization problem can be further expressed as where γ U is a trade-off parameter to represent the confidence on the auxiliary data. γ p is a penalty parameter used to control the impact of regularization term P 2,1 .

3) GRAPH REGULARIZATION
To make the extracted latent factors of user and item maintain the intrinsic geometric structure and thus avoid negative transfer [34], [35], we propose to impose graph regularization constraints on the latent factor matrices U t , V t , U a and V a . The graph regularization of U τ and V τ is defined as follows [36] where (S u τ ) ij and (S v τ ) ij denote the cosine similarity between r τ i· and r τ j· and between r τ ·i and r τ ·j , respectively. They are defined as where r τ i· and r τ ·j denote the i-th row and the j-th column of the user-item rating matrix R τ , respectively. tr(·) denotes the trace of the matrix. L u τ are the graph Laplacian matrix of user graph and item graph, are diagonal matrix.

4) OPTIMIZATION FRAMEWORK
Integrate the graph regularization of into the framework (5), and get the final optimization framework as follows where α U and α V are the graph regularization parameters of users and items in the target domain, respectively. β U and β V are the graph regularization parameters of users and items in the auxiliary domain, respectively.

C. LEARNING THE SKT
We can use an alternate minimization algorithm to optimize the proposed SKT framework. Specifically, we optimize a variable and calculate its update rules while fixing the remaining variables. Repeat the process until convergence.

1) LEARNING U t AND U a
Fixing other variables to solve U t , then the objective function in equation (10) can be expressed as The derivative of J (U t ) with respect to U t is Using the Karush-Kuhn-Tucker complementary condition for the nonnegativity of U t and letting ∂J (U t ) ∂U t = 0, we can obtain Since L u t may take any signs, we replace it with We obtain the following updating rule for learning U t where [·] [·] denotes element-wise division. Similarly, we can obtain the updating rules for learning U a as 2) LEARNING V t AND V a Likewise, fixing other variables to solve V t , then the objective function in equation (10) can be expressed as Using the Karush-Kuhn-Tucker complementary condition for the nonnegativity of V t and letting ∂J (V t ) ∂V t = 0, we can obtain Since L v t may take any signs, we replace it with We obtain the following updating rule for learning V t Similarly, we can obtain the updating rule for learning V a as 3) LEARNING P Fixing other variables to solve P, then the objective function in equation (10) can be expressed as The derivative of J (P) with respect to P is Since P 2,1 is a non-smooth function at zero, we compute its sub-gradient as ∂ P 2,1 ∂P = 2G p P [37], [38], where G p is a diagonal sub-gradient matrix with ith element equal to Using the Karush-Kuhn-Tucker complementary condition for the nonnegativity of P and letting ∂J (P) ∂P = 0, we can obtain [−γ U (U T t U a ) + γ U PU T a U a + γ p G p P] P = 0. Then we obtain the following updating rule for learning P

D. CONVERGENCE ANALYSIS AND TIME COMPLEXITY
Based on the above updating rules for learning latent factors and structured sparse matrix, we can prove that the learning algorithm is convergent. Theorem 1: Updating U t , U a , V t , V a and P sequentially by Equations (11) ∼ (15) will monotonically decrease the objective function in Equation (10) until convergence.
We summarize the learning algorithm in Algorithm 1. The time complexity of SKT and other baseline method are listed in Table 3, where q andq denote the number of observed ratings in the target and auxiliary rating matrix, respectively.
|P| and |N | represent the average number of positive and negative feedbacks by a certain user in the target rating matrix, respectively. K denotes the total number of iterations of [32, Algorithms 1 and 2].
In addition, our proposed algorithm 1 is universal, it can not only deal with heterogeneous user feedback, but also with homogeneous user feedback.

IV. EXPERIMENTS A. DATASETS AND EVALUATION METRICS 1) DATASETS
We adopt two real-world datasets to evaluate the proposed SKT method. The first dataset, Netflix-MovieLens, contains user and aligned movies from two public benchmark sets, namely the Netflix Prize and the MovieLens project. The Netflix 9 rating data contains more than 10 8 rating with value in
Step 4. Update U τ , V τ , P, τ ∈ {t, a}. for iter 1 to K do Step 4.1. Fix V t , U a , V a and P, update U t as show in Eq.(11).
Step 4.2. Fix U t , V t , V a and P, update U a as show in Eq.(12).
Step 4.3. Fix U t , U a , V a and P, update V t as show in Eq.(13).
Step 4.4. Fix U t , U a , V t and P, update V a as show in Eq. (14).
Step 4.5. Fix U t , U a , V t and V a , update P as show in Eq. (15). end for {1, 2, 3, 4, 5}, which are given by more than 4.8×10 5 users on around 1.8 × 10 4 movies. The MovieLens 20M 10 rating data contain 2.0 × 10 7 ratings with values in {0.5, 1, 1.5, . . . , 5}, which are given by more than 1.3 × 10 5 users on around 2.7 × 10 4 movies. We first randomly extract a 5000 × 5000 dense rating matrix R t from Netflix data, and then extract an item side auxiliary data R a of size 5000 × 5000 from the MovieLens data by identifying the movies appearing both in MovieLens 20M and Netflix. Clearly R t and R a share only common items but no users. Similar to [20], [41], we adopt a preprocessing approach on R a by relabeling ratings with value less than 4 in X a as 0 (dislike), and then ratings with value greater than or equal to 4 as 1 (like), to simulate the heterogeneous auxiliary and target domain data.
The second dataset was crawled from an online social network, i.e., Goodreads [42], where users give ratings to books. The Goodreads 11 rating data contains more than 3.1 × 10 7 rating with values {1, 2, 3, 4, 5}, which are given by more 3.0 × 10 5 users on around 1.9 × 10 6 movies. We randomly extract a 10000 × 5000 dense rating matrix R from the Goodreads data, and take the sub-matrices R t = R 1∼5000,1∼5000 as the target rating matrix, and R a = R 5001∼10000,1∼5000 as the item side auxiliary data, so that R t and R a share only common items but not common users. The Goodread dataset also contains user-to-user relationships that are not used by us. Since R t and R a are equivalent to being randomly extracted from the entire dataset, which can ensure that users in both R t and R a do not have too many or too few 10 https://grouplens.org/datasets/movielens/20m/. 11 www.junminghuang.com/datasets/goodreads.tar.gz. user-to-user relationships compared to the distribution in the overall dataset. In other words, R t and R a randomly drawn from the Goodreads dataset will not introduce any skew regarding the user-to-user graph. To simulate heterogeneous auxiliary and target domain data, we adopt a preprocessing approach on R a by relabeling ratings with value less than 4 in X a as 0 (dislike), and then ratings with value greater than or equal to 4 as 1 (like).
In all of our experiments, the target domain rating set from R t is randomly split into training and test sets, R T , R E , with 50% ratings, respectively. R E is kept unchanged, while different number of observed ratings of 2500, 12500, 25000, 125000 and 250000 are randomly picked from R T for training, with different density levels of 0.01%, 0.05%, 0.1%, 0.5% and 1%. The final datasets are summarized in Table 2.

2) EVALUATION METRICS
We adopt the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as the evaluation metrics where r ui andr ui denote the true and predicted rating, respectively. |R E | denotes the number of test ratings. We run 5 random trials when generating the required number of observed ratings from R T , and the averaged results are reported.

B. BASELINES AND PARAMETER SETTINGS 1) BASELINES
We compare our SKT method with the following several related baseline algorithms: • WNMF. Weight Nonnegative Matrix Factorization (WNMF) is a single-domain recommendation method. It decomposes the target rating matrix into the product of two low-dimensional non-negative matrices, which are then used to predict ratings and implement recommendations. The optimization objective of WNMF is shown in formula (1).
• PMF [8]. Probabilistic Matrix Factorization (PMF) is a Matrix Factorization-based method that learns only in the target domain. VOLUME 9, 2021 • GWNMF [36]. Graph Regularized Weighted Nonnegative Matrix Factorization (GWNMF) incorporates constructed user and item graphs into a framework of nonnegative matrix factorization to take advantage of internal and external information. It only learns in the target domain.
• iCMF [24]. Collective Matrix Factorization (CMF) method extends Matrix Factorization MF) by jointly learning the latent factors of item from the user-item rating matrix of multiple domains. The iTCF [31] method is an extension of the CMF method, which introduces richer interactions by sharing both the latent features of item and predictability in two heterogeneous data in a smooth manner. Here, we set the interaction parameter ρ between the user-specific latent features in the iTCF method to be 1 as an approximation of the CMF method. We name this method iCMF.
• Item-CST [19]. Coordinate system transfer (CST) is a transfer learning method in collaborative filtering to transfer the coordinate system from two auxiliary binary rating matrices to a target rating matrix in an adaptive way. Since our problem has only one auxiliary binary rating matrix that overlaps with the items in the target domain, CST can be adapted to our problem by only reserving item-side regularization term in our task, and we named it Item-CST.
• Item-TMF [21]. Transfer by Mixed Factorization (TMF) is the state-of-the-art method for crossdomain recommendation using binary preferences as auxiliary data, which introduces two interest profiles of users to model the user's latent preference based on the iTCF [31] method. Similar to the iTCF method, the TMF method also requires that auxiliary and target domain share overlapping users and items. We can adapt TMF method to our problem by setting the interaction parameter ρ between the user-specific latent feature to be 1. We name this method Item-TMF.
• WNMF-TL. Zhao et.al. proposed an Active Transfer Learning for Cross-System Recommendation [13] method, which can perform cross-domain entity correspondence, and then plug the actively constructed entity correspondence into a general matrix factorization model. In our problem formulation, the cross-domain items are fully one-to-one correspondence, hence we removed the active learning module in the originally proposed method. Moreover, to make a fairer comparison with our SKT method, we use WNMF as the matrix factorization model, and then exploit the items similarity learned from the auxiliary binary rating data as a prior to constrain the items similarity in the target domain. We named this method WNMF-TL.
• KerKT. Kernel-induced Knowledge Transfer (kerKT) is a cross-domain recommendation method based on partially overlapping entities. kerKT uses domain adaptation technology to adjust the feature distribution of overlapping entities between domains, and then uses diffusion kernel completion to correlate non-overlapping entities between domains. Since the items between the domains are completely overlapped in our problem, we adapt KerKT to scenarios where the items between the domains are completely overlapped.
Our SKT method jointly factorizes auxiliary binary rating matrix and target numerical rating matrix with the constraints of sharing the same latent factor of item and selectively sharing the latent factor of user. In addition, we integrate the co-graph regularization of user and item graphs into proposed weighted collective matrix factorization framework to avoid negative transfer.  Table 6.

C. EXPERIMENTAL RESULTS
The experimental results on Netflix-MovieLens and Goodreads are shown in Table 4 and Table 5 respectively. From these results, we can make the following observations: 1) For non-transfer learning methods, the GWNMF method shows superior performance compared to the WNMF method at all density levels. In addition, when the target rating matrix is denser (e.g. ≥ 0.1 for Netflix-MovieLens and ≥ 0.5 for Goodreads), GWNMF tends to perform better than PMF, while PMF beats GWNMF when the target rating matrix becomes sparser (e.g. ≤ 0.05 for Netflix-MovieLens and ≤ 0.1 for Goodreads). The reason is that for the GWNMF method, when the target rating matrix is denser, the neighborhood structure information obtained is more accurate, and the extracted latent factors are more refined. At last, we can see that when the rating matrix in the target domain is very sparse, the non-transfer methods fail to give good recommendations.
2) The prediction performance of Item-CST is not always better than the non-transfer baseline methods at all tasks, especially when the target data becomes sparser, Item-CST performs worse than PMF and GWNMF. This indicates that when the target data is seriously sparse, Item-CST is prone to suffer from negative transfer, therefore it is unstable. A reasonable explanation is that when the target data becomes sparser, the divergence of data distribution between the two domains is greater. The latent factor extracted from auxiliary data is directly adapted to the target data, which will easily encounter negative transfer.
3) The iCMF method performs better than the nontransfer baseline methods and Item-CST at all density levels. Unlike the adaptive knowledge transfer method adopted by Item-CST, iCMF belongs to a collective knowledge transfer method, which is a bi-directed knowledge transfer method with richer interactions. iCMF extracts latent factors by joint matrix factorization, through which the data distribution divergence between the two domains is reduced. Therefore, iCMF can cope with the negative transfer issue better than Item-CST. However, when the target data is extremely sparse, the iCMF method has the limitation of insufficient knowledge transfer.
4) In addition to our proposed SKT method, the Item-TMF method performs best. Item-TMF incorporates virtual user profiles into prediction rules, which can model user preferences more accurately, thereby improving recommendation performance.

5)
We can see that in all cases, WNMF-TL is significantly better than non-transfer learning method of WNMF, which shows that using the items similarity learned in the auxiliary data to constrain the items similarity in the target data can help improve the prediction performance of the model. In addition, WNMF-TL is better than non-transfer method of GWNMF when the density is lower (e.g. ≤ 0.1% for Netflix-MovieLens and Goodreads), while GWNMF beats WNMF-TL when the target rating matrix becomes denser. This is because when the target data becomes denser, the item similarity obtained from the original target rating data is more accurate than the item similarity learned from the auxiliary data.

6)
KerKT performs best when the density is 0.1%, except for SKT. However, kerKT performs worse than other transfer learning methods when the density is lower than 0.1%. The possible reason is that KerKT encountered under transfer or negative transfer when the density is lower than 0.1%.

7)
The proposed SKT can achieve significantly better prediction performance than all the other baseline methods in all cases. Especially when the target rating matrix is extremely sparse, SKT can achieve a greater performance improvement than other baseline methods. Unlike the other three cross-domain recommendation methods, the SKT method can selectively transfer users latent factors from auxiliary domain that do not overlap with users in the target domain. In addition, SKT integrates the intradomain entity similarity information from the target and auxiliary domains, through which positive transfer can be guaranteed.

D. PARAMETER ANALYSIS
In this section, we will describe the hyper-parameter tuning process in controlling their contributions. There are 9 hyper-parameters in the proposed SKT: λ, d, γ V , γ U , γ p , α U , α V , β U and β V . Using different trade-off parameters λ, the performance of the SKT method is relatively stable. In addition, for the latent factor d, setting it too large will VOLUME 9, 2021  increase the time complexity of SKT, and setting it too small will result in too few latent factors of user that can be selected, which will reduce the performance of SKT. To simplify the problem, we fixed the parameter λ = 1, d = 20 and let  Figure 2.  Table 6. To analyze the parameter γ V , we set γ U , γ p , θ and δ to the best parameter values for different density levels, as shown in Table 6. From Figure 2(a), we can see that RMSE changes with different settings for γ V . The parameter reflects the influence of sharing the latent factors of items across domain on matrix factorization. When the density level is lower than or equal to 0.1%, γ V does have a significant influence on RMSE, while when the density level is higher than or equal to 0.5%, γ V doesn't influence significantly on RMSE. In our experiment, to achieve the best prediction performance, when the density level is equal to 0.01%, we set γ V = 100; when the density level is higher than or equal to 0.05% and lower or equal to 0.1, we set γ V = 50; and when the density level is higher than or equal to 0.5%, we set γ V = 1. Similarly, to analyze the parameter γ U , we fix the remaining four parameters, as shown in Table 6. The parameter γ U reflects the influence of selectively transferring the latent factors of users learned from auxiliary domain on matrix factorization. From Figure 2(b), we can also see that the more sparse the target data, the more significant the influence of γ U on RMSE. In our experiment, to obtain better prediction results, we set γ U = 0.1 when the density is equal to 1%, while when the density is lower than or equal to 0.5%, we set γ U = 1. We use the same method to analyze the parameters γ p , θ and δ. The parameter γ p reflects the influence of applying structured sparsity constraints to P on matrix factorization. In Figure 2(c), we can see that when the density level is equal to 0.01%, the influence of γ p on RMSE is significant, and when the density level is greater than or equal to 0.05%, the influence of γ p on RMSE is not significant. In our experiment, when the density level is lower than or equal to 0.1%, we choose γ p = 10, and when the density level is greater than or equal to 0.5%, we choose γ p = 50. From Figure 2(d) or Figure 2(e), we can see that the parameters θ and δ have significant influence on RMSE. These two parameters reflect the influence of the similarity between the entities in the auxiliary and target domains on the matrix factorization. In our experiment, when the density level is equal to 0.01%, we set θ = 1, δ = 1; when the density level is higher than or equal to 0.05% and lower than or equal to 0.1%, we set θ = 0.1, δ = 0.1; and when the density level is higher than or equal to 0.5%, we set θ = 0.1, δ = 0.01.

V. CONCLUSION AND FUTURE WORK
In this article, we present a novel cross-domain recommendation method with heterogeneous feedbacks for knowledge transfer, called SKT. Specifically, SKT can not only directly transfer the latent features of the items from the auxiliary domain, which only shares overlapping items with the target domain, but also selectively transfer the latent preferences of users from the ones. Furthermore, to avoid negative transfer, we integrate the similarity between entities of intradomain from target and auxiliary domain into SKT. Experimental results show that the proposed SKT method achieves best performance compared to seven non-transfer learning and cross-domain recommendation methods.
For future work, we plan to extend our proposed method to scenarios where there are only a few cross-domain entity correspondences or no cross-domain entity correspondences. In addition, there are still some interesting problems to be explored. For example, how does the sparsity of auxiliary data influence the prediction performance? And when there are several auxiliary domains available, how to choose the best auxiliary domain?