Differentiable Ranking Metric Using Relaxed Sorting for Top-K Recommendation

Most recommenders generate recommendations for a user by computing the preference score of items, sorting the items according to the score, and filtering top-<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-items of high scores. Since sorting is not differentiable and is difficult to optimize with gradient descent, it is nontrivial to incorporate it in recommendation model training despite its relevance to top-<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> recommendations. As a result, inconsistency occurs between existing learning objectives and ranking metrics of recommenders. In this work, we present the Differentiable Ranking Metric (DRM) that mitigates the inconsistency between model training and generating top-<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> recommendations, aiming at improving recommendation performance by employing the differentiable relaxation of ranking metrics via joint learning. Using experiments with several real-world datasets, we demonstrate that the joint learning of the DRM objective and existing factor based recommenders significantly improves the quality of recommendations.


I. INTRODUCTION
To make top-K item recommendations, most recommender systems generate the relevance score of items with respect to a user and filter the top-K items of high scores. Thus, ranking, or equivalently, sorting items serves an important role in top-K recommendations. However, conventional model training procedures for recommenders have a limitation to accurately reflect ranking natures of top-K recommendations because the sorting operation is not differentiable, and incorporating it into end-to-end model training is challenging. Instead, most model based recommenders usually exploit surrogate objectives such as mean squared error or log-likelihood, which do not take into account ranking metrics. As has been noted in several works [1]- [3], optimizing such objectives that do not consider ranking natures of top-K recommendations does not always guarantee the best performance.
The associate editor coordinating the review of this manuscript and approving it for publication was Okyay Kaynak .
There are several ranking-oriented objectives, including pairwise and listwise objectives; however, neither objectives are known to fit well with top-K recommendations. Pairwise objectives [4]- [7] enable a recommender to learn users' preferences by casting a recommendation task as a binary classification, predicting whether a user prefers an item to another item. For top-K recommendations, classifying top-K ranked items is important, but the pairwise objectives do not emphasize the relative difference between top-K ranked items and non-top-K ranked items. While several listwise objectives [3], [8], [9] have been recently proposed to model ranking natures of recommendations, adopting listwise objectives is limited in practice especially in large-scale recommenders due to their high computational complexity caused by modeling permutations with Plackett-Luce distribution [10].
In this work, we address such limitations in existing ranking-oriented objectives for model based recommenders, aiming to bridge the gap between the existing learning objectives commonly used for training recommenders and the ranking nature of top-K recommendation tasks. Inspired VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ by differential programming [11], [12], we present the differentiable ranking metric (DRM), which is a differentiable relaxation scheme of ranking metrics such as Prec@K or Recall@K. By employing the differentiable relaxation scheme for sorting operation [11], DRM expedites direct optimization of ranking metrics for recommendation models.
To do so, we first reformulate the ranking metrics in terms of permutation matrix arithmetic forms and then relax the nondifferentiable permutation matrix in the arithmetic forms to a differentiable row-stochastic matrix. This reformulation and relaxation allows us to optimize ranking metrics in a differentiable form of DRM. Using DRM as an optimization objective renders end-to-end recommendation model training highly consistent with ranking metrics. Moreover, DRM can be readily incorporated into existing recommenders via joint learning with their objectives without modifying their model structure, thus preserving their benefits. Specifically, we adopt two state-of-the-art model based recommenders, WARP [5], [13] and CML [6], wherein the joint learning can be used.
Our experiments demonstrate that the DRM objective significantly improves the performance of top-K recommendations on several real-world datasets in terms of ranking metrics, compared with several other recommendation models.
Furthermore, we demonstrate how to adapt other ranking metric driven objective (RMDO) schemes, e.g., Approx-NDCG, for the model based recommenders via joint learning. While RMDO schemes have been studied by the information retrieval community [7], [8], [14], [15], the joint learning approach by those has not been fully investigated for factor based recommenders. To the best of our knowledge, our work is the first to incorporate joint learning with relaxed ranking metrics to factor based recommenders.
The contribution of this work is summarized as follow: • We propose the DRM objective that alleviates the misalignment between the training objective and evaluation metrics in top-K recommendation models.
• We present a joint learning approach to train factor based recommenders using the DRM objective.
• Through extensive experiments, we empirically show that (1) our approach outperforms other state-of-the art top-K recommendation models and (2) the DRM objective fits better than other ranking metric driven objectives for top-K recommendation tasks.

II. PRELIMINARIES
Given a set of M users U = {1, 2, . . . , M }, set of N items V = {1, 2, . . . , N }, and a set of interactions y u,i for all users u in U and all items i in V, a recommendation model is learned to predict preference or scoreŷ u,i ∈ R of user u to item i. We use predicted preference and predicted score interchangeably to denoteŷ u,i . We use binary implicit feedback y u,i such that y u,i = 1 if user u has interacted with item i, and 0 otherwise. Note that we only consider this binary feedback format in this work, while the approach can be generalized for other implicit feedback settings.
For user u, we use i to represent positive items that u has interacted with, and j to represent negative items that u has not. In addition, we use V u to represent a set of positive items for user u, and use y u for its bag-of-words notation, i.e., a column vector [y u,1 , y u,2 , . . . , y u,n ] T . Similarly, we useŷ u to represent the vector of predicted scores of items, i.e., [ŷ u,1 ,ŷ u,2 , . . . ,ŷ u,n ] T .

A. OBJECTIVES FOR RECOMMENDATION MODELS
In general, the objectives of recommenders are categorized into pointwise, pairwise, and listwise.
Pointwise objectives maximize the prediction accuracy independently from the errors of item rankings. While several pointwise objectives, such as mean squared error and crossentropy, are commonly used, those objectives are known to have limitations in that small errors of the objectives do not always lead to high-quality recommendations [16].
In the recommender system domain, pairwise objectives have gained popularity because they are more closely related to top-K recommendations than pointwise objectives. Model training with pairwise objectives enables a recommender to learn users' preferences by casting the recommendation task as a binary classification, predicting whether user u prefers item i to item j. For example, Bayesian personalized ranking [4] minimizes negative log-likelihood of the probability that user u prefers item i to item j by where σ (·) is a sigmoid function, and σ (ŷ u,i −ŷ u,j ) is interpreted as the probability user u prefers item i to item j. Another popular pairwise objective is weighted hinge loss, where [x] + = max(0, x), and µ is a margin value. The weight ui introduced in [5], [13] enables pairwise objectives to emphasize the loss of positive items at lower ranks; the value of ui is chosen to be larger if the approximated rank is lower for a positive item i. Some choices of ui are known to make the optimizing hinge loss closely related to maximizing discounted cumulative gain [17], [18]. For top-K recommendations, listwise objectives have been recently explored by a few research works [3], [8], [19]. In general, listwise objectives are based on Plackett-Luce probability distribution of list permutations, i.e., where φ(·) is an arbitrary smoothing function, e.g., φ(·) = exp(·). These listwise objectives aim to maximize the probability of correctly ordered permutations by minimizing the log-likelihood or cross-entropy. However, they have a limitation of high computational complexity in that the complexity of calculating the permutation probability grows exponentially as the number of items in the dataset increases.

B. RANKING METRICS FOR TOP-K RECOMMENDATIONS
Here, we introduce common evaluation metrics for top-K recommendations. To explain ranking metrics, we represent the list of items sorted by the predicted scores for user u as π u , and the item at rank k as π u (k). In addition, we define the Hit(u, k) function that specifies whether π u (k) is in the validation dataset V u . Specifically, where I[statement] is the indicator function, yielding 1 if the statement is true and 0 otherwise. K -Truncated Precision (Prec@K ) and Recall (Recall@K ) are two of the most widely used evaluation metrics for top-K recommendations. Prec@K specifies the fraction of hit items in V u among recommended items. Recall@K specifies the fraction of recommended items among the items in V u . Notice that both metrics do not take into account the difference in the ranking of recommended items, while they collectively emphasize top-K -ranked items by counting only the items up to K -th ranks.
Recall@K (u, π u ) = 1 On the other hand, K -Truncated Normalized Discounted Cumulative Gain (NDCG@K ) and Average Precision (AP@K ) take into account the relative ranks of items by weighting the impact of Hit(u, k). NDCG@K specifies a normalized value of DCG@K , which is divided by the ideal DCG IDCG@K = max π u DCG@K (u, π u ).
The k-truncated AP is defined as where AP can be viewed as a weighted sum of Hit for each rank k = 1, 2, . . . , K , weighted by Prec@k. We can represent the aforementioned metrics in a unified way as O(K ) conditioned on weight functions w(k, K ), where (u, π u ) of the metrics is omitted for simplicity.

III. PROPOSED METHOD
In this section, we briefly review a factor based recommender with weighted hinge loss and a ranking metric based objective in terms of vector arithmetic as well as how to relax the aforementioned ranking metrics to be differentiable, which can be optimized using gradient descent for training factor based recommenders. We call this relaxed approach DRM. We then describe our joint learning approach to train factor based recommenders using DRM. We also show that this learning approach can be used to incorporate other ranking metric driven objectives in training factor based recommenders.

A. FACTOR BASED RECOMMENDERS WITH HINGE LOSS
Factor based recommenders represent users and items in a latent vector space R d , and then formulate the preference score of user u to item i as a function of two vectors, user vector α u and item vector β i . Dot product is a common method for mapping a pair of user and item vectors to a predicted preference [4], [20], [21]. In [6], the collaborative metric learning (CML) embeds users and items in the Euclidean space and defines its score function using the negative value of L2 distance of two factors.
Our model uses either the dot product or L2 distance of user vector α u and item vector β i as a score function.
Note that · is the L2 norm. Having the score functions above, we update our model using weighted hinge loss in Equation (2). Following [6], we calculate the weight ui of hinge loss by For each update, |J | negative items are sampled from V − V u and used to estimate ui .

B. DIFFERENTIABLE RANKING METRICS
Sorting and ranking items can be considered as permutation of items. An N -dimension permutation corresponds to a vector p = [p 1 , p 2 , . . . , p N ] T where p i ∈ {1, 2, . . . , N } and p i = p j if i = j. For each vector p, we then have its permutation matrix P ∈ {0, 1} n×n and its element can be described as For example, a permutation matrix P = In [11], they propose continuous relaxation of sorting (namely NeuralSort), which represents a sorting operation in a permutation matrix, and then relaxes the matrix into a continuous form. Specifically, sorting a vector s = [s 1 , s 2 , . . . , s N ] T in descending order can be represented in a permutation matrix such as where 1 is the column vector having 1 for all elements and A s is the matrix such that A i,j = |s i − s j |. Then, softmax is used instead for relaxing the permutation matrix, i.e., where τ > 0 is a temperature parameter. Larger τ values make each row of the relaxed matrix flatter. This transformation of NeuralSort renders the permutation matrix in Equation (13) relaxed into a unimodal row-stochastic matrix, realizing the differentiation operation for sorting of realvalue elements. Equation (14) is continuous everywhere and differentiable nearly everywhere with respect to the elements of s. Furthermore, as τ → 0 + ,P (s) reduces to the permutation matrix P (s) . The k-th row P (s) k of the permutation matrix P (s) is equal to the one-hot vector representation of the k-th ranked item. Thus, we can reformulate Hit (Equation (4)) using the dot product of y u and P We then obtain the representation of ranking metrics in Equation (9) in terms of vector arithmetic using Equation (15).
By replacing P k , we obtain the differentiable relaxed objective, which can be used for optimization using gradient descent.
We empirically find that the below equation, namely DRM (objective), is more stable in model training.
Note that minimizing Equation (18) is equivalent to maximizing Equation (17) becausẽ where a u = K k=1 w(k, K )P k (ŷ u ) for each user u.

C. JOINT LEARNING WITH DRM
We incorporate the proposed objective in Equation (18) into the model learning structure via joint learning of two objectives, where λ is a scaling parameter for controlling the effect of L DRM and model is a regularization term of model parameters. L model depends on the chosen base model. Specifically, we adopt two pairwise, hinge loss, factor based models, WARP and CML. Therefore, L model corresponds to a hinge loss L hinge . Two regularization schemes are adapted. We first keep latent factors of users and items within the unit hypersphere, i.e., θ ≤ 1 where θ is either α or β. For the model exploiting negative L2 distance as a score function, we also adopt covariance regularization [22] between all pairs of latent factors using a matrix C i,j = 1 . . , α M , β 1 , β 2 , . . . , β N } and µ is the average vector of all user and item factors. We define a regularization term L C as L C = 1 | | ( C f − diag(C) 2 ) where · f is the Frobenius norm. In the case that L2 distance is used, we add this regularization to Equation (20) with a control parameter λ C , i.e., model = λ C L C . The learning procedure for DRM is summarized in Algorithm 1.
Similar to negative sampling [4], [6], [23], [24] used in factor based models using gradient descent updates, we follow a sampling procedure for positive items. We construct each training sample to contain a user u, a set of positive items I of size ρ and a set of negative items J of size η. Specifically, we construct a list of items y u of size (ρ + η) where first ρ elements are from I, and next η elements are from J . Using Equation (10), the list of predicted scoresŷ u is constructed similarly, y u = [y u,i 1 , y u,i 2 , . . . , y u,i ρ , y u,j 1 , . . . , y u,j η ] T , The DRM objective has quadratic complexity to the number of items to build a relaxed permutation matrix in Equation (14) for a user. Model training can become too time consuming if all items in V are used in the objective in Equation (20). Instead, we sample a fixed number of positive and negative items for each user. Negative sampling draws n neg items along with one positive item, as normally employed for learning recommendation models [4], [6], [23], [24]. Similarly, we sample (n pos positive items and n neg items Update θ with θ using Adagrad Optimizer θ ← θ/ max(1, θ ) end for until Converged to construct a list of items containing both positive and negative items. In this work, we set n neg = 17 and n pos = 3.

D. JOINT LEARNING WITH OTHER OBJECTIVES
Optimizing ranking metrics is considered an important problem in information retrieval [25], and a number of ranking metric driven objectives (RMDOs) have been proposed [7], [8], [14], [15]. ApproxNDCG and LambdaLoss employ the Bradley-Terry pairwise rank probability to have the differentiable proxy for ranks in NDCG. ListNet uses Plackett-Luce [10] distribution to represent distributions of ranks in NDCG. These prior works commonly view item ranks as nondifferentiable variables and relax them using probability distributions, while DRM treats item ranks as matrices and relaxes the matrices to be differentiable.
Besides, incorporating RMDOs into existing recommendation models without modifying the model structure has not gained much attention. We show that the model update strategy in Section III-C can be used for that purpose. For example, ApproxNDCG [7] relaxes the rank of an item i (rank(i)) to be continuous and differentiable, i.e., where σ (·) is a sigmoid function. Note that τ is a temperature parameter as in Equation (14). Relaxation of ranking metrics is realized by plugging the relaxed rank, rank(i), into ranking metrics (Equation (9)). In the same vein of the DRM-driven joint learning in Equation (20), it is possible to use aggregated relaxed NDCG over users as training objectives in joint learning.
Similarly, other RMDOs with gradient update can be applied for the joint learning scheme.

IV. EMPIRICAL EVALUATION
In this section, we evaluate our DRM-driven joint learning scheme in comparison with state-of-the-art recommendation models and other joint learning objectives.

A. EXPERIMENTAL SETUP
We validate our approach and baselines on four real-world user-item interaction datasets.
We treat each like as a user-item interaction.
• Epinion [26] contains 5-star rating reviews of customers on products from a web service. We view each rating as a user-item interaction.
• ML-20M [27] contains the movie ratings from a movie recommendation service. Rating ranges from 0.5 to 5.0 in 0.5 increments. We treat ratings as positive interactions and exclude the ratings less than four.
• Melon 2 contains playlists from a music streaming service. To be consistent with the implicit feedback setting, we treat each playlist as a user, and songs in a playlist as a list of positive items of the user. The dataset statistics are summarized in Table 1.

TABLE 1.
Dataset statistics: # users, # items, and # interactions denote the number of users, items, and interactions, respectively; avg. row and avg. col denote the average number of items that each user has interacted with, and the average number of users who have interacted with each item, respectively. Density denotes the interaction matrix density (i.e., Density = # interactions / (# users × # items).

1) EVALUATION PROTOCOL
We split the dataset into training, validation, and test datasets in 70%, 10%, and 20% portions, respectively. We train each model using the training dataset to find the best hyperparameter settings, evaluating with the validation dataset. We use Recall@50 for model validation. We then train the model five times with the best hyperparameter settings using the training and validation datasets, evaluate the model using the test dataset, and report the average performance in evaluation metrics for all experiments. We use mean AP@10 (MAP@10), NDCG@10, Recall@50, and NDCG@50 averaged for all users with fewer than five interactions in the training dataset. We also conduct Welch's T-test [28] on our results and denote the best results with p-value lower than 0.01 in boldface.

2) BASELINE MODELS
We compare DRM with the following baselines. Note that except for SLIM, all baselines fall in the category of factor based recommenders.
• SLIM [29] is an item-item collaborative filtering algorithm in which the item-item similarity matrix is represented as a sparse matrix. It poses L1 and L2 regularization on the item-item similarity matrix.
• CDAE [30] can be seen as a factor based recommender where user factors are generated via an encoder. The encoder takes embeddings of items the user has consumed and the embedding of the user as input and returns the latent factor of the user. We use a neural network with no hidden layers as described in the original paper implementation.
• WMF [21] uses the mean squared error as the objective function and minimizes it using alternating least squares.
• BPR [4] exploits the pairwise sigmoid objective. • SQLRank [9] views a recommendation problem as sorting lists. Then it optimizes the upper bound of log-likelihood of probabilities of correctly sorted lists.
• SRRMF [31] is a factor based recommender that interpolates unobserved feedback scores to be nonzero to treat negative items differently.
• CML [6] models user-item preferences as a negative value of the distance between user and item vectors.

3) HYPERPARAMETER SETTINGS
We tune hyperparameters of the baselines and DRM using grid-search. For SLIM, we set the search space to be We use open source implementations of models and objectives: ApproxNDCG, LambdaLoss, and ListNet [8], WMF and BPR, 3 SLIM, 4 SQLRank 5 and SRRMF. 6 We implement CDAE, CML, WARP and DRM using Python 3.7.3 and PyTorch 1.5.0. We run our experiment on a machine with an Intel(R) Xeon(R) CPU E5-2698, 160G memory, and an NVIDIA Tesla V100 GPU with CUDA 10.1. Table 2 shows the recommendation performance of the baseline and DRM variants (i.e., DRM dot , DRM L2 ) on different datasets. We record the performance of SQLRank only upon SketchFab and Epinion datasets, because it is hardly possible for us to train SQLRank successfully with large datasets, ML-20M and Melon, due to the huge training time.

B. OVERALL EVALUATION
When comparing models with the same structure, which use the same score function or share the objectives, we observe that DRM consistently outperforms the respective models with the same structure. For example, WARP and DRM dot use the same score function and share hinge loss, but DRM dot consistently outperforms WARP for all datasets. This pattern recurs for CML and DRM L2 , whose score function is L2 distance and shares hinge loss. This clarifies that the proposed DRM objective leads factor based models to make better recommendations by exploiting the top-K recommendation nature.
We observe DRM L2 significantly improves performance over the baselines by up to 15.5% on SketchFab, Epinion, and Melon datasets in most cases except for Recall@50 on SketchFab and Epinion. Additionally, we observe that the performance gain is significant in NDCG@10 and MAP@10, except for ML-20M. DRM tends to perform better than the baselines, especially compared with other models with a similar structure (e.g., DRM dot outperforms WMF and WARP, all exploiting dot score function).
Notably, among the models using hinge loss, the models using dot product as a score function (WARP and DRM dot ) show better performance than the models using L2 distance (CML and DRM L2 ) on SketchFab and Epinion datasets, which are relatively small datasets. In contrast, the models exploiting L2 distance as a score function outperform the models using dot product on larger datasets, ML-20M and Melon.

C. BENEFITS OF DRM OBJECTIVE
In Table 3, we show the effect of the DRM objective in Equation (20) by adjusting the regularization term λ that controls the intensity of the DRM objective. If λ is set to be 0, DRM L2 shares the same learning objective and score function with CML, and DRM dot becomes WARP. When λ is ∞, we represent the models trained only with the DRM objective without a pairwise hinge loss.
Our DRM objective achieves significant performance enhancement, especially when the dot score function is used (e.g., the performance gain is up to 10% for Recall@50 and NDCG@50 in ML-20M dataset). It is worth noting that the joint learning does not require any additional training data or structural modification of the models. These results clarify the benefits of employing the proposed DRM objective.
Although the DRM objective is helpful for the model performance, its effect decreases when the λ becomes too large; the performance tends to degrade since λ becomes greater than 10.0 for both models. The DRM models trained without the pairwise hinge objective perform quite well compared to the other baselines; however, the best performance is achieved via joint learning using both pairwise hinge loss and listwise DRM loss.
The proposed objective gives the models a more aligned objective with the goal of model training. However, the proposed objective alone yields suboptimal performance. Our model balances between pairwise hinge objective and listwise objective for optimization, via a form of joint learning. This structure of joint learning makes our DRM scheme readily applicable for modern recommenders, requiring no modification on the model structure.

D. ABLATION STUDY
The training procedure of DRM requires sampling ρ positive items. In Figure 1, we experiment on SketchFab and Melon datasets to see the relation between the number of positive samples ρ and recommendation performance. As the number of positive items ρ increases, NDCG@10 improves; however, this positive effect often decreases gradually for large ρ beyond some point, e.g., ρ = 3 for SketchFab. We verify that sampling a fraction of positive items and negative items is effective for fitting DRM models, without requiring entire user history.
In Figure 2, we evaluate the models jointly learned with different RMDOs including our proposed DRM objective and other RMDOs such as ApproxNDCG [7], LambdaLoss [14], and ListNet [8] on ML-20M dataset, as described in Section III-D. We observe that the models with such RMDOs other than the DRM objective can have performance gain  over the respective base models, while our DRM model yields better performance than those. This observation is consistent with the performance benefit by Cofactor joint learning [32] that uses co-occurrence matrix factorization embeddings [33] combined with a given model objective without using additional data. In Table 4, we also evaluate the training time of the joint learning approach. We represent the average time of 20 runs of model training on ML-20M. While the joint learning approach usually increases the training time compared to the original model (WARP or CML in the dashed column), DRM performs faster in training than the other RMDOs for joint learning. As shown in Figure 2 and Table 4, DRM demonstrates not only its performance improvement but training efficiency in joint learning compared to other RMDOs.

V. RELATED WORK
The optimization of ranking metrics is considered an important problem in the domain of information retrieval [25]. A common approach to the problem is to optimize nondifferentiable ranking metrics using differentiable surrogates, such as the upper bound of the ranking metrics. In [7], Approx-NDCG was introduced to derive the relaxation of Hit for AP and NDCG with a unified view of ranking metrics and achieve the optimization of relaxed ranking metrics. Furthermore, ApproxNDCG was adopted in deep feed-forward networks for search engine optimization in [34]. In recommender systems, we present a new ranking metric driven objective DRM that is efficiently trained and yields competitive recommendation performance.
Listwise Collaborative Filtering [3] addresses the misalignment issue between the cost and objective of K -Nearest neighbors recommenders. Specifically, this work exploits the similarity between two lists for K -Nearest neighbors recommenders. Our work is complementary since we support factor based models that are commonly used in recent user services.
SQLRANK [9] is a factor based recommender minimizing the upper bound of negative log-likelihood of the permutation probability. Despite its theoretical soundness, we were not able to achieve high-performance models in our experiments with SQLRANK. In [31], a similar pattern to our experiments is also found. Specifically, SRRMF [31] shows that merely treating missing ratings as zero values leads to suboptimal behaviors and thus it exploits respective approximated ranks to smooth negative feedbacks to nonzero values. It is conjectured that DRM achieves competitive performance since it makes use of both pairwise and listwise objectives, differently from SQLRANK that focuses on the theoretical foundation of listwise objective based model learning.
Cofactor [32] is the most similar model to ours in that a new objective was explored to train factor based models through joint learning without additional context information. Specifically, Cofactor uses word2vec-like [33], [35] embedding techniques to incorporate item co-occurrence information into a matrix factorization model. Unlike Cofactor with such word2vec based objective, our work concentrates on directly optimizing ranking metrics such as Precision through differentiable relaxation of the ranking metric itself.

VI. CONCLUSION
While learning based recommender systems are popular, their performance in terms of personalized ranking might be suboptimal because they are not directly optimized for top-K recommendation tasks. In this work, we proposed DRM that enables sorting-embedded end-to-end training, and presented the joint learning of the DRM objective with existing factor based recommenders for improving top-K recommendation performance. DRM utilizes the relaxation of sorting to continuous operation, which is suitable for a high-performance objective that can directly maximize ranking metrics. We experimentally demonstrate that our approach to several real-world datasets achieves better recommendation performance than other state-of-the-art recommender methods.
Our future work is to apply direct ranking optimization to deep learning based recommendation models such as Autoencoder for various recommendation scenarios.
HYUNSUNG LEE was born in Seoul, South Korea. He received the B.S. degree in computer engineering from Sungkyunkwan University, Suwon, South Korea, in 2019, where he is currently pursuing the master's degree in electrical and computer engineering. His research interests include recommendation systems, cluster orchestration, and reinforcement learning.
SANGWOO CHO is currently pursuing the bachelor's degree with the Department of Mathematics, Sungkyunkwan University, Suwon, South Korea. His research interests include multi-armed bandit, theoretical machine learning, and differential programming.
YEONGJAE JANG received the B.S. degree in mathematics from Sungkyunkwan University, Suwon, South Korea, in 2017, where he is currently pursuing the master's degree with the Department of Mathematics. He has also been working as a Data Engineer, since April 2021. His research interests include theoretical machine learning and service optimization. From 2008 to 2018, he worked with Samsung Research, Samsung Electronics, as a Principal Engineer and the Vice President. Since 2018, he has been an Assistant Professor with the Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, South Korea. His research interests include intelligent application, data-driven monitoring, cloud computing, and networked cyber-physical systems.