A Survey of Matrix Completion Methods for Recommendation Systems

In recent years, the recommendation systems have become increasingly popular and have been used in a broad variety of applications. Here, we investigate the matrix completion techniques for the recommendation systems that are based on collaborative filtering. The collaborative filtering problem can be viewed as predicting the favorability of a user with respect to new items of commodities. When a rating matrix is constructed with users as rows, items as columns, and entries as ratings, the collaborative filtering problem can then be modeled as a matrix completion problem by filling out the unknown elements in the rating matrix. This article presents a comprehensive survey of the matrix completion methods used in recommendation systems. We focus on the mathematical models for matrix completion and the corresponding computational algorithms as well as their characteristics and potential issues. Several applications other than the traditional user-item association prediction are also discussed.


Introduction
Technology has given corporations and consumers more analytical capabilities than ever before, largely due to the birth of big data, and the possibilities that spring up from its utilization.Users can easily answer almost any encountered question, and in many cases, can answer unexpected questions.Personal mobile devices can collect data on every communication a person makes, every image a person captures or receives, every video a person records or receives, and every online transaction a person makes.
More importantly, corporations can now store all the needed information.This is of incredible value to such corporations because the entire activities of a person can inform them on his/her particular daily habits, and this can be aggregated from an entire group.On the other hand, the huge amount of data also makes it difficult for the users to make decisions that best fit their needs.A similar difficulty is presented in the corporations providing commodities and services, as it becomes difficult to process the data to understand the user behaviors.
Fortunately, the recent advances in the field of recommendation systems (a.k.a.recommender systems or recommender engines), a sub-field of machine learning, have provided the capability of making predictions based on the past activities of a user or his/her associations with other users' behaviors.Many computational algorithms have been developed for recommendation systems, which can predict the future interests of users based on past preferences considering how much and how little a user may prefer one item over another, such as user rankings.Recommendation systems have attracted much attention in both research and practice, since they can narrow complex and difficult decisions into a few recommendations.The recommendation system techniques have been applied in diverse fields, including movies [1] , music [2] , television [3] , books [4] , e-learning [5] , web search [6] , jokes [7] , news [6] , bioinformatics [8,9] , and engineering [10] .
Generally, a recommendation system is a subset of the information filtering systems, whose goal is to predict the rating a user would give to an item of commodity.The recommendations are typically made through either content-based filtering or collaborative filtering approaches.The content-based filtering approaches utilize a set of discrete features that characterize a commodity and build a user profile that indicates the items the user liked in the past.Then, items with similar properties are recommended.Instead of using item features and user profiles, the collaborative filtering approaches produce recommendations based on a user as well as other users' past behaviors.The fundamental assumption under collaborative filtering is that if the users share similar ratings in the past on the same set of items, then they would likely rate the other items similarly.Content-based filtering and collaborative filtering can be combined to build hybrid recommendation systems, which often demonstrate better recommendation precision than pure recommendation approaches.
In literature, a few surveys overview different aspects of recommendation systems.Bobadilla et al. [11] presented the evolution of recommendation systems.Kunaver and Požrl [12] reviewed the work done in the area of recommendation diversity.Burke [13] discussed the implementation issues in hybrid recommendation systems.Desrosiers and Karypis [14] provided a survey on recommendation methods based on neighborhood information.He et al. [15] emphasized on the influences of human factors in recommendation systems.Campos et al. [16] developed a review on recommendation approaches dealing with temporal context information.Yang et al. [17] investigated how social network information can be adopted by recommendation systems.Klašnja-Milicevic et al. [18] studied recommendation systems for online-based education and learning.Yera and Martínze [19] examined the fuzzy tools used in recommendation systems.Recently, Kotkov et al. [20] considered serendipity within recommendation systems.In this article, we focus on the matrix completion methods in collaborative filtering approaches.This is because the collaborative filtering problem can often be modeled as a matrix completion problem, whose goal is to fill out the unknown values where the users are not inclined to certain items.We overview the mathematical models for matrix completion used in recommendation systems.We then survey the computational algorithms designed for these models, analyze their characteristics, and discuss the potential issues.
The rest of this survey article is organized as follows.In Section 2, the matrix completion problem and low-rank assumption are discussed.Various matrix completion models are analyzed in Section 3, and the computational algorithms considering these models are described in Section 4.Then, in Section 5, the uses of recommendation systems based on matrix completion on several applications other than traditional user-item association predictions are discussed.Finally, Section 6 summarizes our conclusions and research directions.

Matrix Completion Problem
A typical collaborative filtering scenario in recommendation systems can be modeled as a matrix completion problem.Given a list of m users fu 1 ; u 2 ; : : : ; u m g and n items fi 1 ; i 2 ; : : : ; i n g, the preferences of users toward the items can be represented as an incomplete m n matrix A, where each entry either represents a certain rating or is unknown.The ratings in A can be explicit indications, such as scores given by the users in scales 1 5 or ordinal favorability (e.g., strongly agree, agree, neutral, disagree, and strongly disagree).These ratings can also be implicit indications, e.g., item purchases, website/store visits, or link click-throughs.It is generally assumed a user rates a specific item only once.As a result, recommendations can be made by filling out the unknown entries and then ranking them according to the predicted values.Denoting as the complete set of N entries in A with known ratings, the general matrix completion problem is defined as finding a matrix R such that R ui D A ui ; for all entries .u;i / 2 .In addition, we denote N as the complement set to , and P .A/ as an orthogonal projector onto which is an m n matrix with the known elements of A preserved and the unknown elements as 0 s [21] .However, since the number of known Big Data Mining and Analytics, December 2018, 1(4): 308-323 entries is less than the overall number of entries, there exist infinitely many solutions.Nevertheless, it is commonly believed that only a few latent factors [22] influence how much a user likes an item.For example, studies show that the attributes of actor/actress, director, and decade contribute most to a user's preference to a movie.This relatively small number of influence factors compared to the total number of users or items in the rating matrix A provides a guiding framework to fill in the missing values and to select the correct complete matrix.This corresponds to the low-rank assumption in matrix completion, i.e., the rating matrix A is low-rank or approximately low-rank.The low-rank assumption in matrix completion also agrees with the well-known Occam's razor principle in machine learning, whose goal is to find the "simplest" complete matrix X that is consistent with the known ratings in A.

Mathematical Models
Starting from the baseline model, we investigate various mathematical models, deterministic and probabilistic, that have been developed to address the matrix completion problem.The fundamental assumption is that a low-dimensional representation of users and items exists, although probably unknown, which can be used to accurately model the user-item association.Such low-dimensional representation is often characterized by a low-rank matrix.We also study models that employ various regularization methods and incorporate various constraints in the completed matrix.

Denoting
as the average rating among all known ratings in the rating matrix A, the baseline model [23] fills out a missing element R ui by where b u and b i represent the observed deviations of user u and item i from , respectively.The training parameters b u and b i can be estimated by solving the following least squares problem:

SVD model
The fundamental idea of the Singular Value Decomposition (SVD) model proposed by Sarwar et al. [24] is to decompose the rating matrix A into a user feature matrix, a singular value matrix, and an item feature matrix of low-rank.Starting from a normalized matrix A norm , by filling out the missing elements with preliminary simple predictions, the SVD model carries out an SVD operation on A norm such that where ˙is a diagonal matrix with descendently sorted singular values deposited in its diagonal entries, and the U and V columns contain the corresponding left and right singular vectors, respectively.By truncating the diagonal matrix ˙to a top-r rank ˙r , then U r ˙1 2 r and V r ˙1 2 r represent the latent factor vectors for users and items, respectively.The dot product of the u-th row of U r ˙1 2 r and the i -th row of V r ˙1 2 r yields the predicted u-th user rating of the i -th item.Sarwar et al. [25] employed a "folding-in" technique to build an SVD by incrementally adding new users and items so that the SVD model can be scalable and built faster; however, this may lead to quality loss.Instead of carrying out the dot product operation, Billsus and Pazzani [26] used the latent vectors as feature vectors to train an artificial neural network to predict the user ratings.

Matrix factorization model
The matrix factorization model is a generalization of the SVD model, which intends to find a lowrank matrix factorization to approximate A. Assuming an r-dimensional vector x u is associated with each user u and measures the latent factors influencing the preference of items, and an r-dimensional vector y i is associated with each item i and represents the latent factors influencing i , the matrix factorization model uses the dot product y T i x u to capture the correlation between user u and item i .The predicted rating then becomes Assuming the columns of X and Y contain all x u and y i vectors, respectively, the goal of the matrix completion is to estimate The parameters to be learned are the user feature vectors x u and the item feature vectors y i , which can be done by minimizing the Frobenius norm error as follows: min x ;y The potential problem of model ( 6) is that minimizing the Frobenious norm can easily lead to overfitting by biasing to the known entries.

l2-regularized matrix factorization model
To avoid overfitting the observed user-item ratings, the l2-norm regularized matrix factorization method [27] uses l2-norm to regularize the learning parameters by penalizing their magnitudes.Based on the matrix factorization model (6), this can be done by minimizing the regularized l2-norm error of x u and y i in addition to the Frobenious norm error term as follows [28] : min x ;y where 1 is a constant controlling the extent of regularization.
A more sophisticated l2-regularized matrix factorization model can be built on top of the baseline model by considering the user deviation b u and the item deviation b i .Then, each predicted rating The parameters to be learned become b u , b i , x u , and y i , which can be done by minimizing the regularized l2-norm error as follows: min where 2 is the regularization parameter.Because there are more training parameters, this model often yields a more accurate prediction.Prediction accuracy in regularized matrix factorization algorithm can often be improved by incorporating additional information or factors.Vozalis and Marqaritis [29] utilized demographic data as an additional source of information.A more famous example is the SVD++ method [30] , which is considered as the model with the highest accuracy in the Netflix Prize [31] .The SVD++ enhances the regularized SVD model by considering implicit feedback as an additional indication of user preferences.In the SVD++, in addition to the latent factor x u associated with each user u, which measures the latent factors of u influencing the preference of items, a set of item vectors are incorporated, relating to each item rated by the user u.Then, the user vector becomes x u C jW .u/j 1 2 P j 2R.u/ p j , and the predicted rating of user u for item i is calculated as where 3 is the regularization parameter for model (11).

l1-regularized SVD model and l1=l2regularized SVD model
Other regularization methods other than the l2-norm can also be incorporated to the SVD model.The l1-regularized SVD model [32] can generate sparse solutions, and the minimization problem then becomes min x ;y .jx u jCjy i j/ (12 where 4 is the regularization parameter controlling the extent of the l1-norms of the decomposed matrices y i and x u .
Considering that l1-regularization can generate sparse solutions, while l2-regularization often leads to more accurate predictions, the l1= l2-regularized SVD model attempts to benefit from both by combining the l1-norm and l2-norm.As a result, the corresponding minimization objective function becomes min x ;y where ˛is a tunable parameter to balance the l1-norm and l2-norm terms, and where 5 is the regularization parameter controlling the extent of the l1and l2-norms combination.

Spectral regularization model
Instead of applying regularization on the decomposed matrices, Mazumder et al. [33] , inspired by Candes and Tao [21] , proposed a spectral regularization model that uses the nuclear (trace) norm of the recovered matrix R, which is defined as the sum of the singular values in R. The objective of the matrix regularization model is to balance the minimization of the approximation errors in the known entries and the nuclear norm of R, where 6 is the regularization parameter controlling the extent of the nuclear norm.Note that Formula ( 14) is a convex model for completing matrix A.

Rank minimization model
Under the low-rank assumption, the matrix completion problem can be formulated as a matrix rank optimization problem such that min R rank.R/; where rank.R/ denotes the rank of matrix R.
Unfortunately, finding the exact solution for the above rank optimization problem is well-known to be NP-hard [34] .Nevertheless, the low-rank matrix approximation is the general principle used in the matrix completion algorithms for recommendation systems.

Nuclear norm minimization model
The rank optimization problem can be relaxed to a nuclear norm (trace norm) optimization problem [35] by minimizing the sum of the singular values of R.Then, the matrix completion problem is reformulated as a convex optimization problem such as min R kRk ; where k k denotes the nuclear norm.Candes and Recht [36] showed that the solution obtained by optimizing the nuclear norm is equivalent to that obtained by rank minimization in Formula (15), under certain mild conditions.If the application is "noisy", the nuclear norm minimization problem can be modeled as min R kRk ; where ı is the tolerance parameter to relax the R ui D A ui ; .u;i / 2 condition in Formula (16).

Matrix factorization minimization model
For recommendation systems that involve the completion of a large matrix, handling the intermediate m n matrices R .j/ at each iteration step is costly from both computation and storage point of view.Instead of storing the complete recovered matrix, the matrix factorization minimization model uses an r-rank matrix factorization, R D X Y , to represent the completed matrix R.Then, the matrix completion problem is formulated as a non-convex quadratic optimization problem by minimizing the sum of the Frobenius norms of X and Y : min Assuming that X is m r and Y is r n, and r m; n, the storage requirement of X Y becomes .mC n/ r, which is significantly less than that of the m n matrix R. Recht et al. [37] shows that if r is sufficiently greater than the rank of the optimal solution of the nuclear norm minimization model, the non-convex quadratic optimization is equivalent to the nuclear norm minimization.
An alternative matrix factorization model is designed for matrix completion by Wen et al. [38] , which leads to the low-rank matrix fitting (LMaFit) algorithm: min Similar to Formula (18), although the minimization is convex, the constraint is not.Consequently, Formula ( 19) is also a non-convex optimization model, which cannot guarantee globally optimal solutions.

Probabilistic model
The matrix completion problem is addressed by statistical models starting from the probabilistic Latent Semantic Analysis (pLSA) model [2] .In pLSA, the focus is on the conditional probability P .A ui ju; i / that a user u rates an item i with rating A ui .The fundamental idea is to derive a low-dimensional representation of the observed user-item ratings in terms of their affinity to hidden variables c [1] .The probability of co-occurrence is modeled as a mixture of conditionally independent multinomial distributions: P .ÂI u; i/ D X c P .c/P.ujc/P.ijc/D P .u/ where Â is a vector of the unknown parameters.Then, by incorporating a variational distribution V .cIu; i / [5] and defining a risk function R.Â; V /, such that the model maximizes the negative log-likelihood function: where H.V / is the entropy of variational distribution V .
In addition to pLSA, numerous probabilistic models can be used to predict user-item ratings, including Bayesian probabilistic matrix factorization [39] , regression-based latent factor model [40] , latent Dirichlet allocation [41] , probabilistic factor analysis [42] , and restricted Boltzmann machine [43] .However, these models are not covered in this paper.Interested readers can find the details in the above references.

Constraints on the completed matrix
Many applications require the completed matrix to have a certain property.For example, if the matrix to be completed is a covariance matrix, it is expected to be semi-positive definite.Moreover, the predicted negative value becomes meaningless in many applications.For example, in the user-item affinity prediction problem, it is difficult to explain the meaning of a predicted negative rating.The non-negative matrix completion intends to guarantee that all elements recovered are nonnegatives.The non-negative matrix completion problem becomes a constraint satisfaction problem by adding the non-negative constraints: Similarly, assuming A is an n n symmetric matrix, the semi-positive definite constraint [44] can be imposed in a similar way, such that min where R 0 indicates that R is semi-positive definite and 7 and 8 are regularization parameters.

Computational Algorithms for Matrix Completion in Recommendation Systems
In this section, considering the mathematical models described in Section 3, we review several popularly used recommendation algorithms based on matrix completion, including Alternative Least Square (ALS), spectral regularization with soft threshold, Alternating Direction Method of Multipliers (ADMM), Proximal Forward-Backward Splitting (PFBS), Singular Value Thresholding (SVT), Accelerated Proximal Gradient (APG), Fixed Point Continuation (FPC), nonlinear Successive Over-Relaxation (SOR), Stochastic Gradient Descent (SGD), and Expectation Maximization (EM) algorithms.

Alternative least square
The ALS algorithm is designed for the l2-regularized matrix factorization model (Formula ( 7)).However, because of the term y T i x u for calculating R ui , the objective function in Formula ( 7) is non-convex and optimizing Formula ( 7) is NP-hard.Nevertheless, if x u can be fixed by treating its variables as constants, the minimization objective of Formula (7) would become a convex function of y i [22] .Alternately, y i can then be fixed by treating its variables as constants and then the objective of Formula (7) becomes a convex function of x u .Therefore, in ALS, when one is fixed, the other is calculated, and this process is repeated until convergence is reached.This derivation process for the user vectors x u for all u can be expressed as x .j/ u ; and similarly, the process for calculating the item vectors y i for all i is where I r is an r r identity matrix.The benefit of using the ALS approach is that it can be computationally parallelized, since the calculation for each vector does not depend on the results of the other; therefore, it is an efficient optimization technique.
Replacing the Gram matrix XX T in the ALS algorithm with a kernel K.x i T ; x j T / function which measures the similarity between observation vectors may lead to better prediction results [27] .Paterek [27] reported that K.x i T ; x j T / D e 2.x i T x j 1/ is a good choice.The ALS algorithm can also be accelerated by integrating with other approaches.For example, Hastie et al. [45] combined Soft-Impute and ALS algorithms to obtain a Soft-Impute-ALS algorithm which outperforms both.

Spectral regularization with soft threshold
The Soft-Impute algorithm [33] is designed for the spectral regularization model (Formula ( 14)) by replacing the unknown elements from a softthresholded SVD at every iteration step.Starting from an initial matrix R .0/, Soft-Impute carries out the following iterations, R P .A/ C P N .R .j/ /; R .jC1/ D .R /; where D is the matrix shrinkage operator on threshold defined as the shrinkage of the singular values less than and their associated singular vectors, i.e., where k is the k-th singular value of R, and u k and v k are the corresponding left and right singular vectors, respectively.In the Soft-Impute algorithm, OEP .A/ P .R .j/ / C R .j/ replaces P .A/ C P N .R .j/ / during iterations such that the first part OEP .A/ P .R .j/ / is sparse and the second part R .j/ is low-rank, which can be efficiently stored and manipulated.Moreover, partial SVD algorithms are used to fast-calculate the D operator at each iteration step.

Proximal forward-backward splitting algorithm
The PFBS OE46 49 is a soft-thresholding algorithm popularly used in signal analysis and image processing.Given the spectral regularization model (Formula ( 14)), the PFBS solution is formulated by the fixed point equation: for ı > 0. Here, D 6 is the proximity operator of 6 kRk .Then, given Y .0/as the initial matrix, a simplified PFBS algorithm can be expressed using the following iteration steps: R .jC1/ D ı j 6 .Y .j/ /; Y .jC1/ R .jC1/ C ı j C1 P .A R .jC1/ /: 4.4 Alternating direction method of multipliers The ADMM [50] algorithm adopts the form of a decomposition-coordination procedure to break an optimization problem into small local sub-problems and coordinates the solutions of these sub-problems to the global problem.The ADMM combines the advantages of dual decomposition and augmented Lagrangian methods for optimization problems.
The ADMM algorithm for matrix completion starts from the following model, which is equivalent to model (14)  is the penalty parameter for the violation of the linear constraint, and h i denotes the standard trace inner product.Applying the original ADMM [51] algorithm to the augmented Lagrangian function, the following iterative scheme can be obtained: R .jC1/ arg min X L.R; Y .j/ ; Z .j/ /; Y .jC1/ arg min Y L.R .jC1/ ; Y; Z .j/ /: The updated Lagrange multiplier Z .jC1/ [52,53] is generalized as Z .jC1/ Z .j/ C .Y .jC1/ R .jC1/ /; where denotes the learning rate with a suggested range of 0 < < p 5C1 2 .Here, R .jC1/ can be obtained by applying the matrix shrinkage operator, i.e., and Y .jC1/ can be obtained using the inverse operator:

!! :
The ADMM algorithm is particularly suitable for handling matrix completion problems with additional constraints.Similar to the general matrix completion, ADMM have been used to address a model equivalent to the non-negative matrix completion model (Formula ( 23)) by introducing a separation matrix variable [54,55]

!! ;
where Q C is an operator that projects the parameter matrix X onto the non-negative space, such that In the above method, Q C is computed to generate Y .jC1/ , which cannot strictly guarantee non-negative elements in R .jC1/ .Nevertheless, when an appropriate penalty parameter is selected, kY Rk 2 F becomes small when convergence is approached, which can satisfy the non-negative requirements of many practical applications.
For the semi-positive definite matrix completion model (Formula ( 24)) [44] , R 2 S n C must be satisfied, where S n C denotes the cone (manifold) of positive semidefinite matrices in the space of symmetric n n matrices.To satisfy this constraint, the iteration to obtain R .jC1/ [56] becomes where I n is an n n identity matrix, and P S n C is a projector operator, which is computed by carrying out an eigen decomposition on its parameter matrix and then eliminating the eigenvalues less than 0 and their corresponding eigenvectors.

Singular value thresholding
The SVT algorithm [35] is a first-order algorithm for solving the nuclear norm optimization problem using min with a threshold parameter .An iterative gradient ascent approach formulated as Uzawa's algorithm [22] or linearized Bregman iterations [45] is applied, such that R .i/ D .Y .i/ /; Y .iC1/ Y .i/ C ıP .A R .i/ /: Here, ı is the step size.Unlike the ADMM, PFBS, and Soft-Impute algorithms, which lead to solutions of spectral norm regularization model (Formula ( 14)), the SVT algorithm actually converges to the approximated solution of the nuclear norm minimization model (Formula ( 16)).This is because a very large value is usually picked so that the kRk term dominates the F term in the minimization objective.The SVT algorithm considers the global pattern of A and seeks a complete matrix X with minimized nuclear norm to recover the missing entries in A. However, it has a problem of computational cost, where the matrix shrinkage operator D , which requires calculating the SVD to obtain the singular values and vectors of Y .i/ , is repeatedly computed at every iteration step.Cai and Osher [57] reformulated D .Y .i/ / by projecting Y .i/ onto a 2-norm ball and then applying complete orthogonal decomposition and polar decomposition to the projection, which saves 50% or more computational time compared to the SVT implementation with full SVD.A more popular alternative strategy is to compute partial SVD for the singular values of interest.This is because only those singular values over are concerned in D .The partial SVD implementations based on Krylov subspace algorithms, such as Lanczos algorithm with reorthogonalization, can efficiently accelerate the SVT algorithm if the number of singular values exceeding is significantly less than min .m;n/.However, if this number gets over 0:2 min .m;n/, the computational cost of partial SVD using Krylov subspace method can exceed that of full SVD [25] .Alternatively, recent studies show that the partial SVD calculation based on randomized SVD [58] , rank revealing technique [59] , single-pass SVD [60] , and subspace reuse [61] can keep D computation cost low throughout SVT iterations.

Fixed point continuation
Recently, Ma et al. [63] designed an FPC algorithm, which is a matrix extension of the fixed point iterative algorithm for the l1-regularized problem [49] , to solve the nuclear norm regularized linear least squares problem (Formula ( 14)).The fundamental idea of the FPC algorithm is based on an operator splitting technique.As an extended result from Ref. [49], R is the optimal solution to Formula ( 14) if and only if 0 2 6 @kR k C P .R / P .A/: Considering the following equivalent model, 0 2 6 @kR k CR R P .R / P .A/ ; FPC applies operator splitting by setting Y D R .P .R / P .A//; and the above model becomes F : This leads to the following FPC iteration scheme: Y .j/ R .j/ .P .R .j/ / P .A//; R .jC1/ D .jC1/
Notice that when ! D 1, the SOR iteration is equivalent to the GaussSeidel (GS) iteration.Nevertheless, when ! is appropriately set, the SOR iterations in the LMaFit lead to significant convergence acceleration compared to GS iterations.

Stochastic gradient descent
The l2 matrix factorization regularization problem (Formula (8)) can be solved by SGD optimization [64] , which iterates over all known ratings.For each .u;i/ 2 , the prediction error e ui is calculated as Then, the parameters b u , b i , x u , and y i are trained iteratively according to the opposite directions of the gradients: u C .e .j/ ui y .j/ i 2 x .j/ u /; y .jC1/ i y .j/ i C .e .j/ ui x .j/ u 2 y .j/ i /; where is the learning rate.Takacs et al. [65] extended the above model by dedicating different learning rates ( ) and regularization ( ) values for different learning parameters to obtain better accuracy.
In the SVD++ algorithm for Formula (11), the SGD iteration scheme accordingly becomes for 8l 2 R.u/: Compared to the SVD model, the SVD++ model often results in improved accuracy as it considers implicit feedback; however, the tradeoff is that there are significantly more parameters to train, which makes the SVD++ model difficult to scale to very large datasets.
The SGD optimization method can also be applied to the l1 matrix factorization regularization problem (Formula (11)) and the l1= l2 matrix factorization regularization problem (Formula ( 12)).Defining a vector sign function SGN.x/, such that SGN.x/ D where sgn./ denotes the signum function for a scalar, for model (11), the iteration scheme for updating the latent factor vectors x u and y i becomes x .jC1/ u x .j/ u C .e .j/ ui y .j/ i 4 SGN.x .j/ u // and y .jC1/ i y .j/ i C .e .j/ ui x .j/ u 4 SGN.y .j/ i //; respectively.For Formula (12), the iteration scheme for updating x u and y i then becomes x .jC1/ u x .j/ u C .e .j/ ui y .j/ i 5 2 SGN.x .j/ u / 5 .1 ˛/x .j/ u / and y .jC1/ i y .j/ i C .e .j/ ui x .j/ u 5 2 SGN.y .j/ i / 5 .1 ˛/y .j/ i /; respectively.The other iteration steps are similar to that of SGD for the l2-regularized matrix factorization model (Formula ( 8)).

Expectation maximization algorithm
The parameters of the probabilistic models, such as pLSA, are learned using the EM algorithm [4] .In the EM algorithm, the parameters are estimated iteratively, starting from an initial guess.Each iteration computes an Expectation (E) step and a Maximization (M) step in alternation [66] .The E-step uses the current estimate of the parameters to obtain the distribution for the unobserved variables, given the observed values of the known variables.The M-step re-estimates the model parameters to maximize the log-likelihood function.

X
.u;i /2 X c V .jC1/ .cIu; iI Â .j/ /.log P .ujc/Clog P .cji//: The M-step then maximizes the above upper bound of the log-likelihood function R.Â .jC1/ ; V .jC1/ / with respect to Â .jC1/ .The EM iterations are repeated until the likelihood improvement is smaller than a predetermined threshold value.
The EM algorithm is often a non-convex optimization process.It has been shown that each EM iteration either improves the true likelihood or reaches the local maximum.

Applications for Recommendation Systems Using Matrix Completion
In addition to the usual applications of useritem association prediction, here we present other applications of recommendation systems based matrix completion.

Computational drug repositioning
Computational drug repositioning is an important and efficient approach to identify new treatments with known drugs.Luo et al. [8] modeled the drug repositioning problem as a recommendation system (DRRS) to discover new disease indications for drugs.In the DRRS, the related data sources and validated information of drugs and diseases are integrated to construct a heterogeneous drug-disease interaction network (Fig. 1).Then, the heterogeneous network is represented as a large adjacency matrix (Fig. 2) where the unknown drug-disease associations are presented as blank entries.A fast SVT algorithm [61] is used to complete the drug-disease adjacency matrix with predicted scores for unknown drug-disease pairs.The comprehensive experimental results show that the DRRS improves the prediction accuracy compared with the other state-of-the-art approaches in both systemwide and de novo predictions.

Sports game results predictions
The National Collegiate Athletic Association (NCAA) Men's Division I Basketball Tournament, commonly known as "March Madness", is one of the most popular sporting events in the United States.Every year, 68 out of 364 NCAA Division I teams are selected after the regular season to participate in a single elimination tournament for the NCAA men's basketball championship.By arranging every team on rows and columns, a matrix of games is displayed in Fig. 3, where a blue dot represents a game between two teams in the regular season.
Ji et al. [67,68] employed matrix completion recommendation systems to predict the March Madness results.Game parameters, including field goals percentage, three pointers percentages, free throw percentages, offensive rebounds, defensive rebounds, assists, turnovers, steals, blocks, and fouls, were predicted by completing the game parameter matrices.These predicted parameters provided a predicted scenario of a game of two teams that have never met in the regular season.These parameters were fed to a neural network to finally predict the outcome of the March Madness playoff games.In 2015 March Madness, this method correctly predicted the outcomes of 49 out of 63 games.

Business to business electronic commerce
The use of recommendation systems toward electronic commercial or e-commerce applications focusing on Business-to-Customer (B2C) approaches has been discussed previously, such as the Netflix problem.These e-commerce B2C applications also include online retailers such as Amazon, Best Buy, Walmart, and most other corporations that dominate the retail industry.However, another application of recommendation systems used in commerce is Business to Business (B2B) transactions, where like those B2C online retailers, these B2B recommendation system users try to minimize the information overload and allow a computational algorithm to provide effective business intelligence.
The attributes of a typical B2B e-commerce recommendation system can be classified into the following main categories: system inputs, system processes, and system outputs.The system includes data collected from the business, which comprise industry specific conditions, supplier data, past and current customer activities, and customer ratings about goods and services [69] .In this way, the B2B recommender functions similarly to the contentbased filtering approaches in B2C systems; however, they differ in their outputs, as the B2B system is not focused on delivering a computationally derived associated product to the customer, but on establishing links between a business and another stakeholder and identifying potential opportunities with other businesses.For instance, the system can use website browsing data and consequent purchases to evaluate advertising effectiveness, and then make recommendations for partnerships with marketing companies.Another approach could be in supply chain management, where the system makes recommendations for suppliers based on past performances of deliveries in terms of timeliness and quality, and the sales that resulted from the manufacturers.This data can help the business in negotiating prices, discovering opportunities, and evaluating the return-on-investment on any given decision.

Gene expression predictions
Over the last 50 years, one of the most dynamic fields of study in biomedical research has been investigations into protein folding and its effects on gene expression.The genomic manifestations of many human diseases and pathological conditions are related to protein folding [70] .This is a process that describes how a protein can exist in four possible states.The first state is the "unfolded state", in which a protein has been assembled with all the proper chemical components, but is not functional.The second state is the "molten globule", or partially folded state.The third state is the "native state", in which the protein is folded into its proper three-dimensional structure and is biologically functional.The fourth possible state is the amyloid fibril state, in which the protein is misfolded and becomes deformed.These latter two states have captivated many biological scientists because their impacts on the expression of genes lead to neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Bovine Spongiform Encephalopathy, and Rheumatoid Arthritis.
Advances in high performance computing have allowed researchers to investigate the effects of gene expression, and have led to the use of extremely large datasets to predict how genes are expressed based on their underlying protein structure.One such method is the use of low-rank matrix completion on known and sparse gene expression levels to recommend future gene expressions.A low-rank matrix is formed based on the underlying biological conditions.For instance, it is generally known that many genes interact with each other; therefore, interdependent factors contribute to the protein folding phases, leading to gene transcription, and ultimately gene expression, which can be characterized computationally in a correlated data matrix.Since gene expression values are likely to exist in a low-dimensional linear subspace, the resulting matrix can be considered as a low-rank matrix [71] .Then, the techniques discussed in the previous sections, such as the minimization of the nuclear norm can be applied to recover and complete the matrix, thereby yielding a prediction on the final gene.

Microblogging recommendations
In a digital age where many people across the globe get their information from social networking platforms, the popular "microblogging" site Tumblr where users can share short messages to a wide audience can employ the advantages of recommendation systems to allow users find other similar messages or microblogs.Since message posts are generally short, a large number of such messages are generated every day, leading to mass amounts of dynamic text data, in addition to images.However, unlike other collaborative filtering approaches where users rank preferences, with Tumblr the ratings are in a more binary form; users simply chose to follow or not follow a post.However, this can be simplified by incorporating users activities and the contents of their posts which can include a combination of text, tags, or images [72] .These activities are then analyzed using machine learning techniques, such as a convolutional neural network, where all relevant features from the vast datasets can be obtained.Additionally, the features can be examined by employing a second neural network known as "word2vec", which transforms text data into a vector where words in similar contexts are closer to each other with multiple degrees of similarity [73] .Ultimately, the missing information from users who do not follow other users can essentially be supplanted by the activities they have performed in their own posts.By incorporating features from users, the matrix completion models can be used to make recommendations in the inductive setting, where predictions can be made for users not present in the training data set.

Conclusion
Matrix completion approaches have become important methodologies in recommendation systems, which are often more accurate than the nearest-neighbor approaches.Motivated by the famous Netflix Prize problem, many recommendation system models have been proposed and many computational algorithms have been accordingly developed.This survey aims to provide a comprehensive review of the matrix completion models and algorithms for recommendation systems, although it is unlikely to cover all models and algorithms available.
There have been quite a few research directions that go beyond the recommendation systems based on matrix completion.In reality, the popularity of an item may change over time.This can be solved by incorporating temporal dynamics information into the recommendation model.For example, Koren and Bell [64] proposed models that incorporate timechanging factors to gain insight of how the influences of two items rated by the same user decay over time.In fact, a more general problem of matrix completion is tenor completion, which is related to recovering missing values in high-dimensional data.Liu et al. [74] defined trace norm for tensors and extended the nuclear (trace) norm minimization model for tensor completion.Moreover, the traditional recommendation systems focus on prediction accuracy only.Nevertheless, in practical applications, objectives such as diversity and novelty are also important, although these may conflict with accuracy [75] .Hence, multi-objective optimization algorithms [76-78]   are needed to find recommendations with respect to the tradeoffs among conflicting objectives.Furthermore, with the development of modern parallel and distributed computing architectures, much effort has been put on designing efficient parallel algorithms [79,80] to enable matrix completion techniques make efficient recommendations for large-scale datasets.

p j 1 A
(10)  where W .u/ denotes the set of items associated with user u.The parameters to be learned are b u , b c , x u , p j , and y c , which can be done by minimizing the regularized squared error as follows: min b ;x ;p ;y kP .b b

Fig. 3
Fig. 3 Game matrix of 364 NCAA division I basketball teams.The x-and y-axes represent the NCAA teams and each point indicates there is a match between the two team during the regular season.