Low-Rank Matrix Completion: A Contemporary Survey

As a paradigm to recover unknown entries of a matrix from partial observations, low-rank matrix completion (LRMC) has generated a great deal of interest. Over the years, there have been lots of works on this topic, but it might not be easy to grasp the essential knowledge from these studies. This is mainly because many of these works are highly theoretical or a proposal of new LRMC technique. In this paper, we give a contemporary survey on LRMC. In order to provide a better view, insight, and understanding of potentials and limitations of the LRMC, we present early scattered results in a structured and accessible way. Specifically, we classify the state-of-the-art LRMC techniques into two main categories and then explain each category in detail. We next discuss the issues to be considered when one considers using the LRMC techniques. These include intrinsic properties required for the matrix recovery and how to exploit a special structure in the LRMC design. We also discuss the convolutional neural network (CNN)-based LRMC algorithms exploiting the graph structure of a low-rank matrix. Furthermore, we present the recovery performance and the computational complexity of state-of-the-art LRMC techniques. Our hope is that this paper will serve as a useful guide for practitioners and non-experts to catch the gist of the LRMC.


I. INTRODUCTION
In the era of big data, the low-rank matrix has become a useful and popular tool to express twodimensional information. One well-known example is the rating matrix in the recommendation systems representing users' tastes on products [1]. Since users expressing similar ratings on multiple products tend to have the same interest for the new product, columns associated with users sharing the same interest are highly likely to be the same, resulting in the low rank structure July    of the rating matrix (see Fig. I). Another example is the Euclidean distance matrix formed by the pairwise distances of a large number of sensor nodes. Since the rank of a Euclidean distance matrix in the k-dimensional Euclidean space is at most k + 2 (if k = 2, then the rank is 4), this matrix can be readily modeled as a low-rank matrix [2], [3], [4].
One major benefit of the low-rank matrix is that the essential information, expressed in terms of degree of freedom, in a matrix is much smaller than the total number of entries. Therefore, even though the number of observed entries is small, we still have a good chance to recover the whole matrix. There are a variety of scenarios where the number of observed entries of a July 30, 2019 DRAFT matrix is tiny. In the recommendation systems, for example, users are recommended to submit the feedback in a form of rating number, e.g., 1 to 5 for the purchased product. However, users often do not want to leave a feedback and thus the rating matrix will have many missing entries. Also, in the internet of things (IoT) network, sensor nodes have a limitation on the radio communication range or under the power outage so that only small portion of entries in the Euclidean distance matrix is available.
When there is no restriction on the rank of a matrix, the problem to recover unknown entries of a matrix from partial observed entries is ill-posed. This is because any value can be assigned to unknown entries, which in turn means that there are infinite number of matrices that agree with the observed entries. As a simple example, consider the following 2 × 2 matrix with one unknown entry marked ?
If M is a full rank, i.e., the rank of M is two, then any value except 10 can be assigned to ?.
Whereas, if M is a low-rank matrix (the rank is one in this trivial example), two columns differ by only a constant and hence unknown element ? can be easily determined using a linear relationship between two columns (? = 10). This example is obviously simple, but the fundamental principle to recover a large dimensional matrix is not much different from this and the low-rank constraint plays a central role in recovering unknown entries of the matrix.
Before we proceed, we discuss a few notable applications where the underlying matrix is modeled as a low-rank matrix. 1) Recommendation system: In 2006, the online DVD rental company Netflix announced a contest to improve the quality of the company's movie recommendation system. The company released a training set of half million customers. Training set contains ratings on more than ten thousands movies, each movie being rated on a scale from 1 to 5 [1]. The training data can be represented in a large dimensional matrix in which each column represents the rating of a customer for the movies. The primary goal of the recommendation system is to estimate the users' interests on products using the sparsely July [4]. The Euclidean distance matrix can be recovered with 92% of distance error below 0.5m using 30% of observed distances. sampled 1 rating matrix. 2 Often users sharing the same interests in key factors such as the type, the price, and the appearance of the product tend to provide the same rating on the movies. The ratings of those users might form a low-rank column space, resulting in the low-rank model of the rating matrix (see Fig. I).
2) Phase retrieval: The problem to recover a signal not necessarily sparse from the magnitude of its observation is referred to as the phase retrieval. Phase retrieval is an important problem in X-ray crystallography and quantum mechanics since only the magnitude of 1 Netflix dataset consists of ratings of more than 17,000 movies by more than 2.5 million users. The number of known entries is only about 1% [1].
Original image Image with noise and scribbles Reconstructed image Fig. 3. Image reconstruction via LRMC. Recovered images achieve peak SNR ≥ 32dB.
the Fourier transform is measured in these applications [5]. Suppose the unknown timedomain signal m = [m 0 · · · m n−1 ] is acquired in a form of the measured magnitude of the Fourier transform. That is, m t e −j2πωt/n , ω ∈ Ω, where Ω is the set of sampled frequencies.
Further, let f ω = 1 √ n [1 e −j2πω/n · · · e −j2πω(n−1)/n ] H , M = mm H where m H is the conjugate transpose of m. Then, (2) can be rewritten as where F w = f w f H w is the rank-1 matrix of the waveform f ω . Using this simple transform, we can express the quadratic magnitude |z ω | 2 as linear measurement of M. In essence, 7 the phase retrieval problem can be converted to the problem to reconstruct the rank-1 matrix M in the positive semi-definite (PSD) cone 3 [5]: 3) Localization in IoT networks: In recent years, internet of things (IoT) has received much attention for its plethora of applications such as healthcare, automatic metering, environmental monitoring (temperature, pressure, moisture), and surveillance [6], [7], [2].
Since the action in IoT networks, such as fire alarm, energy transfer, emergency request, is made primarily on the data center, data center should figure out the location information of whole devices in the networks. In this scheme, called network localization (a.k.a. cooperative localization), each sensor node measures the distance information of adjacent nodes and then sends it to the data center. Then the data center constructs a map of sensor nodes using the collected distance information [8]. Due to various reasons, such as the power outage of a sensor node or the limitation of radio communication range (see Fig.   1), only small number of distance information is available at the data center. Also, in the vehicular networks, it is not easy to measure the distance of all adjacent vehicles when a vehicle is located at the dead zone. An example of the observed Euclidean distance matrix is ? ?
where d ij is the pairwise distance between two sensor nodes i and j. Since the rank of Euclidean distance matrix M is at most k+2 in the k-dimensional Euclidean space [4], the problem to reconstruct M can be well-modeled as the LRMC problem.  [46], [47] II.B.1 Heuristic greedy algorithm [43], [44] II.B.3 Optimization over smooth Riemannian manifold [49], [50] II.B.4 Truncated NNM [57], [58] When the rank is unknown When the rank is known

4)
Image compression and restoration: When there is dirt or scribble in a two-dimensional image (see Fig. 1), one simple solution is to replace the contaminated pixels with the interpolated version of adjacent pixels. A better way is to exploit intrinsic domination of a few singular values in an image. In fact, one can readily approximate an image to the low-rank matrix without perceptible loss of quality. By using clean (uncontaminated) pixels as observed entries, an original image can be recovered via the low-rank matrix completion.

5)
Massive multiple-input multiple-output (MIMO): By exploiting hundreds of antennas at the basestation (BS), massive MIMO can offer a large gain in capacity. In order to optimize the performance gain of the massive MIMO systems, the channel state information at the transmitter (CSIT) is required [9]. One way to acquire the CSIT is to let each user directly feed back its own pilot observation to BS for the joint CSIT estimation of all users [12].
In this setup, the MIMO channel matrix H can be reconstructed in two steps: 1) finding the pilot matrix Y using the least squares (LS) estimation or linear minimum mean square error (LMMSE) estimation and 2) reconstructing H using the model Y = HΦ where each column of Φ is the pilot signal from one antenna at BS [10], [11]. Since the number of resolvable paths P is limited in most cases, one can readily assume that rank(H) ≤ P [12]. In the massive MIMO systems, P is often much smaller than the dimension of H due to the limited number of clusters around BS. Thus, the problem to recover H at BS can be solved via the rank minimization problem subject to the linear constraint Y = HΦ [11].
The paradigm of LRMC has received much attention ever since the works of Fazel [21], Candes and Recht [22], and Candes and Tao [23]. Over the years, there have been lots of works on this topic [5], [57], [48], [49], but it might not be easy to grasp the essentials of LRMC from these studies. One reason is because many of these works are highly theoretical and based on random matrix theory, graph theory, manifold analysis, and convex optimization so that it is not easy to grasp the essential knowledge from these studies. Another reason is that most of these works are proposals of new LRMC technique so that it is difficult to catch a general idea and big picture of LRMC from these.
The primary goal of this paper is to provide a contemporary survey on LRMC, a new paradigm to recover unknown entries of a low-rank matrix from partial observations. To provide better view, insight, and understanding of the potentials and limitations of LRMC to researchers and practitioners in a friendly way, we present early scattered results in a structured and accessible way. Firstly, we classify the state-of-the-art LRMC techniques into two main categories and then explain each category in detail. Secondly, we present issues to be considered when using LRMC techniques. Specifically, we discuss the intrinsic properties required for low-rank matrix recovery and explain how to exploit a special structure, such as positive semidefinite-based structure, Euclidean distance-based structure, and graph structure, in LRMC design. and their performance guarantees can be found in [73]. A survey with an emphasis on first-order LRMC techniques together with their computational efficiency is presented in [74]. Our work is distinct from the previous studies in several aspects. Firstly, we categorize the state-of-theart LRMC techniques into two classes and then explain the details of each class, which can help researchers to easily determine which technique can be used for the given problem setup.
Secondly, we provide a comprehensive survey of LRMC techniques and also provide extensive simulation results on the recovery quality and the running time complexity from which one can easily see the pros and cons of each LRMC technique and also gain a better insight into the choice of LRMC algorithms. Finally, we discuss how to exploit a special structure of a lowrank matrix in the LRMC algorithm design. In particular, we introduce the CNN-based LRMC algorithm that exploits the graph structure of a low-rank matrix.
We briefly summarize notations used in this paper.
• For a vector a ∈ R n , diag(a) ∈ R n×n is the diagonal matrix formed by a.
• For a matrix A ∈ R n 1 ×n 2 , a i ∈ R n 1 is the i-th column of A.
• rank(A) is the rank of A.
• A T ∈ R n 2 ×n 1 is the transpose of A.
• For A, B ∈ R n 1 ×n 2 , A, B = tr(A T B) and A⊙B are the inner product and the Hadamard product (or element-wise multiplication) of two matrices A and B, respectively, where tr(·) denotes the trace operator.
• A , A * , and A F stand for the spectral norm (i.e., the largest singular value), the nuclear norm (i.e., the sum of singular values), and the Frobenius norm of A, respectively.
• σ i (A) is the i-th largest singular value of A.
• I d is the d-dimensional identity matrix.
• If A is a square matrix (i.e., n 1 = n 2 = n), diag(A) ∈ R n is the vector formed by the diagonal entries of A.
• vec(X) is the vectorization of X.

II. BASICS OF LOW-RANK MATRIX COMPLETION
In this section, we discuss the principle to recover a low-rank matrix from partial observations.
Basically, the desired low-rank matrix M can be recovered by solving the rank minimization problem min X rank(X) subject to where Ω is the index set of observed entries (e.g., Ω = {(1, 1), (1, 2), (2, 1)} in the example in (1)). One can alternatively express the problem using the sampling operator P Ω . The sampling operation P Ω (A) of a matrix A is defined as Using this operator, the problem (9) can be equivalently formulated as A naive way to solve the rank minimization problem (10) is the combinatorial search. Specifically, we first assume that rank(M) = 1. Then, any two columns of M are linearly dependent and thus we have the system of expressions m i = α i,j m j for some α i,j ∈ R. If the system has no solution for the rank-one assumption, then we move to the next assumption of rank(M) = 2. In this case, we solve the new system of expressions m i = α i,j m j + α i,k m k . This procedure is repeated until the solution is found. Clearly, the combinatorial search strategy would not be feasible for most practical scenarios since it has an exponential complexity in the problem size [76]. For example, when M is an n × n matrix, it can be shown that the number of the system expressions to be solved is O(n2 n ).
As a cost-effective alternative, various low-rank matrix completion (LRMC) algorithms have been proposed over the years. Roughly speaking, depending on the way of using the rank information, the LRMC algorithms can be classified into two main categories: 1) those without the rank information and 2) those exploiting the rank information. In this section, we provide an in depth discussion of two categories (see the outline of LRMC algorithms in Fig. 3).

A. LRMC Algorithms Without the Rank Information
In this subsection, we explain the LRMC algorithms that do not require the rank information of the original low-rank matrix.

1) Nuclear Norm Minimization (NNM):
Since the rank minimization problem (10) is NPhard [21], it is computationally intractable when the dimension of a matrix is large. One common trick to avoid computational issue is to replace the non-convex objective function with its convex surrogate, converting the combinatorial search problem into a convex optimization problem.
There are two clear advantages in solving the convex optimization problem: 1) a local optimum solution is globally optimal and 2) there are many efficient polynomial-time convex optimization solvers (e.g., interior point method [77] and semi-definite programming (SDP) solver).
In the LRMC problem, the nuclear norm X * , the sum of the singular values of X, has been widely used as a convex surrogate of rank(X) [22]: Indeed, it has been shown that the nuclear norm is the convex envelope (the "best" convex approximation) of the rank function on the set {X ∈ R n 1 ×n 2 : X ≤ 1} [21]. 4 Note that the relaxation from the rank function to the nuclear norm is conceptually analogous to the relaxation from ℓ 0 -norm to ℓ 1 -norm in compressed sensing (CS) [39], [40], [41].
Now, a natural question one might ask is whether the NNM problem in (11) would offer a solution comparable to the solution of the rank minimization problem in (10). In [22], it has been shown that if the observed entries of a rank r matrix M(∈ R n×n ) are suitably random and the number of observed entries satisfies where µ 0 is the largest coherence of M (see the definition in Subsection III-A2), then M is the unique solution of the NNM problem (11) with overwhelming probability (see Appendix B).
It is worth mentioning that the NNM problem in (11) can also be recast as a semidefinite program (SDP) as (see Appendix A) is the sequence of linear sampling matrices, and {b k } |Ω| k=1 are the observed entries. The problem (13) can be solved by the off-theshelf SDP solvers such as SDPT3 [24] and SeDuMi [25] using interior-point methods [26], [27], [28], [31], [30], [29]. It has been shown that the computational complexity of SDP techniques is O(n 3 ) where n = max(n 1 , n 2 ) [30]. Also, it has been shown that under suitable conditions, where ω is a positive constant [29]. Alternatively, one can reconstruct M by solving the equivalent nonconvex quadratic optimization form of the NNM problem [32]. Note that this approach has computational benefit since the number of primal variables of NNM is reduced from n 1 n 2 to r(n 1 + n 2 ) (r ≤ min(n 1 , n 2 )). Interested readers may refer to [32] for more details.

2) Singular Value Thresholding (SVT):
While the solution of the NNM problem in (11) can be obtained by solving (13), this procedure is computationally burdensome when the size of the matrix is large.
As an effort to mitigate the computational burden, the singular value thresholding (SVT) algorithm has been proposed [33]. The key idea of this approach is to put the regularization term into the objective function of the NNM problem: where τ is the regularization parameter. In [33, Theorem 3.1], it has been shown that the solution to the problem (14) converges to the solution of the NNM problem as τ → ∞. 5 5 In practice, a large value of τ has been suggested (e.g., τ = 5n for an n × n low rank matrix) for the fast convergence of SVT. For example, when τ = 5000, it requires 177 iterations to reconstruct a 1000×1000 matrix of rank 10 [33].
Let L(X, Y) be the Lagrangian function associated with (14), i.e., where Y is the dual variable. Let X and Y be the primal and dual optimal solutions. Then, by the strong duality [77], we have The SVT algorithm finds X and Y in an iterative fashion. Specifically, starting with Y 0 = 0 n 1 ×n 2 , SVT updates X k and Y k as where {δ k } k≥1 is a sequence of positive step sizes. Note that X k can be expressed as where (a) is because Due to the inclusion of the nuclear norm, finding out the solution X k of (18) seems to be difficult. However, thanks to the intriguing result of Cai et al., we can easily obtain the solution.
where D τ is the singular value thresholding operator defined as . To conclude, the update equations for X k and Y k are given by One can notice from (21a) and (21b) that the SVT algorithm is computationally efficient since we only need the truncated SVD and elementary matrix operations in each iteration. Indeed, let r k be the number of singular values of Y k−1 being greater than the threshold τ . Also, we suppose {r k } converges to the rank of the original matrix, i.e., lim k→∞ r k = r. Then the computational complexity of SVT is O(rn 1 n 2 ). Note also that the iteration number to achieve the ǫ-approximation 6 is O( 1 √ ǫ ) [33]. In Table I, we summarize the SVT algorithm. For the details of the stopping criterion of SVT, see [33,Section 5].
Over the years, various SVT-based techniques have been proposed [35], [78], [79]. In [78], an iterative matrix completion algorithm using the SVT-based operator called proximal operator has been proposed. Similar algorithms inspired by the iterative hard thresholding (IHT) algorithm in CS have also been proposed [35], [79]. 6 By ǫ-approximation, we mean M − M * F ≤ ǫ where M is the reconstructed matrix and M * is the optimal solution of SVT.

3) Iteratively Reweighted Least Squares (IRLS)
Minimization: Yet another simple and computationally efficient way to solve the NNM problem is the IRLS minimization technique [36], [37]. In essence, the NNM problem can be recast using the least squares minimization as where W = (XX T ) − 1 2 . It can be shown that (22) is equivalent to the NNM problem (11) since we have [36] X * = tr((XX T ) The key idea of the IRLS technique is to find X and W in an iterative fashion. The update expressions are Note that the weighted least squares subproblem (24a) can be easily solved by updating each and every column of X k [36]. In order to compute W k , we need a matrix inversion (24b). To avoid the ill-behavior (i.e., some of the singular values of X k approach to zero), an approach to use the perturbation of singular values has been proposed [36], [37]. Similar to SVT, the computational complexity per iteration of the IRLS-based technique is O(rn 1 n 2 ). Also, IRLS requires O(log( 1 ǫ )) iterations to achieve the ǫ-approximation solution. We summarize the IRLS minimization technique in Table II.

B. LRMC Algorithms Using Rank Information
In many applications such as localization in IoT networks, recommendation system, and image restoration, we encounter the situation where the rank of a desired matrix is known in advance.
As mentioned, the rank of a Euclidean distance matrix in a localization problem is at most k + 2 (k is the dimension of the Euclidean space). In this situation, the LRMC problem can be formulated as a Frobenius norm minimization (FNM) problem: Due to the inequality of the rank constraint, an approach to use the approximate rank information (e.g., upper bound of the rank) has been proposed [43]. The FNM problem has two main advantages: 1) the problem is well-posed in the noisy scenario and 2) the cost function is differentiable so that various gradient-based optimization techniques (e.g., gradient descent, conjugate gradient, Newton methods, and manifold optimization) can be used to solve the problem.
In this subsection, we explain these techniques in detail.

1) Greedy Techniques:
In recent years, greedy algorithms have been popularly used for LRMC due to the computational simplicity. In a nutshell, they solve the LRMC problem by making a heuristic decision at each iteration with a hope to find the right solution in the end.
Let r be the rank of a desired low-rank matrix M ∈ R n×n and M = UΣV T be the singular value decomposition of M where U, V ∈ R n×r . By noting that M can be expressed as a linear combination of r rank-one matrices. The main task of greedy of rank-one matrices representing M. Once the atom set A M is found, the singular values σ i (M) = σ i can be computed easily by solving the following problem To be specific, let A = [vec(P Ω (ϕ 1 )) · · · vec(P Ω (ϕ r ))], α = [α 1 · · · α r ] T and b = vec(P Ω (M)).
One popular greedy technique is atomic decomposition for minimum rank approximation (ADMiRA) [43], which can be viewed as an extension of the compressive sampling matching pursuit (CoSaMP) algorithm in CS [38], [39], [40], [41]. ADMiRA employs a strategy of adding as well as pruning to identify the atom set A M . In the adding stage, ADMiRA identifies 2r rankone matrices representing a residual best and then adds the matrices to the pre-chosen atom set.
Specifically, if X i−1 is the output matrix generated in the (i−1)-th iteration and A i−1 is its atom set, then ADMiRA computes the residual R i = P Ω (M) − P Ω (X i−1 ) and then adds 2r leading principal components of R i to A i−1 . In other words, the enlarged atom set Ψ i is given by where u R i ,j and v R i ,j are the j-th principal left and right singular vectors of R i , respectively.
Note that Ψ i contains at most 3r elements. In the pruning stage, ADMiRA refines Ψ i into a set of r atoms. To be specific, if X i is the best rank-3r approximation of M, i.e., 7 then the refined atom set A i is expressed as where u X i ,j and v X i ,j are the j-th principal left and right singular vectors of X i , respectively.
The computational complexity of ADMiRA is mainly due to two operations: the least squares operation in (27) and the SVD-based operation to find out the leading atoms of the required matrix (e.g., R k and X k+1 ). First, since (27) involves the pseudo-inverse of A (size of |Ω| × O(r)), its computational cost is O(r|Ω|). Second, the computational cost of performing a truncated SVD of O(r) atoms is O(rn 1 n 2 ). Since |Ω| < n 1 n 2 , the computational complexity of ADMiRA per iteration is O(rn 1 n 2 ). Also, the iteration number of ADMiRA to achieve the ǫ-approximation [43]. In Table III, we summarize the ADMiRA algorithm. Yet another well-known greedy method is the rank-one matrix pursuit algorithm [44], an extension of the orthogonal matching pursuit algorithm in CS [42]. In this approach, instead of choosing multiple atoms of a matrix, an atom corresponding to the largest singular value of the residual matrix R k is chosen.
2) Alternating Minimization Techniques: Many of LRMC algorithms [33], [43] require the computation of (partial) SVD to obtain the singular values and vectors (expressed as O(rn 2 )). As an effort to further reduce the computational burden of SVD, alternating minimization techniques have been proposed [45], [46], [47]. The basic premise behind the alternating minimization techniques is that a low-rank matrix M ∈ R n 1 ×n 2 of rank r can be factorized into tall and fat matrices, i.e., M = XY where X ∈ R n 1 ×r and Y ∈ R r×n 2 (r ≪ n 1 , n 2 ). The key idea of this approach is to find out X and Y minimizing the residual defined as the difference between the original matrix and the estimate of it on the sampling space. In other words, they recover X and Y by solving Power factorization, a simple alternating minimization algorithm, finds out the solution to (31) by updating X and Y alternately as [45] X i+1 = arg min Alternating steepest descent (ASD) is another alternating method to find out the solution [46].
The key idea of ASD is to update X and Y by applying the steepest gradient descent method (31). Specifically, ASD first computes the gradient of f (X, Y) with respect to X and then updates X along the steepest gradient descent direction: where the gradient descent direction ▽f Y i (X i ) and stepsize t x i are given by After updating X, ASD updates Y in a similar way: where The low-rank matrix fitting (LMaFit) algorithm finds out the solution in a different way by solving [47] arg min With the arbitrary input of X 0 ∈ R n 1 ×r and Y 0 ∈ R r×n 2 and Z 0 = P Ω (M), the variables X, Y, and Z are updated in the i-th iteration as where X † is Moore-Penrose pseudoinverse of matrix X.
Running time of the alternating minimization algorithms is very short due to the following reasons: 1) it does not require the SVD computation and 2) the size of matrices to be inverted is smaller than the size of matrices for the greedy algorithms. While the inversion of huge size matrices (size of |Ω| × O(1)) is required in a greedy algorithms (see (27)), alternating minimization only requires the pseudo inversion of X and Y (size of n 1 × r and r × n 2 , respectively). Indeed, the computational complexity of this approach is O(r|Ω| + r 2 n 1 + r 2 n 2 ), which is much smaller than that of SVT and ADMiRA when r ≪ min(n 1 , n 2 ). Also, the iteration number of ASD and LMaFit to achieve the ǫ-approximation is O(log( 1 ǫ )) [46], [47]. It has been shown that alternating minimization techniques are simple to implement and also require small sized memory [84]. Major drawback of these approaches is that it might converge to the local optimum.

3) Optimization over Smooth Riemannian Manifold:
In many applications where the rank of a matrix is known in a priori (i.e., rank(M) = r), one can strengthen the constraint of (25) by defining the feasible set, denoted by F , as Note that F is not a vector space 8 and thus conventional optimization techniques cannot be used to solve the problem defined over F . While this is bad news, a remedy for this is that F is a smooth Riemannian manifold [53], [48]. Roughly speaking, smooth manifold is a generalization of R n 1 ×n 2 on which a notion of differentiability exists. For more rigorous definition, see, e.g., [55], [56]. A smooth manifold equipped with an inner product, often called a Riemannian metric, forms a smooth Riemannian manifold. Since the smooth Riemannian manifold is a differentiable structure equipped with an inner product, one can use all necessary ingredients to solve the optimization problem with quadratic cost function, such as Riemannian gradient, Hessian matrix, exponential map, and parallel translation [55]. Therefore, optimization techniques in R n 1 ×n 2 (e.g., steepest descent, Newton method, conjugate gradient method) can be used to solve (25) in the smooth Riemannian manifold F .
In recent years, many efforts have been made to solve the matrix completion over smooth Riemannian manifolds. These works are classified by their specific choice of Riemannian manifold structure. One well-known approach is to solve (25) over the Grassmann manifold of orthogonal matrices 9 [49]. In this approach, a feasible set can be expressed as F = {QR T : Q T Q = I, Q ∈ R n 1 ×r , R ∈ R n 2 ×r } and thus solving (25) is to find an n 1 × r orthonormal matrix Q satisfying In [49], an approach to solve (39) over the Grassmann manifold has been proposed.
Recently, it has been shown that the original matrix can be reconstructed by the unconstrained optimization over the smooth Riemannian manifold F [50]. Often, F is expressed using the singular value decomposition as F = {UΣV T : U ∈ R n 1 ×r , V ∈ R n 2 ×r , Σ 0, The FNM problem (25) can then be reformulated as an unconstrained optimization over F : One can easily obtain the closed-form expression of the ingredients such as tangent spaces, Riemannian metric, Riemannian gradient, and Hessian matrix in the unconstrained optimization [53], [55], [56]. In fact, major benefits of the Riemannian optimization-based LRMC techniques are the simplicity in implementation and the fast convergence. Similar to ASD, the computational 9 The Grassmann manifold is defined as the set of the linear subspaces in a vector space [55].

4) Truncated NNM: Truncated NNM is a variation of the NNM-based technique requiring
the rank information r. 10 While the NNM technique takes into account all the singular values of a desired matrix, truncated NNM considers only the n − r smallest singular values [57].
Specifically, truncated NNM finds a solution to where X r = n i=r+1 σ i (X). We recall that σ i (X) is the i-th largest singular value of X.
we have and thus the problem (42) can be reformulated to This problem can be solved in an iterative way. Specifically, starting from X 0 = P Ω (M), truncated NNM updates X i by solving [57] min where U i−1 , V i−1 ∈ R n×r are the matrices of left and right-singular vectors of X i−1 , respectively.
We note that an approach in (46) has two main advantages: 1) the rank information of the desired matrix can be incorporated and 2) various gradient-based techniques including alternating direction method of multipliers (ADMM) [80], [81], ADMM with an adaptive penalty (ADMMAP) 10 Although truncated NNM is a variant of NNM, we put it into the second category since it exploits the rank information of a low-rank matrix.
End Output X k [82], and accelerated proximal gradient line search method (APGL) [83] can be employed. Note also that the dominant operation is the truncated SVD operation and its complexity is O(rn 1 n 2 ), which is much smaller than that of the NNM technique (see Table V) as long as r ≪ min(n 1 , n 2 ).
Similar to SVT, the iteration complexity of the truncated NNM to achieve the ǫ-approximation [57]. Alternatively, the difference of two convex functions (DC) based algorithm can be used to solve (45) [58]. In Table IV, we summarize the truncated NNM algorithm.

III. ISSUES TO BE CONSIDERED WHEN USING LRMC TECHNIQUES
In this section, we study the main principles that make the recovery of a low-rank matrix possible and discuss how to exploit a special structure of a low-rank matrix in algorithm design.

A. Intrinsic Properties
There are two key properties characterizing the LRMC problem: 1) sparsity of the observed entries and 2) incoherence of the matrix. Sparsity indicates that an accurate recovery of the undersampled matrix is possible even when the number of observed entries is very small.
Incoherence indicates that nonzero entries of the matrix should be spread out widely for the efficient recovery of a low-rank matrix. In this subsection, we go over these issues in detail.

1) Sparsity of Observed Entries:
Sparsity expresses an idea that when a matrix has a low rank property, then it can be recovered using only a small number of observed entries. Natural question arising from this is how many elements do we need to observe for the accurate recovery One can easily see that if we observe all entries of one column and one row, then the rest can be determined by a simple linear relationship between these since M is the rank-one matrix.
Specifically, if we observe the first row and the first column, then the first and the second columns differ by the factor of three so that as long as we know one entry in the second column, rest will be recovered. Thus, the DOF of M is 4 + 4 − 1 = 7. Following lemma generalizes our observations. Lemma 2. The DOF of a square n × n matrix with rank r is 2nr − r 2 . Also, the DOF of n 1 × n 2 -matrix is (n 1 + n 2 )r − r 2 .
Proof: Since the rank of a matrix is r, we can freely choose values for all entries of the r columns, resulting in nr degrees of freedom for the first r column. Once r independent columns, say m 1 , · · · m r , are constructed, then each of the rest n − r columns is expressed as a linear combinations of the first r columns (e.g., m r+1 = α 1 m 1 +· · ·+α r m r ) so that r linear coefficients (α 1 , · · · α r ) can be freely chosen in these columns. By adding nr and (n − r)r, we obtain the desired result. Generalization to n 1 × n 2 matrix is straightforward.
This lemma says that if n is large and r is small enough (e.g., r = O(1)), essential information in a matrix is just in the order of n, DOF= O(n), which is clearly much smaller than the total number of entries of the matrix. Interestingly, the DOF is the minimum number of observed entries required for the recovery of a matrix. If this condition is violated, that is, if the number of observed entries is less than the DOF (i.e., m < 2nr − r 2 ), no algorithm whatsoever can recover the matrix. In Fig. III-A1, we illustrate how to recover the matrix when the number of observed entries equals the DOF. In this figure, we assume that blue colored entries are observed. 11 In a nutshell, unknown entries of the matrix are found in two-step process. First, we identify the linear relationship between the first r columns and the rest. For example, the (r + 1)-th column can be expressed as a linear combination of the first r columns. That is, Since the first r entries of m 1 , · · · m r+1 are observed (see Fig. III-A1(a)), we have r unknowns (α 1 , · · · , α r ) and r equations so that we can identify the linear coefficients α 1 , · · · α r with the computational cost O(r 3 ) of an r × r matrix inversion. Once these coefficients are identified, we can recover the unknown entries m r+1,r+1 · · · m r+1,n of m r+1 using the linear relationship in (48) (see Fig. III-A1(b)). By repeating this step for the rest of columns, we can identify all unknown entries with O(rn 2 ) computational complexity 12 . 11 Since we observe the first r rows and columns, we have 2nr − r 2 observations in total. 12 For each unknown entry, it needs r multiplication and r − 1 addition operations. Since the number of unknown entries is  the matrix is spread out widely, then the matrix can be recovered with a relatively small number 13 In [75], it has been shown that the required number of entries to recover the matrix using the nuclear-norm minimization is in the order of n 1.2 when the rank is O(1).  of entries. For example, consider the following two rank-one matrices in R n×n The matrix M 1 has only four nonzero entries at the top-left corner. Suppose n is large, say n = 1000, and all entries but the four elements in the top-left corner are observed (99.99% of entries are known). In this case, even though the rank of a matrix is just one, there is no way to recover this matrix since the information bearing entries is missing. This tells us that although the rank of a matrix is very small, one might not recover it if nonzero entries of the matrix are concentrated in a certain area.
In contrast to the matrix M 1 , one can accurately recover the matrix M 2 with only 2n − 1 (= DOF) known entries. In other words, one row and one column are enough to recover M 2 ).
One can deduce from this example that the spread of observed entries is important for the identification of unknown entries.
In order to quantify this, we need to measure the concentration of a matrix. Since the matrix has two-dimensional structure, we need to check the concentration in both row and column directions. This can be done by checking the concentration in the left and right singular vectors.
Recall that the SVD of a matrix is where U = [u 1 · · · u r ] and V = [v 1 · · · v r ] are the matrices constructed by the left and right singular vectors, respectively, and Σ is the diagonal matrix whose diagonal entries are σ i .
From (49) The coherence, a measure of concentration in a matrix, is formally defined as [75] µ(U) = n r max where e i is standard basis and P U is the projection onto the range space of U. Since the columns of U = [u 1 · · · u r ] are orthonormal, we have Note that both µ(U) and µ(V) should be computed to check the concentration on the vertical and horizontal directions. ). The lower bound is because where the first equality is due to the idempotency of P U (i.e., P T U P U = P U ) and the last equality is because n i=1 |u ij | 2 = 1. Coherence is maximized when the nonzero entries of a matrix are concentrated in a row (or column). For example, consider the matrix whose nonzero entries are concentrated on the first Note that the SVD of M is Then, U = [1 0 0] T , and thus P U e 1 2 = 1 and P U e 2 2 = P U e 3 2 = 0. As shown in Fig. III-A2(a), the standard basis e 1 lies on the space spanned by U while others are orthogonal to this space so that the maximum coherence is achieved (max i P U e i In this case, the SVD of M is Then, we have and thus P U e 1 2 2 = P U e 2 2 = P U e 3 2 = 1 3 . In this case, as illustrated in Fig. III-A2(b), P U e i 2 is the same for all standard basis vector e i , achieving lower bound in (51) and the minimum coherence (max i P U e i 2 2 = 1 3 and µ(U) = 1). As discussed in (12), the number of measurements to recover the low-rank matrix is proportional to the coherence of the matrix [22], [23], [75].

B. Working With Different Types of Low-Rank Matrices
In many practical situations where the matrix has a certain structure, we want to make the most of the given structure to maximize profits in terms of performance and computational complexity.
We go over several cases including LRMC of the PSD matrix [54], Euclidean distance matrix [4], and recommendation matrix [67] and discuss how the special structure can be exploited in the algorithm design.

1) Low-Rank PSD Matrix Completion:
In some applications, a desired matrix M ∈ R n×n not only has a low-rank structure but also is positive semidefinite (i.e., M = M T and z T Mz ≥ 0 for any vector z). In this case, the problem to recover M can be formulated as min X rank(X) subject to P Ω (X) = P Ω (M), Similar to the rank minimization problem (10), the problem (54) can be relaxed using the nuclear norm, and the relaxed problem can be solved via SDP solvers.
The problem (54) can be simplified if the rank of a desired matrix is known in advance. Let rank(M) = k. Then, since M is positive semidefinite, there exists a matrix Z ∈ R n×k such that M = ZZ T . Using this, the problem (54) can be concisely expressed as Since (55) is an unconstrained optimization problem with a differentiable cost function, many gradient-based techniques such as steepest descent, conjugate gradient, and Newton methods can be applied. It has been shown that under suitable conditions of the coherence property of M and the number of the observed entries |Ω|, the global convergence of gradient-based algorithms is guaranteed [59].

2) Euclidean Distance Matrix Completion:
Low-rank Euclidean distance matrix completion arises in the localization problem (e.g., sensor node localization in IoT networks). Let be sensor locations in the k-dimensional Euclidean space (k = 2 or k = 3). Then, the Euclidean . It is obvious that M is symmetric with diagonal elements being zero (i.e., m ii = 0). As mentioned, the rank of the Euclidean distance matrix M is at most k + 2 (i.e., rank(M) ≤ k + 2). Also, one can show that a matrix D ∈ R n×n is a Euclidean distance matrix if and only if D = D T and [52] where h = [1 1 · · · 1] T ∈ R n . Using these, the problem to recover the Euclidean distance matrix M can be formulated as Let Y = ZZ T where Z = [z 1 · · · z n ] T ∈ R n×k is the matrix of sensor locations. Then, one can easily check that Thus, by letting g(Y) = diag(Y)h T + hdiag(Y) T − 2Y, the problem in (57) can be equivalently formulated as Since the feasible set associated with the problem in (59) is a smooth Riemannian manifold [53], [54], an extension of the Euclidean space on which a notion of differentiation exists [55], [56], various gradient-based optimization techniques such as steepest descent, Newton method, and conjugate gradient algorithms can be applied to solve (59) [3], [4], [55].

3) Convolutional Neural Network Based Matrix Completion:
In recent years, approaches to use CNN to solve the LRMC problem have been proposed. These approaches are particular useful when a desired low-rank matrix is expressed as a graph model (e.g., the recommendation matrix with a user graph to express the similarity between users' rating results) [67], [64], [65], [66], [70], [68], [69]. The main idea of CNN-based LRMC algorithms is to express the low-rank matrix as a graph structure and then apply CNN to the constructed graph to recover the desired matrix.

Graphical Model of a Low-Rank
The adjacency matrix W r ∈ R n 1 ×n 2 of the row graph G r is defined in a similar way.

CNN-based LRMC:
Let U ∈ R n 1 ×r and V ∈ R n 2 ×r be matrices such that M = UV T . The primary task of the CNN-based approach is to find functions f r and f c mapping the vertex sets of the row and column graphs G r and G c of M to U and V, respectively. Here, each vertex of difficult to express f r and f c explicitly, we can learn these nonlinear mappings using CNN-based models. In the CNN-based LRMC approach, U and V are initialized at random and updated in each iteration. Specifically, U and V are updated to minimize the following loss function [67]: where τ is a regularization parameter. In other words, we find U and V such that the Euclidean distance between the connected vertices is minimized (see (61)). The update procedures of U and V are [67]: 1) Initialize U and V at random and assign each row of U and V to each vertex of the row graph G r and the column graph G c , respectively.
2) Extract the feature matrices ∆U and ∆V by performing a graph-based convolution operation on G r and G c , respectively.
3) Update U and V using the feature matrices ∆U and ∆V, respectively. (61) using updated U and V and perform the back propagation to update the filter parameters. 5) Repeat the above procedures until the value of the loss function is smaller than a prechosen threshold.

4) Compute the loss function in
One important issue in the CNN-based LRMC approach is to define a graph-based convolution operation to extract the feature matrices ∆U and ∆V (see the second step). Note that the input data G r and G c do not lie on regular lattices like images and thus classical CNN cannot be directly applied to G r and G c . One possible option is to define the convolution operation in the Fourier domain of the graph. In recent years, CNN models based on the Fourier transformation of graphstructure data have been proposed [68], [69], [70], [71], [72]. In [68], an approach to use the eigendecomposition of the Laplacian has been proposed. To further reduce the model complexity, CNN models using the polynomial filters have been proposed [70], [69], [71]. In essence, the Fourier transform of a graph can be computed using the (normalized) graph Laplacian. Let R r be the graph Laplacian of G r (i.e., R r = I − D −1/2 r Then, the graph Fourier transform F r (u) of a vertex assigned with the vector u is defined as where R r = Q r Λ r Q T r is an eigen-decomposition of the graph Laplacian R r [63]. Also, the inverse graph Fourier transform F −1 r (u ′ ) of u ′ is defined as 14 Let z be the filter used in the convolution, then the output ∆u of the graph-based convolution on a vertex assigned with the vector u is defined as [63], [70] From (62) and (63), (64) can be expressed as where G = diag(F r (z)) is the matrix of filter parameters defined in the graph Fourier domain.
We next update U and V using the feature matrices ∆U and ∆V. In [67], a cascade of multi-graph CNN followed by long short-term memory (LSTM) recurrent neural network has been proposed. The computational cost of this approach is O(r|Ω| + r 2 n 1 + r 2 n 2 ) which is much lower than the SVD-based LRMC techniques (i.e., O(rn 1 n 2 )) as long as r ≪ min(n 1 , n 2 ).
Finally, we compute the loss function l(U i , V i ) in (61)  14 One can easily check that F −1 r (Fr(u)) = u and Fr(F −1 r (u ′ )) = u ′ .

4) Atomic Norm Minimization:
In ADMiRA, a low-rank matrix can be represented using a small number of rank-one matrices. Atomic norm minimization (ANM) generalizes this idea for arbitrary data in which the data is represented using a small number of basis elements called atom. Example of ANM include sound navigation ranging systems [85] and line spectral estimation [86]. To be specific, let X = r i=1 α i H i be a signal with k distinct frequency components Then the atom is defined as is the steering vector and b i ∈ C n 2 is the vector of normalized coefficients (i.e., b i 2 = 1). We denote the set of such atoms H i as H. Using H, the atomic norm of X is defined as Note that the atomic norm X H is a generalization of the ℓ 1 -norm and also the nuclear norm to the space of sinusoidal signals [86], [39].
Let X o be the observation of X, then the problem to reconstruct X can be modeled as the ANM problem: where τ > 0 is a regularization parameter. By using [87, Theorem 1], we have and the equivalent expression of the problem (68) is Note that the problem (70) can be solved via the SDP solver (e.g., SDPT3 [24]) or greedy algorithms [88], [89].

IV. NUMERICAL EVALUATION
In this section, we study the performance of the LRMC algorithms. In our experiments, we focus on the algorithm listed in Table V. The original matrix is generated by the product of two random matrices A ∈ R n 1 ×r and B ∈ R n 2 ×r , i.e., M = AB T . Entries of these two matrices, a ij and b pq are identically and independently distributed random variables sampled from the normal distribution N (0, 1). Sampled elements are also chosen at random. The sampling ratio p is defined as where |Ω| is the cardinality ( In the performance evaluation of the LRMC algorithms, we use the mean square error (MSE) and the exact recovery ratio, which are defined, respectively, as where M is the reconstructed low-rank matrix. We say the trial is successful if the MSE performance is less than the threshold ǫ. In our experiments, we set ǫ = 10 −6 . Here, R can be used to represent the probability of successful recovery.
We first examine the exact recovery ratio of the LRMC algorithms in terms of the sampling ratio and the rank of M. In our experiments, we set n 1 = n 2 = 100 and compute the phase transition [90] of the LRMC algorithms. Note that the phase transition is a contour plot of the success probability P (we set P = 0.5) where the sampling ratio (x-axis) and the rank (y-axis) form a regular grid of the x-y plane. The contour plot separates the plane into two areas: the area above the curve is one satisfying P < 0.5 and the area below the curve is a region achieving P > 0.5 [90] (see Fig. IV). The higher the curve, therefore, the better the algorithm would be.
In general, the LRMC algorithms perform poor when the matrix has a small number of observed We next examine the efficiency of the LRMC algorithms for different problem size (see Table   VI). For iterative LRMC algorithms, we set the maximum number of iteration to 300. We see that LRMC algorithms such as SVT, IRLS-M, ASD, ADMiRA, and LRGeomCG run fast. For example, it takes less than a minute for these algorithms to reconstruct 1000×1000 matrix, while In the noisy scenario, we also observe that FNM-based algorithms perform well (see Fig. IV and Fig. IV). In this experiment, we compute the MSE of LRMC algorithms against the rank of the original low-rank matrix for different setting of SNR (i.e., SNR = 20 and 50 dB). We observe that in the low and mid SNR regime (e.g., SNR = 20 dB), TNNR-ADMM performs comparable to the NNM-based algorithms since the FNM-based cost function is robust to the noise. In the high SNR regime (e.g., SNR = 50 dB), the convex algorithm (NNM with SDPT3) exhibits the best performance in term of the MSE. The performance of TNNR-ADMM is notably better than that of the rest of LRMC algorithms. For example, given rank(M) = 20, the MSE of TNNR-ADMM is around 0.04, while the MSE of the rest is higher than 1.
Finally, we apply LRMC techniques to recover images corrupted by impulse noise. In this experiment, we use 256×256 standard grayscale images (e.g., boat, cameraman, lena, and pepper images) and the salt-and-pepper noise model with different noise density ρ = 0.3, 0.5, and 0.7.
For the FNM-based LRMC techniques, the rank is given by the number of the singular values σ i being greater than a relative threshold ǫ > 0, i.e., σ i > ǫ max i σ i . From the simulation results, we observe that peak SNR (pSNR), defined as the ratio of the maximum pixel value of the image to noise variance, of all LRMC techniques is at least 52dB when ρ = 0.3 (see Table VII). In particular, NNM using SDPT3, SVT, and IRLS-M outperform the rest, achieving pSNR≥ 57 dB  even with high noise level ρ = 0.7.

V. CONCLUDING REMARKS
In this paper, we presented a contemporary survey of LRMC techniques. Firstly, we classified state-of-the-art LRMC techniques into two main categories based on the availability of the rank information. Specifically, when the rank of a desired matrix is unknown, we formulated the LRMC problem as the NNM problem and discussed several NNM-based LRMC techniques such as SDP-based NNM, SVT, and truncated NNM. When the rank of an original matrix is known a priori, the LRMC problem can be modeled as the FNM problem. We discussed various FNM-based LRMC techniques (e.g., greedy algorithms, alternating projection methods, and optimization over Riemannian manifold). Secondly, we discussed fundamental issues and principles that one needs to be aware of when solving the LRMC problem. Specifically, we discussed two key properties, sparsity of observed entries and the incoherence of an original matrix, characterizing the LRMC problem. We also explained how to exploit the special structure  Table V). In general, there is a trade-off between the running time and the recovery performance. In fact, FNM-based LRMC algorithms such as LMaFit and ADMiRA converge much faster than the convex optimization based algorithms (see Table VI), but the NNM-based LRMC algorithms are more reliable than FNM-based LRMC algorithms (see Fig. IV and IV). So, one should consider the given setup and operating condition to obtain the best trade-off between the complexity and the performance.
We conclude the paper by providing some of future research directions.
• When the dimension of a low-rank matrix increases and thus computational complexity increases significantly, we want an algorithm with good recovery guarantee yet its complexity scales linearly with the problem size. Without doubt, in the real-time applications such as IoT localization and massive MIMO, low-complexity and short running time are of great importance. Development of implementation-friendly algorithm and architecture would accelerate the dissemination of LRMC techniques.
• Most of the LRMC techniques assume that the original low-rank matrix is a random matrix whose entries are randomly generated. In many practical situations, however, entries of the matrix are not purely random but chosen from a finite set of integer numbers. In where C is a given matrix, and {A k } l k=1 and {b k } l k=1 are given sequences of matrices and constants, respectively. To convert the NNM problem in (11) into the standard SDP form in (71), we need a few steps. First, we convert the NNM problem in (11) into the epigraph form: 15 min X,t t subject to X * ≤ t, P Ω (X) = P Ω (M).
Next, we transform the constraints in (72) to generate the standard form in (71). We first consider the inequality constraint ( X * ≤ t). Note that X * ≤ t if and only if there are symmetric matrices W 1 ∈ R n 1 ×n 1 and W 2 ∈ R n 2 ×n 2 such that [21, Lemma 2] tr(W 1 ) + tr(W 2 ) ≤ 2t and where P Ω (Y) =   0 n 1 ×n 1 P Ω (X) (P Ω (X)) T 0 n 2 ×n 2   is the extended sampling operator. We now consider the equality constraint (P Ω (Y) = P Ω ( M)) in (74). One can easily see that this condition is equivalent to Y, e i e T j+n 1 = M, e i e T j+n 1 , (i, j) ∈ Ω, where {e 1 , · · · , e n 1 +n 2 } be the standard ordered basis of R n 1 +n 2 . Let A k = e i e T j+n 1 and M, e i e T j+n 1 = b k for each of (i, j) ∈ Ω. Then, Y, A k = b k , k = 1, · · · , |Ω|, 15 Note that min X X * = min One can express (77) in a concise form as (13), which is the desired result.

APPENDIX B PERFORMANCE GUARANTEE OF NNM
Sketch of proof: Exact recovery of the desired low-rank matrix M can be guaranteed under the uniqueness condition of the NNM problem [22], [23], [75]. To be specific, let M = UΣV T be the SVD of M where U ∈ R n 1 ×r , Σ ∈ R r×r , and V ∈ R n 2 ×r . Also, let R n 1 ×n 2 = T T ⊥ be the orthogonal decomposition in which T ⊥ is defined as the subspace of matrices whose row and column spaces are orthogonal to the row and column spaces of M, respectively. Here, T is the orthogonal complement of T ⊥ . It has been shown that M is the unique solution of the NNM problem if the following conditions hold true [22, Lemma 3.1]: 1) there exists a matrix Y = UV T + W such that P Ω (Y) = Y, W ∈ T ⊥ , and W < 1, 2) the restriction of the sampling operation P Ω to T is an injective (one-to-one) mapping.
The establishment of Y obeying 1) and 2) are in turn conditioned on the observation model of where µ 0 and µ 1 are some constants, e ij is the entry of E = UV T , and µ(U) and µ(V) are the coherences of the column and row spaces of M, respectively.
where γ > 2 is some constant and n 1 = n 2 = n, then M is the unique solution of the NNM problem with probability at least 1 − βn −γ . Further, if r ≤ µ −1 0 n 1/5 , (80) can be improved to m ≥ Cµ 0 γn 1.2 r log n with the same success probability.
One direct interpretation of this theorem is that the desired low-rank matrix can be reconstructed exactly using NNM with overwhelming probability even when m is much less than n 1 n 2 . A solver for conic programming prob-  MSE RESULTS FOR DIFFERENT PROBLEM SIZES WHERE RANK(M) = 5, AND p = 2 × DOF n 1 = n 2 = 50 n 1 = n 2 = 500 n 1 = n 2 = 1000