An Efficient Tensor Completion Method via New Latent Nuclear Norm

In tensor completion, the latent nuclear norm is commonly used to induce low-rank structure, while substantially failing to capture the global information due to the utilization of unbalanced unfolding scheme. To overcome this drawback, a new latent nuclear norm equipped with a more balanced unfolding scheme is defined for low-rank regularizer. Moreover, the new latent nuclear norm together with the Frank-Wolfe (FW) algorithm is developed as an efficient completion method by utilizing the sparsity structure of observed tensor. Specifically, both FW linear subproblem and line search only need to access the observed entries, by which we can instead maintain the sparse tensors and a set of small basis matrices during iteration. Most operations are based on sparse tensors, and the closed-form solution of FW linear subproblem can be obtained from rank-one SVD. We theoretically analyze the space-complexity and time-complexity of the proposed method, and show that it is much more efficient over other norm-based completion methods for higher-order tensors. Extensive experimental results of visual-data inpainting demonstrate that the proposed method is able to achieve state-of-the-art performance at smaller costs of time and space, which is very meaningful for the memory-limited equipment in practical applications.


I. INTRODUCTION
In the past decades, tensor completion has aroused increasing attention due to its wide applications in a variety of fields, such as computer vision [1]- [8], multi-relational link prediction [9]- [11], and recommendation system [12]- [15]. The goal of tensor completion is to recover an incomplete tensor from partially observed entries, and the most existing methods try to achieve it via the low-rank structure assumption. To our best knowledge, these tensor completion methods can mainly be categorized into tensor decomposition based method and rank-minimization based method.
Tensor decomposition based method aims to decompose the incompleted tensor into a sequence of low-rank factors and then predict the missing entries via the latent factors. For example, the CANDECOMP/PARAFAC (CP) decomposition based methods [16]- [21] recover the target tensor by a summation of component rank-one tensors, and the Tucker decomposition based methods [22]- [25] via a core tensor multiplied by a low-rank matrix along each mode. In recent years, the Tensor-Train and Tensor-Ring decompositions are commonly used to express the higher-order incomplete tensor by a multilinear product over a sequence of low-order latent cores [26]- [29]. Unfortunately, the tensor decomposition based 1 Faculty of Automation, Guangdong University of Technology, Guangzhou, 510006, China. method is non-convex, may suffer from the problem of local solutions. In addition, most of the tensor decomposition based methods require predefined rank, and their performance is rather sensitive to the rank selection. For the Tucker, Tensor-Train, and Tensor-Ring decompositions, the rank is defined as a vector; it, therefore, requires a computational expensive cost to find the optimal rank due to the immense selections.
Rank-minimization based method is another type of approach to exploit the low-rank structure of incompleted tensor. Since the tensor rank minimization rank(·) is an NP-hard problem, a number of norms are defined as the convex surrogates of tensor rank, and the most commonly used ones are overlapped nuclear norm [30]- [32] and latent nuclear norm [33], [34]. In [30], the overlapped nuclear norm via Tucker rank was first proposed by assuming all modes are low-rank, while it performs poorly when the target tensor is only low-rank in a certain mode. In contrast to the overlapped nuclear norm, the latent nuclear norm [33] generalizes better, especially for the tensor with only several modes low-rank. However, these two norm regularizers are based on the unbalanced mode-k unfolding scheme, and therefore the unfolding matrices are usually unbalanced. For a significantly-unbalanced matrix of size m × n, the matrix rank substantially fails to capture the global information of the target tensor due to the small upper bound min{m, n}. Considering the powerful capacity of Tensor Train decomposition for representing higher-order tensors, the overlapped and latent nuclear norms via Tensor Train are proposed in [31] and [34], respectively. These two norms are still based on the unbalanced unfolding scheme, i.e., k-mode unfolding scheme (the first k modes versus the rest). Though the Tensor Ring nuclear norm [32] applied a more balance scheme to unfold the target tensor, a set of weightingparameters are needed to carefully tune, which spent an expensive cost. Finally, the above-mentioned norm regularizers are commonly minimized by the alternating direction method of multipliers (ADMM) and block coordinate descent (BCD) algorithms, where the computational expensive partial-SVD operation on a large dense matrix is usually required.
To address the above-mentioned drawbacks, this paper defines a new latent nuclear norm by using a more balanced unfolding scheme, which is shown more powerful over the other norm regularizers in exploiting the low-rank global information of the target tensor. It should be noted that, though we applied the same balanced unfolding scheme as the overlapped TR nuclear norm in the new norm, it needn't additional weighting-parameters for the unfolding matrices. Moreover, instead of simply utilizing the expensive ADMM or BCD algorithms, the Frank-Wolfe (FW) algorithm is devel-oped to minimize the proposed latent nuclear norm for tensor completion. Under the FW framework, we show that linear subproblem has a closed-form solution which can be obtained from the rank-one SVD, and most steps of the algorithm only need to access the observed entries. By utilizing sparsity of the observed tensor, we can only maintain the sparse tensor and small basis matrices instead of full-size tensors, thus require much smaller space in each iteration. Due to the proposed method operates on the sparse tensors and only need to perform rank-one SVD during iteration, it requires much smaller time-complexity over other tensor norms, which is discussed later. Furthermore, extensive experimental results of visual-data inpainting confirm that the proposed method is able to achieve state-of-the-art performance at smaller costs of time and space, which is very meaningful for the memorylimited equipment in practical applications. To sum up, the contributions of this paper are listed below: • By using a more balanced unfolding scheme, a new latent nuclear norm is proposed, which is shown more powerful over other norm regularizers to exploit global information of the target tensor. • An efficient method, i.e. the new latent nuclear norm together with the Frank-Wolfe algorithm, is developed for tensor completion, which requires much smaller complexity over other tensor norms in terms of space and time. • The proposed method requires neither predefined rank nor additional weighting-parameters for the unfolding matrices and is empirically shown to achieve outstanding performance at smaller costs of time and space. This is very meaningful for the memory-limited equipment in practical applications. The rest of this paper is organized as follows. The related works are described in Section II. Notations and preliminaries required in this paper are introduced in Section III. In Section IV, we define a new latent nuclear norm and develop an efficient Frank-Wolfe based algorithm. Moreover, the complexities of time and space are also theoretically analyzed. In Section V, performance of the proposed method is investigated in synthetic data and real-world visual data. Finally, the work of this paper is concluded in section VI.

II. RELATED WORKS
Our work is somewhat related to latent-norm based completion methods [33], [34] and Tensor-Ring based completion methods [28], [32], [35]. In [33], Tomioka et al. proposed the latent nuclear norm by mode-k unfolding scheme (one mode versus the rest), and shown that it generalizes better than the overlapped nuclear norm [30] when only several modes are low-rank. Since the mode-k unfolding scheme is significantlyunbalanced, the unfolding matrix is usually unbalanced and the rank is often too small to describe the global information of target tensor. Recently, Wang et al. [34] defined a new latent nuclear norm via Tensor Train, however it may still base on the significantly-unbalanced matrix due to the unbalanced kmode unfolding scheme. In recent years, Wang et al. [28] first applied Tensor Ring decomposition by alternating least square (TR-ALS) to incomplete data. Yuan et al. [35] proposed a method, named Tensor Ring low-rank factors (TRLRF), by combining nuclear norm regularization and TR decomposition. However, these two TR-decomposition based methods require a large computational complexity per iteration and thus may run out of the memory when encountering the large-scale data. Moreover, the TR-rank is defined as a vector and it is therefore very challenging to manually find the optimal rank due to the immense selections. In [32], Yu et al. defined an overlapped Tensor Ring nuclear norm by a more balanced unfolding scheme and showed that it substantially improves the recovery performance in visual-data inpainting. Unfortunately, its computational complexity is still large and a set of weighting-parameters require computational expensive tuning.
In contrast, the proposed latent nuclear norm is defined via a more balanced unfolding scheme and requires neither predefined rank nor additional weighting-parameters. Moreover, the new latent nuclear norm together with the FW algorithm is developed as an efficient method, which is shown more powerful to exploit global information at smaller costs of time and space.

B. Preliminaries
In this section, we briefly describe the Tensor Ring decomposition, Tensor Circular Unfolding, and their relation.
Tensor Ring decomposition [36], [37] is recently proposed to represent a higher-order tensor by a sequence of 3rdorder latent core tensors, i.e. TR-cores. Specifically, given an N th-order tensor X ∈ R I1×I2×···×I N , the TR-cores can be denoted by G k ∈ R R k−1 ×I k ×R k and the TR-rank by the vector Ring decomposition of X can be formally expressed by where T r(·) is the matrix trace operation. More details of Tensor Ring decomposition can be seen in [36], [37].
To efficiently exploit the global information of high-order tensors, Yu et al. [32], [38] defined a balance unfolding scheme named Tensor Circular Unfolding (TCU) in Definition 1 and described its relation with TR decomposition in Theorem 1. Definition 1. (Tensor Circular Unfolding [32], [38]) Suppose an N th-order tensor X ∈ R I1×···×I N , the tensor circular  Fig. 1: Illustration of Tensor Ring representation of an 5thorder tensor X ∈ R I1×I2×I3×I4×I5 and its Tensor Circular Unfoldings. Each node of {G k ∈ R r k−1 ×I k ×r k } 5 k=1 denotes a tensor whose order decided by its number of edges. The edge connecting two nodes denotes a contraction between two tensors along a specific mode. The Tensor Circular Unfoldings {X <k,2> } 5 k=1 are easily obtained by unfolding X along modes {k − 1, k} specified by a red arc.
where d < N is a positive integer and The d continuous modes {a, a+1, · · · , k} enumerate the rows of X <k,d> , and the rest modes its columns. To easily understand the Tensor Circular Unfolding scheme, Fig. 1 illustrates the circularly-unfolding matrices {X <k,2> } 5 k=1 obtained by unfolding X along modes {k − 1, k} specified by a red arc.
This theorem theoretically reveals the relation of Tensor Circular Unfolding scheme and Tensor Ring decomposition, which implies that the low-rank global information can be exploited by Tensor Circular Unfolding scheme.

IV. LATENT TENSOR-RING NUCLEAR NORM AND FRANK-WOLFE BASED ALOGRITHM A. Latent Tensor-Ring Nuclear Norm
As well-known, in tensor completion, most common definitions of the nuclear norm are overlapped nuclear norm and latent nuclear norm via Tucker/TT rank [30], [31], [33], [34]. These nuclear norms are based on mode-k unfolding scheme (one mode versus the rest) or k-modes unfolding scheme (the first k modes versus the rest), and thus may construct significantly-unbalanced unfoldings. For a significantlyunbalanced matrix of size m × n, enough large rank is usually required to describe the global information, while it fails due to the small upper bound min{m, n}. Though TR nuclear norm [32] applied a more balanced unfolding scheme, i.e. Tensor Circular Unfolding (TCU), to exploit the global information and achieve a rather-well performance, its computational expensive selection of weighting-parameters seems inappropriate in practical applications. Moreover, we found that the performance of TR nuclear norm largely depends on the selection of its weighting-parameters. To solve the issues that the above mentioned nuclear norms have, a new nuclear norm named latent TR nuclear norm is defined as follow by using TCU scheme.
Definition 2. (Latent Tensor-Ring Nuclear Norm) Suppose an N th-order tensor X ∈ R I1×···×I N , the latent Tensor-Ring nuclear norm is Note that, latent TR nuclear norm is defined as the infimum over N tensors {X k } N k=1 which are respectively low-rank in the specific unfolding (X k ) <k,d> .
Therefore, a new tensor completion model via latent Tensor-Ring nuclear norm is formulated as where T ∈ R I1×I2×···×I N and X I1×I2×···×I N are true tensor and reconstructed tensor, respectively. Ω denotes the index set of the observed entries, so T Ω represents the observed entries from the true tensor. (X k ) <k,d> is the circularlyunfolded matrix with size m k ×n k where m k = I a I a+1 · · · I k , n k = I k+1 I k+2 · · · I a−1 . Since the balanced unfolding scheme does help to catch the global information, d is default set as N 2 .

B. Frank-Wolfe Based Algorithm
Though the alternating direction method of multipliers (ADMM) and block coordinate descent (BCD) are usually used to solve the nuclear norm based completion model, they have to operate on the full-size tensors and perform partial-SVD during iterations [7], [30]- [32], [35], [39], [40]. This substantially requires large costs in time and space when encountering large-scale data. Similar to [11], this section instead develops the Frank-Wolfe [41], [42] based algorithm to solve the problem (5) by utilization of sparsity structure and rank-one SVD operation in each iteration, which will be shown much more efficient in time and space later. Under the Frank-Wolfe framework, we first transform (6) into where ∈ Ω, and 0 otherwise. β > 0 is a constraint parameter. Then we solve the problem (7) via the following three steps: Linear subproblem of S (t+1) . For the linear subproblem, S (t+1) := arg min S∈D < S, ∇F (X (t) ) >, Proposition 1 shows that the closed-form solution can be obtained efficiently from rank-one SVD.
Proposition 1. The closed-form solution of the linear problem S (t+1) := arg min S∈D < S, ∇F (X (t) ) > can be given by where k * = arg max k∈D σ max (−∇F (X ) <k,d> ), (u k * , v k * ) denote a pair of left and right singular vectors corresponding to the largest singular value σ max (−∇F (X ) <k * ,d> ).
Proof. Let S ltrnn be the latent TR norm of S, then its dual norm can be defined as From this definition and constraint S ltrnn ≤ β, it is easy to get that Note that, according to [33], the dual norm ∇F (X ) * ltrnn can be given by where − ∇F (X ) <k,d> 2 denotes the spectral norm, i.e., the greatest singular value of −∇F (X ) <k,d> . Hence, It is not difficult to find that the minimum of < S, ∇F (X (t) ) > is obtained when Therefore, we can get S (t+1) = fold k * (βu k * v k * ).
Seen from the problem (7), it is easy to check that ∇F (X (t) ) = P Ω (X ) − P Ω (T ), and its rank-one SVD can be computed efficiently by the power method in [43].
Line search of γ (t+1) . With F in problem (7), the step-size γ t+1 can be given by solving the following problem: Note that the problem (14) is essentially a quadratic equation of γ, i.e., (âγ 2 +bγ +ĉ) F . Hence, it is easy to get a simple closed-form solution: Update X (t+1) . Note that the update of γ (t+1) only needs to access the entries indexed by Ω, i.e., S (t+1) Ω , X Ω . Hence, instead of calculating and storing the full tensors during iterations, we can follow an efficiently update scheme proposed in [11]. This efficiently update scheme consists of two steps. The step 1 is to only store the sparse tensors S where {U k , Σ k , V k } N k=1 are initialized to empty matrices. It is not difficult to check that the above formulas satisfy the update of X (t+1) := (1 − γ (t+1) )X (t) + γ (t+1) S (t+1) . Step 2 is using a trick shown in Algorithm 1 to reduce the size of the basis matrices without considerably increasing the objective function value F (X ) when k=1 N R k >R, whereR is a given threshold. This trick avoids the problem that the basis matrices gradually increase in size and then cause memory-explosion.
We summarize the complete procedure in Algorithm 2. Since the algorithm only accesses the observed entries of S, X and require rank-one SVD operation, it is efficient in terms of both space and time.

C. Analysis of Space-Complexity and Time-Complexity
It is well-known that the complexities of space and time are very important to evaluate one algorithm. In this section, for an N th-order tensor X with size I × I × · · · × I, we aim to analyze the proposed method in terms of space complexity and time complexity. Seen from the Algorithm 2, all the operations are based on the sparse tensors of Ω 1 observed entries and a set of basis matrices Thus, the space complexity of the proposed method is O( For the timecomplexity of the proposed method, the main per-iteration cost Algorithm 1 Reducing the size of basis matrices.
R k = number of nonzero elements in Σ J 9: end for Output: Algorithm 2 FW-based algorithm for latent Tensor-Ring nuclear norm minimization. Input: Partically observed entries T Ω , Parameters:R, tol = 10 −5 . 1: Initialize: R k * = R k * + 1; 10: if k=1 N R k >R then 11: Reducing the size of {U k , Σ k , V k } N k=1 by Algorithm 1; 12: lies in the update of S (t+1) Ω which consist of the rank-one SVDs of −∇F (X ) <k,d> ∈ R I d ×I N −d for k = 1, · · · , N and the computation of Equation (17). The rank-one SVDs performed by the power method require a cost of O(N (I N + I d + I N −d )), and the time-cost of Equation (17) is O( Ω 1 ). Therefore, the overall time-complexity of the proposed method is O(N (I N + I d + I N −d )) per iteration.
Since HaLTRC and TRNNM impose N auxiliary variables and N Lagrangian multipliers to simplify the optimization, they both require a space-complexity of O((2N + 1)I N ) per iteration. TNN requires two additional variables, and SiLRTC-TT has N auxiliary variables. Thus their per-iteration spacecomplexities are O(3I N ) and O((N + 1)I N ), respectively. Similar to the proposed algorithm, FFWTensor only needs to store the sparse tensors and a set of basis matrices at a cost of O((I N −1 + I + 1)R + Ω 1 ) per iteration. And the periteration time-complexity of these algorithms can be obtained according to the corresponding papers.
Seen from TABLE I, it is not difficult to observe that LTRNNM requires a much smaller space-complexity over the other compared algorithms when the target tensor X has a high missing ratio and R << I d . This is because the sparsity structure of X is efficiently used in LTRNNM. When N = 3, LTRNNFW reduces to the unscaled version of FFWTensor, thus they have the same space-complexity. It is worthy noting that LTRNNM requires much lesser storage space over FFWTensor when N > 3, due to (I d + I N −d ) is significantly smaller than (I N −1 + I).
Note that, both LTRNNFW and FFWTensor have the smaller order of magnitude of time-complexity than the other compared algorithms, which is benefit from the sparsity structure of the target tensor and the efficient rank-one SVD used during iterations. In contrast, other algorithms have to operate on the full-sized tensors and perform partial-SVD in each iteration. Typically, performing rank-one SVD is much significantly faster than partial-SVD, especially for the large scale matrix. Therefore, it is not surprising that LTRNNFW and FFWTensor are time-efficient.

V. EXPERIMENTS A. Effect of β for the Proposed Method
This section aims to investigate the effect of the constraint parameter β for the proposed method on the synthetic data X ∈ R I1×I2×···×I N with the latent structure of X = N k=1 X k . All the {X k } N k=1 are generated such that (X k ) <k,d> ∈ R m k ×n k has a low-rank structure, i.e. (X k ) <k,d> = AB , where the values of A ∈ R m k ×r k and B ∈ R n k ×r k are drawn randomly from the standard Gaussian distribution N (0, 1). For simplicity, we set the dimension of each mode same and so does the corresponding low-ranks, i.e., I 1 = I 2 = · · · = I N = I, R 1 = R 2 = · · · = R N . The uniformly random missing ratio of 50% is considered in this experiment, and the relative squared error (RSE) is used as the evaluation index. The RSE between the estimationX and the true one X is defined by RSE = X −X F / X F . Fig. 2 shows the plots of RSE versus β for tensors of different size 30 × 30 × 30 × 30 (4D), 20 × 20 × 20 × 20 × 20 (5D), 10 × 10 × 10 × 10 × 10 × 10 (6D) and corresponding rank tuples (5, 5, 5, 5) (4D), (6, 6, 6, 6, 6) (5D), (7,7,7,7,7,7) (6D). The plots illustrate that the proposed method is robust to constraint parameter β in a wide range, which is an important property for algorithms in practical applications.

B. Performance in High-Order Form
To the best of our knowledge, reshaping low-order tensors into high-order tensors is a common practice to improve the performance for TT/TR-based methods on visual-data completion [27], [28], [31], [32], [38]. To evaluate the proposed method in high-order form, the first 180 frames of the brain Magnetic Resonance Imaging (MRI) [30] with cropped size 180 × 216 is considered in this experiment. Thus, we present the MRI data by the 3rd-order tensor of size 180 × 216 × 180 and further reshape into tensors of size 12 × 15 × 12 × 18 × 12 × 15 (6D), 4 × 5 × 9 × 4 × 6 × 9 × 4 × 5 × 9 (9D) and 4 × 5 × 3×3×4×6×3×3×4×5×3×3 (12D). RSE, peak signal-tonoise ratio (PSNR), structural similarity (SSIM) [45], storage size during iteration (SSDI) and RunTime are used to evaluate the performance. The PSNR between the estimationX and the true one X is defined by PSNR = 10 log 10 (255 2 /MSE), where MSE = X −X 2 F /num(X ) and num(X ) denotes the number of entries of X . We choose FFWTensor method to be the baseline, due to it and the proposed method both took full advantage of the sparsity structure of the observed tensor during iterations. For simplicity, the SSDI of both FFWTensor and the proposed method is defined by a sum of the total number of entries of basis matrices {U k ∈ R p k ×r k , Σ k ∈ R r k ×r k , V k ∈ R r k ×q k } N k=1 and the number of observed entries, i.e., SSDI = N k=1 (p k r k + r k + q k r k ) + Ω 1 . TABLE II shows the performance of FFWTensor and LTRNNFW under different-order form {3D, 6D, 9D, 12D} and missing ratios {70%, 75%, 80%, 85%, 90%, 95%}. Obviously, the proposed method obtains significantly better results in the high-order form {6D, 9D}, while slightly degrades the performance after further reshaping into 12D form. This implies that reshaping low-order tensor to highorder tensor does help to improve the performance, especially when reshaping into an appropriate high-order form. However, FFWTensor achieves the worse performance after reshaping into high-order form. In addition, it can be observe that: • In 3D case, the proposed method obtains similar results as FFWTensor, which is caused by that the proposed method reduces to the unscaled version of FFWTensor when encountering 3rd-order tensors. • In high-order cases, i.e. {6D, 9D, 12D}, the proposed method significantly obtains better results over FFWTensor in terms of RSE, PSNR, SSIM, SSDI and RunTime. Note that, the main difference of the proposed method from FFWTensor is a more balanced unfolding scheme applied in the proposed method. Better results of {RSE, PSNR, SSIM} illustrate the powerful ability of the balanced unfolding scheme in catching the global information. Smaller values of SSDI and RunTime imply stronger power of data-representation and more space-and-time efficiency, which is meaningful when encountering largescale data or the memory is limited. Moreover, as shown in Fig. 3, the recovery frame by the proposed method is more clear than that by FFWTensor. All these results show the superiority of the proposed method in processing the high-order tensors.
As shown in Fig. 4  the powerful ability of a more balanced unfolding scheme in catching the global information. Smaller time-cost is caused by the efficiently-utilization of sparsity structure and rank-one SVD operation during iteration. Though FFWTensor method spends comparable time-cost with the proposed method, it fails to achieve good performance as the proposed method in most cases, especially in high missing-ratio cases {90%, 95%}. The other methods (i.e. HaLRTC, SiLRTC-TT, TNN, TRNNM) can achieve comparable results with the proposed method in some cases, however, require largely time-cost. Moreover, for HaLRTC, SiLRTC-TT, and TRNNM, the computational expensive determination of several weighting-parameters significantly increase their time-cost. These imply that, compared to the proposed method, other norm-based completion methods are not good choices for the large-scale data in practical applications. In addition, the visual results of each method on these three data sets are shown in Fig. 7, 8, 9. Observe that the proposed method obtains the recovery images with a better resolution and captures much more detailed information, e.g. wheel, beard, and eyes.

VI. CONCLUSION
In this paper, a new latent nuclear norm equipped with a more balanced unfolding scheme is defined for low-rank regularization, and an efficient Frank-Wolfe algorithm is developed for optimization by utilization of sparsity structure and rankone SVD operation. We theoretically analyze that the proposed method is much more efficient over other norm-based methods in terms of both time and space, which is important for the memory-limited equipment in practical applications. Furthermore, extensive experimental results confirm that the proposed method can achieve state-of-the-art performance in visual-data inpainting at smaller costs of time and space.