Joint Factors and Rank Estimation for the Canonical Polyadic Decomposition Based on Convex Optimization

Estimating the minimal number of rank-1 tensors in the Canonical Polyadic Decomposition (CPD), known as the canonical rank, is a challenging area of research. To address this problem, we propose a method based on convex optimization to jointly estimate the CP factors and the canonical rank called FARAC for joint FActors and RAnk canonical estimation for the PolyadiC Decomposition. We formulate the FARAC method as a convex optimization problem in which a sparse promoting constraint is added to the superdiagonal of the core tensor of the CPD, whereas the Frobenius norm of the offdiagonal terms is constrained to be bounded. We propose an alternated minimization strategy for the Lagrangien to solve the optimization problem. The FARAC method has been validated on synthetic data with varying levels of noise, as well as on three real data sets. Compared to state-of-the-art methods, FARAC exhibits very good performance in terms of rank estimation accuracy for a large range of SNR values. Additionally, FARAC can handle the case in which the canonical rank exceeds one of the dimensions of the input tensor.


I. INTRODUCTION
Large amounts of data are usually generated by sensor networks, massive experiment, simulations, etc. In a very wide range of applications, data need more than two dimensions for efficient description [2], [10], [28], [41]. To represent such data, multidimensionnal arrays (a.k.a. tensors) are suitable in order to capture complex interactions among input features of data. Tensors can be seen as a generalization of vectors (1st order tensors) and matrices (2nd order tensors). The order of a tensor is its number of dimensions.
The most popular tensor decomposition is the Canonical Polyadic Decomposition (CPD) [37]. The CPD is widespreadly used in differents fields such asche mometrics, telecommunications, blind source separation [16], [24], [26], [27], [39]. CPD is particularly attractive owing to its The associate editor coordinating the review of this manuscript and approving it for publication was Yeliz Karaca . uniqueness property [1], [14]. However, the CPD requires knowledge of the rank. Unfortunately, the problem of determining the canonical rank is NP-hard [4].
In the present study, we propose estimating both the canonical rank and the CP factors from noisy observations. We give a formulation of the problem of interest as a convex optimization problem. The reminder of the paper is organised as follows: First, we introduce some notations and preliminaries in multilinear algebra in section II. We then review some existing works for the estimation of the canonical rank III. In section IV, we describe our proposed approach and our algorithm for solving the problem. As a final step, we do a number of numerical experiments in section VI to evaluate and compare our proposed approach with CORCONDIA (CORe CONsistency DIAgnostic) method.

II. NOTATIONS AND ALGEBRAIC BACKGROUND
In this section, we recall some algebraic definitions on tensor algebra from [37]: VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Definition 1: (Inner product): The inner product of two N-order tensors X , Y ∈ R I 1 ×···×I N is defined as: Definition 2: (Tensor Frobenius Norm): The norm of a tensor X is defined as: Definition 3: (n-mode multiplication) The n-mode product of a tensor X ∈ R I 1 ×···×I N with a matrix U ∈ R J ×I n is a tensor of order N and size I 1 × . . . I n−1 × J × · · · × I N as shown by: Definition 4: (Diagonal tensor) A tensor X ∈ R I 1 ×···×I N is diagonal if all of its entries are zero except thoes in its superdiagonal, that is: Definition 5: (Rank-one tensor) A tensor X ∈ R I 1 ×···×I N of order N is rank one if it can be written as the outer product of N vectors: where u n ∈ R I n with 1 ≤ n ≤ N . Definition 6: (Unfolding) The n-mode unfolding of a tensor X is a matrix denoted by X (n) , whose columns are the n-mode fibers of X .
Definition 7: (Tucker model) The Tucker decomposition transforms a tensor X ∈ R I 1 ×···×I N of multilinear ranks (R 1 , . . . , R N ) using N factors U n ∈ R I n ×R n by a core tensor G ∈ R R 1 ×···×R N as follows: where u n,r ∈ R I n is the r-th column of the n-th factor matrix U n with 1 ≤ n ≤ N .
Definition 8: (CANDECOMP Decomposition (CPD)) A tensor of order N and canonical rank R follows a CPD if it admits a factorization as a sum of R rank-one tensors. The CPD of a tensor X ∈ R I 1 ×···×I N is given by: CPD is a special case of the Tucker model where the core tensor is diagonal and the multilinear ranks are all equal to R.
Hence, CPD in eq. (2) can be written in the Tucker format as follows: Definition 9: (Tensor rank) The rank R of a tensor is defined as the smallest number of components in an exact CPD.
We have the following upper bound on the maximum rank (the largest attainable rank) for a 3-order tensor X ∈ R I 1 ×I 2 ×I 3 [14]:

III. RELATED WORKS
Existing approaches for the canonical rank detection include: • CORe CONsistency DIAgnostic (CORCONDIA) [33] is a heuristic method for detecting the number of components of the CP model. It measures the similarity between the estimated core tensor from the ideal identity core (called core consitency) for different CP models. By analyzing the gap between the core consistency of different CP models, it determines the correct number of components. Taking the last model (starting with the one-component model), whose core array is still similar to the ideal diagonal tensor, gives the proper number of components to use. CORCONDIA is intuitive and has a simple approach. However, its computation is prohibitive even for small tensors [8].
In addition, the CORCONDIA method cannot handle the case when the canonical rank exceeds one of the dimensions of the input tensor.
• A fast version of CORCONDIA was presented in [8].
It suggests an efficient way to compute the CORCON-DIA diagnostic that takes advantage of sparse data and works well as the tensor size grows. In cases where either the tensor or the factors or both are sparse [8], their algorithm significantly outperforms the state-of-the-art baselines and scales well when the tensor size increases.
In the fully dense scenario, their proposed algorithm is as good as the state of the art (The CORCONDIA method) for rank estimation.
• Automatic Relevance Determination (ARD) [23] is a Bayesian approach applied to the Tucker and CP models. In ARD, entries of CP factors are assigned a Gaussian prior. The objective is to find the rank and the CP factors by solving an l 2 −regularized CP decomposition. By assigning priors to model hyperparameters and learning the hyperparameters of these priors, ARD reduces the excess of components and simplifies the core structure.
• Detection of the number of components in CAN-DECOMP models via minimum description length (N-D MDL) [17] is an extension of the minimum description length (MDL) to the multilinear case for detecting the number of components in a CP model using the generalized unfolding of the observation tensor. Under the condition that the tensor rank is smaller than the size of the most squared unfolded matrix, the generalized N-D MDL criterion estimates the number of components of the CPD. The drawback of the multilinear MDL algorithm is that it fails to work when the tensor rank is larger than the size of the most squared unfolded matrix [17].
• Using a Bayesian approach, [20] integrates the tensor rank determination into the CPD. A scalable algorithm is developed that processes mini batch data at a time by incorporating stochastic optimization into the probalistic tensor CPD.
• Automatic tensor rank estimation for nonnegative tensor CPD is proposed in [19]. By interpreting the nonnegative CPD using probability density functions, the problem is formulated as a statistical inference.
• The Gaussian-gamma prior was introduced in [21] within a Bayesian framework in the context of probabilistic CPD modeling for automatic rank determination.
• The GSL-CP method proposed in [40] estimates the rank as well as the loading matrices of the decomposed tensor. Without knowing the expected rank, the CPD is computed by promoting group sparsity of the loading matrix in orthonormal subspace.
• The canonical rank of a tensor is estimated in [6] using an improved version of MDL (iMDL).
• Tensor learning models for regression are proposed in [38]. In their regularization process, they employ the group-sparsity norm, which promotes a low-rank decomposition of the CP core as well as automatic selection of the rank during the learning procedure.
• Using orthogonal CPD, [30] views the superdiagonal of the CP core as analogous to the vector of singular values. To determine the rank of an incomplete tensor, they integrate a regularization with the CP-based tensor nuclear norm.
• The method proposed in [25] is based on convolutional neural networks (CNN) with a pre-decomposition using CPD providing rank-one components to the CNN.
In short, state-of-the-art methods use either Bayesian approaches, propose rank estimation within a learning framework, or use too restricting constraints like factor orthogonality. In contrast to existing methods, the proposed method belongs to the family of deterministic parameter estimators.

IV. PROPOSED METHOD (FARAC)
The purpose of this section is to present the method FARAC for estimating the canonical rank and the CP factors simultaneously. This is accomplished by minimizing the superdiagonal of the core tensor using the l 0 norm as an objective function, as well as adding a constraint on the reconstruction error and an another one on the offdiagonal terms by allowing them to be non-zero but bounded. Our goal is to find a CP core tensor structured as shown in Fig. 1. The canonical rank is the FIGURE 1. CP core tensor G of a rank-R tensor of order 3 and size R 0 with number of strictly positive and ordered values of the tensors' superdiagonal. 1 In the following, G denotes the CP core tensor, λ its superdiagonal andG is defined according to: where diag(λ) is a diagonal tensor whose superdiagonal is λ. Our optimization problem can then be expressed mathematicaly as follows: where 1 and 2 are small positive constants. Unlike existing methods that estimate the canonical rank, we do not constrain the CP factors to be orthogonal [11], [13]. However, since minimizing the l 0 norm is NP-hard [5], we minimize the l 1 norm of λ. Eq. (3) becomes: where γ 1 is a strictly positive hyper-parameter.

A. DERIVATION OF OUR APPROACH (FARAC)
In this section, we present the proposed approach for solving the optimization problem with constraints in eq. (4). Recall that the Lagrangien function is the augmented objective function by the constraint equations using the Lagrangien multipliers. Following this, the Lagrangien function of the problem in eq. (4) is given by: where γ 2 and γ 3 are two strictly positive Lagrange multipliers. According to the Lagrangien method [35], L G,{U n } is minimized with respect to {U n } n ,G and λ. To do that, we will proceed iteratively. We first minimize L G,{U n } w.r.t the n-th CP factor U n at each iteration, assuming the remaining factors and G are known. This is a classical linear regression problem. Following that, we minimize L G,{U n } w.r.tG using the factors {U n } n from the previous step. The convexity w.r.t G is demonstrated in Appendix B. Then, we updateG using the Adam optimizer [7]. Finally, we minimize L G,{U n } w.r.t λ, which is also a convex optimization problem as shown in Appendix B. Its solution is given by the soft thresholding operator [34] using the current updates ofG and {U n } n . The final expressions of the exact solutions of CP factors, the gradient of the Lagrangien w.r.t theG and the formula to update the superdiagonal of G are given below. Details of the computations can be found in Appendix A.
2: Compute the gradients of L G,{U n } w.r.tG using (7). 3: Update biased first moment estimate of the offdiags: 4: Update biased second raw moment estimate of the offdiags:

5:
Compute bias-corrected first and second moment estimates of the offdiags:m 7: Update the superdiagonal of G using eq. (8): is defined in eq. (9). The whole algorithm of our derived approach is described in Algorithm 1.

V. COMPLEXITY ANALYSIS
• We evaluate the complexity of Algorithm IV-A by taking into consideration the SVDs in the initialization and where T is the number of iterations.

VI. NUMERICAL EXPERIMENTS
All the experiments are conducted on a computer with an Intel Core i7 9th generation 2.6 GHz processor and 32 Go RAM memory running Windows 10.
It should be noted that the gradients' computations are automatically done on the Tensorflow framework [9]. The learning rate and the exponential decay rates used in the Adam optimizer are all equal to their default values (α = 0.001, β 1 = 0.09, β 2 = 0.0999).

A. NUMERICAL SIMULATIONS 1) SYNTHETIC DATA
We create a synthetic rank-R real valued tensor X of order N using the CP model. The CP factors are derived from a single realization of the normal standard distribution. B is a zero mean, unit variance and white noise. Our noisy data tensor is FIGURE 3. Convergence curve of the mean reconstruction error using the Relative Square Error (RSE) along iterations using FARAC. We used a noisy tensor X noise with a size of 5 × 5 × 5 and an SNR of 25db. The threshold parameter is equal to 0.02.

FIGURE 4.
Mean principal angle between the three subspaces spanned by CP factors along iterations using our method. We used a noisy tensor X noise with a size of size 5 × 5 × 5 and an SNR of 25db. The threshold parameter is equal to 0.02.
given by: where X = X ||X || F and B = B ||B|| F . Hence, the SNR will be calculated using the following formula: FARAC has been runed on a rank-2 tensor X with size I ×I ×I where R 0 = I = 5. Experiments are conducted on a tensor with these parameters until other settings are indicated. Similar results are obtained for tensors with other orders and sizes. By using X and eq. (11), we generate 100 noisy realisations of the inpout tensor. Accuracy is defined as the proportion of realizations where the estimated rank is accurate. VOLUME 10, 2022 FIGURE 5. FARAC Vs. CORCONDIA accuracy w.r.t SNR for a tensor of a noisy tensor X noise with SNR values ranging from 0db to 40db. X noise is of size 5 × 5 × 5. Different CP models with a rank ranging from 1 to 5 are used to fit CORCONDIA. R = 2 is the true rank.

FIGURE 6.
Accuracy of rank estimation of a noisy tensor X noise with different low SNR values ranging from 0db to 10db using FARAC. The size of X noise is 5 × 5 × 5. The used large bound of rank used is R 0 = 5, while the true rank is R = 2.

2) REAL DATATSETS
• Amino acid fluorescence: As described in [31], the data set includes the excitation and emission spectra of five samples of different concentrations of tyrosine, tryptophane and phenylalanine, forming a tensor of 5 (samples) × 51 (excitation) × 201 (emission). This dataset can be described by a rank-3 CP model.
• Sugar process data [32]: The dataset contains 265 samples that can be arranged in an IJK three-order tensor of size 265 × 571 × 7. The first mode relates to samples, the second to emission wavelengths, and the third to excitation wavelengths. The (ijk)-th element of this tensor, X (i, j, k), represents the measured emission . CORCONDIA approach's accuracy for a noisy tensor X noise . SNR values range from to 5db. The size of X noise is 5 × 5 × 5. Different CP models with a rank ranging from 1 to 5 are used to fit CORCONDIA. R = 2 is the true rank. intensity from sample i, excited at wavelength k, and measured at wavelength j. This dataset is modeled by a rank-4 CP model.
• Dorrit fluorescence data [15]: Fluorescence spectrometer was used to measure 27 synthetic samples containing different concentrations of four analytes (hydroquinone, tryptophan, phenylalanine and dopa). Each fluorescence landscape corresponds to 233 emission wavelengths and 24 excitation wavelengths. A rank-4 CP model is used to model this dataset, which is represented as a tensor of size 27 × 233 × 24.

1) SYNTHETIC DATA
• Accuracy of estimated ranks using the proposed approach is illustrated in Fig. 2. We can clearly see that the FARAC approach can estimate the exact canonical rank while being robust to the choise of the threshold parameter.
• Fig. 3 depicts the convergence curves of the proposed method in terms of the Reconstruction Square Error (RSE) at each iteration t, which is given by: We can see that the reconstruction error quickly decreases to zero w.r.t iterations.
• The recovery of the CP factors is shown in Fig. 4 by checking the subspace error. This can be done by computing the principal angle [36] between the true subspaces and the estimated ones at each iteration t: n are the exact and the estimated factors at iteration t, respectively. We see that the principal angle between the subspaces of the three CP factors converges to a low value with respect to iterations.
• In Fig. 5, we compare the FARAC method with COR-CONDIA with respect to SNR values. We can see that FARAC clearly outperforms the state-of-the-art method for large range values of SNRs.
• In Table 1, we show the accuracy of the rank estimation using FARAC when the true rank exceeds one of the dimensions of the input tensor. As we can see, FARAC can handle this difficult scenario for a large range of SNR values. The CORCONDIA approach, on the other hand, is ineffective in that situation [17].
• According to Fig. 6, at very low SNR, FARAC tends to overestimate the true rank. In contrast, rank estimation using the CORCONDIA method becomes a very difficult task at low SNR according to Fig. 7. Compared with the state-of-the-art method, FARAC shows very good performance in terms of the accuracy of rank estimation for large range of values of SNRs while being robust to the choise of the threshold parameter. FARAC also handles the difficult case where the rank is bigger than one of the dimensions of the input tensor.

2) REAL DATASETS
• In Fig. 8 and 9, we present the convergence loss of FARAC, as well as the rank estimation over iterations on the amino acid fluorescence, the dorrit fluorescence and the sugar process datasets. As shown in these figures, the rank is well estimated for the three real datasets. The threshold parameter is selected so that the curve of the reconstruction loss exhibits good convergence properties. Based on the noise level, one can use a grid search over the range of values [0.1, 0.01, 0.001]. It is important to note that the FARAC method is robust to the choice of the threshold as shown in Fig. 2. For the amino and dorrit fluorescence datasets, the threshold parameter used is equal to 0.01; for the sugar process dataset, it is equal to 0.001.

VII. CONCLUSION
We have addressed in this work the challenging problem of the canonical rank estimation by defining it as a convex optimization problem. The proposed method, called FARAC jointly estimates the canonical rank and the CP factors.
We have compared FARAC to the well-known CORCON-DIA method and found that FARAC is much more accurate.
We have also shown that FARAC shows strong robustness to the choice of the threshold parameter and can handle the difficult case when the rank exceeds one of the dimensions of the tensor, unlike the CORCONDIA method. Last but not least, we illustrate the efficiency of the FARAC method on real datasets.

A. DETAIL COMPUTATIONS FOR THE DERIVATION OF FARAC
Let us recall the Lagrangien of the problem (4) that we want to minimize w.r.t (U n ) n ,G and λ: • We denote by L U n , the part of L G,{U n } which depends only on U n .
The matricized form of eq. (12) is given by: In contrast to the derivation of a classical CPD, G (n) is not the identity matrix. The gradient of L U n w.r.t U n can be calculated as in a traditional matrix linear regression problem: We set ∇ U n (L G,{U n } ) to 0 and get the exact formula of U n : • The part of L G,{U n } that depends only onG is denoted by LG: LG = • We want to derive L G,{U n } w.r.t λ. Let us first rewrite it as follows: • We denote by L λ , the part of L G,{U n } which depends only on λ. L λ is given as follows: Let us derive L λ with respect to λ l and set it to 0.
Furthermore, we have the following: In this section, we will demonstrate that the Lagrangien in eq. (14) is convex w.r.t to λ andG so that the global minimum will be reached. To do that, we will show that the hessian w.r.t λ andG are positive semi-definite matrices. • Convexity w.r.tG: We first place the elementsG(r 1 , . . . , r N ) in a vector x ∈ R R 0 N −R 0 and use the same method as for λ. Let HG be the hessian of L G,{U n } w.r.tG. Using (13), we compute the second derivates of L G,{U n } . The following derivatives are done only on the offdiagonals of G since its diagonal is zero. +γ 3 x T x = γ 2 ||Z|| 2 + γ 3 ||x|| 2 ≥ 0, since γ 2 and γ 3 are strictly positive values.