Hyper-Laplacian Regularized Multi-View Subspace Clustering With a New Weighted Tensor Nuclear Norm

In this paper, we present a hyper-Laplacian regularized method WHLR-MSC with a new weighted tensor nuclear norm for multi-view subspace clustering. Specifically, we firstly stack the subspace representation matrices of the different views into a tensor, which neatly captures the higher-order correlations between the different views. Secondly, in order to make all the singular values have different contributions in tensor nuclear norm based on tensor-Singular Value Decomposition (t-SVD), we use weighted tensor nuclear norm to constrain the constructed tensor, which can obtain the class discrimination information of the sample distribution more accurately. Third, from a geometric point of view, the data are usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space, the WHLR-MSC model uses hyper-Laplacian graph regularization to capture the local geometric structure of the data. An effective algorithm for solving the optimization problem of WHLR-MSC model is proposed. Extensive experiments on five benchmark image datasets show the effectiveness of our proposed WHLR-MSC method.


I. INTRODUCTION
In the era of big data, the data are usually generated from different sources or collected from different views, these data are called multi-view data [1]. In practical applications, various image datasets include various multi-view representations. For example, a scene can be described by two views of image and text, and a picture can be described by three views of edge, fourier and texture. Due to the consistency and complementary information of multi-view data, in recent years, it has received extensive attention from researchers, and a series of multi-view clustering algorithms have been produced. At present, it can be roughly divided The associate editor coordinating the review of this manuscript and approving it for publication was Wei Liu. into five categories: collaborative training methods [2]- [4], matrix factorization methods [5]- [7], multi-kernel learning methods [8]- [10], subspace learning methods [11]- [15] and graph-based methods [16]- [20]. In this paper, we mainly study the subspace clustering methods of multi-view data.
In recent years, subspace clustering has become one of the research hotspots in the fields of signal processing [21], computer vision (such as face clustering [22], motion segmentation [23]), and pattern recognition [24]. In fact, the data in the high-dimensional space are often located in the low-dimensional subspace. The goal of subspace clustering is to divide the dataset in the high-dimensional space into different clusters, and each cluster corresponds to a low-dimensional subspace. Low-rank representation (LRR) is one of the most widely used techniques in subspace VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ clustering [25]. Inspired by this, Zhang et al. [12] propose a low-rank tensor constrained multi-view subspace clustering (LT-MSC) method. LT-MSC applies the tensor nuclear norm constraint to the tensor constructed by the representation matrices of different views, and effectively utilizes the relevant information between different views. However, it ignores the local geometric structure of the data samples embedded in the high-dimensional ambient space. In order to solve this problem, Lu et al. [13] come up with a hyper-Laplacian regularized multi-view subspace clustering with low-rank tensor constraint (HLR-MSCLRT) method. HLR-MSCLRT uses hyper-Laplacian graph regularization to maintain the local geometric structure embedded in the high-dimensional ambient space, and further improves the clustering performance. Xie et al. [26] provide an on unifying multi-view self-representations for clustering by tensor multi-rank minimization (t-SVD-MSC) method, t-SVD-MSC stacks the subspace representation matrices of different views into a 3-order tensor, and uses the t-SVD based tensor nuclear norm minimization method to update the tensor. Wu et al. [27] construct a 3-order tensor by stacking all transition probability matrices, and then recover an essential tensor from the constructed tensor by t-SVD based tensor nuclear norm minimization. The t-SVD based tensor nuclear norm minimization is an effective convex relaxation of the rank minimization model, and has achieved impressive results in multi-view clustering. However, most existing t-SVD based tensor nuclear norm minimization methods do not treat different singular values differently. This ignores the prior information of singular values, which leads to a decrease in clustering performance.
In order to solve the above problems, in this paper, we propose a hyper-Laplacian regularized multi-view subspace clustering with a new weighted tensor nuclear norm (WHLR-MSC) method.
The main contributions of our work are summarized as follows: 1) Our proposed algorithm stacks the subspace representation matrices of different views into a tensor and uses low-rank constraints on the constructed tensor to capture the global structure of the data, and the hyper-Laplacian graph regularization term is used to capture the local geometry structure of the data, which effectively utilizes the relevant information between different views. 2) We use our proposed weighted tensor nuclear norm minimization method to impose different weights on different singular values. Thus, the prior knowledge of each singular value is fully considered, making the algorithm more flexible in practical applications. 3) Extensive experimental results on five image datasets show that our proposed method is significantly superior to the state-of-the-art methods. The remaining of the paper is organized as follows. Section II briefly summarizes the hypergraph related knowledge and tensor nuclear norm in this paper. In Section III, we show our methods, including the weighted tensor nuclear norm, the proposed WHLR-MSC model, the related optimization details, the connection with related low-rank tensor methods and the corresponding convergence and complexity analysis. In Section IV, we give the experimental results and analysis. Finally, we conclude the whole paper in Section V.

II. RELATED WORK
This section briefly summarizes the hypergraph related knowledge and tensor nuclear norm.

A. HYPERGRAPH RELATED KNOWLEDGE
For hypergraph G = (V , E, S), E = {e j } is the set of hyper edges. Each edge e j connects more than two vertices and can be regarded as a subset of the vertex set V . When each hyperedge contains only two vertices, the hypergraph degenerates into a normal graph. S represents the weight matrix of the hypergraph, and its element s(e j ) represents the weight of the hyper edge e j . Hypergraph can be represented by a |V | × |E| incidence matrix H. The elements of matrix In addition, in order to better describe the hypergraph, the degree of the hypergraph vertex v i and the degree of the hyper edge e j are denoted as d (v We use diagonal matrices D V and D E to represent the degree matrix of the vertex and the degree matrix of the hyperedge, and the diagonal elements are d(v i ) and d(e j ), respectively.
According to [28], the Laplacian matrix of the hypergraph is defined as

B. TENSOR NUCLEAR NORM
A tensor is a multidimensional array, which is also a generalization of the concept of matrix. The definition of the tensor nuclear norm in the literature [29] is as follows where ζ m s are constants, ζ m > 0 and M m=1 ζ m = 1, A ∈ R I 1 ×I 2 ×···×I M . A (m) represents the mode-m matrixization of the tensor A, defined as un fold(A) = A (m) ∈ R I m ×(I 1 ×···×I m−1 ×I m+1 ×···×I M ) . A represents the nuclear norm of the tensor A.
The optimized solution of problem (2) can be obtained by solving M independent optimization problems. The number of singular values of A (m) is denoted by b, the m-th (m = 1, 2, · · · , M ) optimization problem is where σ i (A (m) ) denotes the i largest singular value of A (m) .

III. THE PROPOSED METHOD
In this section, we first give the weighted tensor nuclear norm and the objective function of our proposed WHLR-MSC method, then introduce an iterative alternating method for WHLR-MSC optimization, and finally discuss the relationship between WHLR-MSC and related low-rank tensor methods and analyze its convergence and complexity.

A. WEIGHTED TENSOR NUCLEAR NORM
In problem (3), the same parameter ζ m is used to deal with each singular value, which is not very reasonable, because different singular values may have different importance, so they should be treated differently. For example, given any image, there is a big difference between its non-zero singular values, especially between the first few large singular values and the last few small singular values, the larger singular values are generally related to some significant parts of the image. Therefore, in order to retain the significant part, we should shrink the large singular values smaller, which is not considered in the existing tensor nuclear norm minimization. In order to solve this problem, we introduce the prior information of singular values, and propose a weighted tensor nuclear norm minimization method as shown below where A ω, is called the weighted nuclear norm of tensor A ∈ R I 1 ×I 2 ×···×I M , ω i denotes the i element of vector ω.

B. PROPOSED WHLR-MSC
The objective function of our proposed WHLR-MSC method is as follows (1) ; B (2) ; · · · ; where λ 1 and λ 2 are balance parameters. (·) stacks different subspace representation matrices into a 3-way tensor A, is the hyper-Laplacian regularized term. · 2,1 represents the 2,1 -norm. By using Eqs. (4) and (5), we get the following formula (1) , A (2) , · · · , A (V ) ), (1) ; B (2) ; · · · ; B (V ) ], (6) where γ m = ζ m /λ 1 > 0 and γ = λ 2 /λ 1 . An inexact augmented Lagrange multiplier (ALM) [30] is used to solve our objective function. In order to make the objective function separable, we introduce an auxiliary variable G m instead of A (m) , as shown below (1) ; B (2) ; · · · ; B (V ) ], where g m and a are the vectorization of the matrix G m and the tensor A, respectively. P m , which is an alignment matrix corresponding to the mode-m expansion, is a permutation matrix, whose function is to align the corresponding elements between G m and A (m) . We construct an augmented Lagrange function for Eq. (7) as follows where µ is the penalty parameter, α m and Y v are Lagrange multipliers.

C. OPTIMIZATION WHLR-MSC
Update A (v) : Fixing B (v) and G m , we solve the following subproblem to update A (v) where v (.) represents the operation of selecting an element and reshaping it into a matrix corresponding to the v-th view. We get the close-form solution of A (v) as shown below where Update a: For the update of a, we first construct the tensor A and then vectorize it.
Update B: Fixing A (v) and G m , we solve the following subproblem to update B According to [25], the optimal solution of Eq. (11) is as follows where D :,i represents the i-th column of D = [D (1) ; · · · ; Update G m : where m (.) shows the operation of reshaping a vector into a corresponding matrix according to the m-th way unfolding, F = m (P m a + γmα m µ ). Then, according to [31], the optimal solution of Eq. (14) is as follows where J γm µ * ω (F) = diag(β 1 , β 2 , · · · , β l ), Update g m : We update g m by directly replacing the corresponding elements Update α m : We summarize the entire solution process of Eq. (8) in Algorithm 1. } with V views, parameters γ m s, γ , ω and cluster number K . 2: Initialize: Update G m by Eq. (15). 12: Update g m by Eq. (16). 13: Update α m by Eq. (17). 14: End 15: Update the parameter µ by µ = min(ρµ, µ max ). 16: Check the convergence conditions: 17: ∞ < ε and P m a − g m ∞ < ε 18: Obtain the similarity matrix by W = 1 |. 19: Use spectral clustering method on similarity matrix W . 20: Output: Clustering result Q.
The Eq. (19) is the model of LT-MSC [12]. Therefore, the LT-MSC and HLR-MSCLRT methods can be regarded as two special cases of our proposed method.

E. CONVERGENCE AND COMPLEXITY ANALYSIS
In this section, the convergence property and the complexity of our proposed method are analyzed. Since the objective function Eq. (5) is not smooth and has blocks A (v) , B, G m and ω, it is difficult to prove the theoretical convergence of the proposed WHLR-MSC method. However, WHLR-MSC converges to the local optimal point of the objective function can be proved by Eq. (6). The main reason is that fixing other variables, solving for each single variable, there is a closed-form solution, and each update of each single variable is convex. For the WHLR-MSC method we proposed, in Section IV-F, Figure 2 shows the convergence curves of the proposed method on five datasets. We can easily find that the proposed method for solving WHLR-MSC converges quickly.
The main cost of ALM for solving Eq. (7) is to construct

IV. EXPERIMENTS AND ANALYSIS OF RESULTS
In this section, in order to evaluate the effectiveness of our proposed WHLR-MSC method, we conduct numerical experiments on five benchmark datasets, and use six popular evaluation metrics to measure the clustering results. Then we carry out parameter setting, performance evaluation, convergence analysis, computation cost and weighted values analysis for our proposed method. Finally, in order to visually display the clustering effect, we implement some visualizations of multi-view clustering methods. All experiments are conducted in Matlab 2019a on the computer with Intel(R) Core(TM) i5-4210U processors, 1.70 GHz CPU and 12 GB of memory.

A. DATASET DESCRIPTIONS
We select five widely used datasets, which involve different clustering tasks, including toy clustering, scene clustering and face clustering. They are COIL-20, MSRC-v1, Yale, Extended YaleB and ORL respectively. Their detailed descriptions can be found in [13]. We give a brief summary as shown in Table 1. Figure 1 provides some sample examples of images.

C. EVALUATION METRICS
We use six widely used evaluation metrics to quantitatively evaluate the performance of all clustering methods. These metrics are accuracy (ACC), normalized mutual information VOLUME 9, 2021 (NMI), Adjusted Rand Index (ARI), F-score, Precision and Recall respectively. Their detailed definition can be found in [35]. The higher the value of each metric, the better the clustering performance.

E. PERFORMANCE EVALUATION
In order to compare each clustering algorithm fairly, we run all clustering algorithms 10 times and report their mean values and standard deviations. At the same time, we carefully adjust the clustering algorithms with parameters to achieve the best performance. We give the experimental results shown in Tables 2-6. From these tables, we draw the following conclusions.
1) Compared with other methods, our proposed WHLR-MSC method achieves the best clustering performance on all the five datasets. For example, for the ORL dataset, on the six evaluation metrics, our method achieves a significant improvement of approximately 9.5%, 4.2%, 12.8%, 12.5%, 15.7% and 8.8%, respectively, compared with the second-best WHLR-MSCLRT method. On YaleB dataset, our method improves about 1.1%, 1.7%, 2.1%, 1.9%, 2.0% and 1.8% in terms of ACC, NMI, ARI, F-score, Precision and Recall, respectively, compared with the second-best WHLR-MSCLRT method. The main reason is that the weighted tensor nuclear norm fully considers the prior information of singular values, which is of great significance in practice.
2) Multi-view clustering methods are roughly divided into two categories: tensor-based methods and matrix-based methods. The tensor-based methods, including LT-MSC,     HLR-MSCLRT and our proposed method, on all the datasets except COIL-20, have a significant improvement in clustering performance than other methods. The main reason is that tensor-based methods can explore high-order correlations between views. At the same time, it can mine the complementary information between different views more comprehensively.
3) We observe that ConcatFea has better clustering performance than LRR best and SPC best on all the datasets except COIL-20, which indicates that the performance of directly concatenating features is dependent on the data. On most datasets, clustering methods based on multi-view perform better than clustering methods based on single view, which shows that compared with the simple expansion of single view data, the full integration of multi-view data will improve its separability. 4) In short, compared with the two methods of LT-MSC and HLR-MSCLRT, the WHLR-MSC method we proposed achieves the best performance on all the five datasets. In addition, these two methods are most related to our method, which shows that our method uses graph regularization term to capture the local structure of the data and uses weighted tensor nuclear norm to treat singular values differently, which is effective for improving clustering performance.

F. CONVERGENCE ANALYSIS
As shown in Figure 2, we analyze the convergence on the MSRC-v1,Yale,YaleB, ORL and COIL-20 datasets. Among them, we record the reconstruction error of each iteration. Reconstruction Error = X (v) − We can see that WHLR-MSC converges very quickly. After 20 iterations, the error is close to 0, which shows that our algorithm can stably converge after a certain number of iterations.

G. COMPUTATION COST
In Table 7 we report the average running time (in seconds) of the proposed WHLR-MSC method and the eight multi-view clustering methods on all datasets. We can easily see that the multi-view subspace clustering methods (including LT-MSC, HLR-MSCLRT and WHLR-MSC) cost longer than other comparison methods. Compared with LT-MSC and HLR-MSCLRT, the average running time of our proposed method is at an intermediate level. In short, although our proposed WHLR-MSC method is time-consuming, it significantly improves the clustering performance.

H. WEIGHTED VALUES ANALYSIS
We analyze the influence of weighted coefficient ω on clustering performance on MSRC-v1 and ORL datasets respectively, as shown in Figure 3 and Figure 4. In each figure, the x-axis and y-axis represent the weighted value and the evaluation metric value, respectively. We can easily see that the weighted parameter ω has a large influence for clustering effect, which shows the importance of fully considering the prior knowledge of singular values. On the MSRC-v1 and ORL datasets, our method overall has good performances (ACC and NMI) with ω = [1 100 10 0. 1 5] and ω = [1 10 50]. This may be because each singular value of the matrix has different roles to the robustness of clustering algorithms in the presence of illumination and other noises.

I. VISUALIZATION
The visualization of the clustering results of the nine multi-view clustering algorithms on the MSRC-v1 dataset is shown in Figure 5. We can clearly see that almost all multi-view clustering algorithms can divide the MSRC-v1 dataset into different clusters. However, compared with the other eight methods, the distribution of similar samples in the WHLR-MSC is more compact.

V. CONCLUSION
In this paper, we propose a hyper-Laplacian regularized multi-view subspace clustering with a new weighted tensor nuclear norm (WHLR-MSC) method. In our model, low-rank tensor is used to capture the global structure of the data and explore the high-order correlation between different views, hyper-Laplacian graph regularization is used to capture the local geometric structure of the data. In addition, the weighted tensor nuclear norm fully considers the prior information of the singular values, and can more accurately obtain the classification information of the sample distribution. An efficient algorithm is proposed to solve and optimize our model. Extensive experiments on five benchmark image datasets show that our method is superior to state-of-the-art methods.