Adaptive Graph Representation for Clustering

Many graph construction methods for clustering cannot consider both local and global data structures in the construction of initial graph. Meanwhile, redundant features or even outliers and data with important characteristics are addressed equally in the graph optimization process. These lead to the learned representation graph may not capture the optimal structure when clustering. This paper proposes a novel model for clustering, named adaptive graph construction and low-rank representation of weighted noise (ACLWN), to overcome these problems. ACLWN is composed of an adaptive representation graph construction model named ARG, and an adaptive weighted sparse representation graph learning model named AWSG. In ARG, manifold learning and sparse representation are employed to capture the local structure of data. In AWSG, an adaptive weighted matrix is proposed to strengthen the important features and improve the robustness of the low-dimensional representation graph. Moreover, constraints such as non-negative low-rank, sparsity and distance regularization terms are imposed to capture the local and global structures of data. Comprehensive experimental results show that our method outperforms the compared state-of-the-art methods. The low-dimensional representation graph constructed by ACLWN is more suitable for clustering.


I. INTRODUCTION
Clustering as an unsupervised learning method has long been favored by researchers in machine learning, data mining and pattern recognition. A cluster is a set of data points that are the same as one another within the same cluster and are disparate from the points in other clusters [1], [2], [3]. Spectral clustering [4], [5], [6], the most typical graph learning clustering method, has good performance when dealing with complex high-dimensional data. It first constructs an initial graph to describe the similar relationships among data, then develops a low-dimensional representation matrix based on the initial graph, and eventually obtains the final clusters by k-Means [7]. Spectral clustering can be more accurate and robust only when the initial graph is well constructed. Similarly, the other clustering of graph representation is also done.
The associate editor coordinating the review of this manuscript and approving it for publication was Qilian Liang .
The existing graph construction strategies can be roughly categorized into three groups: (1) Capture the similarity between data points by distance metric. Jurusan et al. utilized the straight-line distance between data points to assess similarity [8]; Yin et al. used the cosine function to construct the similarity matrix in the original space [9]; and Ding et al. proposed a random compact Gaussian (RCG) kernel, and used it to measure similarity between data points [10]. But these methods are unable to automatically collect structural information of points suitable for graph learning clustering. (2) Obtain similarity between data points based on global self-representation. Each point is encoded as a weighted combination of all other points, i.e., data point could be represented by its adjacent and reachable indirect neighbors. Yun et al. utilized the weighted tensor nuclear norm to capture the fundamental spatial structure [11]; Shang et al. proposed a self-representation method based on dual-graph regularized feature selection [12]; Weng et al. introduced the Laplace smoothing criteria for graph construction by adopting data VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ self-representativeness [13]. Nevertheless, these graph construction methods are not ideal for high-dimensional datasets because of their high time complexity. (3) Obtain similarity between data points by describing each point as a linear combination of its neighbors. kNN-Graph [14], [15] methods represent a data point only according to its k-closest points. Roweis et al. developed a locality embedding graph (LLE-graph) [16] by using kNN-Graph to represent a data point. However, because of the various domain parameters, these methods often provide different similarity results. Moreover, the existing initial graph construction methods cannot capture the full local structural features from the original data since they can only reflect structural relationships of data to some extent.
To obtain more information about data structure for graph learning clustering, researchers have achieved a promising effect in extracting the complete structural information from the initial graph. Elhamifar et al. proposed sparse subspace clustering(SSC)) [17] by employing a self-expression dictionary and a compressed sensing technology. Since sparse representation can capture the neighborhood structure of data, numerous sparse subspace clustering methods have emerged, such as ILRDFL [18], LRGPDDL [19] and SSC+E [20]. Although the sparsity constraint can independently embody the relationships between pairwise points, it fails to reflect the global structure of data. To perform well in capturing the global structure of data, another important researching point, the low-rank representation (LRR) [21], was proposed by Liu et al. LRR can improve the correlation between samples within clusters while weakening the correlation between clusters. Because of the rich complex information in high-dimensional data structures, lots of novel LRR-based methods are proposed to explore the implicit information in data [22], [23], [24], [25]. Among them, Latent low-rank representation [24] is a popular method because it can recover well the hidden effect of inter-point structure and remove damaged points by integrating some latent observations. Xie et al. proposed a low-rank sparse preserving projection method [25], which learned a robust weight matrix by employing LRR-based methods to reduce the influence of outliers and noise. Unfortunately, LRR cannot reflect the adjacency relationship between points.
As a result, exploring how to better capture the both local and global structures has become a hot research topic [26], [27], [28], [29], [30]. Zhu et al. [29] designed a novel subspace clustering method to learn a representation graph by conducting feature selection and subspace learning in selfrepresentation framework. The method addressed the shortcoming that existing methods cannot obtain local and global structures simultaneously. Han et al. [30] combined distance regularization and non-negative regularization to improve the latent LRR model. However, the representation graph learned by these methods brings about a lack of physical interpretation because it contains a large number of negative elements. Moreover, in the initial graph construction, these models treat redundant features or even outliers as equally significant data. To balance the importance of data, Li et al. adaptively assigned weights to distinguish and filter noise and outliers [31]. Wen et al. proposed an Adaptive Weighted Nonnegative Low-Rank Representation (AWNLRR) [32] model to assign low weights to redundant features or outliers. They were still unable to obtain the optimal graph representation.
To keep exploring the optimal representation graph for graph learning, we propose a novel model, named adaptive graph construction and low-rank representation of weighted noise (ACLWN), based on manifold learning [33], [34] and sparse representation [35]. The framework of ACLWN model is illustrated in Figure 1. ACLWN includes an adaptive representation graph construction model (ARG) for constructing the initial graph and an adaptive weighted sparse representation graph learning model (AWSG) to obtain the low-dimensional representation matrix of data. First, to generate an initial graph capturing the local structure of data, ARG represents data point by adaptively obtaining its k nearest neighbors. Next, to optimize the initial graph and obtain a low-dimensional representation graph, AWSG jointly employs sparse, non-negative and low-rank constraints to capture the global and local structure of data. Subsequently, this method uses a weight matrix to guide the learning of the low-dimensional representation matrix, which adaptively assigns more weight to important features and less weight to redundant features or outliers. Meanwhile, AWSG uses 21 −norm [36] to replace the nuclear norm in AWNLRR to further constrain the representation of noise and outliers. Since ACLWN can obtain both global and local structures of data, and it is robust to noise and outliers, it is suitable for graph learning clustering.
The main contributions of this paper are summarized as follows, 1. A graph learning clustering model is proposed for adaptive graph construction and weighted sparse representation.
2. The model completely utilizes the geometric structural information of data to guide the construction of the initial graph. It overcomes the disadvantages of the initial graph construction based on the complete graph and the k-nearest graph.
3. An adaptive weight matrix is employed to improve robustness to noise and outliers, so that the generated low-dimensional representation matrix can accurately express significant features of data. 4. Extensive experiments on real datasets are conducted to verify that the effectiveness of our framework is superior to those of the other state-of-the-art baseline algorithms.
The remaining sections are organized as follows. The next section briefly surveys the related works. Section III proposes the clustering framework ACLWN. Section IV describes the optimization step of ACLWN. In section V, the framework is analyzed in terms of experimental results on different datasets, parameter sensitivity and convergence. Section VI concludes work.

II. RELATED WORKS
In this section, we begin with a brief introduction to some of the symbolic conventions used throughout the paper, and then describe three types of traditional graph construction methods. Finally, two representation-based techniques are introduced, i.e., representation-based subspace clustering (RSC) and representation-based classification methods (RC), which are most closely related to the method proposed in this paper.
A. NOTATIONAL CONVENTIONS Throughout this paper, given a dataset X = {x 1 , x 2 , . . . , x n } ∈ R d×n with d features and n instances, its jth column vector and (i, j)th element are denoted by x j and x ij , respectively. X p is the p (p = 1, 2, {1, 2} , F) norm constraints of matrix X , and some typical norm are represented as where X T is the transposed conjugate matrix of X , λ is the eigenvalue of X T X . I ∈ R d * n is an identity matrix, and 1 ∈ R d * 1 signifies a column vector with all entries equal to one. Tr(X ) and X T are the trace and transposition of matrix X , respectively. α β denotes the element-wise multiplication of the vector α and β.

B. GRAPH CONSTRUCTION
The traditional gragh learning methods [37], [38] convert the dataset X into graph as follows.
a. Complete graph. It constructs edges between each point x i and all the other points in X . b. k-nearest graph [14], [15]. It constructs edges between x i and its k closest neighbors for each point x i in X .
Assume S is the initial graph, and each node represents a point. If nodes i and j share an edge connection, three classic definitions of similarity s ij are as follows.
2) Cosine similarity [9]: 3) Gaussian kernel similarity [10]: , where σ is the scale parameter. These measures have some limitations. For example, although the binary similarity is simple, it cannot reflect the similarity between complex data. Cosine similarity cannot take into account the local geometric structure of data. Gaussian kernel similarity is a distance-based measure, which is sensitive to noise, outliers and redundant features.

C. REPRESENTATION-BASED SUBSPACE CLUSTERING
In this paper, representation-based subspace clustering (RSC) is represented by sparse, low-rank constraints and others techniques. Its general framework [17], [21], [39] is defined as where (Z ) is the regularization term of the variable Z , and ψ (E) represents noise and outliers constrained by various norms. E is the reconstruction error term, while λ is the corresponding regularization parameter of ψ (E). Formula (1) can learn a low-dimensional representation matrix A ∈ R n×c , which is useful for clustering. It can also to extract intrinsic geometric structural information from highdimensional complex data. For various RSC algorithms, their differences lie in the choice of (Z ). Usually, the final clustering result of RSC is obtained by using the following four steps: 1) create the initial graph matrix M ∈ R n×n , where m ij denotes the similarity between x i and x j ; 2) con-

D. REPRESENTATION-BASED CLASSIFICATION METHODS
Representation-based classification algorithms are based on the assumption that points in the same class as the test point contribute more than those in other class points in the joint linear representation of the test point. Based on the above assumptions, many famous representation-based supervised classification methods have been proposed, such as sparse representation classification (SRC) [40], cooperative representation classification (CRC) [41], regularization robust coding (RRC) [42], and local constrained linear coding (LLC) [43]. The representation-based classification (RC) [44], [45], [46] method can be uniformly abstracted as where P and Q represent train set and test set respectively. λ denotes the regularization parameter. Under various norm constraints, ϕ (d α) is regarded as the regularization term of α. Prior knowledge is denoted by vectors r and d respectively. The difference between all kinds of RC methods is that they choose the different regularization terms of ϕ (d α) as well as the parameters r and d.

III. ADAPTIVE GRAPH CONSTRUCTION AND LOW-RANK REPRESENTATION OF WEIGHTED NOISE (ACLWN)
The low-dimensional representation matrix can be realized in further optimization learning if the initial graph with high quality is obtained at the early stage, as described above. This section focuses on presenting the graph learning clustering framework for adaptive graph construction and weighted noise low-rank representation (ACLWN).

A. ADAPTIVE REPRESENTATION GRAPH CONSTRUCTION MODEL (ARG)
Traditional initial graph construction methods, as described in Section II-B, suffer from many shortcomings. As a simple explanation for manifold learning, if we choose two very similar data points, they will also represent each other very similarly when constructing the initial graph [33], [34]. Sparse representation [35] is robust to noise and outliers. In this section, we propose a novel graph construction method that combining manifold learning and sparse representation as where S is the initial graph to be learned, and its ith column vector and (i, j)th element are denoted by s i and s ij , respectively. Normally, we measure the similarity of two points by calculating the distance between them.
Here, the first item of Formula (3) calculates the distance between points, and adaptively chooses k-nearest neighbor of current point according to the distance information. In other words, it achieves the sparse representation by representing current point according to the principle of competitive representation. The 1 − norm of the second term ensures the sparseness of the representation matrix columns and improves the contribution of high discriminative features to the representation matrix. We also impose constraints s ii = 0 and 0 ≤ s ij ≤ 1 on the representation matrix to reduce the self-representation contribution of points.
We add the 1 T s i = 1 constraint to Formula (3) to ensure that all points receive equal attention in the expression. It leads to the second term being a constant term, which is equal to a sparse constraint on S. Formula (3) can be transformed into min S n i,j=1 We denote that Formula (4) has a trivial solution, i.e., x i has a similarity value of 1 with x j and 0 with the other points. To address this issue, we add an induction factor to Formula (4) so that the similarity value becomes 1 n between x i and the other points, Both representation-based clustering and classification algorithms have demonstrated that the data representation matrix includes lots of discriminant information, as explained in Section II-C and Section II-D. To maximize the geometric structure of the data collected in the initial graph and eventually generate the low-dimensional representation matrix comprising discriminant information, we propose the following objective function of AWSG, where W is the weight matrix, with W ≥ 0; Z is the low-dimensional representation matrix that has to be learnt; λ 1 and λ 2 are penalty parameters. Weight matrix W is used to regularize the data reconstruction error. The first two terms of Formula (6) adaptively assign high weights to significant elements and low weights to redundant features or outliers to improve the contribution of high discriminative features of important points in the representation matrix.
Hence the method is robust to noise and outliers. Under the constraint F − norm, the second term and the first term together constitute the Lasso problem, which is convenient for optimization to find the optimal solution. W T 1 = 1 constraint can ensure all points to be treated equally. Usually, distance information is used to measure the local structure of data. Therefore, we add the n i,j=1 x i − x j 2 2 constraint to Formula (6) in the process of obtaining a low-dimensional representation matrix. Suppose D denotes the distance matrix, To eliminate negative effects of low-dimensional representation matrix Z , such as all 0 for some rows of Z and self-representation of points, we introduce the Z 1 = 1 and diag (Z ) = 0 constraints to the Formula (7). Finally, we can derive that the objective function is C. VERIFY THE FORMULA (6) In this section, we prove that Formula (6) can generate a sparse non-negative and low-dimension representation graph.
It is also the same as the n separate sub-problems min  [47], which can produce a sparse solution w i . The penalty parameter 2 λ 1 controls the sparsity. Thus, we can deduce that the sparse weighted matrix W will be produced by solving Formula (6).
Most notably, combining the regularization term λ 1 2 W 2 F and boundary constraints W ≥ 0 and W T 1 = 1 can avoid a trivial solution of W [44], [48]. The Z ≥ 0 constraint can ensure that the graph learned by similarity between points is interpretable. Furthermore, non-negative constraints can improve representation-based graph learning performance [49].

IV. OPTIMIZATION ALGORITHMS
A. OPTIMIZATION ALGORITHM FOR FORMULA (5) According to the work of Nie et al. [50], we define that the jth column element d ij of distance vector d i is d ij = x i − x j 2 2 . Then Formula (5) can be rewritten as The Lagrange function in Formula (9) about conditions 0 ≤ s ij ≤ 1 and 1 T s i = 1 is written as where η is the scalar Lagrange coefficient, and ξ is the vector Lagrange coefficient. We differentiate Formula (10) with respect to s i and set the partial derivative to 0, then we can obtain the jth element s ij of s i as follows, By multiplying both sides of Eq.(11) by s ij , and according to the Karush-Kuhn-Tucker (KKT) [51] condition that s ij ξ j = 0, we can obtain s ij . Here, we denote it as s ij , Suppose d i1 , d i2 , . . . , d in are sorted in ascending order, and s i needs to satisfy these conditions that s ik > 0, s ik+1 =0, and s i contains at most k non-zero values, then we have Impose constraint 1 T s i = 1 on Inequality (13), we can derive Learning the self-adaptives i of k neighbors according to Formula (13) and Formula (14), β is represented by According to Formula (13), Formula (14) and Formula (15), the jth element s ij of s i can be defined as Finally, we summarize the pseudo code of ARG in Algorithm 1.  Formula (8) is a non-convex optimization problem with two unknown parameters, Z and W . Many methods can address this kind of problem, such as alternating direction method (ADM) [52] and accelerated proximal gradient (APG) [53].
Because of the characteristics of the issue and the efficacy of the ADM approach, we employ the ADM approach to optimally solve the Formula (8). To make the problem simpler, we introduce two auxiliary variables E and U , and then turn Formula (8) into We first rewrite Eq.(17) into Augmented Lagrangian function [54] L (ϒ) = W where the penalty parameter is µ (µ > 0), C 1 and C 2 are Lagrange multipliers, ϒ is a constrained set, with ϒ = {Z , W , E, U , C 1 , C 2 }. Then, fixing the remaining variables, the optimal value of the current variable is iteratively obtained.
Step 1. Update W and fix the other variables: Since E is a fixed variable, Formula (19) can be converted into Formula (20) and Formula (9) are optimized in a similar way, for further details, please see Formula (9). Finally, variable W can be calculated as follows, Because of w T i 1 = 1 constraint, we have m j=1 Step 2. Update E and fix the other variables: Define We can obtain each optimization value of e ij in e using Eq. (24), Step 3. Update U and fix the other variables: Define M = Z + C 2 µ , Formula (26) can be turned into We can obtain a closed solution by simplifying Formula (27), Step 4. Update Z and fix the other variables: Find a latent solutionẐ by minimizing the following Formula (30) then we can obtain a closed solution as Referring to Formula (20), the optimization process of Formula (31) can be solved, where the Lagrange multiplier is ζ i , with ζ i = (n − 1) −1 (1 − z i 1);z i is the ith row ofẐ . Eventually, we can obtain an ideal Z by plugging the value of ζ i into Formula (32).
Step 5. Update C 1 , C 2 and µ, and fix the other variables: U , E at the kth iteration (the current step) and (k − 1)th iteration (the previous step), respectively. µ max and ρ are positive parameters. Finally, we summarize our AWSG in Algorithm 2.

Algorithm 2:
The Algorithm AWSG Input: Initial graph matrix S and penalty parameters λ 1 , λ 2 , and λ 3 ; Output: W and Z ; When constructing the initial graph in Algorithm 1, it only calculates the distance vectors between the current data and its k-nearest neighbors, where k n and n is the number of points. Hence its time complexity can be ignored compared with the other steps. Obviously, there are five steps in Algorithm 2. Since steps 1, 2, and 5 are considered the element-wise operations, they are quickly solved and have low time complexity. The matrix inversion process in step 4 has high time complexity. However, because the inversion process is performed outside the iterative loop in our method, the time complexity is minimal. In Step 3, the procedure of iteratively updating U has a high time complexity, O t · n 2 , where t is the number of iterations. Therefore, the overall time complexity of ACLWN is O t · n 2 .

V. EXPERIMENTS AND ANALYSIS A. EXPERIMENTS ENVIRONMENT
In this section, we compare the performance of ACLWN to those of its baseline methods on face, object and nonimage datasets, and quantify it with two common evaluation measures. The baseline methods include k-Means [7], Ncut [55], SSC [17], LRR [21], NSHLRR [56], AWNLRR [32], FTRR [57] and SGL [58]. Among them, k-Means directly runs on the original features, while the other methods cluster by learning various representation graphs from datasets. In the experiments, Ncut constructs the adjacency graph through gaussian kernel function. SSC, LRR, AWNLRR and NSHLRR perform the spectral clustering on the learned graphs. AWNLRR and NSHLRR utilize the k-nearest graph method to construct the initial graph. FTRR uses a low-pass filter to obtain similarity matrix and then uses k-Means to cluster. SGL is one of the latest graph learning methods. Moreover, since k-Means is unstable, we run it 30 times and then calculated its mean values of the measures as its final results. These experiments are run on Windows 10 system and MATLAB 2019a, hardware platform of Intel(R) Core(TM) i7-10510U CPU and 20GB RAM. 2) Face datasets: Face image dataset is the most typical image dataset for performance testing of graph learning clustering. We chose three common face datasets in this paper, which are Extended YaleB, 2 ORL 3 and MSRA25 [50]. As shown in Figure 2 3) UCI Datasets: 4 To test the performance of our model, we chose three UCI datasets, which are Cars, Vehicle and Yeast. 4) Handwritten Digit Datasets: As shown in Figure 2(b), we chose the Mnist [57] dataset and the Profile view of the Handwritten [59] datasets, with 10 classes ranging from ''0'' to ''9'' and 216 pixel gray levels. 5) UCSC Datasets: We chose the Cora dataset from the UCSC 5 website to evaluate the clustering performance of ACLWN and its baseline methods, and then turned it into a form suitable for graph learning clustering. The dataset includes 2708 scientific papers split into seven categories, which are case-based, genetic algorithm, neural network, probabilistic technique, reinforcement learning, rule learning, and theory. Each paper has a vector of 1433 words, representing the 1433 features.

C. EVALUATION METRICS
Accuracy and normalized mutual information (NMI ) are two standard measures that are used to quantitatively evaluate clustering performance. The ground truth and predicted labels are denoted by Y and P, respectively.
Accuracy is defined as where τ (·) denotes indicator function, P is mapped to its best group label using Kuhn-Munkres method. Normalized mutual information is defined as where I (·) stands for mutual information, while H (·) stands for information entropy. We can realize standardization by calculating the information gain.

D. EXPERIMETAL RESULTS AND ANALYSIS
The experimental results of different clustering methods on the datasets introduced in Section V-B are quantitatively evaluated by Accuracy and NMI . The clustering Accuracys are listed in Tables 2-5, where the bold black numbers indicate  the best results. Tables 2-4 shows the results of our model ACLWN and its baseline methods on the top k subsets of COIL20, YaleB, and ORL datasets, respectively. Table 5 summarizes the clustering Accuracys of different methods on the rest of the datasets. Figure 3 illustrates the NMI results of our model and its baseline methods on all the datasets. It can be found that our model ACLWN has achieved the best results on these datasets. Specific analysis as follows: (1) It is clear from Tables 2-5 and Figure 3 that the clustering Accuracys of k-Means are generally lower than those of the representation-based clustering models. Therefore, we can draw a conclusion that representation-based methods are superior to the methods clustering directly on the original datasets. Since the representation-based methods eliminate redundant features to some extent, the representation matrixs obtained by these methods are more discriminating.
(2) Table 2, Table 4 and Figure 3 exhibit the comparison of Accuracy and NMI of ACLWN and its baseline methods on COIL20 and ORL datasets. Ncut outperforms the representation-based methods, SSC, LRR, NSHLRR, SGL and AWNLRR in most cases. It is because Ncut is a distance-based method which captures more local structure of data than those of methods based on sparse and low-rank representation.
(3) As can be found from Figure 3 and Table 4 that SSC and LRR have lower clustering Accuracy and NMI in most cases compared to NSHLRR, SGL, AWNLRR and ACLWN. Since SSC employs sparse representation, it only captures the local structure of data. LRR only obtains the global structure of data by using low-rank representation. As a result, SSC and LRR cannot capture both global and local data structures, and their final representation graphs are weaker than those of other methods.
(4) Although NSHLRR, SGL, AWNLRR and ACLWN all use the non-negative sparse low-rank representation, AWNLRR and ACLWN employ a weight matrix to efficiently reduce the representation of noise and outliers while improving the weight value of important features. Besides, the k-nearest neighbor method based on Euclidean distance is still used in the initial graph construction of AWNLRR, hence its experimental results are not perfect. To produce a more robust initial graph, ACLWN uses an adaptive neighbor graph construction method based on the local structures of data. Therefore, as shown in Tables 2-5 and Figure 3, our method has the best clustering Accuracy and NMI results on the various datasets.
(5) We also select the most recent method, FTRR, to further prove that the effectiveness of our adaptive graph construction method is superior to most other graph construction methods. In FTRR, a low-pass filter construction method is used to create the representation graph. As demonstrated in Tables 2-5 and Figure 3, clustering Accuracy and NMI of our method are generally greater than FTRR, indicating that our method is still better than the method FTRR.
Therefore, ACLWN has the following advantages in graph representation: 1) ACLWN jointly employs sparse, lowrank, distance regular terms and non-negative constraints to obtain the geometric structure of data. It is the efficient integration of these constraints that guides the generation of low-dimensional representation graphs suitable for clustering. 2) ACLWN uses the weight matrix to eliminate noise and outliers while increasing the weight of important features, resulting in a more accurate low-dimensional representation graph. 3) ACLWN uses manifold learning and sparse representation graph construction methods to produce a high-quality initial graph.

E. PARAMETER SENSITIVITY ANALYSIS
There are four parameters in ACLWN model, including the adaptive number of neighbors k in ARG method and the three penalty parameters, λ 1 , λ 2 and λ 3 in AWSG method. This section analyzes the influence of the four parameters on the clustering accuracy of YaleB, COIL20 and Cora datasets. Figure 4 shows the relationship between the number of nearest neighbors k and the clustering accuracy on YaleB, COIL20 and Cora datasets when the penalty parameters λ 1 , λ 2 and λ 3 are fixed. In Figure 4(a), when k takes different values, the minimum and the maximum accuracies are 87.49% and 88.77% respectively. It is only 1.28% difference VOLUME 10, 2022   between the minimum accuracy and the maximum accuracy on YaleB dataset. Besides, the difference between the minimum accuracy and the maximum accuracy is 3.96% and 7.34% on COIL20 and Cora datasets respectively. As a result, the adaptive nearest neighbor number k has little influence on the representation of the current data, and the clustering accuracy generally tends to be stable when k has different values. That is, ARG is highly insensitive to the nearest neighbors number k. We recommend setting k within the range of [8,14] to ensure high clustering accuracy and lower time complexity.
Next, we analyze the sensitivity of the parameters, λ 1 , λ 2 and λ 3 , in our method. For the sake of illustration, we define the parameter coarse set as {10 −4 , 10 −3 , 10 −2 , 10 −1 , 10 0 , 10 1 , 10 2 , 10 3 , 10 4 }. λ 1 acts as the harmonic parameter of the weight matrix W , which avoids meaningless trivial solutions. Figures 5, 6 and 7 show the relationship between the three parameters in our method   and the clustering accuracy on YaleB, COIL20 and Cora datasets. From Figures 5(a), 6(a) and 7(a), we can see that when the parameters λ 2 and λ 3 are fixed, the clustering accuracy is less affected by the selection of parameter λ 1 in VOLUME 10, 2022  the coarse set . Hence, λ 1 is very insensitive to our method. In comparison with parameter λ 1 , parameters λ 2 and λ 3 have a huge influence on clustering performance. As shown in Figures 5(b), 6(b) and 7(b), when fixing λ 1 and choosing various combination parameters of λ 2 and λ 3 , clustering accuracy are not always maximal, and the gap between the maximum clustering accuracy and the minimum clustering accuracy is irregular. This occurs because the parameter directly affects the quality of the corresponding representation matrix, and λ 2 and λ 3 directly determine the function of the corresponding term in the process of learning graphs. Thus, these three parameters need to be optimized to achieve the best clustering performance. Finding common optimal values for all three parameters is challenging due to the diversity of datasets. A simple and effective method for determining the optimal solution is presented in this paper. Based on the previous analysis, first we can fix λ 1 to a value such as 1, then find the ideal combination of λ 2 and λ 3 from the coarse set . According to the best combination of the two parameters, we can further define an ideal candidate set which may contain the optimal values for these two parameters. The method is then rerun by selecting a variety of combinations of the two parameters from the optimal candidate set. Only in this manner, we can determine the ideal penalty parameter to ensure the optimal clustering performance.

F. CONVERGENCE ANALYSIS
In this section, we demonstrate the convergence of the ACLWN using the YaleB, COIL20 and Cora datasets by iterating it. The objective function value of ACLWN is Figure 8(a), obj decreases monotonically and rapidly in the first few iterations, then briefly increases, and finally trends towards stability. In Figures 8(b) and 8(c), obj decreases before reaching equilibrium and then shows a convergence tendency. It can also be shown in Figures 8(a), 8(b) and 8(c) that clustering accuracy increases significantly at first and after to be stable trends.
Therefore, ACLWN has good convergence, which enables it to obtain the local optimal solution.

VI. CONCLUSION
The present paper proposes a new graph learning clustering framework ACLWN, which consists of two parts, i.e., ARG and AWSM. ARG is used to generate the initial graph adaptively with the local structure of data. AWSG introduces an adaptive weight matrix into the graph optimization process to effectively eliminate representations of redundant features, noise and outliers. ACLWN jointly employs sparse representation, low-rank representation, distance regularization term and non-negative constraints to obtain the geometric structure of data, so as to form a low-dimensional graph with more discriminative characteristics. Experiments on objects, faces, UCI non-image and the other datasets show that the performance of our ACLWN is better than those of the baseline clustering methods.
MEI CHEN received the Ph.D. degree in computer science from Lanzhou University, in 2016. She is currently a Professor with the School of Electronic and Information Engineering, Lanzhou Jiaotong University. She has published over 20 research papers in many conferences and journals, such as IEEE ACCESS, Pattern Recognition, Atmospheric Environment, Frontiers of Computer Science, and WAIM. Her research interests include artificial intelligence and data mining. She is a member of CCF.
YOUSHUAI WANG received the bachelor's degree from the School of Electronics and Information Engineering, Lanzhou Jiaotong University, in June 2020, where he is currently pursuing the Graduate degree. His current research interests include data mining and graph learning clustering.
YONGXU CHEN is currently pursuing the master's degree in computer technology with the School of Electronic and Information Engineering, Lanzhou Jiaotong University. His research interests include community detection and clustering.
HONGYU ZHU was born in Chongqing, China, in 1996. She is currently pursuing the master's degree with Lanzhou Jiaotong University. Her research interest includes clustering in data mining.
YUE XIE is currently pursuing the master's degree in computer technology with the School of Electronic and Information Engineering, Lanzhou Jiaotong University. His research interests include community detection and clustering.
PENGJU GUO was born in Changzhi, Shanxi, in 1995. He received the master's degree from Lanzhou Jiaotong University. His research interests include clustering and community discovery in complex networks.