A Group Norm Regularized Factorization Model for Subspace Segmentation

Subspace segmentation assumes that data comes from the union of different subspaces and the purpose of segmentation is to partition the data into the corresponding subspace. Low-rank representation (LRR) is a classic spectral-type method for solving subspace segmentation problems, that is, one first obtains an affinity matrix by solving a LRR model and then performs spectral clustering for segmentation. This paper proposes a group norm regularized factorization model (GNRFM) inspired by the LRR model for subspace segmentation and then designs an Accelerated Augmented Lagrangian Method (AALM) algorithm to solve this model. Specifically, we adopt group norm regularization to make the columns of the factor matrix sparse, thereby achieving a purpose of low rank, which means no Singular Value Decompositions (SVD) are required and the computational complexity of each step is greatly reduced. We obtain affinity matrices by using different LRR models and then performing cluster testing on different sets of synthetic noisy data and real data, respectively. Compared with traditional models and algorithms, the proposed method is faster and more robust to noise, so the final clustering results are better. Moreover, the numerical results show that our algorithm converges fast and only requires approximately ten iterations.


I. INTRODUCTION
With the advent of the era of big data, we are confronted with a great deal of data every day. Although the data volume is large, it may only come from several low-rank subspaces. Subspace segmentation [1] divides data into several clusters and each cluster is a subspace; this problem arises in machine learning, computer vision, image processing, finance, and other fields [2]- [6]. Subspace segmentation is an important clustering problem, which is mainly solved by the following four methods: mixtrue of Gaussian [7], factorization [8], algebraic [9], and spectral-type methods [1], [10].
In the spectral-type method, one first calculates an affinity matrix and then spectral clustering (such as Normalized Cuts (N _Cut) [11] ) is performed. The main variation among spectral-type methods is the different affinity matrix learning methods. Spectral clustering has attractive advantages, for example the algorithm is efficient, the data can be of any The associate editor coordinating the review of this manuscript and approving it for publication was Corrado Mencar .
shape, the method is not sensitive to abnormal data, and can be applied to high-dimensional problems. Recently there have been many studies and developments in spectral clustering, such as [12]- [17].
LRR is a classic, effective spectral-type method for solving subspace segmentation problems. In 2010, the Low-Rank Representation (LRR) [1] problem was proposed by Liu et al. They assume that data samples come from the union of multiple subspaces, and the purpose of the LRR method is to denoise and obtain samples from the corresponding subspaces to which they belong. And they proved that LRR accurately obtains each real subspace for clean data. For noisy data, LRR approximately restores the subspace of the original data with theoretical guarantees. In reference [1], the spectral clustering performance using the affinity matrix obtained by LRR is more accurate and robust than other methods.
When solving the LRR problem, traditional methods tend to minimize the nuclear norm to approximate the minimum rank in the objective function. This is a convex approximation VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ that guarantees convergence in the designed algorithm. However, singular value decomposition (SVD) is required in order to solve nuclear norm problems. An SVD is timeconsuming, and the computation complexity is O(n 3 ) for an n × n affinity matrix. Many classic algorithms employ SVD to solve LRR, such as the Accelerated Proximal Gradient method (APG [18]), Alternating Direction Method (ADM [19]), and Linearized Alternating Direction Method with Adaptive Penalty (LADMAP [20]). APG solves an approximated problem of LRR, but the clustering results are inferior. LADMAP performs best among these algorithms; however, the calculation speed is still slow, especially for high-dimensional data. Along this line of thinking, accelerated LADMAP (LADMAP(A) [20]) is proposed by Lin et al.
They used skinny SVD technology to reduce the complexity to O(rn 2 ), where r is the rank of the affinity matrix. However, the rate of convergence is sub-linear, which requires more iterations, and the rank depends on hyperparameter selection. Lu [22]. They decomposed an affinity matrix into UV and then used the Augmented Lagrangian Method (ALM) to solve the model, where U ∈ R n×r , V ∈ R r×n . Factorization model is difficult to determine rank, so Chen et al. designed a greedy method to traverse the rank r, that is, they first choose a proper interval d and run the algorithm on the ranks 1, d + 1, 2d + 1,. . . , k * d + 1,. . . , and stop when the results begin to worsen. Thus, this process searches through the options one by one to find the optimum rank. Although the original problem becomes non-convex, the algorithm does not require SVD; only multiplication of the factor matrix is required. Its complexity is O(rmn), where m is the dimension of the data.
HMFALM requires an outer loop to find rank r, the inner loop iterates to meet the stopping criterion, and rank-finding result is heavily dependent on the given hyperparameter. To overcome the shortcomings of HMFALM, we introduce group norm regularization of U to design an adaptive rank-finding matrix factorization model (GNRFM) to solve LRR. We first let U ∈ R n×K , where K is a larger number. Group norm regularization l 2,1 makes some columns of the factor matrix U become zero vector columns. In this manner, the rank of the affinity matrix is automatically reduced to achieve the purpose of adaptively adjusting the rank. We design the Accelerated Augmented Lagrangian Method (AALM) algorithm to solve GNRFM. In summary, the contributions of this work include: • We first use the group norm regularization method to solve the rank minimum problem. Specifically, we design the GNRFM model to solve the subspace segmentation problem. Compared with the traditional nuclear norm LRR model, our model has less computational complexity. Compared with the factorization model HMFALM [22], the GNRFM model can adaptively find ranks without greedy searches.
• The group regularization term also has positive anti-noise effects, so the GNRFM is more robust.
• We design the AALM algorithm to solve GNRFM, which uses the acceleration technique of one-step inner iteration and deletes the zero vector column to reduce the computational complexity in each step. The numerical results show that the AALM algorithm converges in about ten steps. The structure of this paper is as follows. Section 2 introduces the LRR problem, its nuclear norm approximation model [1], and the matrix factorization model [22]. Section 3 introduces our model, GNRFM, details the Accelerated ALM (AALM) for our model, gives time complexity analysis and introduces how to use GNRFM's solution for spectral clustering. Numerical experimental results are reported in Section 4. Finally, Sections 5 concludes this paper.

II. THE LRR PROBLEM AND TWO TYPES OF MODELS
In order to facilitate a shared understanding of notations, we offer a summary in Table 1 of the primary notations used in this paper. First, let us recall the following LRR problem: where X ∈ R m×n is the data matrix, m is the dimension of the data vector, n is the number of data vectors, and Z ∈ R n×n . We refer to the optimal solution Z of the above problem as the ''lowest-rank representation'' of data X with respect to a dictionary X . This is an NP-hard problem because the rank is l 0 norm, and the solution is not unique. As in the classic method for solving low-rank problems, Liu et al. [1] took advantage of the nuclear norm to approximate and obtain the following convex optimization problem: Liu et al. [23] proved that under some conditions, the solution of (2) is unique and is one of the solutions to (1). This solution Z can be transformed to obtain an affinity matrix for data X , which can then be used for subspace segmentation. The uniqueness of (2) is given by Siming and Zhouchen [24]: Theorem 1: Suppose the skinny SVD of X is X = U V , then the minimizer to problem (2) is uniquely defined by This formula naturally implies that Z is precisely able to recover an affinity matrix in [25].
Since the solution of (2) is one of the solutions for (1), we recommend referring to see corallary 4.1 in [23]. To make the model robust to noise, Liu et al. [23] proposed the following noisy LRR nuclear norm model: where In order to solve (4), several algorithms based on SVD were designed, which are lack of speed. So Chen et al. [22] put Z into a low rank factorization Z =Û V , and they proposed the following matrix factorization model: whereÛ ∈ R n×r , V ∈ R r×n . If we write U = XÛ , then the model is expressed as follows: However, the rank r of this model must be specified. Chen et al. [22] designed a greedy method to find the optimal rank: 1. Provide the interval d and hyperparameter µ. 2. Solve the problem (6) when r = 1, d + 1, 2d + 1, . . . , kd + 1, . . . and stop when r + µ||E(r)|| 2,1 begins to worsen. Thus, they search through the options one by one to find the optimum rank.
If we assume the optimal rank r = r , in this case, the solution obtained from (6) is (U , V , E ). According to the fact that data space X is full and the theorem in ( [1], [23]), we obtain the optimal Z by Z = X + U V (X + is the pseudoinverse of X ). The obtained rank is heavily dependent on the hyperparameter µ, and numerous additional iterative calculations must be done before the optimal rank is obtained. In order to find the rank adaptively to reduce the number of iterations, we design a new model in section 3, which adds the group norm regularization term ||U || 2,1 to the model (6).

III. THE GROUP NORM REGULARIZED FACTORIZATION MODEL AND ALGORITHM
For calculating speed, the matrix factorization method is superior to the nuclear norm approximation method. However, it is difficult to estimate the rank of the restored matrix Z using the former method. So, our goal is to find an adaptive method for estimating the ranks of different types of data. The rank of a matrix is determined by the number of rows or columns in the factor matrix, and the rank of the matrix is reduced if some columns are zero. So, we first take an oversized factor matrix, and make the number of columns in the factor matrix zero by introducing group norm regularization; this achieves the purpose of adjusting the rank adaptively.

A. THE GROUP NORM REGULARIZED FACTORIZATION MODEL
If we assume that X ∈ R m×n is a matrix of data samples, m is the dimension of the data, n is the number of data points, and some data contain noise. We hope to remove noise and represent clean data with a low rank to obtain an affinity matrix. We obtain the group norm regularized factorization model (GNRFM) by adding the group norm regularization term ||U || 2,1 to (6): where X ∈ R m×n , U ∈ R m×K , V ∈ R K ×n , K is a larger number, and ||.|| F is classic Frobenius norm. ||U || 2,1 is the group norm of U , and ||U || 2,1 = j i (U i,j ) 2 . The true rank r of X is usually unknown, and K is a relatively large initial guess, such as K = n. Owing to the group norm regularization, some columns in U will be equal to zero under proper parameters µ U , µ V . Assuming s columns in U will equal zero based on the group norm ||.|| 2,1 , then we can get rank(UV ) ≤ K − s. So, we reach the goal of adjusting the rank of UV adaptively only by introducing group norm regularization. ||V || 2 F is also very important because U and V play roles of balance and mutual restraint in GNRFM.
In summary, the GNRFM model adaptively estimates ranks for different types of data without the need to design additional updated rank strategies. The regularization terms make the model more resistant to noise. Of course, we introduce two extra hyperparameters µ U and µ V , but numerical results show that our model is less sensitive to hyperparameters relative to other models.

B. THE AUGMENTED LAGRANGIAN METHOD (ALM)
In this section, we introduce the ALM method to solve (7). For such bi-convex problems, i.e, convex in U for a fixed V and convex in V for a fixed U, Sun and Fevotte [26], Shen et al. [27], Xu et al. [28], and Chen et al. [22] all used similar ALM methods to solve such bi-convex problems, and VOLUME 8, 2020 obtained relatively good numerical results. The augmented Lagrange function in formula (7) is defined as follows: where β is a penalty parameter, Y ∈ R m×n is the Lagrange multiplier corresponding to the constraint X = UV + E and <, > is the usual inner product. It is well-known that, starting from Y 0 = 0, the classic Augmented Lagrangian Method solves at the t-th iteration and then updates . Similar to classical ALM, we update E and (U , V ) at the t-th iteration separately: It is difficult to solve (10a) directly because U and V are coupled, so we propose a method called the inner iteration technique to obtain an approximate solution: where l represents the inner iteration steps. At this point, V is solved by least square method: where I K is a K -order identity matrix.
Since U is difficult to solve, inspired by [20], we conduct quadratic linearizing in (11) and add a proximal term: where is the same as proposed in [20].
We obtain the solution to (14) by soft threshold shrinkage: where . . , K , and X * i represents the i-th column of X .
Owing to the soft-thresholding rule, some columns in U are equal to zero, so we obtain a low-rank solution. Similarly, we get an explicit expression of E: where i = 1, 2 . . . , n.
To avoid ALM converging to an infeasible point, we adopt the strategies proposed by Lu and Zhang [29] to update β t in the third part of Algorithm 1. At this point, we have given the explicit formula for updating the variables in (9) at the t-th iteration. According to the above updating formula, we employ Algorithm 1 to solve problem (7).
according to (13) and (15) to find an approximate solution for (10). Then, update E t+1 according to (16), so we can find an approximate point 3. If t > 0 and 4. Set t ← t + 1.

end while
Many books or articles (Boyd [30], Chen et al. [22], Sun and Fevotte [26], Shen et al. [27], Xu et al. [28]) all numerically show strong convergence behaviour and fast calculation speed for non-convex problems like this type of matrix factorization. But the proof of convergence about applying the ALM to non-convex problems is still a very difficult matter at present. The last four articles assume that ALM algorithm converges to the KKT point under some strong conditions, which are difficult to verify theoretically. Thus this topic is deserving of future research. For a detailed discussion of convergence, we recommend readers to see [22].

C. THE ACCELERATED ALM METHOD (AALM) FOR GNRFM
In this section, we propose two techniques to accelerate the ALM for GNRFM. The techniques aim to reduce the computational complexity at each iteration and the number of iterations. In Figure 1 and Figure 2 of Section 4, we compare the accelerated and unaccelerated ALM on synthetic data.
The first technique conducts inner iterations during only one step for U and V , which is also adopted in [22]: where Only one V t+1 is replaced here to facilitate the later proof. Although we solve (10a) and (U, V) at the same time with only one inner iteration, the numerical values show that the Algorithm 2 converges in about ten steps.
The computational complexity primarily stems from the matrix multiplication at each iteration. In the present case, some columns from matrix U are zero owing to the utilization of group norm regularization. This fact inspires the second technique, that is, we delete the zero columns in U and the corresponding rows in V before performing matrix multiplication. In numerical experiments, we find that r ≤ K t+1 ≤ K t ≤ K . Here, K t is the number of non-zero columns of U at the t-th iteration. Next, we offer a theoretical proof to ensure that deleting the zero vector column does not affect convergence.
Theorem 2: When updating U , V by (22) and (21), if the i-th column in U is a zero vector column, then this column is always a zero vector in subsequent iterations.
Proof: According to (21), we get: Therefore, the second technique does not affect the convergence, and it speeds up the calculation. We provide a detailed comparison of ALM and AALM in terms of synthetic data in the next section. By applying the above acceleration techniques, we arrive at Algorithm 2 as below.

D. TIME COMPLEXITY
The time complexity for AALM algorithms depends on two factors: the total number of iterations and the computational Algorithm 2 Accelerated ALM (AALM) for GNRFM Input: Data X , an overestimated rank K , and hypermeters µ U , µ V ; according to (21) and (22) to find an approximate solution for (10). Delete zero columns in U and corresponding rows of V . Then, update E t+1 according to (16) 3. If t > 0 and 4. Set t ← t + 1. end while complexity of each iteration. The numerical results show that our algorithm converges rapidly and only needs about ten iterations. So, we focus on discussing the computational cost per iteration.
From Algorithm 2, we see that the computational complexity arises form matrix multiplication. At the t-th iteration, the computational complexity of AALM is O(K 2 t n + K 2 t m + K t mn). Here, K t is the number of non-zero vector columns in U at the t-th iteration. Since r ≤ K t+1 ≤ K t ≤ K , the time complexity per iteration for AALM is O(rmn).

E. SUBSPACE SEGMENTATION (CLUSTERING)
Similar to Liu et al. [23], we design the following algorithm to perform subspace segmentation (clustering) based on U , V obtained by solving (7). In the fifth step of Algorithm 3, each item is squared to ensure that the elements in the affinity matrix are positive. VOLUME 8, 2020 In the third and sixth steps, one SVD decomposition is needed. For small data sets, this does not take too much time. For large-scale data, the Nyström approximation is also a popular family of methods to replace SVD [31], especially for Spectral Clustering [32]. The data set tested in this article is not particularly large, so the calculation of these two parts is not as critical. In summary, Algorithm 3 describes how to use the solution obtained by GNRFM for clustering.

IV. NUMERICAL EXPERIMENTS
In this section, we test the efficiency of our algorithm and compare it with other algorithms. We implement our algorithm on a PC with 3.2GHZ AMD Ryzen 7 2700 Processor and 16GB of running memory. All computations are done in Matlab 2016b and few tasks are written with C++. We compare our algorithm with three methods: LADMAP(A) [20], IRLS [21], and HMFALM [22]. The first method is based on model (4), which is faster than other SVD algorithms because it uses an adaptive adjustment penalty term to accelerate convergence and uses skinny SVD instead of traditional SVD. The first method also reduces the complexity from O(n 3 ) to O(rn 2 ), where r is the predicted rank of Z . IRLS smoothes the objective function by introducing regular terms and then uses the weighted least squares method to solve the variables alternately. Although the singular value decomposition is not required during the algorithm, the matrix product complexity is still O(n 3 ). During the solution process, the Matlab command lyap is used to solve the Sylvester equation (sometimes the solution of the equation is not unique, and the program is terminated), but in some problems, the number of iterations is less than that of LADMPA(A). HMFALM, which is based on the matrix factorization model (6), does not require calculating SVD; so as to be O(rmn) complexity, where m is the dimension of the data. The outer loop finds the rank, which starts from 1 and increases by step d. Under each outer loop, the inner loop is calculated iteratively until the stop condition is met to break out of the inner loop, and until the best rank interval is obtained to find the optimal r one by one. HMFALM is faster than the first two algorithms, but it is very sensitive to the hyperparameter µ, and the anti-noise ability is not good without a regularization term.
Our model GNRFM adds the group norm regularization term to the matrix factorization model (6) and uses the nature of the group norm regularization term; the factor matrix has zero columns so as to adaptively reduce the rank. Although our rank starts to decrease from a large number K , it only takes a few steps to iterate from a large rank to a small rank. The numerical results show that our AALM algorithm converges in about ten iterations for (7). The stopping criteria in our numerical experiments is defined as follows: where ε is a moderately small number. In the following numerical experiments, we adopt the classic evaluation index indicators, Accuracy (Acc) and Normalized Mutual Information (NMI), to measure the clustering results. The larger the Acc and NMI values are, the better the clustering performance is.

A. EXPERIMENTS ON SYNTHETIC DATA
We first compare the ALM and AALM (before and after acceleration) on the synthetic data. For the inner iteration of ALM, we try two stopping criteria: 1. The inner iteration stops in five fixed steps.

The stop criterion of the inner iteration converges when
The construction method for noisy synthetic data is the same as in [1], [20], [22], [33]. The specific construction procedure is as follows. First, we denote the number of subspaces as s, and the number of bases in each subspace as r, while the dimension of the data is d. For the first subspace, we construct the basis B 1 , which is a random orthogonal matrix with the dimension d × r. Basis {B i } s i=2 in the corresponding subspace is obtained by B i+1 = TB i , where T is a random rotation matrix. This ensures that these subspaces are independent of each other, and the basis in each subspace is linear independent. Then, in the i-th space, we use the basis to generate p samples: X i = B i P i , where P i ∈ R n×p is independent and identically distributed, obeying the standard normal distribution N (0, 1). Then, we randomly contaminate 20% of the data, such as the data vector x, by adding noise according to the following formula: where η is a zero mean unit variance Gaussian noise vector. Finally, we get the data matrix X = [X 1 , X 2 , . . . , X s ] ∈ R d×sp . We denote s = 40, p = 50, d = 2000, r = 5, and σ = 0.05 generate synthetic data as described above. In Figure 1 and Figure 2, ALM and AALM are compared.  In Figure 1, the horizontal axis represents time(s), and the vertical axis is obtained after log 10 transformation of error Figure 2, the vertical axis represents the relative error of E 0 , which is noise added into the synthetic data. Figure 1 compares the convergence of ALM and AALM, and Figure 2 compares the results of ALM and AALM algorithms to capture noise. The purple line shows the criterion for the inner iteration of ALM, which then adopts the second criterion: each step iterates until the inner iteration converges. The green line represents the inner iteration with five fixed steps. The red line illustrates the inner iteration with a single fixed step, but the zero vector columns are not deleted. The blue line is the inner iteration with one step and deletes the zero vector columns of each U . From Figure 1 and Figure 2, we see that when AALM uses the two acceleration techniques, it converges fastest and obtains the best recovery result. When ALM converges within each inner iteration, it requires the fewest outer iteration steps (11 steps), but it is the slowest. Comparing the blue line with the red line, we observe that deleting the zero vector columns validates our previous analysis; with no effect on the convergence and result, these deletions save memory space and speed up the calculations. From Figures 1 and 2, we see that the inner iteration does not need to converge; even if one step is adopted, this greatly reduces the calculation time.
From Table 2 to Table 4, we use LADMAP(A), HMFALM, and AALM separately to obtain the corresponding affinity matrix on synthetic data with different levels of noise, and then Algorithm 3 is adopted to perform clustering. The goal is to verify noise resistance and sensitivity to hyperparameters between the AALM and several other algorithms for comparison. If the noise level is too high, then the information from the data is lost. So, for the intensity of the noise, we select σ = [0.05, 0.1, 0.2], that is 5%, 10%, 20% of the original data. For the selection of hyperparameters, we select µ = [0.1, 0.2, 0.5] for the three algorithms LADMAP(A), IRLS, HMFALM (the parameters mentioned by these three articles). Paramter µ for LADMAP(A) and IRLS is the regularized parameter for resisting noise, From model (4), it is apparent that the larger the µ, the better the anti-noise effect. For the HMFALM model, parameter µ plays a role in finding ranks. With respect to our algorithm, (µ U = 1, µ V = 10), (µ U = 1, µ V = 20), and (µ U = 1, µ V = 50) are selected. µ U and µ V are regularized parameters for resisting noise, and µ U leads the column-sparse factor matrix U to adaptively find ranks. The larger the values of µ U and µ V , the better the noise resistance. The greater the value of µ U , the faster the rank drop. For the other parameters for IRLS and LADMAP(A) algorithms, we select optimal parameters  based on the corresponding article, and we select ε = 10 −5 , β 1 = 1, β max = 10 5 , and ρ = 2 for the HMFALM algorithm with the searching gap d = 0.025 * n. We select ε = 10 −5 , β 1 = 1, β max = 10 5 , and ρ = 2 as the other parameters in our algorithm. ε is the parameter for stopping criteria, and β is an increased penalty parameter for the AALM algorithm. For all results, we run the algorithm three times, and take the average as the result for each synthetic data point. We illustrate the best results for each case in bold font. Table 2 to Table 4 feature synthetic data with different degrees of noise and different dimensions, and almost all of our AALM algorithm results perform the best in terms of clustering and high speed. For Table 2, which features a low-level noise situation (σ = 0.05), the most accurate of the three algorithms is more than 90% accurate for different dimensional problems, but the HMFALM and LADMAP(A) algorithms are very sensitive to parameters; that is, the results change greatly if parameters change a little, and there is no consistent parameter µ for all synthetic data. For Table 3, which features moderate noise (σ = 0.1), our AALM algorithm results are still the best, with a speed increase of more than six times that of HMFALM and about forty times greater than LADMAP(A). Particularly in a high-dimensional situation d = 2000, our AALM is more accurate than the other two algorithms by more than 10%. For Table 4, which has the highest noise (σ = 0.2), our clustering accuracy and NMI are significantly higher than those of the other two algorithms. As a result, although our model (7) has one more hyperparameter than (6), our model is not sensitive to hyperparameters, while the clustering result of the other two models is greatly  affected by the hyperparameter µ. In addition, when σ = [0.05, 0.1, 0.2], our clustering accuracy is the best, and even in some cases it is upwards of 20% more accurate than other algorithms. So, the GNRFM model, when introduced with the group norm regularization term, has good noise immunity and is robust.
Specifically, we take (s, p, d, r = 10, 20, 200, 5) and (µ U = 1, µ V = 50) from Table 2 and draw Figure 3 and Figure 4 for verification. At this time, the data come from ten subspaces; each space has 20 vectors, and each vector is 20 dimensions. Figure 3 shows the values of an affinity matrix, which is obtained by Algorithm 3 after the AALM algorithm solves the GNRFM model with color images. It can be seen that the affinity matrix is clearly divided into ten blocks, and the correlation between different subspaces is very small, and the internal correlation in the subspace is very high, the affinity matrix is well portrayed and accurately segments the subspace. In Figure 4, we use t-SNE [34] to visualize our affinity results. From Figure 4, we see that the affinity results are clearly divided into ten categories, so our models and algorithms are effective for subspace segmentation.

B. EXPERIMENTS ON REAL DATA
In this section, we test the clustering effectiveness of our algorithm in sport motion segmentation datasets (the Hopkins155 dataset [35] and HARUS dataset [36]) and face segmentation datasets (the Extended Yale B dataset [37] and CMU PIE dataset [38]). Description of the four data sets are shown in Table 5.
The Hopkins155 dataset contains 156 data sequences; each video data sequence contains from 39 to 550 data vectors (from two or three motion modes), and the dimension in each data vector is 72 (24 frames × 3). We specify the number of classes (two or three classes) in each data sequence, and take advantage of HMFALM, LADM, IRLS, and AALM respectively in these 156 sequences to solve the affinity matrix and conduct clustering. In Table 4, we give the total accuracy, average NMI, average iteration steps, and average time for the data series in the conditions of two motions, three motions, and all motions. Among them, HMFALM, LADM and IRLS, we select the value of µ = 2.4 (the optimal parameters tested by the authors in their article). With respect to the AALM algorithm, we select µ U = 0.005, µ V = 3.
As seen from Table 6, our algorithm is faster than the other three algorithms. For the NMI index with three motions, the results of our algorithm are not as good as IRLS, but our algorithm is much faster than IRLS. For video data, instant clustering is very important. Overall our models and algorithms have achieved the best results. VOLUME 8, 2020   The Human Activity Recognition Using Smartphones Dataset (HARUS) contains sensor signals data collected by sensors with a group of 30 volunteers carrying out 6 activities. The HARUS dataset contains 10,299 signals, and each signal is a 561-dimensional feature. In Table 7, we illustrate the accuracy, NMI, iteration steps, and time for four algorithms used on the HARUS dataset. For HMFALM, LADMAP(A), and IRLS, we select µ = 0.01 (the optimal parameters tested by [33], because a large µ value is too time-consuming), with respect to the AALM algorithm; we also select µ U = 0.005, µ V = 3. Because the data dimensions are too large, instead of using Algorithm 3 to convert Z into an affinity matrix (SVD requires too much time), we use formula W = (|Z | + |Z T |)/2 to conduct the transformation. From Table 7, we see that for the large data set, HARUS, the IRLS algorithm does not work. In the remaining algorithms, our AALM algorithm takes less than two seconds, and it is almost 20% more accurate than the other two algorithms. Then, we use t-SNE to visualize the affinity results in Figure 5; we see that the data is roughly separated into six categories with a little error.
In Figure 6, we show the Extended Yale B dataset and CMU PIE dataset. The Extended Yale B dataset contains 38 subjects (people), and each subject has 64 facial images. The CMU PIE dataset contains 68 subjects (people), and each subject has 170 facial images. The Extended Yale B dataset also contains noise from different angles of light. Figure 6 above shows 20 pictures from one person's face where data has lighting noise such that some faces cannot be seen clearly or even become dark. For instance, the image in the fourth picture cannot even be identified by the human eye. The CMU PIE dataset featrues human expressions in addition to light noise, so clustering is more difficult. Similar to Lu et al. [21], we conduct two experiments in each dataset by constructing the first five subjects and the first ten subjects into a dataset X. First, we resize all the pictures to 32× 32. Second, to reduce noise, we project them to a 30-dimensional (80) subspace for five subjects and a 60-dimensional (160) subspace for ten subjects by principle component analysis (PCA) for the Extended Yale B (CMU PIE) dataset. Third, by applying HMFALM, LADM, IRLS, and AALM to solve the low-rank representation problem, we obtain different affinity matrices. At last, we compare the clustering results using Algorithm 3 with different affinity matrices. We set parameter µ = 1.5 for HMFALM, LADM, IRLS (the optimal parameters tested by the authors in their article), and (µ U = 1, µ V = 20) for AALM.
As can be seen from Table 8, the result of the IRLS algorithm is the worst. The remaining three algorithms achieve the same accuracy and NMI, but our algorithm is the fastest. We use t-SNE to visualize the affinity results of the Extended Yale B dataset. From the five subjects in Figure 7, we see that the data is roughly separated into five categories with a little error. From Table 9, which illustrates the CMU PIE dataset, it is apparent that our algorithm achieves the best results. For the large-scale problem of ten subjects in Table 9, our algorithm is much faster than the others. Our algorithm is fast and effective for face clustering.
In summary, our algorithm achieves the best accuracy with the fastest computing speed in real problems with sports motion data (Hopkins155 dataset and HARUS dataset) and facial data (Extended Yale B dataset and CMU PIE dataset). In addition, our AALM algorithm can immediately provide the affinity results for data.

V. CONCLUSION
In this paper, we first propose the use of the group norm regularized factorization model (GNRFM) to solve the problem of low-rank representation with minimal rank, and then we apply it to subspace segmentation. The calculation of the traditional nuclear norm approximation method is very complicated as O(n 3 ), while the calculation of our group norm model is O(rmn). Compared with the traditional factorization model, which greedily searches for ranks, we adaptively find the rank by introducing the group norm regularization term, which greatly reduces the number of iteration steps. In addition, the group norm regularization term also boasts anti-noise effects and makes the model more robust. On synthetic data and real data, our GNRFM model and the algorithm designed for it achieve excellent clustering results. Furthermore, our AALM algorithm only requires about ten iterations, so the speed is much faster than traditional algorithms; thus, our algorithm plays a role in the process of instant rapid clustering in the era of big data. In the future, we will also consider applying the affinity matrix obtained by the GNRFM to AP clustering [39], Multi-view affinity learning [40], [41], and spectral clustering with interplay manner [41]. In particular, our group norm regularized factorization method can be used for a series of low-rank problems. Some theoretical proofs given by authors will be provided in the future.