Subspace Segmentation by Low Rank Representation via the Sparse-prompting Quasi-rank Function

In this paper, a general optimization formulation is proposed for the subspace segmentation by low rank representation via the sparse-prompting quasi-rank function. We prove that, with the clean data from independent linear subspaces, the optimal solution to our optimization formulation not only is the lowest rank but also forms a block-diagonal matrix, which implies that it is reasonable to use any sparse-prompting quasi-rank function as the measure of the low rank in subspace clustering. With the data contaminated by Gaussian noise and/or gross errors, the alternating direction method of multipliers is applied to solving it and every sub-optimization problem has a closed-form optimal solution when the band restricted thresholding operator induced by its corresponding sparse-prompting function has an analytic expression, in which the gross errors part is replaced with the sparse-prompting matrix function. Finally, taking a specific sparse-prompting function, the fraction function, we conduct a series of simulations on different databases to get the performance of our algorithm tested, and experimental results show that our algorithm can obtain lower clustering error rate and higher value of evaluation indicators ACC, NMI and ARI than other state-of-the-art subspace clustering algorithms on different databases.


I. INTRODUCTION
I N the past few decades, there has been an explosion of high dimensional databases in many fields, such as machine learning, computer vision and signal processing. With the number of dimensions in a database increasing, distance measures become increasingly meaningless. In fact, in very high dimensions, the distances between data points are almost equidistant [12], which leads to the performance damage of many traditional clustering algorithms. While the high dimensional databases, which are not uniformly distributed across the circumambient space and their essential dimensions are much lower than that of the circumambient space, often lie in the union of some low dimensional subspaces [2], [29], [46]. To cluster such data into clusters, in which every cluster corresponds to a underlying subspace, the methods of high-dimensional data clustering, subspace clustering methods [42], [49], are proposed.
In general, the subspace clustering methods roughly consist of four kinds: iterative methods [3], [24], statistical methods [19], [36], [45], algebraic methods [37], [44], [50] and spectral clustering-based methods [8], [28], [35], of which the first three are sensitive to noise, outliers, and initialization [16]. The spectral clustering-based methods, which firstly build an affinity matrix that can accurately describe the similarity of each pair of data points and then apply this affinity matrix to the framework of spectral clustering [40], have performed superiorly in many fields such as computer vision, signal and image processing and machine learning [1], [25], [34], [55]. We can see that the key to spectral clustering-based methods is to bulid a good affinity matrix.
Recently, some methods for constructing a good affinity matrix have been proposed based on sparse and low-rank representation, such as the sparse subspace clustering (SSC) [16], the low rank representation (LRR) [32], [33] and the low rank subspace clustering (LRSC) [26], [51]. In SSC schemes, the sparse representation and the 1 -norm minimization are employed for a desired affinity matrix. If the clean data are drawn from independent linear subspaces, it is proved that the points which have nonzero coefficients in the sparse representation of a point are in the same subspace. Different from SSC, the low rank representation and the nuclear norm minimization are employed for a desired lowrank affinity matrix in the LRR and LRSC schemes. If the clean data are from independent linear subspaces, both LRR and LRSC also show that the optimal solution matrix to the nuclear norm minimization not only coincides with the optimal solution to the rank minimization but also forms a block-diagonal matrix, i.e., its (i, j)th entry is nonzero only if the ith and the jth points are from the same subspaces, which implies that the optimal solution is a good affinity matrix. However, if the data are corrupted by noise, especially by the gross errors, it is not clear whether the obtained affinity matrix is the lowest rank or whether the obtained sparse noise matrix is the sparest from theoretical viewpoint, which are extremely important for a good affinity matrix. To obtain a better affinity matrix, many non-convex low rank approximation functions, such as 0 -norm [4], the Schatten p-norm [9], [53], [59], the weighted Schatten p-norm [54], log-determinant rank function [27] and multivariate GMC penalty function [4] are employed. Extensive experiments demonstrate that these approximation functions have a better performance. However, are these non-convex functions used as the measure of the low rank in subspace clustering reasonable in theory and is there any other function that can replace them?
In this paper, the two questions above are mainly discussed. First, we show that the sparse-prompting penalty function can produce a sparse solution for the noiseless signals in sparse signal recovery and based on this result, we demonstrate that the optimal solution to the sparse-prompting quasi-rank function minimization is the lowest rank and forms a block-diagonal matrix if the uncorrupted data are drawn from independent linear subspaces. This conclusion that coincides with the nuclear norm minimization used in LRR and LRSC implies that any sparse-prompting quasirank function used as the measure of the low rank in subspace clustering is reasonable. Second, if the data are corrupted by Gaussian noise and/or gross errors, we replace the low rank and gross errors parts with the sparse-prompting quasi-rank function and the sparse-prompting matrix function, respectively. The optimization solution to this problem is different from LRR and LRSC. Subsequently, we apply the alternating direction method of multipliers to solving it and every suboptimization problem has a closed-form optimal solution when the band restricted thresholding operator induced by its corresponding sparse-prompting function has an analytic expression. Finally, taking a specific sparse-prompting function, the fraction function, we conduct a series of sim-ulations on different databases including Extended Yale B face database, ORL face database and Hopkins 155 motion segmentation database to demonstrate the performance of our algorithm. The results show that our algorithm can obtain lower clustering error rate and higher value of evaluation indicators ACC, NMI and ARI than other state-of-the-art subspace clustering algorithms on different databases.
The paper is organized as follows. In section II, we mainly review some existing approaches on subspace clustering by sparse and low rank representation, including sparse subspace clustering(SSC), low rank representation(LRR) and low rank subspace clustering(LRSC). In section III, we review some existing results on sparse signal recovery and show that the sparse-prompting penalty function can lead to the sparse solution to the constrained minimization problem, which is useful in discussing the subspace segmentation by low rank representation with the clean data. In section IV, we discuss the regularized rank minimization via the sparse-promoting quasi-rank function and show that the optimal solution can be acquired by a matrix band restricted thresholding operator. The subspace segmentation by low rank representation via the sparse-prompting quasi-rank function with the clean data is discussed in section V. We show that any sparse-prompting quasi-rank function used as the measure of the low rank in subspace clustering is reasonable. The subspace segmentation by low rank representation via the sparse-prompting quasi-rank function with the corrupted data is discussed in section VI. A series of simulations are conducted to test the performance of our algorithm by taking the fraction function as a specific sparse-promoting function in section VII, Finally, section VIII gives the conclusions.

II. RELATED WORK
In this section, some classic methods on subspace clustering by sparse and low rank representation, including sparse subspace clustering(SSC) [16], subspace segmentation by low rank representation(LRR) [32], [33] and low rank subspace clustering(LRSC) [26], [51], is reviewed. Let X = (x 1 , x 2 , ···, x N ) ∈ R d×N be a set of d-dimension data points drawn from an unknown union of k linear subspaces S j ( j = 1, 2, ···, k) with corresponding dimension d j . Subspace clustering aims to to find out the number k of subspaces and cluster data into clusters, in which every cluster corresponds to an underlying subspace. As is described in Section I, the key to spectral clustering-based methods is to construct a good affinity matrix that can accurately depict the similarity of each pair of points. If the set of data points has the selfexpressive property, i.e., each data point in a union of subspace can be efficiently represented as a linear combination of other data points, the representation coefficients, considered as a similarity measure, can be regarded as a good affinity matrix. SSC, LRR and LRSC all construct an affinity matrix based on representation coefficients, which is widely used at present. The major difference between them is that SSC is based on a sparse representation and LRR and LRSC are based on a low rank representation.
The goal of SSC is to find out a sparse representation of the data points set X by minimizing the number of nonzero coefficients. If the data are clean, the following model is employed where C is a sparse representation matrix and C 0 means the number of non-zero elements in the matrix C. Because this minimization problem in general is NP-hard, 1 minimization, as the tightest convex relaxation of 0 minimization, is considered. That is where C 1 = i j |c i j |, i.e., the sum of absolute value of elements in the matrix. It is shown in [16] that under some conditions, the solutions to (II.1) and (II.2) coincide and that c i j 0 only if points i and j are in the same subspace.
If the data are corrupted by Gaussian noise G and gross errors E, i.e., only a small percentage of the entries of X are corrupted [7], the affinity matrix C can be acquired by solving the following convex optimization model

B. SUBSPACE CLUSTERING BY LOW RANK REPRESENTATION (LRR)
Different from SSC, the goal of LRR is to find out a low rank representation. If the data are clean, the following model is considered min Because of the discrete nature of the rank function, this problem is difficult to solve in general. Hence, the following convex optimization model is considered: in which C * is the nuclear norm of C, i.e., the sum of the singular values of C. It is shown that in [32], [33] the solution to model (II.5) is the Costeria and Kanade affinity matrix C = V 1 V 1 , where X = U 1 ΣV 1 is the rank d singular value decomposition of X. In fact, V 1 V 1 is a block-diagonal matrix if linear subspaces are independent [11], which implies that C = V 1 V 1 is a good affinity matrix. With the data corrupted by Gaussian noise and/or gross errors, the following convex optimization problem is considered, min |e i j | 2 is the 2,1 norm of E.

C. LOW RANK REPRESENTATION SUBSPACE CLUSTERING (LRSC)
Although the low rank representation is used in LRR and LRSC, LRSC finds a symmetric, low rank affinity matrix C. Thus, the LRSC solves the following non-convex optimization problems if X is clean, and min Y,C,G,E if X is corrupted by the Gaussian noise G and the gross errors E. It is shown that, if the data X are clean, the optimal solution to problem (II.7) is also the Costeria and Kanade affinity matrix C = V 1 V 1 which is similar to LRR, and hence C can be used as an affinity matrix. Moreover, if X is corrupted only by the Gaussian noise G (i.e., γ = ∞), then Y and C can be acquired with the singular values of X and Y thresholded, respectively.

III. SPARSE SIGNAL RECOVERY VIA THE SPARSE-PROMPTING PENALTY FUNCTION
We briefly review some definitions and results on sparse signal recovery in this section, especially the optimal solution to the regularized minimization via the sparse-prompting penalty function is the band restricted thresholding operator. In addition, for the constrained minimization problem, we show that the sparse-prompting penalty function can lead to the sparse solution, which will be used in discussing the subspace segmentation by low rank representation if the data are clean.
Sparse signal recovery, solving a high dimension underdetermined system for sparse solutions, has attracted much attention in recent years and there is no doubt that it is definitely beneficial and favorable in different fields, such as compressed sensing [13], [15], [21], subspace clustering [16], [32], [33]. To model this problem, the following constrained minimization problem is commonly considered, which is also known as 0 minimization problem [39], in which A is an m × N real matrix with rank(A) N, x ∈ R N and b ∈ R m , and x 0 named 0 norm is the number of nonzero entries in it. If A is a full row rank matrix, rank(A) N becomes m N. However, the 0 optimization problem is actually NP-hard and sensitive to noise because of the discontinuous nature of the 0 norm.
An important method to solve 0 minimization problem is to replace 0 -norm with a continuous relaxation function, which is called sparsity-promoting penalty function in this paper.
Definition III.2. Let p(·) be the (strict) sparse-promoting function. Then (1) For a signal x ∈ R N , the function P : p(x i ) is called the (strict) sparse-promoting penalty function.
(2) Given a matrix X ∈ R m×N , the function P * : (3) Assume that X = UΣV is the singular value decomposition (SVD) of the matrix X ∈ R m×N , in which U and V are m × m and N × N unitary matrices and Σ = diag(σ i ) 1≤i≤m is the diagonal matrix with the singular values σ i of X. Then the function P : There are many different sparse-promoting functions used in literature; some relevant examples are given in Table 1.
Replacing the 0 norm with a sparse-promoting penalty functions P(·), the minimization (III.1) is rewritten as for the constrained problem and min x∈R N for the regularization problem, where λ > 0 is the regularized parameter. It is noted that all the sparse-promoting functions are strict except for 1 norm. 1 norm is typical and popular and based on it, considerable excellent theoretical work has been done [6], [14]. However, because 1 -norm, as a convex surrogate form of the 0 -norm, often fails not only to produce the sparest solution but also to select the position of non-zero coefficient [17], [58], many strict sparse-promoting functions are discussed [17], [20], [22], [31], [43], [58], [60].
In the following, we first demonstrate that any sparsepromoting penalty function can reduce the sparse optimal solution in sparse signal recovery.
Lemma III.1. (1) Suppose that P(x) is any strict sparsepromoting penalty function andx is the optimal solution to problem (III.2), then the columns of matrix A m×N corresponding with the support ofx are linear independent, i.e., x 0 = k ≤ rank(A).
(2) If P(x) = x 1 , the problem (III.2) has at least one optimal solution x * with at most rank(A) non-zeros.
Proof. (1) Assume thatx is the optimal solution to problem (III.2) with x 0 = k > rank(A), then the k columns of matrix A m×N corresponding with the support ofx are linear dependent. Thus, a non-trivial vector h ∈ R N exists which has the same support asx such that Ah = 0 and max 1≤ j≤k Obviously,x − h,x + h are also solutions to Ax = b and for arbitrary j,x j − h j ,x j + h j andx j have the same sign.
Because p(t) is a strictly concave function on t ∈ (0, +∞), we have which are in contradiction to the fact thatx is the optimal solution to problem (III.2), so x 0 ≤ k.
Remark III.1. It is necessary to point out that, when P(x) = x 1 , the problem (III.2) may have more than one solution and the number of non-zeros of other solutions may be larger than rank(A), which is different from Lemma III.1 (1).
Second, we introduce an important result that any sparsepromoting function can induce a band restricted thresholding operator.
where τ is called a thresholding parameter and c is called a band parameter.
Definition III.4. Let h τ be a band restricted thresholding operator. Then (1) the vector band restricted thresholding operator H τ : (2) the matrix band restricted thresholding operator H τ : where H τ (Σ) = diag(h τ (σ i )) and X = UΣV is the singular value decomposition of the matrix X.
Remark III.2. Lemma III.3 shows that any sparsepromoting penalty function can lead to a band restricted thresholding operator H τ , but we can not make sure whether its analytic expression exists. For example, only when p = 1 2 , 2 3 and 1, the band restricted thresholding operators induced by P(x) = x p (0 < p ≤ 1) have analytic expressions.
At the end of this section, we introduce a specific sparsepromoting function, the fraction function p(x) = a|x| a|x|+1 (a > 0), in detail, since it will be taken in section VII. In fact, the fraction function is widely used in image restoration [23] [41] and sparse signal recovery [31]. In [31], the equivalence between 0 minimization and fractional function minimization is discussed and the analytic expression of the optimal solution to model (III.3) is given.

IV. LOW RANK MINIMIZATION VIA THE SPARSE-PROMOTING QUASI-RANK FUNCTION
In this section we mainly discuss the regularized rank minimization via the sparse-promoting quasi-rank function and show that the optimal solution can be acquired by a matrix band restricted thresholding operator. Assume that the corrupted data matrix D = Z + G, where G is the Gaussian noise and Z is an unknown clean low rank matrix, then the following regularized rank minimization is commonly employed to find a low rank approximation of Z, where λ > 0 is a parameter. Because of the discrete nature of the rank function, this problem is NP-hard in general. Hence, the following convex optimization model is considered: in which Z * is the nuclear norm of Z. The optimal solution to this problem is Z = US λ (Σ)V [5], where D = UΣV , Σ = diag(σ i ) 1≤i≤m is the singular value decomposition of the matrix D, and S λ (x) = diag(s λ (σ i )) is the matrix soft thresholding operator, i.e., Before giving the main conclusion of this section, we need the following inequality.
Lemma IV.1. (Von Neumann's Inequality) Let matrices X, Y ∈ R m×N be given, and let σ 1 (X) ≥ σ 2 (X) ≥ · · · ≥ σ N (X) VOLUME 4, 2016 and σ 1 (Y) ≥ σ 2 (Y) ≥ · · · ≥ σ N (Y) denote the singular values of X and Y, respectively. Then The equality achieves if and only if there exist two unitary matrices U and V such that X = UΣ X V and Y = UΣ Y V hold simultaneously, where Σ X and Σ Y are the diagonal matrices and its diagonal elements are arranged in decreasing singular values of X and Y, respectively.
Theorem IV.1. Let D ∈ R m×N be a given matrix and P(Z) the sparse-promoting quasi-rank function. Then the optimal solution Z * to the following minimization is the matrix band restricted thresholding operator H τ (D) defined in Definition III.4, where the band restricted thresholding operator h τ is induced by sparse-promoting function p(·).
Proof. Let D = U D Σ D V D and Z = U Z Σ Z V Z be the SVD of D and Z, where Σ D and Σ Z are the m × N diagonal matrices and its diagonal elements are arranged in decreasing singular values of D and Z, respectively. We have where the inequality holds by (IV.4). Since the optimal problem min σ 1 (Z),σ 2 (Z),···,σ N (Z) By Lemma III.2, its optimal solution (σ i (Z)) * is a band restricted thresholding operator h τ (σ i (D)).
From Lemma IV.1, we notice that the equality is achieved if and only if Z denotes the m×N diagonal matrices and its diagonal elements are arranged in decreasing singular values of Z * , i.e., the optimal solution Y * to problem (IV.5) is

V. SUBSPACE SEGMENTATION BY LOW RANK REPRESENTATION VIA THE SPARSE-PROMPTING QUASI-RANK FUNCTION WITH UNCORRUPTED DATA
In this section, we show that the optimal solution to the optimization model based on the sparse-prompting quasirank function is the lowest rank and forms a block-diagonal matrix if the uncorrupted data are drawn from independent linear subspaces. This conclusion implies that any sparseprompting quasi-rank function used as the measure of the low rank in subspace clustering is reasonable.
Given a clean data matrix Y ∈ R m×N , whose columns are drawn from a union of k low dimensional linear subspaces of unknown dimensions {d i } k i=1 satisfying d i m and k i=1 d i N, we consider the following optimization problem min C P(C) sub ject to Y = YC, (V.1) where P(C) is any sparse-prompting quasi-rank function.
The following theorem shows that the optimal solution to problem (V.1) is also the Costeria and Kanade affinity matrix C = V 1 V 1 , which is similar to the nuclear norm used in LRR and LRSC. In fact, this matrix is called Shape Interaction Matrix (SIM) in [11] and has been widely used for subspace clustering.
Theorem V.1. Let rank(Y) = k and Y = UΣV be the SVD of Y, where Σ = diag(σ i ) is a diagonal matrix and its diagonal elements are arranged in decreasing singular values of Y. Then the optimal solution to problem (V.1) is Hence, the optimal solution δ * with δ * 0 ≤ k is equal to rank(Y) by Lemma III.1. On the other hand, Y = YC implies that rank(Y) ≤ rank(C) and hence rank(Y) = rank(C) = k.
It follows from rank(Y) = rank(C) = k that equation (V.4) can be rewritten as where Σ 1 = diag(σ 1 , σ 2 , · · ·, σ k ) and ∆ 1 = diag(δ 1 , δ 2 , · · ·, δ k ), which implies that and Equation (V.6) means that the columns of Q 2 must be orthogonal to the columns of V 1 , and hence the columns of Q 1 must be in the range of V 1 . Thus, Q 1 = V 1 R for a rotation matrix R, which reduces to V 1 Q 1 = V 1 V 1 R = R. Together with Equation (V.7), we obtain that R = V 1 P 1 δ, which leads to ∆ 1 = I k and P 1 = V 1 R 1 . Hence, The proof is completed. By the properties of Shape Interaction Matrix (SIM) [11], the following Corollary holds, which implies that any sparsepromoting quasi-rank function as the measure of the low rank in subspace clustering is reasonable. Corollary 1. When linear subspaces are independent, the optimal solution, V 1 V 1 , to problem (V.1) forms a blockdiagonal matrix: The (i, j)th entry of V 1 V 1 can be nonzero only if the ith entry and jth entry samples are from the same subspace.

VI. SUBSPACE SEGMENTATION BY LOW RANK REPRESENTATION VIA THE SPARSE-PROMPTING QUASI-RANK FUNCTION WITH CORRUPTED DATA
In this section, we discuss subspace segmentation by low rank representation via the sparse-prompting quasi-rank function if the data are corrupted by Gaussian noise and/or gross errors. We replace the low rank and gross errors parts with the sparse-prompting quasi-rank function and the sparse-prompting matrix function, respectively. Subsequently, we apply the alternating direction method of multipliers to solving it and every sub-optimization problem has a closedform optimal solution when the band restricted thresholding operator induced by its corresponding sparse-prompting function has an analytic expression.
Let a corrupted data matrix X = Y + G + E be given, where Y is an unknown clean matrix, G is a Gaussian noise and E is the gross errors, we consider the following optimization problem min Y,C,G,E where P(C) and P * (E) are the sparse-promoting quasi-rank function and sparse-promoting matrix function, respectively. Clearly, this problem is equivalent to the following problem min C,G,E We introduce an auxiliary variable S , and the problem (VI.2) is rewritten as This problem can be solved by the alternating direction method of multipliers (ADMM), which minimizes the following augmented Lagrangian function where Λ 1 and Λ 2 are Lagrange multipliers and ν is a penalty parameter. Given variables S k , C k , E k , G k and Λ 1,k , Λ 2,k and ν k , the updated rules are as follows: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. (VI.8) Update the Lagrange multipliers as follows: Update parameter ν k as follows: Termination condition: (VI.12) The sub-optimization problems above are solved separately, and the specific methods are as follows: The sub-optimization problem (VI.5) can be solved by Theorem V.1 and the optimal solution is where H τ (·) is the matrix band restricted thresholding operator and U k Σ k V k is the SVD of matrix S k + Λ 2,k ν k . The sub-optimization problem (VI.6) and (VI.7) have the closed-form solutions, which can be obtained by the Frobenius norm minimization operator: (VI.14) where D k = (X − E k − G k ), and where Z k+1 = (I − S k+1 ).
Since the sub-optimization problem (VI.8) is separable, the ith column vector e i,k+1 of the optimal solution E k+1 is (VI. 16) where P(·) is the corresponding sparse-promoting penalty function of P * (·). By Lemma III.3, we have e i,k+1 = (sign(e i1 )h τ (B µ (e i1 )), · · ·, sign(e iN )h τ (B µ (e iN ))), (VI.17) where h τ is the induced band restricted thresholding operator by sparse-promoting function p(·). Hence, the algorithm formulation, called LRRSP, for the subspace segmentation by low rank representation via the sparse-prompting quasi-rank function is as follows.

VII. EXPERIMENTS
We conduct a series of simulations to test the performance of the algorithm formulation above in this section by taking the fraction function p a (x) = a|x| 1+a|x| as a specific sparsepromoting function. We evaluate the performance of the L-RRSP algorithm by subspace clustering error rate and cluster evaluation indicators ACC, NMI and ARI. The performance of LRRSP algorithm is compared with other state-of-theart subspace clustering algorithms, such as SCC [8], LSA [56], LRR, LRR-H and LRSC, in which the implementations is provided by the authors. Experiments are performed on three standard databases: Extended Yale B face database, ORL face database and Hopkins155 motion segmentation database. The experimental environment is Microsoft Win-dows10 operating system, Lenovo G500 notebook computer with Inter(R) Core(TM) i5-3230M CPU @2.60GHz processor and 4G memory.

A. EXPERIMENTS WITH THE FACE DATABASE
In this subsection, the face databases including the Extended Yale B database and the ORL face database [30] are used to evaluate the performance of our algorithm. Face clustering refers to the problem of clustering a set of face images from multiple individuals according to the identity of each individual. Here, the data matrix X is of dimension m × N, where m is the number of pixels and N is the number of images. According to the Lambertian assessment, the set of all images of each individual with a fixed pose and varying illumination forms a cone in the image space and lies close to a linear subspace of dimension 9 [2]. In practice, a few pixels deviate from Lambertian model due to cast shadows and specularities, which can be modeled as sparse outlying entries. Therefore, the face clustering problem reduces to clustering a set of images according to multiple subspaces and corrupted by gross errors.
The Extended Yale B database samples 38 categories of faces with a grayscale size of 192 × 168 pixels, and each type includes 64 face images taken under different lighting conditions. In order to reduce storage space and computing  cost of all algorithms, we down-sample each image to 48×42 pixels, and then vectorize it to a 2016-dimensional vector as a data sample. Figure 1 shows some sample images of the Extended Yale B face database. Following the experimental settings in [16], the 38 subjects of target images are divided into 4 groups. They are 1 to 10, 11 to 20, 21 to 30 and 31 to 38, respectively. Then we apply clustering algorithms for each trial, i.e., each set of n subjects. Finally, we take the mean and median of the clustering results of all trials for each n subjects. All choices of n ∈ {2, 3, 5, 8, 10} categories will be considered in the first three groups and all choices of n ∈ {2, 3, 5, 8} in the fourth group. Table 2 shows the mean and median subspace clustering error rates of different algorithms, which include SCC, LSA, LRR, LRR-H, SSC and LRSC. Under the number of categories 5, the three indicators ACC, NMI and ARI are shown in Table 3.
The ORL database contains 400 frontal face images of different ages, genders and races from 40 categories. Each categories has 10 face images of size 92 × 112 pixels and black background, of which 40 subjects are from different ages, genders and races. These images change in facial expressions, facial accessories and details, such as smiling or not smiling, eyes open or closed, wearing or not wearing glasses, etc. The posture of the face also changes, whose depth rotation and plane rotation can reach at 20 degrees, and the face size also changes up to 10%. Some sample images of the ORL face database are shown in Figure 2. Following the experimental settings in [30], we resize all the images in the ORL database to 56 × 46 and vectorize them to 2576dimensional vectors as the data samples of 9D subspace. We divide the 40 types of target images into the following two groups: 1 to 20 and 21 to 40. For each cluster, we use all 5, 10, 15, 20 categories that can be selected for the experiments and the clustering errors are shown in Table 4, and the three indicators ACC, NMI and ARI are shown in Table 5 under the number of categories 10.

B. EXPERIMENTS ON MOTION SEGMENTATION
In this subsection, the Hopkins155 motion segmentation databasee [47] is used to evaluate the performance of our algorithm. Motion segmentation is the problem of dividing the video sequence of multiple rigid moving objects into multiple spatio-temporal regions, which correspond to different motions in the scene. Specifically, it extracts and tracks a set of N feature points in each frame of the video, where a 2F-dimensional vector (F is the total number of frames of the video) obtained by stacking the feature data points corresponds to the feature trajectory. The goal of motion segmentation is separating these feature trajectories according to objectąŕs underlying motion. Under the affine projection model, all characteristic trajectories related to the motion of a single rigid lie in an affine subspace of R 2F of dimension d = 1, 2 or 3. Alternatively, they lie in a subspace of equivalent to R 2F , which dimension at most 4 [46]. Therefore, the features of moving objects can be used as data points in subspace, so segmenting the feature trajectory of moving objects is to cluster the samples in subspace.
The Hopkins155 motion segmentation database is made up of 155 video sequences, of which 120 video sequences contain two motions and each sequence contains 30 frames with 266 characteristic trajectories; 35 of the video sequences contain three motions and each sequence contains 29 frames with 398 characteristic trajectories. Some sample images of the Hopkins155 motion segmentation database are shown in Figure 3. Following the experimental settings in [18], we first apply the principal component analysis algorithm to reducing the trajectory from 2F (F is the total number of frames of the video) to 4n (n is the number of subspaces), and then use 2F and 4n data to execute our algorithm. The clustering results under 2F-dimensional data points are shown in Table 6 and under 4n-dimensional data points are shown in Table 7.
The above experimental results show that the clustering error rate of our algorithm is lower and the value of evaluation indicators ACC, NMI and ARI are higher than other algorithms on different databases.

VIII. CONCLUSIONS
In this paper, we propose a general optimization formulation for subspace segmentation by low rank representation via the sparse-prompting quasi-rank function. Our key contribution is to show that any sparse-prompting quasi-rank function as the measure of the low rank in subspace clustering is reasonable, i.e., the optimal solution to our optimization VOLUME 4, 2016    formulation forms a block-diagonal matrix if the clean data are drawn from independent linear subspaces. In addition, the alternating direction method of multipliers (ADMM) is applied to solving the optimization problem if the data are corrupted by Gaussian noise and gross errors. Finally, a series of simulations on different databases are conducted.
JIGEN PENG received the B.S. degree in mathematics from Jingxi University, Jinagxi, China, in 1989, and the M.Sc. and Ph.D. degrees in applied mathematics and computing mathematics from Xi'an Jiaotong University, Xi'an, China, in 1992 and 1998, respectively. He is currently a professor with the School of Mathematics and Information Science, Guangzhou University, Guangzhou, China. His current research interests include nonlinear functional analysis and applications, data set matching theory, the machine learning theory, and sparse information processing. VOLUME 4, 2016