A Randomized Sparsity Preserving Projection and its Exponential Approach for Image Recognition

To preserve the sparse representation of the original high-dimensional data, sparsity preserving projection (SPP) is proposed which involves solving a series of <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> minimization problems and a large-scale generalized eigenvalue problem. Compared to many other dimensionality reduction techniques, SPP does not contain any model parameters. Besides, it may have some discriminative abilities even though no class labels are provided. However, SPP tends to show signs of fatigue when encountering with the large-scale high-dimensional data sets since the process of solving the <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> minimization problems is quite time-consuming. To manage this problem, this paper proposes a randomized SPP (RSPP) via random sketching where the <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> minimization problems can be solved efficiently. Furthermore, a preprocessed approach and an exponential approach with a bit of extra efforts are presented to handle the small-sample-size problem which may be occurred in RSPP. Finally, some experiments on image recognition for four authentic data sets are implemented and the corresponding results demonstrate the superiority of the proposed methods.


I. INTRODUCTION
The ability to produce and collect data has been improved unprecedentedly with the development of society and the progress of science and technology, therefore in many application fields the original data usually has high dimensions.To deal with the so-called ''curse of dimensionality'' problem, one should reduce the dimensionality while keeping some valuable information preserved in the low-dimensional reduced subspace.In this paper, I consider sparsity preserving projection (SPP) [1] which aims to preserve the sparse reconstructive relationship of the raw data based on sparse representation, that is, solving a series of ℓ 1 minimization problems mathematically.Since the sparse representation approach does not need any parameters, SPP tends to be more objective and conveniently used in practice compared to many other dimensionality reduction (DR) methods.Moreover, SPP may contain some natural discriminating The associate editor coordinating the review of this manuscript and approving it for publication was Varuna de Silva .information even if no class labels are provided, although it is an unsupervised DR method.
The popular DR methods up to now can be roughly divided into three categories based on linearity.The first class is the linear DR methods which are designed to preserve the global structures of the original data.Till now, principle component analysis (PCA) [2] and linear discriminant analysis (LDA) [3] are still two most classical and popular representations, where the former is unsupervised and aims to maximize the separability of the low-dimensional representation in the reduced subspace which is equivalent to the singular value decomposition (SVD) of the standardized data mathematically and the latter is supervised and aims to maximize the inter class distance while minimizing the intra class distance in the reduced subspace.Compared to these linear DR methods, SPP has the ability to uncover the manifold structures since the real-world data is usually nonlinear and lies on a low-dimensional manifold [4].
The second class is the nonlinear DR methods which are designed to discover the local manifold structures of the raw data.Isometric mapping (ISOMAP) [5] and Laplacian eigenmaps (LE) [6] are two typical representations, which have shown their outstanding performances on discovering the nonlinear structures in the classification applications.However, they may need to take some special tricks to handle the so-called ''out-of-sample'' problem occurred in the recognition applications.Compared to these nonlinear DR methods, SPP can handle the recognition problems easily since it offers a linear transformation matrix along with the low-dimensional representation.
The third class is the local linearized DR methods which can not only uncover the nonlinear manifold structures but also be applicable to the recognition applications, represented by locality preserving projection (LPP) [7], [8] and neighborhood preserving embedding (NPE) [9], [10].However, they are seriously sensitive to the their model parameters while these parameters are usually quite difficult to determined beforehand.Compared to these linearized DR methods, SPP is a parameter-free method which does not contain any model parameters.
However, one should admit that SPP and its extensions [11], [12], [13], [14], [15], [16], [17] suffer from the expensive computational costs of solving a series of ℓ 1 minimization problems, especially for the large-scale high-dimensional data sets.It can be noticed that the major part of these ℓ 1 minimization problems is the linear sparse reconstructive constraints where the coefficient matrix, namely, data matrix is kept unchanged while the righthand vector takes every sample successively.Therefore, if these linear constraints can be treated skillfully and cheaply, one may reduce the computational costs of solving lots of ℓ 1 minimization problems sharply, which would in turn promote the applications of SPP and its extensions.
To the best of my knowledge, preconditioning is the most common approach to deal with a series of linear constraints where the coefficient matrix is fixed.Speaking precisely, one should compute a certain matrix decomposition of the coefficient matrix such as QR decomposition, LU decomposition or SVD.Unfortunately, these decompositions are also too expensive to obtain when the scale of the data matrix is large.Fortunately, the data matrix in the realworld problem usually has sparsity which implies that it may have an approximate low-rank matrix factorization.Recently, randomized (or stochastic) algorithms [18], [19], [20] based on random sketching (or sampling) are the most widely used approach to compute an approximate decomposition of a low-rank matrix.The first contribution of this paper is to apply the random sketching technique to SPP, and propose a randomized SPP (RSPP).In fact, the random sketching approach can not only reduce the computational costs sharply, but also obtain a better sparse reconstructive weight matrix.Since in many practical problems, the data is usually contaminated by noise, and the noise can be interpreted mathematically as very small singular values which can be eliminated automatically by randomized algorithms.Another drawback of SPP is the small-sample-size (SSS) problem.The mathematical model of SPP would be ill-posed when the number of rows is larger than that of columns of the data matrix just like many other linearized DR methods do.Many great efforts have been made to handle the SSS problem in the past few decades, including preprocessed method [1], [3], [7], [9], kernel method [21], [22], [23], two-dimensional method [24], [25], [26] and exponential method [27], [28], [29], [30], [31], [32], [33].It can be observed that the proposed RSPP may still suffer from the SSS problem.Since an approximate decomposition of the data matrix has already been obtained, one can immediately apply the preprocessed approach or the exponential approach to overcome the SSS problem with a bit of extra works, which is the second contribution of this paper (see Figure 1).
The rest of this paper is organized as follows.In Section II, I briefly review the sparse representation models and the random sketching technique.In Section III, I propose a randomized sparsity preserving projection (RSPP) and two solutions to RSPP to overcome the SSS problem.In Section IV, I conduct some image recognition experiments on four real-world data sets.Finally, in Section V, I give some conclusions and future works.

II. PRELIMINARY
In this section, I briefly review the sparse representation models to capture the sparse structures of the raw data and the random sketching technique to compute an approximate low-rank decomposition of the data matrix.

A. SPARSE REPRESENTATION
In most existing DR methods, the local structures are typically modeled as a specific adjacency weight matrix.To construct a desirable weight matrix, Qiao et al. [1] suggests to encode the sparse structures straightforwardly where the sparse features have been found as a common characteristic in most linearized DR methods.
Denote x i ∈ R m be the i-th sample vector and set X = [x 1 , x 2 , • • • , x n ] ∈ R m×n , SPP suggests to reconstruct each sample x i using as few other samples as possible.To reach this goal, a series of ℓ 1 minimization problems called standard mode (SM) min should be solved for each sample x i , where T is the sparse reconstructive weight vector of x i , ∥ • ∥ 1 denotes the ℓ 1 norm of a vector, 1 ∈ R n denotes a column vector with all ones, and the superscript T denotes the transpose of its corresponding vector or matrix.Once each s i has been obtained, the weight matrix S of SPP can be given as Qiao et al. [1] pointed out that the obtained weight matrix S has some natural discriminative abilities even though no class labels are provided.
In many practical problems, the data is usually contaminated by noise.Therefore, the constraint x i = Xs i does not always hold exactly.Three type extensions have been introduced to fixed this problem.The first type called relaxed mode (RM) is defined as where ∥ • ∥ 2 denotes the ℓ 2 norm of a vector or matrix, ε > 0 is a given error tolerance.The second type called extended mode (EM) is expressed as where t i = x i − Xs i ∈ R m denotes the error.Finally, the third type called dual mode (DM) is shown as min where κ > 0 is a given upper bound.

B. RANDOM SKETCHING
Randomized algorithms usually contain two steps to obtain a desired approximate low-rank matrix decomposition of the given matrix.In the first step, one uses random sketching to determine a low-dimensional subspace which contains the most of the range of a given matrix.And in the second step, one projects the given matrix onto this obtained subspace, and then the image matrix is operated deterministically to get an approximate low-rank matrix decomposition.Specifically, for the data matrix X , I need to compute the following matrix product k+p) is the random sketching matrix, k is the target rank of X which is assumed to be much smaller than both m and n, and p is the oversampling parameter which is usually a small positive integer and given in advance.
To capture the most of the action of X , I then seek to construct a matrix Q ∈ R m×(k+p) whose columns form an basis for the range of Z .It implies that X ≈ QQ T X .Denote M = Q T X ∈ R (k+p)×n , I can immediately obtain an approximate low-rank matrix decomposition of X , namely, X ≈ QM .If a specific matrix decomposition to M is implemented, I can get a desired approximate matrix decomposition of X .Taking the SVD for example, assume that k+p) is diagonal whose diagonal elements are just the singular values of M sorted in a descending order.Therefore, I can obtain an approximate SVD of X , that is, To measure the approximation between X and QQ T X , Halko et al. [18] established the following results.Choose a target rank k, an oversampling parameter p and a standard Gaussian matrix , then where E denotes the expectation with respect to the random matrix , η > 1 is a constant, and σ k+1 (X ) is the (k + 1)-th largest singular value of X .
It can be noticed that the error ∥X − QQ T X ∥ 2 could be controlled very well in expectation if σ k+1 is small enough.Therefore, k is also called the numerical rank of X .In some cases, k is given beforehand, while in others, it is a part of the problem to be determined such that the error is small enough, that is, where ϵ > 0 is a given tolerance.The resulting problems are called the fixed-rank problem and the fixed-precision problem if the numerical rank k and the error tolerance ϵ are given beforehand, respectively.The randomized algorithms in details to generate Q in these two different problems can be found in [18] and the references therein.

III. RANDOMIZED SPP AND ITS SOLUTIONS
In this section, I first propose a randomized sparsity preserving projection which is called RSPP based on random sketching technique and then present a preprocessed approach and an exponential approach to deal with the SSS problem occurred in RSPP.

A. RANDOMIZED SPP
Inspired by the adaptive range finder proposed in [18] to solve the fixed-precision problem where the numerical rank k is assumed to be inestimable, and the energy ratio rule proposed in PCA [2] and lots of PCA preprocessed DR methods [1], [3], [7], [9], I give straightaway the following randomized algorithm to generate the matrix Q which is expected to capture the most of the action of data matrix X .
In Algorithm 1, the subscript F denotes the Frobenius norm of a matrix, and the notation [Q, V ] denotes the spliced matrix with Q and V .It can be noticed that where tr(X T X ) denotes the trace of X T X .Therefore, the scalar γ represents the sum of squares of all singular values of X and the scalar s represents the sum of squares of all singular values of M , or equivalently QM .Since QM is the approximate reconstruction of X , the stopping criterion s > ργ reveals roughly the energy ratio rule.Moreover, the block size r is a small positive integer which balances the computational cost and the reliability.It is obvious that the reduced dimension denoted by k, that is, the number of columns of Q is dependent on the energy ratio ρ.I should point out that the reduced dimension k described here and the target rank k mentioned above are obtained from two different ways although they may have some interconnections.With assumption that the data matrix X in the real-world problem is usually low-rank, the reduced dimension k is usually much smaller than both m and n.
Once the orthonormal matrix Q and the reduced matrix M are obtained via Algorithm 1, the ℓ 1 minimization problem (1) can be transformed approximately into the following ℓ 1 minimization problem called standard mode (SM) min where m i ∈ R k is the ith column of M .Since k is much smaller than m, the ℓ 1 minimization problem (6) could be solved efficiently, which would reduce the computational costs of generating the affinity weight matrix S sharply.Moreover, it can be noticed that the constraint m i = Ms i in (6) holds if the constraint x i = Xs i in (1) holds, however the converse is not true.Despite of this, I can assert that the constraint x i = Xs i would hold approximately, that is, x i ≈ Xs i if the constraint m i = Ms i holds, according to the error estimation (5) and the fact where e i ∈ R n is the ith column of the n order identity matrix I n .Recall the assumption that the data in many practical problems is usually contaminated by noise which implies that the constraint can only hold approximately, that is, x i ≈ Xs i , I suggest here to replace it by m i = Ms i , differing from three type extensions described briefly in (2), ( 3) and (4).In fact, the noise can be interpreted mathematically as very small singular values of X which would be eliminated automatically by Algorithm 1.Therefore, the minimization problem (6) provides a chance to get a better affinity weight matrix S for SPP.
In fact, I can also present three type extensions of (6) just like RM (2), EM (3) and DM (4) to SM (1).The only thing should be done is replacing X and x i by M and m i , respectively.The performances of these four modes can be seen in details in Section IV.
Once the sparse representation s i is computed by seeking the solution to the ℓ 1 minimization problem (6) where G = S + S T − S T S. Denote the dimension of the reduced subspace by d, then the linear transformation matrix W can be given as W = [w 1 , w 2 , • • • , w d ] where w i ∈ R m is an eigenvector corresponding to the ith largest eigenvalue of the GEP (7), and the low-dimensional representation matrix Y can be computed as Y = W T X .Finally, a numerical algorithm for RSPP can be summarized as follows.
In can be found that Algorithm 2 may still suffer from the SSS problem since the GEP (7) would be ill-posed if m > n.
In the rest of this section, I will discuss how to deal with the SSS problem skillfully and cheaply when the approximation X ≈ QM has been already obtained.

B. PREPROCESSED RSPP
Taking PCA as a preprocessor before solving the GEP (7) may be the most direct and simplest way to handle this problem.In fact, PCA is equivalent to the SVD mathematically and an approximate SVD can be obtained from X ≈ QM .Suppose that the approximate SVD of X can be given as ∈ R k×k is diagonal whose diagonal elements are the singular values of M which are also regarded as the approximations of those of X sorted in a descending order.Denote X = U T X , then the preprocessed RSPP can be solved by computing the eigenvectors {z i } d i=1 which correspond to the largest d eigenvalues of Finally, I can obtain the transformation matrix It can be noticed that X = U T X = Ũ T M and Ũ is orthogonal, thus the GEP ( 8) is equivalent to the following GEP where z = Ũ z.Therefore, I can get the transformation matrix and the reduced representation matrix Y = Z T M .In fact, the GEP (9) can also be obtained from the following minimization problem , which is reasonable in some sense since the weight matrix S computed actually from the problem (6) encodes the sparse structures of M .I should point out that the preprocessor described here is an approximation of the PCA preprocessor.However, it can outperform the PCA preprocessor in the terms of computing times at least since PCA is quite time-consuming in the situation that both m and n are large, where the proposed preprocessor based on the random sketching technique can perform very well as long as the the assumption that the data matrix X is low-rank holds.Finally, a numerical algorithm for the preprocessed RSPP can be summarized as follows.

C. EXPONENTIAL RSPP
It can be confirmed that the GEP ( 9) would be well-posed if M is full rank and its smallest singular value is bounded away from zero.However, there is no obvious evidence to show that M is full rank since it is just defined by M = Q T X , although there is no SSS problem occurs in all experiments conducted in Section IV.To be on the safe side, in this subsection I present an exponential approach to RSPP motivated by ESPP [32] which has been shown that it can handle with the SSS problem and outperform SPP in recognition accuracy rates.
According to the GEP (9), I formulate the model of exponential RSPP as the following standard eigenvalue problem (SEP) where τ > 0 is a proper scaling parameter to avoid the overflow or annihilation of the matrix exponential calculation.Let z i be the eigenvector corresponding to the ith smallest eigenvalue of the SEP (10), then I can get the transformation matrix W = QZ and the representation matrix Denote the spectral decompositions of MM T and MGM T by respectively, where U A , U B are both orthogonal and A , B are both diagonal, then Moreover, the scaling parameter τ can be chosen as suggested by ESPP [32], where λ A1 and λ B1 are the biggest values in A and B , respectively.Finally, a numerical algorithm for the exponential RSPP can be summarized as follows.

IV. EXPERIMENTAL RESULTS
In this section, I will implement some experiments on image recognition to illustrate the superiority of the proposed algorithms, that is, Algorithms 3 and 4 on some real-world image data sets.All experiments are conducted in MATLAB software with version R2019b on an Intel Core 2.71GHz PC with 16 GB memory under Windows 10 system.
In each examples, I will select l objects randomly from c classes to form the training data matrix X 1 ∈ R m×n where n = lc, and use the rest of the images to form the testing data matrix X 2 .I then use the training data matrix X 1 to obtain the adjacent weight matrix W and the corresponding reduced matrix Y 1 = W T X 1 by a certain method, and project the testing data matrix X 2 into the subspace spanned by W to obtain the corresponding reduced matrix Y 2 .Finally, I perform the image recognitions between Y 1 and Y 2 by using a nearest neighbor classifier [34].To obtain a credible numerical result, all experiments are performed 10 times randomly for means based on cross validation.It should be firstly pointed out that PRSPP-SM does not encounter the SSS problem in all experiments.I then show CPU time t 1 in seconds of computing the corresponding weight matrix and CPU time t 2 in seconds of implementing the whole process of image recognition by a certain algorithm which implies that t 2 contains t 1 in Table 1, where the symbol ''#'' means the corresponding experiment does not get a result in 24 hours.From Table 1, one can observe that t 1 occupies almost all (more than 99% in fact) of t 2 in all experiments, which implies that the work to reduce the computational costs of finding the weight matrix of SPP is quite meaningful and necessary.From the view of varying m, one can find that CPU times become larger and larger as the dimension m increasing, where SPP-EM has the largest growth followed by SPP-SM, SPP-DM and SPP-RM, and the proposed PRSPP-SM has the smallest growth.Moreover, PRSPP-SM consumes the least, which is less than one second even when m = 10, 000.Of course, it is partially due to the fact that n is relatively small in this example.However, a less CPU time of computing the weight matrix does not mean a better performance of image recognition.One should also take the accuracy rates of image recognition into consideration.To this purpose, I give the recognition accuracy rates [34] with respect to the reduced dimension d which is set to be the dimension k determined by Algorithm 1 in Table 2.
It can be observed from Table 2 that PRSPP-SM gets the largest accuracy rate for each m.Based on the observations from Tables 1 and 2, 1 may conclude that PRSPP-SM can not only reduce the computational costs but also get a better projected subspace for the Yale database.Moreover, I will only give the performances of SPP-RM and SPP-DM in the following examples for comparison, since both of them are highly competitive to the others in both CPU times and recognition accuracies.It can be confirmed that PRSPP-SM, PRSPP-RM, PRSPP-EM and PRSPP-DM still do not encounter the SSS problem in all experiments.I then give CPU times t 2 and the accuracy rates of image recognition in Tables 3 and 4, respectively.
It can be found from Table 3 that both PRSPP-EM and PRSPP-DM consume almost the same CPU times compared 2 http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html to PRSPP-SM, where the former is just a bit bigger and the latter is just a bit smaller, and all of these three are much smaller than both SPP-RM and SPP-DM.However, PRSPP-RM consumes the largest which is even inferior to both SPP-RM and SPP-DM.From Table 4, one can observe that PRSPP-RM, PRSPP-EM and PRSPP-DM can achieve higher accuracy rates than PRSPP-SM in most cases, while most of them can outperform both SPP-RM and SPP-DM especially for a bigger l.Moreover, both CPU times and accuracy rates grow as the increase of l for all algorithms from Tables 3 and 4. Based on the observations shown above, I will not give the numerical behaviors of PRSPP-RM in the following examples for comparison any more.
From the numerical results of Examples 4.1 and 4.2, one can find that PRSPP outperform SPP/PCA in both CPU times and accuracy rates no matter which sparse representation mode is taken.The most advantage of PRSPP is that it can compute the affinity weight matrix S efficiently by using the random sketching technique.In fact, PRSPP can be seen as an approximate version of SPP/PCA since the preprocessor in PRSPP is an approximation of the PCA preprocessor.However, the computational cost of obtaining an approximate SVD of X is much less than that of PCA since the former deals the problem with the size of k × n while the latter m × n.

B. PERFORMANCES WITH DIFFERENT ENERGY RATIO
In this subsection, I compare the performances of SPP/PCA and PRSPP with different energy ratio ρ on two real-world image databases.It can be confirmed that PRSPP-SM, PRSPP-EM and PRSPP-DM still do not encounter the SSS problem in all experiments.I then show CPU times t 2 and the accuracy rates of image recognition in Table 5 and Figures 6 and 7, respectively.
From Table 5, one can observe that the CPU times of PRSPP-SM, PRSPP-EM and PRSPP-DM grow as the increase of the energy ratio ρ, and PRSPP-DM consumes the least among them.Moreover, all of them are much less than those of SPP-RM in the MNIST database and those of both SPP-RM and SPP-DM in the COIL20 database.
From Figures 6 and 7, one can find that PRSPP-SM, PRSPP-EM and PRSPP-DM can all achieve high recognition accuracy rates, which are much higher than those of both SPP-RM and SPP-DM in all different reduced dimensions d, especially for a smaller one.Furthermore, in both databases PRSPP-DM achieves a bit higher accuracy rates than those of both PRSPP-SM and PRSPP-EM while SPP-DM achieves a significant higher accuracy rates than SPP-RM.
From the numerical results of Example 4.3, one can find that PRSPP outperform SPP/PCA in both CPU times and accuracy rates for all cases with different energy ratio ρ.Of course, the performances of PRSPP is heavily dependent on the choice of ρ, and the best choice of ρ is surely dependent on the image database.To be on the safe side, one can take ρ as 0.95 or 0.98 when facing new databases just like the PCA preprocessor does.Moreover, I will only give the performances of SPP-DM and PRSPP-DM in the next example, since they seem to be the most competitive ones in most cases.

C. PERFORMANCES OF THE EXPONENTIAL APPROACH
In this subsection, I compare the performances of SPP/PCA, ESPP, PRSPP and ERSPP on four real-world image databases.
Example 4.4: In this example, I consider four image databases mentioned above, that is, Yale, ORL, MNIST and COIL20.The main information of them used here are listed in Table 6, where n test denotes the number of testing samples.
To show the performances of ERSPP (Algorithm 4), I conduct four experiments with only one sparse representation mode DM and four different energy ratio ρ shown in Table 6 on four image databases.For comparison, I also give the performances of SPP-DM, ESPP-DM and PRSPP-DM, where SPP-DM keeps 98% energy in the PCA preprocessing step, ESPP-DM only adapts the asymmetric algorithm [32] and PRSPP-DM shares the same ρ with ERSPP-DM.The block size r in Algorithm 1 is set to be 5, and the reduced dimension d of both ERSPP-DM and PRSPP-DM are set to be the dimension k determined by Algorithm 1.The other parameters are kept the same as in Example 4.1.I give CPU times t 2 and the accuracy rates of image recognition on each database in  From Table 7, one can observe that the exponential approaches of both SPP-DM and PRSPP-DM only pay a bit    Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
not happen in all my examples but also obtain higher accuracy rates than PRSPP just with a bit of extra work.
From all numerical results of the experiments in this section, I may conclude that RSPP and its two solutions can perform better than both SPP and ESPP, especially in the terms of CPU times when both m and n are large.

V. CONCLUSION
In this paper, I propose a randomized sparsity preserving projection (RSPP) to reduce the computational costs of evaluating the affinity weight matrix in SPP, and two efficient solutions to RSPP to overcome the SSS problem occurred in RSPP.Two main contributions of my work can be summarized as 1) I derive a randomized dimensionality reduction method, that is, RSPP based on the random sketching technique, which can obtain an approximate low-rank matrix decomposition of the data matrix and then solve the large-scale ℓ 1 minimization problems efficiently to obtain the affinity weight matrix.2) I design two effective solutions, that is, PRSPP and ERSPP to RSPP to deal with the SSS problem by using the already obtained approximate matrix decomposition with a bit of extra work.Both of them can not only reduce the computational costs sharply but also reach higher accuracy rates of image recognition from the experimental results.
The ideas of PRSPP and ERSPP can be applied to the extensions of SPP and the other linearized DR methods along with their extensions to obtain better recognition accuracies for some much larger and more complicated real-world data sets.This is one of my future works.However, I should point out that only the Gaussian random sketching matrix has been considered in this paper since it is easy to implement and has elegant error estimation.However, Gaussian matrix will not preserve some structures or properties of the raw data.Taking sparsity for example, the projected data matrix M will not be sparse any more even if the original data matrix X is.At this moment, one should consider some other structured random sketching matrix such as subsampled randomized Hadamard transform (SRHT) [35] or subsampled randomized Fourier transform (SRFT) [36].This is another one of my future works.

Algorithm 1
Adaptive Range Finder With Fixed Energy Ratio Input: Data matrix X , energy ratio ρ and block size r.Output: Orthonormal matrix Q with k columns and reduced matrix M such that X ≈ QM .1. Compute γ = ∥X ∥ 2 F , and set Q = [], M = [] and s = 0; 2. Generate standard Gaussian random matrix with r columns, and compute B = A ; 3. Compute B := B − QQ T B and its orthonormal basis V of B by the economic QR decomposition; 4. Set Q

2 ,
for each projected sample m i , the adjacent weight matrix S can be written as the form of S = [s 1 , s 2 , • • • , s n ] T .In order to preserve the sparse structures encoded in S, I can impose the following minimization problem min w n i=1 w T x i − w T Xs i 2 which can be immediately transformed into the following generalized eigenvalue problem (GEP) with an additional constraint w T XX T w = 1,

Algorithm 2
Randomized SPP (RSPP) Input: Data matrix X , energy ratio ρ, block size r and reduced dimension d.Output: Transformation matrix W and representation matrix Y .1. Implement Algorithm 1 with X , ρ and r to obtain Q and M ; 2. Compute the weight matrix S by seeking the solution to the problem (6); 3. Seek the eigenvectors {w i } d i=1 which correspond to the largest d eigenvalues by solving the GEP (7); 4. Set W = [w 1 , w 2 , • • • , w d ], and compute Y = W T X .

Algorithm 3
Preprocessed RSPP (PRSPP) Input: Data matrix X , energy ratio ρ, block size r and reduced dimension d.Output: Transformation matrix W and representation matrix Y .1. Implement Algorithm 1 with X , ρ and r to obtain Q and M ; 2. Compute the weight matrix S by seeking the solution to the problem (6); 3. Seek the eigenvectors {z i } d i=1 which correspond to the largest d eigenvalues by solving the GEP (9); 4. Set Z = [z 1 , z 2 , • • • , z d ], and compute W = QZ and Y = Z T M .

FIGURE 2 .
FIGURE 2. Samples of the first individual from the Yale face database.

Algorithm 4
Exponential RSPP (ERSPP) Input: Data matrix X , energy ratio ρ, block size r and reduced dimension d.Output: Transformation matrix W and representation matrix Y .1. Implement Algorithm 1 with X , ρ and r to obtain Q and M ; 2. Compute the weight matrix S by seeking the solution to the problem (6); 3. Perform the spectral decompositions (11) of MM T and MGM T ; 4. Choose the scaling parameter τ as (13), and compute the matrix exponential (12); 5. Seek the eigenvectors {z i } d i=1 which correspond to the smallest d eigenvalues by solving the SEP (10); 6. Set Z = [z 1 , z 2 , • • • , z d ], and compute W = QZ and Y = Z T M .
A. PERFORMANCES WITH DIFFERENT SPARSE REPRESENTATION MODES In this subsection, I compare the performances of both SPP/PCA and PRSPP with different sparse representation modes by two examples.Example 4.1: In this example, I consider the Yale face database 1 which contains 165 images of c = 15 individuals where the 11 samples of the first individual are shown in Figure 2, and set l = 6 which means n = 90.I perform four different experiments with respect to the number of rows, that is, m = 400, m = 625, m = 2500 and m = 10, 000, to show the performances of the proposed PRSPP (Algorithm 3, abbreviated to PRSPP-SM in this section) where the energy ratio ρ and the block size r are chosen as 0.95 and 10, respectively.For comparison, I also give the performances of SPP by keeping ρ = 0.95 energy in the PCA preprocessing step (abbreviated to SPP/PCA or SPP) based on different sparse representation modes, that is, SM (1), RM (2), EM (3) and DM (4), where the error tolerance ε in (2) and the upper bound κ in (4) are given as 0.05 and 5, respectively.

FIGURE 3 .
FIGURE 3. Samples of the first individual from the ORL face database.

Example 4 . 2 :
In this example, I consider the ORL face database 2 which contains 400 images of c = 40 individuals with cropped size m = 56 × 46.The 10 samples of the first individual are shown in Figure 3.To show the performances of PRSPP combing with different sparse representation modes, that is, SM (6), RM (2), EM (3) and DM (4) where X and x i in the last three modes are replaced by M and m i respectively, I implement four different experiments with respect to the number of training samples from each class, namely, l = 5, 6, 7, 8, which means n = 200, n = 240, n = 280 and n = 320.For comparison, I also give the performances of SPP-RM and SPP-DM.The other parameters in these algorithms are set as those in Example 4.1.

Example 4 . 3 :
In this example, two image databases are considered.The first one called the MNIST database 3 contains 60,000 training handwritten digits and 10,000 testing handwritten digits with cropped size m = 28×28, and the second one called the COIL20 image database4 contains 1,440 images of c = 20 distinct subjects with cropped size m = 64×64.The first 20 handwritten digits from the training

FIGURE 4 .
FIGURE 4. The first 20 training samples from the MNIST handwritten digits database.

FIGURE 5 .
FIGURE 5. Twenty subjects from the COIL20 image database.

FIGURE 6 .
FIGURE 6. Recognition results on the MNIST database.

FIGURE 9 .
FIGURE 9. Recognition results on the ORL database.

FIGURE 10 .
FIGURE 10.Recognition results on the MNIST database.

FIGURE 11 .
FIGURE 11.Recognition results on the COIL20 database.

TABLE 1 .
CPU times on the Yale database with different m.

TABLE 2 .
Accuracy rates on the Yale database with different m.

TABLE 3 .
CPU times on the ORL database with different l .

TABLE 4 .
Accuracy rates on the ORL database with different l .

TABLE 5 .
CPU times on the MNIST and COIL20 databases with different ρ.
FIGURE 7. Recognition results on the COIL20 database.

TABLE 6 .
Main information of four image databases.Recognition results on the Yale database.

TABLE 7 .
CPU times on four image databases.

Table 7 and
Figures 8,9, 10 and 11, the exponential approach of RSPP (ERSPP) can not only overcome the SSS problem on the safe side although it does