An Orthogonal Locality and Globality Dimensionality Reduction Method Based on Twin Eigen Decomposition

Dimensionality reduction is a hot research topic in pattern recognition. Traditional dimensionality reduction methods can be separated into linear dimensionality reduction methods and nonlinear dimensionality reduction methods. Linear dimensionality reduction methods usually utilize Euclidean distances to explore global geometric structures, and nonlinear dimensionality reduction methods can preserve local manifold structures in learned low-dimensional subspaces. However, redundant information and noises of the raw high-dimensional data restrict the classification performance of these methods. To solve the problem, we propose a novel orthogonal dimensionality reduction method based on twin eigen decomposition called orthogonal locality and globality preserving projections (OLGPP). Orthogonality, as a commonly used criterion in pattern recognition, is insensitive to the redundant information and the noises. OLGPP not only combines the advantages of global Euclidean structures and local manifold structures, but also is insensitive to the redundant information and noises of the raw high-dimensional data by embedding the orthogonality criterion. Additionally, the objective function based on twin eigen decomposition can be solved sequentially to obtain analytical solutions. On four well-known datasets, we design extensive experiments to evaluate the performance of OLGPP. From the experimental results, OLGPP has the optimal experimental results, and its average recognition rates are about 10% higher than the classic local dimensionality reduction method (i.e., locality preserving projection), which proves OLGPP is an effective dimensionality reduction method.


I. INTRODUCTION
With the rapid development of electronic information technology, the rapid increase of data generation and storage methods has led to the accumulation of massive data in many fields such as social media, knowledge graph, intelligent search engine and so on. Generally speaking, these data are usually accompanied by high-dimensional problems and complex structures [1]. If tasks such as classification and clustering are directly performed on these data, not only will tasks pay huge computational cost and storage cost, but also the curse of dimensionality [2] will appear. Especially in The associate editor coordinating the review of this manuscript and approving it for publication was Yongjie Li. some fields of pattern recognition [3], machine learning [4] and neural networks [5], [6], how to deal with the massive data has become a problem with practical application significance. At present, dimensionality reduction (DR) [7], [8] is one of the effective methods to solve this problem. DR aims to reduce redundant information of raw high-dimensional data and retain effective information of the data as much as possible. And the complexity and dimensionality of raw highdimensional data can be decreased while retaining intrinsic feature information by using DR methods.
In practical applications, principal component analysis (PCA) [9] and linear discriminant analysis (LDA) [10] are two representative linear DR methods. PCA attempts to find the optimal projection directions by maximizing the variance matrix of samples. However, PCA does not consider class label information [11] of samples, which makes it different for PCA to perform well on classification tasks in most cases. Unlike PCA, LDA takes into accounts the class label information. LDA seeks to learn a set of projection directions by maximizing the inter-class scatter matrix and simultaneously minimizing the intra-class scatter matrix. The research results from subspace linear discriminant analysis (SLDA) [12] demonstrates that LDA exhibits favorable inter-class separability and intra-class aggregation in the learned projection subspace. When the complexity and dimensionality of data increase, the distribution rule of data is often nonlinear and more feature information are hidden in local nonlinear structures as pointed out in [13]. PCA and LDA usually play a significant role in some fields such as sentiment analysis [14], image segmentation [15] and memristive neural networks [16]. However, PCA and LDA only preserve the global linear structures in the learned projection subspace, which leads to poor classification performance.
To make up for the deficiency of linear DR methods, scholars had proposed many nonlinear DR methods, for instance, local linear embedding (LLE) [17], laplacian eigenmaps (LE) [18], multi-dimensional scaling (MDS) [19], isometric mapping (Isomap) [20], hessian LLE (HLLE) [21] and so on. Among them, LLE and Isomap are the most classical nonlinear DR methods. To achieve the purpose of retaining the sample manifold structures in the local neighborhood, LLE constructs a local reconstruction weight matrix of sample points and minimizes a reconstruction error in the learned projection subspace. Isomap is a method derived from MDS, which preserves the intrinsic geometric structures information of local manifold structures in the process of DR. Specifically speaking, the detailed ideas are as follows: (1) firstly, setting k−nearest neighbor parameter and constructing the nearest neighbor connection graph; (2) secondly, using the shortest path algorithm (Dijkstra algorithm [22] or Floyd algorithm [23]) to calculate the shortest path between two points on the graph; (3) finally, utilizing MDS method to obtain the projections in the learned projection subspace. In addition, since the shortest path algorithm will cause a huge computational cost, parallel technology [24], [25] can be used to improve the computational burden for DR. Unfortunately, the out-of-sample problem [26] exists in these methods. When the methods are applied to new data, these methods will have weak performance. To overcome this shortcoming, locality preserving projection (LPP) [27] and neighborhood preserving embedding (NPE) [28] are proposed based on the idea of traditional methods. There are some similarities between LPP and NPE. Both of them attempt to discover local nonlinear manifolds hidden in raw high-dimensional data. The differences are as follows: LPP constructs a Laplacian graph to keep the local structures information to find the projection directions in the learned projection subspace; NPE obtains the corresponding projection direction by calculating the edge weight between samples to retain the local structures information. Furthermore, LPP and NPE suffer from the small sample size (SSS) problem [29], which makes inferior performance. To avoid the interference of the SSS problem, kernel LPP (KLPP) [30] and kernel NPE (KNPE) [31] were proposed based on kernel technology. Many experimental results indicate that KLPP and KNPE have better classification performance compared with LPP and NPE. Besides, sample-dependent LPP (SLPP) [32] utilizes the similarities between sample pairs to settle the SSS problem, which is also a feasible method.
Combining the advantages of global Euclidean structures and local manifold structures, Zang et al. [33] proposed elastic preserving projections (EPP). EPP can preserve the global geometry structures and the local manifold structures into the learned projection subspace. However, due to the complexity of data distribution and the influence of noises, EPP will be difficult to fully explore the local manifold structures. In pattern recognition, orthogonality is a commonly used criterion. As [34], [35] pointed out, orthogonality criterion can be less sensitive to the complexity of data distribution and the influence of noises. And the orthogonality criterion can also enhance the extraction ability of local manifold structures. In this paper, to overcome the effect of noises and the complexity of data distribution and obtain more discriminative low-dimensional features, we propose a novel DR method named orthogonal locality and globality preserving projections (OLGPP). By constructing local undirected neighborhood graph and global geometric graph, OLGPP effectively utilizes the local manifold structures and the global Euclidean structures. Meanwhile, OLGPP optimizes the ratio between two structures so that the data mapped in the learned projection subspace will have more effective and comprehensive intrinsic feature information. By introducing the orthogonality constraints, OLGPP can have favorable discriminative power and can obtain more intrinsic local structures of the raw data. Experiments are designed to estimate the effectiveness of OLGPP on several real-world datasets, including Semeion handwritten digit dataset, COIL-20 object image dataset, ORL face image dataset and UMIST face dataset. The experimental results demonstrate that OLGPP always has better recognition ability under the same conditions. The rest of this paper is as follows. Section II reviews some related methods. The next section introduces the construction and optimization of OLGPP in detail. Then we discuss the singularity problem in section IV. Section V presents and analyzes the experimental results. The paper is summarized in the final section.

II. REVIEW OF RELATED WORK
Supposing that X = [x 1 , x 2 , x 3 , . . . , x n ] ∈ R m × n consists of n samples, where x i (i = 1, 2, 3, . . . , n) is the m-dimensional column vector and each column vector represents a sample. The main goal of DR methods is to find an optimal mapping: R m → R d (d < m) for preserving the raw highdimensional data information structures in the learned projection subspace as much as possible. Generally speaking, we can further describe this mapping as y i = P T x i , P ∈ R m×d , where P represents the projection matrix obtained by the DR methods. y i (i = 1, 2, 3, . . . , n) represents the lowdimensional features of x i and d is the dimensionality of y i (i = 1, 2, 3, . . . , n).

A. PRINCIPAL COMPONENT ANALYSIS
PCA, as a mainstream DR method, has been widely used in data processing. Assuming that the sample set X is centralized, i.e., n i=1 x i = 0. PCA intends to find a set of projection directions P = [ p 1 , p 2 , p 3 , . . . , p d ] maximizing the variance between the samples. From the theory of PCA, the following formulation can be obtained: max tr(P T XX T P) Using the Lagrange multiplier method [36], the following generalized eigenvalue problem can be given: By solving (2), the projection directions P can be given by LPP is a classical linear DR method based on the k−nearest neighbor method. The local graph is further constructed to retain the local manifold structures of the raw high-dimensional data. The corresponding nearest neighbor weight matrix is as follows: and t ∈ (0, +∞) is a kernel parameter. The objective function of LPP can be given as follows: By substituting y i = P T x i in (4), we can obtain further optimization criterion for LPP: subject to tr(P T XDX T P) = I where S is the nearest neighbor weight matrix, D is the diagonal matrix, and L = D − S is a Laplacian matrix. Each element of D corresponds to the sum of each row vector of S. The optimization criterion of LPP can be turned into the following solvable eigenvalue problem by using the Lagrange multiplier method XLX T P = λXDX T P When the equivalent eigenvalue problem is solved, the eigenvectors corresponding to the first d largest eigenvalues can be obtained and corresponding low-dimensional features Y can be defined as Y = [y 1 , y 2 , y 3 , . . . , y n ], where y i = P T x i (i = 1, 2, 3, . . . , n).

III. ORTHOGONAL LOCALITY AND GLOBALITY PRESERVING PROJECTIONS AND ITS OPTIMIZATION
In practical applications, the raw high-dimensional data is frequently complex and the distribution rule of the data is generally nonlinear so that intrinsic feature information often exists in the local manifold structures. Meanwhile, these data are often accompanied by large redundant information and noises. Most traditional DR methods do not pay attention to these manifold structures, and the existence of redundant information and noises also makes some traditional methods unable to absolutely achieve the feature information of the global and local structures. To enhance the ability of feature extraction and obtain more intrinsic local structures, we propose a novel DR method named OLGPP. Thanks to the orthogonality criterion, our method has better robustness and feature extract ability so that it can discover more efficient manifold structures and earn more comprehensive intrinsic feature information.
OLGPP combines the merits of traditional linear and nonlinear methods with the orthogonality criterion. Compared with the traditional methods, our methods can not only preserve the elasticity between samples, but also can reduces the sensitivity to the noise problems. Our method can be divided into the following parts. First, we construct a local undirected neighborhood graph and further formulate the local structure objective function for preserving the local manifold structures. Second, we describe the global Euclidean structures by constructing a global geometric graph and then give the global structure objective function for keeping the global Euclidean structures. Finally, combined with the orthogonality criterion, we present the complete objective function, which considers the global and local structures and give a twin eigen decomposition method [34] to solve this problem

A. LOCAL UNDIRECTED NEIGHBORHOOD GRAPH CONSTRUCTION AND ITS OBJECTIVE FUNCTION
In order to preserve the local manifold structures, we adopt the k−nearest neighbor method to construct the local undirected neighborhood graph Glocal. We can get this graph by using the following method: if x j belongs to one of the k−nearest neighbors of x i , put an undirected edge between x i and x j . The corresponding weights of Glocal are described as 55716 VOLUME 9, 2021 According to the corresponding linear transformation method, we give the local structure objective function as following: where L local = D local − S local , D local is the diagonal matrix, and the elements on the diagonal are the sum of the elements in each row of Slocal. That is, D = diag( n j=1 S1j, n j=1 S2j, n j=1 S3j, . . . , n j=1 Snj). Minimizing the objective function can acquire the more compact local features.

B. GLOBAL GEOMETRIC GRAPH CONSTRUCTION AND ITS OBJECTIVE FUNCTION
Inspired by the Undirected Complete Graph, we construct the global geometric graph Gglobal to describe the global Euclidean structures. In other words, there is an undirected edge between any two points in the graph. The corresponding weights of Gglobal are defined as Similarly, the global structure objective function can be given as follows: where Sglobal is the nearest neighbor weight matrix, Dglobal is the diagonal matrix, and Lglobal = Dglobal − Sglobal is the Laplacian matrix. Each element of Dglobal corresponds to the sum of each row vector of Sglobal. The purpose of maximizing the objective function is to obtain more comprehensive global structure information.

C. ORTHOGONAL LOCALITY AND GLOBALITY PRESERVING PROJECTIONS OBJECTIVE FUNCTION
From Section III.A and Section III.B, it can be learned that we should minimize the objective function tr(P T X L i,j local X T P) so that the local manifold structures can be used to explore the intrinsic local features. Meanwhile, in order to keep the global Euclidean structures, we are necessary to maximize the objective function tr(P T X L i,j global X T P). Inspired by [37], we use a method based on twin eigen decomposition to solve our objective function. According to the idea of this method, the first projection direction is required and then the other projection directions are solved sequentially. Therefore, the final objective function can be provided in two parts: where α ∈ (0, 1) is a trade-off parameter used to balance the local manifold structures and the global Euclidean structures.
In addition, to ensure the uniqueness of the solutions, we further add constraints: In conclusion, the first projection direction can be acquired by solving model 1 and then the kth (2 ≤ k ≤ d) orthogonal projection direction can be given from model 2.

D. OPTIMIZATION
To acquire the first orthogonal projection direction and the kth orthogonal projection direction, we respectively give two theorems to solve model 1 and model 2. From the Theorem 1, the first orthogonal projection direction can be obtained accurately and the kth orthogonal projection direction can be gotten sequentially according to Theorem 2.
Theorem 1: Let p is the solution of the following generalized eigenvalue function, then p is also the first orthogonal projection direction of OLGPP: where Proof: By using the Lagrange multiplier method, (14) can be transformed into the solution problem as follows: where λ is a Lagrange multiplier. Setting the partial derivative of L(p) with respect to p and letting it to zero, we can get Obviously, we multiply both sides of (20) by XH dd X T −1 to get (16), which means the Theorem1 is established. VOLUME 9, 2021 Theorem 2: Let P is a set of eigenvectors corresponding to the maximum eigenvalue of the following solvable eigenvalue problem, then P is also the kth orthogonal projection directions: Proof: By means of the Lagrange multiplier method, (15) can be transformed into the following solution problem: (24) where λ and σ i are Lagrange multipliers, set Taking the partial derivative of L(p) with respect to p and setting it to zero, we can get Multiplying the left side of (25) by p T 1 (XH dd X T ) −1 , p T 2 (XH dd X T ) −1 , . . . , p T k−1 (XH dd X T ) −1 respectively, and the following formulation can be further obtained. For the convenience of calculation, the variables of each equation in the (26), as shown at the bottom of the page, can be combined in the form of a matrix, so that we can get the equation as follows From (27), we can further derive the expression of the Lagrange multipliers σ (k−1) from the known variables At present, the expression of σ (k−1) is substituted into (25) to obtain  (8) and (10) respectively 2 Obtain L global and L local by (9) and (11) 3 Obtain the first orthogonal projection direction p by using the (16) 4 for k = 2,3,4, . . . , d do 5 computer the k projection direction from solving (21) 6 end for 7 Obtain orthogonal projection matrix P = p 1 , p 2 , p 3 . . . , p d After optimizing (29) according to (22), we can get the final formulation Above all, the Theorem 2. can be proved. Therefore, the k projection direction can be obtained by solving (29). In Algorithm 1, we show our OLGPP method in detail.
By solving model 1 and model 2, we can get the orthogonal projection matrix P = p 1 , p 2 , p 3 . . . , p d corresponding to the first d largest eigenvalues. According to y i = P T x i , P ∈ R m × d , the orthogonal features of the sample set X can be obtained as Y = P T x 1 , P T x 2 , P T x 3 , . . . , P T x n , Y ∈ R d × n . Finally, we utilize the nearest classifier to obtain the final recognition results.

E. TIME COMPLEXITY ANALYSIS
The time complexity of our method is not very high in comparison with other related methods. In essence, the formulation of our method is a generalized eigenvalue problem. In other words, we can get the projection matrix by solving this problem. Specifically, the most computational step of OLGPP is step 1 and step 5. According to the Algorithm 1, time complexity of step 1 is O(n 2 m+m 2 n), and time complexity from step 5 is O((d − 1)m 2 ). So the total time complexity is O(n 2 m + m 2 n) + O((d − 1)m 2 ).

IV. DISCUSSION ON SINGULARITY
In practical applications, the SSS problem is a common problem. When the SSS problem occurs, the number of samples n is lower than the dimensionality of data m. In this section, we discuss the singularity of constraint matrix 55718 VOLUME 9, 2021 X (D local − Dglobal)X T when the SSS problem exists in our method.
(a) if the diagonal elements of Dlocal − Dglobal do not contain zeros, we can give some conclusions as follows: According to the property of the rank of matrix product, i.e., rank(AB) ≤ min(rank(A), rank(B)), we have Since rank(D local − Dglobal) = n, so we can further get rank((D local − Dglobal)X T ) ≤ n.
Above all, we can prove (b) if the diagonal elements of Dlocal − Dglobal contain zeros, some conclusions can be gotten as follows: Similarly, according to the property of the rank of matrix product Now rank((D local − Dglobal)X T ) < n, we can have In summary, rank(X (D local − Dglobal)X T ) is always less than n when the SSS problem exists, which means the constraint matrix is singular. In order to avoid the influence of the SSS problem, we can adopt two methods to process the data: (1) utilizing PCA to process the original data; (2) adopting regularization techniques to transform the original generalized eigenvalue decomposition problem is into a solvable problem.

V. EXPERIMENTS AND ANALYSIS
To estimate the performance of OLGPP, we design experiments [38] on the Semeion handwritten digit dataset, COIL-20 object image dataset, UMIST face dataset and ORL face image dataset. To avoid the SSS problem from influencing the final experimental results, we use PCA to reduce the dimensionality of each sample to 100. As in [39], we select the kernel parameter from a series of t = [0.2l, 0.4l, 0.6l, 0.8l, l, 2l, 4l, 6l, 8l, 10l] , where l is average Euclidean distance of all samples. For the trade-off number α and k−nearest neighbor parameters, they are set to 0.05 and 5 respectively according to [37], and all comparison methods adopt the same parameter settings if these parameters are involved. Moreover, in order to avoid the influence of the sample randomness on the final experimental results, we conduct random experiments ten times independently and take the average of ten random experiments to computer recognition rates. Additionally, all the recognition rates shown in the Section V are obtained by using the nearest classifier.

A. EXPERIMENTS ON SEMEION HANDWRITTEN DIGIT DATASET
There are 1593 handwritten digits on the Semeion handwritten digit dataset, which are composed of about 80 persons who have handwritten them twice. The first is to write numbers in a normal way (accurately write down all numbers from 0 to 9), and the second is to write numbers quickly (ignoring accuracy). In this section, we randomly elect χ(χ = 30, 40, 50, 60) images from per class of dataset as training samples and the other images are used as testing samples. Such random selections are executed ten times, and Table 1 lists the average recognition rates of ten random experiments.
LDA is a DR method that takes consideration of the global Euclidean structures, while LPP and NPE are DR methods to explore local manifold structures hidden in the raw data. Table 1 shows that the average recognition rates of LDA is better than these of LPP and NPE when the number of training samples is relatively small. However, the recognition ability of LPP and NPE is significantly better than LDA with the number of training samples increases. This also indicates that in the complex high-dimensional data, the distribution rule of the data is nonlinear, and effective feature information generally exists in local structures. If we still adopt linear DR methods to process these data, intrinsic feature information will be lost in the process of DR, which is one of the reasons for low recognition rates. Compared with LPP and SLPP, EPP considers the local structures and the global structures. When reconstructing data in the learned projection subspace after dimensionality reduction, it retains more structure information in the learned subspace, which is also the reason why EPP in the Table 1 has always higher recognition rates than other comparison methods. By introducing the orthogonality criterion, our method can further capture the intrinsic local and global structures of the raw data and it is one of the reasons why our method has the best average recognition rates. In Table 1, our method outperforms other comparison methods in the recognition rates, which also proves the advantage of our method.
Besides, to more intuitively demonstrate the advantages of the method we proposed, Figure 1 reveals the variation of the average recognition rates in different dimensionalities. It can be observed from Figure 1 that when the dimensionality of samples is small, the increasing trend of LDA's average recognition rates is obviously faster than that of LPP, NPE and SLPP. This phenomenon proves that the linear DR methods considering the global Euclidean structures can effectively improve the classification performance. Benefited by the local and the global structures, the average recognition rates of EPP are superior to these of LPP, NPE and SLPP from the beginning to the end and it also indicates that EPP can preserve the two structures effectively and reconstruct the feature information in the learned projection subspace. As the number of the dimensionality increases, the average recognition rates of all methods also begin to increase. And the recognition rates of our method exhibit the fastest increasing trend when the dimensionality increases gradually, which also indicates that our method can be effectively applied to the high-dimensional data. Different from other methods, the curve of SLPP grows slowly. However, the average recognition rates of all the methods tend to be stable when the dimensionality increases to a certain value, and our method appears a smoother change trend than the compared method. It also illustrates that our method has better robustness.

B. EXPERIMENTS ON COIL-20 OBJECT IMAGE DATASET
The COIL20 object image dataset collects images of 20 kinds of objects, in which one image is taken every 5 • rotation of each category. Therefore, there are 72 images in each object, and 1440 object images are collected in the whole dataset. Figure 2 shows a positive image of 20 objects. χ(χ = 6,7,8,9) images are selected randomly from per class as training as samples and the other images are regarded as testing samples. We conduct ten such random experiments repeatedly and the average recognition rates are listed in Table 2. From Table 2, it can observe that the recognition   rates of all methods present an upward trend with the increase of training samples, and the average recognition rates of EPP consistently outperform those of LPP, NPE and SLPP, while our method always maintains the optimal recognition rates. Different from the Semeion dataset, NPE has the lowest average recognition rates in Table 2. Furthermore, the standard deviation of OLGPP is generally lower than that of other methods in Table 2. From the viewpoints of mathematical meaning, standard deviation reflects the degree of dispersion between objects in a group. In other words, it means that the recognition rates obtained in each random experiment are extremely close to the average recognition rates and there are few differences between the recognition rates, which just proves that our method has favorable stability for different samples.

C. EXPERIMENTS ON UMIST FACE DATASET
The UMIST face dataset consists of 20 persons with a total of 564 images, and each person is collected a series of posture images from the side view to the front view. In our experiments, we randomly choose χ (χ = 6,7,8,9) images from each class of the dataset as training samples while the remaining images are regarded as testing samples. In Table 3, we tabulate the average recognition rates for ten random experiments. From Table 3, it can be learned that the overall change trend in the average recognition rates is the same as that in the COIL20 image dataset, and NPE always performs the worst classification performance in the all methods. Because of considering the global and local structures, the average recognition rates of the EPP method is optimal in the four compared methods, and our method still maintains the best classification performance.

D. EXPERIMENTS ON ORL FACE IMAGE DATASET
The ORL face image dataset takes face images of 40 persons in a dark and uniform background. Each person is collected respectively 10 images at different times, different illumination, different facial expressions, and different facial details.   We randomly select χ(χ = 3, 4, 5, 6) images from per class as training samples while the remaining images are treated as testing samples and repeat ten time, and we tabulate the average recognition rates in Table 4. Similar to the experiment results on COIL20 and UMSIT datasets, the recognition ability of NPE is the worst in Table 4. On the ORL face image dataset, our method always has the optimal classification performance, and the standard deviation is consistently lower than other methods.

E. EXPERIMENTAL RESULTS ANALYSIS
From all the experiment results in Section V, some conclusions can be given as follows: (1) the recognition ability of all the methods increases quickly as the number of training samples increases, which indicates abundant training samples play an important role in the improvement of recognition ability; (2) compared with linear DR methods, nonlinear methods can discover the intrinsic local structures hidden in the raw high-dimensional data effectively, which are beneficial to DR; (3) our method integrated with the orthogonality criterion always has low the standard deviation, which means that our method has favorable robustness and can be capable of handling the complex high-dimensional data and noises in the datasets; (4) in all the experiments, our method has the optimal experimental results, which demonstrates that the method proposed is a feasible DR method.

F. THE IMPACT OF THE SAMPLE RANDOMNESS
In the experiments, the recognition rates in the above tables are the average values computing ten random independent experiments. To more intuitively illustrate whether the sample randomness affects the recognition rates, in the form of histograms, we enumerate the recognition rates in each random experiment on the COIL20 and the ORL datasets respectively when images are selected as training samples in VOLUME 9, 2021 per class. By analyzing Figure 3 and Figure 4, we can get some conclusions: (1) in each random experiment, the recognition rates of all the methods has changed, but the change trend of our method is not obvious. In terms of the overall change trend, compared with other methods, the change trend of our method is relatively gentle, and almost close to a fixed value, which is the reason why the standard deviation of our method is smaller than that of other methods in Table 2 and  Table 4; (2) our method has the highest recognition rates in every random experiment, which further illustrates the method feasibility.

G. THE IMPACT OF KERNEL PARAMETER
In our objective function, we are necessary to use a kernel parameter t to computer the weights of the global geometric graph and the local undirected neighborhood graph. The size of the kernel parameter directly affects the construction of the two graphs, which means that the kernel parameter may influence the global and local structures preserved in the learned subspace. In this section, we analyze whether the kernel parameters affect the recognition ability of our method. In Figure 5, we illustrate the best recognition rates of different parameters when χ images are selected for training in all experimental datasets. As can be seen from Figure 5, when the number of the training samples is fixed, the recognition rates corresponding to different parameters have little change, which indicates that the kernel parameter rarely affects the classification performance. Simultaneously, we can observe the common change trend on the four datasets: the recognition rates of our method will increase as the number of training samples increases. Associated with Table 1-4, we can conclude that the classification performance of OLGPP is hardly influenced by the kernel parameter, and our method has strong robustness to the kernel parameter. Besides, OLGPP is more sensitive to the number of training samples and increasing the number of training samples is a practicable way to improve the recognition rates.

VI. CONCLUSION
In this paper, a novel DR method named OLGPP is proposed. In our method, the local manifold structures and the global Euclidean structures are utilized to explore the intrinsic structure information hidden in the raw high-dimensional data and the discriminate power of low-dimensional orthogonal features will be further enhanced by the orthogonality criterion. Furthermore, our method is also insensitive to the noises, which is conducive to the recognition ability. By constructing the orthogonal objective function and solving this function sequentially, low-dimensional orthogonal features are obtained. Extensive experimental results on several datasets demonstrate that our method is an effective DR method.
In addition, some the raw high-dimensional data also contains class label information. As reported in [11], classification performance can be improved from using the class label information. Our method is an unsupervised method, and our future work will focus on the embedding of class label infor-mation for further improving the classification performance of our method.