New Variants of Global-Local Partial Least Squares Discriminant Analysis for Appearance-Based Face Recognition

We propose new appearance-based face recognition methods based on global-local structure-preserving partial least squares discriminant analysis. Two variants of the method are described in this article: the neighbourhood-preserving partial least squares discriminant analysis (NPPLS-DA) and the uncorrelated NPPLS-DA (UNNPPLS-DA). In contrast to standard partial least squares discriminant analysis (PLS-DA), which effectively only recognizes the global Euclidean structure of the face space, both NPPLS-DA and UNNPPLS-DA are designed to find an embedding that preserves both the global and local neighbourhood information and obtain a face subspace that best detects the essential manifold structure of the face space. Unlike global-local features extracted using other methods, the global-local PLS-DA features are obtained by maximizing covariance between data matrix and a response matrix which is coded with the class structure of the data. Furthermore, in UNPPLS-DA, an uncorrelated constraint is introduced into the objective function of NPPLS-DA to extract uncorrelated features that are important in many pattern recognition problems. We compare the proposed NPPLS-DA and UNPPLS-DA methods with several competing methods on six different face databases. The experimental results show that the proposed NPPLS-DA and UNPPLS-DA methods provide better representation and consistently achieve higher recognition rates in face recognition than the other competing methods.


I. INTRODUCTION
In appearance-based face recognition, a facial image is considered a vector of pixels and is represented as a single point in the high-dimensional space. As the number of images in the data set increases, the complexity of classification and discrimination also increase. It was pointed out in [1] that the difficulty in high-dimensional classification is intrinsically caused by the existence of many noisy features in the data that do not contribute to the reduction of the misclassification rate. Thus, dimensionality reduction and feature selection techniques are employed prior to classification to extract features that contain most of the necessary information in the data. The most widely used dimension reduction techniques are principal component analysis (PCA) [2], [3], The associate editor coordinating the review of this manuscript and approving it for publication was Seifedine Kadry . partial least squares (PLS) [4]- [8] and linear discriminant analysis (LDA) [9], [10]. As indicated in [1], these methods project the original data onto the direction that can result in little misclassification error. The directions determined by these methods assign more weights to the features that contribute the most to the classification of the data. PCA is an unsupervised dimension reduction technique that captures most of the variance of a data, while LDA is a supervised dimension reduction technique that aims at discriminating the different classes in the data.
PLS is a statistical method that models the linear relationship between sets of observed variables X and Y by means of latent variables (components). The method was first developed by Wold [11], [12] and has since gained wide acceptance in fields such as chemometrics, bioinformatics, social sciences, and medicine. The ability of PLS to address high dimensionality and collinearity problems in spectral data VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ makes it a powerful and standard tool for the analysis of chemical data in chemometrics [5], [13], [14]. Although PLS was not designed for discrimination and classification tasks, it has been successfully applied to these problems and its performance was outstanding [15]- [17]. PLS for discrimination or better known as partial least squares discriminant analysis (PLS-DA) was shown in [17] to have a statistical relationship with LDA, and it was further suggested that PLS should be used instead of PCA when discrimination is the goal and dimension reduction is needed. Despite its successes in discrimination and preserving linear structures, the performance of PLS-DA has been reported to degrade under nonlinear conditions [18]- [20]. Attempts to modify PLS-DA to handle nonlinear data include kernelized extensions of PLS-DA [21]- [24] that extend the nonlinear kernel-based PLS methodology proposed in [25]. A disadvantage of using the kernel-based method as pointed out in [26] is that the relation between the original variables and the modelled lower-dimensional data is not clear. Another approach towards a nonlinear PLS-DA is called the locally weighted PLS-DA (LW-PLS-DA) [27]. LW-PLS-DA extended the locally weighted PLS approach described in [28] where locally weighted regression [29] was integrated into PLS-DA. This extension of PLS-DA also has a disadvantage that was highlighted in [27] indicating that if a PLS-DA model is trained based on only local information, the interpretation of the resulting model at a global level remains a challenge. The need for nonlinear dimensionality reduction and representation in appearance-based face recognition has been a well-researched topic. Nonlinear methods such as kernel methods [30], ISOMAP [31]- [33] and maximum variance unfolding (MVU) [34] are developed to preserve the global structure of the face space, while manifold learning methods such as locally linear embedding (LLE) [35], Laplacian eigenmaps [36], locality-preserving projection (LPP) [37]- [39] and neighbourhood-preserving embedding (NPE) [40], [41] attempt to capture local manifold structures. A systematic investigation in [42] revealed that although nonlinear methods have flexibility in learning continuous and smooth data, but inconsistent performance was observed when the data sets have a complicated distribution. In fact, in a subspace of low nonlinearity, a similar performance is observed in both linear and nonlinear methods. It was further highlighted in [43] that when treating multi-clustered data as in the case of appearance-based face recognition, there is a need to simultaneously treat both the global clustering structure as well as the local clustering structure.
Since global and local structures are both important for appearance-based face recognition and classification, a method that can reduce the dimensionality of a data set while preserving both of its global and local structures is thus highly desirable. The most current trend in feature extraction methods is generally witnessing a burgeoning growth of approaches embracing the global and local structure-preserving (GLSP) framework and are reporting very promising results [44]- [50]. Their approaches are similar and involve developing an objective function that finds global and local structure-preserving features in the low-dimensional embedding. The optimal direction for projection is thus the optimal solution of the objective. The resulting optimization problem can be transformed into an eigenvalue problem and can be solved using existing high-performance computational methods for eigenvalue problems.
To make PLS-DA more effective for dimensionality reduction and discrimination, we propose modifications to PLS-DA based on the GLSP framework. The GLSP objective function in our approach combines the PLS-DA objective and the objective of LLE to capture the local manifold structure. Features extracted from our methods are termed as global-local PLS-DA features. The global and local objectives are combined using two different multi-objective optimization criteria which results in two algorithmic procedures for extracting the features.
The rest of the paper is organized as follows. In section II, we provide a brief review of the PLS-DA and LLE algorithms. Section III (A) describes extraction of global-local PLS-DA features via the trace ratio optimization criteria and outlines the neighbourhood-preserving PLS-DA method (NPPLS-DA). In section III (B), we incorporate the uncorrelated constraint into the NPPLS-DA optimization criteria that gives rise to the uncorrelated NPPLS-DA method (UNPPLS-DA). The computational complexities of the two methods are discussed in section III (C). Extensive experimental results on six face databases are presented in section IV. Finally, we present some concluding remarks in section V.

II. PRELIMINARIES
In appearance-based face recognition, image data are represented as vectors, i.e., as points in a high-dimensional vector space. For example, a p × q 2D image can be mapped to a vector x ∈ R pq by concatenating rows or columns of the image. Despite this high-dimensional embedding, it is often found that the data in fact lies on a low-dimensional manifold. The primary goal of subspace analysis is to identify, represent and parameterize this manifold in accordance with some optimality criteria.
Let X = [x 1 , x 2 , . . . , x n ] T represent the n × m data matrix where each x i is a face vector of dimension m, concatenated from a p × q face image such that m = p × q. Here, m represents the total number of pixels in the face image, and n is the number of different face images in the training set.
To take advantage of class information, the training set may be expanded to contain multiple images of an individual, providing examples of how a person's image may change due to variations in lighting conditions, facial expressions and small changes of orientations. Suppose the training set contains face vectors from C individuals; then, the expanded training set is represented by where n c ], c = 1, 2, . . . , C such that sample x (c) i is the ith face vector belonging to the cth individual (class), and n c denotes the number of samples in the cth class such that C c=1 n c = n. In what follows, we give a brief review of the PLS-DA and LLE techniques applied to appearance-based face recognition.
A. PARTIAL LEAST SQUARES DISCRIMINANT ANALYSIS (PLS-DA) PLS-DA is derived from partial least squares which models the linear relationship between two sets of observed variables by means of components (also called latent variables). Suppose the two (mean-centred) data sets are X = [x 1 , x 2 , . . . , x n ] T ∈ R n×m and Y = [y 1 , y 2 , . . . , y n ] T ∈ R n×C , where the rows correspond to observations and the columns correspond to variables. The main idea behind PLS is to ∈ R n×C , where each column projection vector pair (w, v) maximizes the covariance of the projected data. Mathematically, this is represented by the constrained optimization problem of the form In matrix form, (2) can be written as The Lagrangian associated with (3) is Differentiating the Lagrangian and setting it to zero gives which can be solved simultaneously to find w and v. Further, from (4) and (5), and using the constraints The PLS component that is often sought after for dimension reduction is the vector w. By letting λ = 2λ X and writing v = X T Yw/λ, (4) and (5) can be combined into a single eigenvalue problem in w of the form, Alternatively, one can also cast problem (6) as a subspace optimization problem in the dimensionally reduced projection matrixŴ , that is, where the columns ofŴ are the d eigenvectors of X T YY T X associated with the d largest eigenvalues.
For appearance-based face recognition, since the purpose of analysis is discrimination, a dummy matrix representing class membership is used as the response matrix Y and takes the form where 0 n i and 1 n i are n i × 1 vectors of zeros and ones, respectively. For example, if the data set contains two classes, then the matrix Y is designed as a single-column vector with entries of 1 for all samples in the first class and 0 for samples in the second class, i.e.
Further, if the data have three classes, then the Y matrix is encoded with three columns as follows, One may choose to centre the class membership matrix Y in (8) to have zero mean. It has been shown in [17] that the eigenvalue problem in (6) is a slightly altered eigenvalue problem associated with the between-class separation matrix in LDA. Therefore, PLS-DA maximizes the class separation in the lower dimensional subspace. There also exists a close relationship between PLS-DA and PCA; both methods maximize the variance of the samples. The main difference between the two methods is that PLS-DA computes weight vectors in a supervised way, i.e., with full awareness of the class labels, while PCA is an unsupervised method. The PLS-DA method is even sometimes referred to as a supervised PCA.

B. LOCALLY LINEAR EMBEDDING (LLE)
Here, we assume that face vectors lie on a nonlinear manifold M embedded in the high-dimensional face space in R m . The original LLE algorithm described in [35] is an unsupervised version that assumes class information is unavailable. Each data point is assumed to reside on a locally linear patch of the manifold and can be reconstructed by a linear combination of its k nearest neighbours. To make use of class information, we interpret 'neighbours' to mean data from the same class, i.e., each data point x (c) i from class c is reconstructed by in c x (c) n c . Linear coefficients (weights) S c ij that are used in the reconstruction are constrained by enforcing that n c j=1 S (c) ij = 1. The constraint ensures that the weights are invariant under rotations, rescalings and translations. As such, the reconstruction weights carry information about the geometry of the local patch on the manifold on which face vectors from class c resides. The total reconstruction error is the sum of the reconstruction errors for all the classes: Minimizing J 1 subject to the constraint ij in the same way as in [35].
To preserve local geometric structure in the low dimensional embedding, the same reconstruction is preserved for each point. In particular, if . . , n c , c = 1, 2, . . . , C, we minimize the total reconstruction errors in the low-dimensional space, i.e., With the assumption that the low-dimensional embedding is a subspace in the linear m dimensional face space associated with X , then we may write z is the projection matrix to be determined. As a consequence, (10) becomes where Minimization of (11) results in an m × m eigenvalue problem where the columns of W are the bottom d eigenvectors of the matrix X T (I − S) T (I − S)X associated with its d smallest eigenvalues.

III. GLOBAL-LOCAL PLS-DA
The core idea of PLS-DA is to optimize the covariance between extracted features (latent components) and the response (class membership) matrix. Several different modified models for optimizing the projection matrix W in Euclidean space have been summarized in [4], and the more recent nonlinear extensions can be found in [21]- [24], [27]. However, all the algorithms do not specifically address local manifold structures of data. Since face data may reside on a nonlinear submanifold [38], for the PLS-DA method to be more effective for face recognition, local manifold learning is needed. From the perspective of classification, an effective dimensionality reduction method should be able to maximize class separation and preserve (if not minimize) within class distances of data points in lower-dimensional subspaces. It is established in [17], [51] that PLS-DA can only effectively maximize between-class separability where the between-class scatter matrix is given by where S b is a slightly altered version of the between-class scatter matrix associated with the Fisher criterion [17]. It is also observed in section II-B that information on within-class structures can be captured through the local linear models in the LLE construction in which the optimum construction is associated with the minor eigenspace of the matrix The between-class scatter matrix S b captures the global class structure of the data while matrix S w captures the local within-class structure. To combine global and local modelling of the data within the existing PLS-DA framework, we insist that the projection matrix W ∈ R m×d that achieves low-dimensional embedding in the linear m-dimensional face space associated with X is an optimizer of a multi-objective optimization problem: As such, the optimizing matrix W maximizes between-class separation and at the same time, captures and preserves the local structure of the data set.
In the following subsections, we describe two global-local PLS-DA methods that are based on two different strategies in solving (15).

A. NEIGHBOURHOOD-PRESERVING PLS-DA
Our first global-local PLS-DA method is the neighbourhoodpreserving PLS-DA (NPPLS-DA) method that is obtained by rewriting the multi-objective optimization problem (15) in a trace ratio optimization problem of the form: This problem can be solved using the iterative trace ratio optimization procedure (see [55] or [56]), where the optimum solution W * is the solution to the eigenvalue problem . Without loss of generality, we let the diagonal entries of ∈ R d×d be arranged in descending order such that λ 1 ≥ λ 2 ≥ · · · ≥ λ d and the columns of W * be denoted by vectors w 1 , w 2 , . . . , w d . The matrix W * defines the optimal projection matrix. In particular, if z i ∈ R d is the low-dimensional embedding of the face vector x (c) i , then we write: These low-dimensional embeddings are the global-local PLS-DA features obtained using NPPLS-DA It is acknowledged in [56] that the solution to (16) may not exist when the matrix S w is not positive definite. In face recognition, this can occur when the number of features is larger than the number of samples. To ensure positive definiteness of S w , we use the idea of regularization by adding a constant shift to the diagonal elements of S w , so that S w becomes S w + γ I , for some γ > 0. The regularized matrix S w = S w + γ I is better conditioned than S w and leads to more viable numerical schemes. The optimization problem in (16 is then modified to the regularized trace ratio optimization problem which is given by: Diagonal shifts of S w result in diagonal shifts of the eigenvalue problem in (17). It is well known that eigenspaces are invariant under diagonal shifts; therefore, the solution to (19) is also the solution (16). The algorithm for the NPPLS-DA method based on iterative trace ratio optimization is given in Algorithm 1. The statistical uncorrelation of discriminant vectors has been shown to be a favourable property in pattern classification [61]- [63]. Statistically uncorrelated features contain minimum redundancy and can lead to better performance of subspace learning methods.
Suppose that z i and z j are two different components of the extracted feature z = xW . The covariance between z i and z j is: = X T X is the total scatter matrix. Hence, the correlation coefficient between z i and z j is: We can see from (20) and (21) that Cor(z i , z j ) = 0 when w T i S t w j = 0. This means that the two features z i and z j are mutually uncorrelated when w T i S t w j = 0. To simplify computation, the vector w i is normalized to satisfy: Based on the analysis above, the desired uncorrelated feature vectors can be obtained when and the above criterion (23) can be summarized as: The optimization problem (15) combined with criterion (24), results in a constraint optimization problem similar to (16) which is: The iterative trace ratio optimization procedures in [55] and [56] can be extended for a more general constraint of the form W T CW = I , where is any positive semidefinite matrix. To see this, first we let C = S t . The Lagrangian associated with (25) is where ∈ R d×d is the Lagrange multiplier matrix. Define The optimum solution (W * , * ) satisfies ∂L(W * , * ) where δ * = φ b (W * )/φ w (W * ). Let Q ∈ R d×d be an orthogonal matrix that diagonalizes * T + * so that * T Then, (27) can be transformed to VOLUME 8, 2020 or, where V * = W * Q and * = φ w (W * ) 2 * . From (28), we may now interpret * as a diagonal matrix containing the generalized eigenvalues of (S b − δ * S w ). and V * is a matrix whose columns are the corresponding generalized eigenvectors. Since the trace function is invariant under orthogonal transformation, V * is also the optimizer of (25).
Let the columns of V * be v 1 , v 2 , . . . , v d ordered according to the generalized eigenvalues σ 1 ≥ σ 2 ≥ · · · ≥ σ d . The matrix V * is the optimum projection matrix and the uncorrelated global-local PLS-DA feature vectors are extracted as follows: If, however, S t is degenerate as it usually is in face recognition problems, we can use the same regularization technique as in section III-A to solve this problem. The complete algorithm for the uncorrelated NPPLS-DA is given in Algorithm 2.

C. DISCUSSION
The NPPLS-DA and UNPPLS-DA methods are similar to LDA and its variants (for example, [10]) in which optimal projection to low-dimensional embeddings is obtained by modelling between-class and within-class scatter subject to constraints. However, the modelling of between-class scatter and within-class scatter in the global-local PLS-DA methods is quite different compared to LDA. As mentioned earlier, the between-class scatter matrix in NPPLS-DA (and UNPPLS-DA) (13) is different from the between-class scatter matrix associated with the Fischer criterion that is used in LDA, and the statistical differences have been shown in [17], [51]. Furthermore, the matrix S w (14), which is akin to the within-class scatter matrix in LDA, is constructed based on the assumption that local (within-class) structure is represented by a nonlinear manifold embedded in the higher-dimensional face space. Whereas in LDA, the withinclass scatter matrix is formed by a pooled within-class sum of squares and cross-products matrix, irrespective of local class structures. In NPPLS-DA (and UNPPLS-DA), local class structures are represented by the matrix S, and the sum of within-class distances after projection is minimized while preserving the structural information embedded in S. The different ways in which between-class scatter and within-class scatter are interpreted lead to extraction of different features. In the experimental section, the differences between our methods and the LDA-related method in [10] is further highlighted.
The computational complexity of the NPPLS-DA and UNPPLS-DA algorithms may differ significantly from other dimensional reduction algorithms in two major steps: 1) Computing the reconstruction weights, and 2) Computing the lower-dimensional embedding Computing the optimal reconstruction weight is done exactly as in the LLE algorithm with a computational complexity of O(mnk 3 ), where k is the number of nearest neighbours [58], [59]. The most expensive step in the computation of lower-dimensional embedding via the iterative trace ratio optimization procedure is the cost of solving the d-dimensional eigenvalue problem iteratively. In general, this process requires O(d 3 ) [60] per iteration, and it has been observed in [55] that the convergence is fast.

IV. EXPERIMENTAL RESULTS
In this section, we use six benchmark face databases, including the Yale face database [38], ORL face database [64], AR face database [65], Extended Yale B face database [66], CMU-PIE face database [67] and Essex face database [68], to evaluate the effectiveness of the proposed methods for face representation and recognition. The experiments are designed to compare the performances of the proposed NPPLS-DA and UNPPLS-DA methods with the existing methods, including principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), neighbourhood-preserving embedding (NPE) (a linear version of LLE) and the more recent robust sparse linear discriminant analysis (RSLDA) [10]. All experiments are implemented in MATLAB 2019a using a PC machine (64-bit operating system) with an Intel(R) Core(TM) i7 − 7700HQ CPU (2.80 GHz) and 8 GB memory.

A. FACE REPRESENTATION
Face representation is a necessary step in face recognition. Face images have to be represented before recognition can occur. Face representations are tuned to capture the essential properties of face images. Such properties eventually give rise to basis functions (face basis) that are crucial for face representation. The basis functions correspond to the coordinate axes (dimensions) that define the face subspace. Once face bases are extracted, the face images to be represented are projected along the axes that define the basis to generate features needed for representation. Specifically, the learned face subspace is spanned by the face basis. Therefore, any face image in the face subspace can be represented as a linear combination of the face basis. The face representations are then used for recognition subject to further transformations. The objective of face representation is twofold: 1) The face basis should be complete and low in dimension. 2) In addition, the face basis should be able to capture most of the energy or face contents that will allow efficient derivation of suitable face representations. Moreover, it would be very interesting to visualize a face basis. The face basis can be displayed as a sort of feature image. Using the Yale face database, we present the first six face bases of different algorithms in Fig. 1. It is interesting to note that the NPPLS-DA and UNPPLS-DA bases are somehow similar.
1. The optimal transformation matrix W * ∈ R m×d 2. The d-dimensional embedding coordinates Z for the original input data X Step 1: Construct the data matrix X.
Step 2: Construct the class membership matrix Y based on Eq. (8) Step 3: Compute S b in Eq. (13).
Step 4: Compute the reconstruction weight S (c) ij for each data point x (c) i by solving the minimization problem in Eq. (9).
Step 5: Construct the matrix S w = X T (I − S) T (I − S)X where S is given in Eq. (12) Step 6: Form the regularized matrix S w = S w + γ I Step 7: Compute W * using the iterative trace ratio optimization procedure in Appendix A (with S w → S w ) Step 8: Compute the d-dimensional embedding Z = XW * .

Input:
1. Training set with class labels x 1. The optimal transformation matrix V * ∈ R m×d 2. The d-dimensional embedding coordinates Z for the original input data X Step 1: Construct the data matrix X.
Step 2: Construct the class membership matrix Y based on Eq. (8) Step 3: Construct the total scatter matrix S t = X T X .
Step 5: Compute the reconstruction weight S  Step 6: Construct the matrix S w = X T (I − S) T (I − S)X where S is given in Eq. (12) Step 7: Compute V * using the iterative trace ratio optimization procedure in Appendix B Step 8: Compute the d-dimensional embedding Z = XV * .

B. FACE RECOGNITION
In the face recognition experiments, we randomly select some images from each individual to form the training set that is used to learn a transformation matrix. Using the obtained transformation matrix, we mapped the remaining images (test set) to a lower-dimensional face subspace. The nearest neighbour classifier is then used to classify the test images in the lower-dimensional face subspace. The Euclidean metric is used as our distance measure. We repeat this approach 10 times and report the mean classification accuracy of each method.
The recognition process can be structured into three phases. First, the face bases are calculated from the training set of face images. The test set of face images is then projected into the face subspace spanned by the face basis. Last, the test set of face images is identified in the face subspace using a nearest neighbour classifier. In our preliminary experiments, we observe that for δ > 0, the UNPPLS-DA method extracts at most C face bases. However, for δ < 0, the UNPPLS-DA method can extract more than C face bases. Since all the databases used in our experiments have a substantially large amount of individuals (C), we set δ > 0 in all our previous experiments and tune it to achieve the best performance for the UNPPLS-DA method.  The best average recognition rates together with their corresponding standard deviations and reduced dimensionality (in brackets) as obtained by the different methods are shown in Table 1. The UNPPLS-DA method was able to obtain the best average recognition rates for each p. For example, the best average recognition rates obtained by the Baseline, PCA, PLS-DA, RSLDA, NPE and NPPLS-DA when p = 8 are 52%, 56%, 64.44%, 70.22%, 81.33% and 82.00%, respectively, while the best average recognition rate obtained by the UNPPLS-DA method for the same p was 83.33%. Fig. 3 shows the plots of the average recognition rates versus reduced dimensionality. It can be seen that the recognition rate of all the different methods increases with the number of reduced dimensions. However, in all cases, the performances of the NPPLS-DA and UNPPLS-DA methods significantly outperform the ordinary PLS-DA method as well as PCA and RSLDA. For this database, NPPLS-DA and UNPPLS-DA only show a slight improvement compared to NPE.  Fig. 4.
A random subset with p(= 2, 4, 6, 8) images per individual was taken with labels to form the training set. The rest of the database was considered to be the testing set. For each given p, we average the results over 10 random splits. The experimental design is the same as before. The recognition results are shown in Table 2    recognition rates for RSLDA, NPPLS-DA and UNPPLS-DA when p = 2 are 73.06%, 84.50% and 81.53%, respectively. However, the best recognition rate for RSLDA, NPPLS-DA and UNPPLS-DA when p = 8 are 95.25%, 98.50% and 98.87%, respectively. These results show that our proposed approaches have the potential to not only improve the discriminant power of PLS-DA but also provide better discriminant power than RSLDA. The findings also indicate that both NPPLS-DA and UNPPLS-DA are capable of learning from examples, i.e., both methods take advantage of the large training sample size. For this database, the performance of NPPLS-DA and UNPPLS-DA is comparable to NPE.

3) EXPERIMENTS ON THE EXTENDED YALE B FACE DATABASE
The Extended Yale database B contains 2414 front-view face images of 38 individuals. Each person provides 64 face images with different illumination conditions. In this experiment, we use the cropped face images with resolution of 32 × 32 pixels. Each face image is represented by a 1024-dimensional vector in the face space and the pixel values (features) are scaled to [0, 1]. Fig. 6 shows some sample images from the Extended Yale database B.
For each individual, we randomly select p(= 5, 10, 20, 30) images to form the training set, and the remaining images are treated as the test set. For each given p, we average the result over 10 random splits. Table 3 shows the best average recognition rates, standard deviations and the corresponding reduced dimensionality (in brackets) obtained by the different methods across the 10 different splits of the database. In addition, the plots of the average recognition rates versus reduced dimensionality of the different methods are presented in Fig. 7. The results for this database emphatically VOLUME 8, 2020 TABLE 2. Best average recognition rate (in percent) on the ORL database over ten random splits.  highlight the superiority of NPPLS-DA and UNPPLS-DA over the other methods we use in the comparison. While the recognition rates for NPPLS-DA and UNPPLS-DA increase with the number of training samples, the performances of RSLDA, PCA and PLS-DA become even worse than the Baseline. Performance below the Baseline indicates that it is better to perform face recognition on the original database without dimensionality reduction rather than using RSLDA, PCA or PLS-DA for dimension reduction prior to face recognition. The NPE method tends to be quite sensitive to the number of training samples, but overall, for this database, both NPPLS-DA and UNPPLS-DA outperform the NPE method. We can also see from Table 3 and Fig. 7 that the UNPPLS-DA method was able to perform much better than M. Aminu, N. A. Ahmad: New Variants of Global-Local PLS-DA for Appearance-Based Face Recognition NPPLS-DA when p = 10, 20 and 30. The UNPPLS-DA method appears to be very sensitive to the training sample size (p) on this database. As the size of the training samples increases, the average recognition rate of UNPPLS-DA also increases. In contrast, we did not see much improvement in the performance of NPPLS-DA when p = 10, 20 and 30. For example, the best average recognition rate for NPPLS-DA when p = 20 is 83.33%. However, after increasing the size of the training samples to p = 30, the best average recognition rate of NPPLS-DA drops to 82.39%. Overall, UNPPLS-DA performed better than all other methods on this database which indicates that statistically uncorrelated features can sometimes encode more discriminant information that can enhance performance.

4) EXPERIMENTS ON THE CMU-PIE FACE DATABASE
The CMU-PIE face database contains 41, 368 facial images of 68 individuals. The face images for each individual were captured across 13 different poses, under 43 different illumination conditions, and with 4 different expressions. In our experiments, we use a subset of the database that contains  Fig. 8.
A random subset with p(= 3, 5, 7, 10) images per individual is taken to form the training sets, and the rest of the images are used as test sets. For each p, we average the result over 10 random splits. Table 4 shows the best average recognition rate, standard deviation and corresponding reduced dimensionality (in brackets) for the different methods. As can be seen, NPPLS-DA and UNPPLS-DA achieved  higher average recognition rates than all other methods. PCA and PLS-DA perform rather poorly, where recognition rates for these methods are worse than the baseline method. Performances of UNPPLS-DA and NPPLS-DA on this database are comparable. It is observed in Fig. 9 that the recognition rates of RSLDA, NPPLS-DA and UNPPLS-DA increase as the dimensionality of the face subspaces increases, but the rate of increase for NPPLS-DA and UNPLS-DA is much higher than that of RSLDA. For each p, both the NPPLS-DA and UNPPLS-DA methods are able to achieve the best performances using less than 10 dimensions. This performance behaviour is also observed on the other databases.
This finding shows that both methods are able to capture most of the discriminant information in the data using a small number of dimensions. The recognition rates of both the NPPLS-DA and UNPPLS-DA methods are very stable on this database. The sensitivity of the NPE method to the number of training samples is rather obvious for this database, where for p = 10, its performance dips below the baseline.

5) EXPERIMENTS ON THE AR FACE DATABASE
The AR face database contains over 4000 colour images of 126 individuals (70 men and 56 women). The images were taken in two different sessions (separated by two weeks).  All images were taken using the same camera under different conditions of illumination, facial expressions and occlusions (sun glasses and scarf). Each image is of 768×576 pixels and each pixel is represented by 24 bits of RGB colour values. Fig. 10 shows some sample images from the AR database.
In our experiments, we use a cropped subset of the AR database [65]    sets. For each p, we average the result over 10 random splits. Table 5 shows the best average recognition rate, standard deviation and corresponding reduced dimensionality (in brackets) of the different methods. Fig. 11 shows the plots of the average recognition rates versus reduced dimensionality for the different methods on the AR face database. With an exception of the NPE method, the performances of NPPLS-DA and UNPPLS-DA consistently outperform the other methods. The PCA and PLS-DA methods performed the worst on this database. The recognition rates of both PCA and PLS-DA for each p are not significantly better than the Baseline. The RSLDA method also performs rather poorly for this database. This indicates that discovering the manifold structure of the face space is very important in enhancing the discriminant ability of PLS-DA in face recognition. Moreover, the good performances of NPPLS-DA and UNPPLS-DA on this database demonstrate the robustness of these methods with respect to the variation on facial expression and lighting conditions.

6) EXPERIMENTS ON THE ESSEX FACE DATABASE
The Essex face database is a challenging face database that contains 7900 face images with complicated acquisition conditions. The face images demonstrate variations in facial expressions, lighting conditions, and with/without a beard and glasses. We used a subset of the database containing 7780 face images belonging to 389 individuals in our experiments. All images are cropped and resized to 32 × 32 pixels. Each image is represented by a 1024-dimensional vector in the face space and the pixel values (features) are scaled to [0, 1].
We randomly select p(= 3, 5, 7, 10) images per individual to form the training set, and the rest of the face images are used as the test sets. For each p, we average the result over 10 random split of the data. The recognition rates of the different methods on the Essex face database are shown in Table 6. From this table, one can see that both NPPLS-DA and UNPPLS-DA perform significantly better than the PLS-DA method for every p. This indicates that features extracted using NPPLS-DA and UNPPLS-DA contain more  discriminant information than those extracted using the PLS-DA method. Fig. 12 shows the plots of the average recognition rate versus reduced dimensionality for the different methods. As can be seen, both NPPLS-DA and UNPPLS-DA achieve better overall performance than that of the other four methods, requiring only the first few dimensions (less than 10) to achieve the best performance. On the other hand, the performances of the PCA and PLS-DA methods are again not significantly better than the Baseline for each p. This experiment further reveals the importance of local manifold learning in face recognition.

C. DISCUSSION
We have systematically performed experiments on six face databases. The recognition rates for each method on the six databases are reported in Fig. 3, 5, 7, 9, 11, 12 and  i.e., NPPLS-DA, UNPPLS-DA and NPE, show significantly better performance than that of methods without local manifold learning. This results emphasizes the importance of local manifold learning in face recognition. Local manifold learning detects the face manifold structure. As such, these methods can recognize face images with different expressions, poses and lighting conditions. More importantly, both NPPLS-DA and UNPPLS-DA are robust to the variations on facial expressions, poses and lighting conditions, findings that are markedly evident from the results shown for the Extended Yale B and the CMU-PIE databases. 5) Both NPPLS-DA and UNPPLS-DA consistently performed better than the Baseline on all six databases, indicating that these methods are capable of reducing the dimension of the problem while at the same time preserving most of the information in the data which is necessary for successful recognition. The superiority of NPPLS-DA and UNPPLS-DA is particularly evident in their ability to achieve the best recognition rate with only a few dimensions (less than 10 on average). 6) In some cases, the UNPPLS-DA method shows a performance improvement over the NPPLS-DA method. The improvement of UNPPLS-DA over the NPPLS-DA method is more significant on the Extended Yale B database. It can be seen from Table 3 that when p = 30, the UNPPLS-DA method was able to obtain an average recognition rate of 93.26%, which is significantly higher than the average recognition rate of 82.39% obtained by NPPLS-DA. This implies that uncorrelated features can sometimes encode more discriminant information that can lead to a better (higher) recognition rate.

V. CONCLUSION
In this article, new variants of the subspace learning algorithm based on the global and local structure-preserving framework are described and analysed. These methods are modified from the conventional PLS-DA method and primarily designed to improve the discrimination power of PLS-DA and make it more suitable for appearance-based face recognition. The global-local PLS-DA combines the ability of PLS-DA to maximize between-class separability with preservation of the local manifold structure. The first of such methods is called neighbourhood-preserving partial least squares discriminant analysis (NPPLS-DA) and involves solving a trace ratio optimization problem with a unity constraint. A direct extension of the NPPLS-DA method is also proposed that is called uncorrelated neighbourhood-preserving partial least squares discriminant analysis (UNPPLS-DA). The UNPPLS-DA method is also formulated as a trace ratio optimization similar to NPPLS-DA, but with a statistically uncorrelated constraint designed to ensure that features extracted using UNPPLS-DA contain minimum redundancy. Experimental results on six face databases confirmed that the proposed NPPLS-DA and UNPPLS-DA methods consistently outperform PCA, PLS-DA and the robust sparse LDA methods in several different aspects. In comparison with NPE, the performances of both NPPLS-DA and UNPPLS-DA are less sensitive to the number of training samples and more robust to the variations on facial expressions, poses and lighting conditions. It is clear that the global-local PLS-DA features are able to encode more discriminant information in the low-dimensional face subspace. Our results further strengthen the fact that modelling and preserving the local manifold structure of the face space is very important and can lead to better performance in appearance-based face recognition. .