Robust Manifold Embedding for Face Recognition

Flexible manifold embedding (FME) has been recognized as an effective method for face recognition by integrating both class label information from labeled data and manifold structure information of all data. In order to achieve better performance, this particular method usually requires sufficient samples to make manifold smooth. However, it is often hard to provide enough samples for FME in practice. In view of facial symmetry, we utilize left/right mirror face images to address the deficiency of samples in manifold embedding. These mirror images enable to reflect variations of illuminations, or poses or both them that the original face images cannot provide. Therefore, we propose a robust manifold embedding (RME) algorithm in this paper, which can fully use the class label information and correctly capture the underlying manifold structure. The proposed RME algorithm integrates two complementary characteristics of the label fitness and the manifold smoothness. Moreover, the original face images and its left/right mirror images are jointly used in the learning of RME, which shows better robustness against the variations of both illuminations and poses. Extensive experiments on several public face databases demonstrate that the proposed RME algorithm is promising for higher recognition accuracy than other compared methods in reference.


I. INTRODUCTION
Dimension reduction is a hot topic for the recognition tasks of high dimensional image. In past decades, a large number of dimension reduction algorithms have been proposed [1]- [7]. Linear discriminant analysis (LDA) and principal component analysis (PCA) are the two most classical algorithms for dimension reduction. As a supervised method, LDA uses the class label information from training samples and solves the classification problem by simultaneously making the between-class scatter matrix (S W ) maximizing and the within-class scatter matrix (S B ) minimizing in the expected low-dimensional feature space. By contrast, PCA is an unsupervised method and it finds the directions of maximum scatter for optimal reconstruction without using any class The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojie Ju . label information. Owing to the simplicity and high efficiency of LDA and PCA, lots of variants based on the two algorithms have been proposed [8]- [14]. Due to the application of class label information, the supervised methods offer better recognition performance than the unsupervised methods when there are enough labeled training samples available. In addition, with the development of deep learning in recent years, the face recognition methods based on neural network has attracted more and more attention [15]- [17]. However, although deep learning has made a great breakthrough in image recognition due to its strong learning ability, data-driven and adaptability, it has a large amount of computation, high hardware requirements, complex model design and poor interpretability.
Structurally, human faces are non-rigid, and the corresponding face images essentially show a complicated nonlinear structure. Therefore, the essential nonlinear structure of face images cannot be well revealed by the traditional linear dimension reduction methods. Due to the effectiveness of manifold learning in nonlinear dimension reduction, it is widely used by capturing low-dimensional essential structures embedded in high-dimensional space [18]- [22]. In general, the most classical manifold learning methods include locally linear embedding (LLE), ISOMAP and Laplacian Eigenmap (LE). Despite their effectiveness, the weakness of these manifold learning methods is so-called out-ofsample problem. That is to say, these methods cannot obtain a projection matrix to map a new sample to the desired low-dimensional embedding space. To solve this issue, many researchers focused on many improved manifold learning methods. He et al. [23] proposed a locality preserving projection (LPP) algorithm for the description of the essential face manifold structure. Vural and Guillemot [24] presented a semi-supervised manifold learning method, in which an interpolation function is built to provide an out-of-sample extension for general supervised manifold learning methods. In order to utilize the local structure information of the data, Lu et al. [25] proposed a novel manifold linear regression framework. Based on this framework, manifold discriminant regression learning (MDRL) and robust manifold discriminant regression learning are respectively given. By using the findings in sparse coding theory, Raducanu and Dornaika [26] proposed a generalized out-of-sample extension solution. Shi et al. [27] presented a novel supervised multi-manifold learning method, in which the multi-manifold features of images are extracted. In manifold learning algorithms, the local neighborhood size is very important and sensitive for the algorithm performance. In order to determine the neighborhood size, Zhang et al. [28] presented an adaptive manifold learning framework. Based on graph manifold learning in high dimensional feature space, Wang et al. [29] presented an unsupervised feature selection method, in which the importance of features in the original space was computed by means of L1-regularized least square and the spectral regression analysis. Fang et al. [30] proposed a locality and similarity preserving embedding (LSPE) for the preservation of locality, similarity and the sparse reconstruction relationship. In order to eliminate effects of occlusions and illumination variations, Wang et al. [31] presented a manifold regularized local sparse representation (MRLSR) method. Zhang et al. [32] proposed a patch alignment framework to unify spectral analysis based dimensionality reduction algorithms, in which discriminative locality alignment (DLA) was used. DLA consists of two major stages: the part optimization and the whole alignment. Nevertheless, DLA is too much sensitive to the values of the parameters such as neighborhood size and dimensional size. Liu and Jin [33] presented an enhanced discriminative locality alignment (EDLA) algorithm, which simultaneously uses the local structure information and class label information, leading to better performance than DLA.
Recently, representation based classification methods have become a research hotspot in pattern recognition and computer vision. Wright et al. [34] introduced sparse representation to image classification and presented a sparse representation based classification (SRC) algorithm. In SRC, a given test sample is first represented as a linear combination of all the training samples, and then a minimum reconstruction error is used to achieve the final classification. In order to examine the classification capability of collaborative representation or L1-norm sparsity, Zhang et al. [35] elaborated on the mechanism of SRC and concluded that it is collaborative representation that contributes to the performance of SRC. Based on the above observation, a collaborative representation based classification (CRC) method was developed for image classification. An observed data includes lots of features. However, it is well known that each feature of the observed data has different contribution to the pattern representation and classification. Based on this fact, Yang et al. [36] proposed a relaxed collaborative representation (RCR) method. Xu et al. [37] presented a two-phase test sample sparse representation method (TPTSSR) which makes a coarse-to-fine classification decision for the face samples. Considering that all samples from an object lie in a linear subspace, a linear regression classification (LRC) algorithm was proposed in [38]. In order to effectively use label information and manifold structure of the observed data, Nie et al. [39] presented a unified manifold learning framework called flexible manifold embedding (FME) by employing a linear regression function to map a new sample to desired feature space.
In practical application scenario, the same face shows considerable changes in pose, illumination and expression, so the variations of pose, illumination and expression regarding the same person are almost always larger than image variations of face identity [40]. To perform feature extraction for better performance, one should prepare enough training samples varying in different poses, illuminations and expressions. However, in practice, it is difficult or burdensome to collect sufficient facial images with different poses, illuminations and expressions of each object as the training samples. Fortunately, both the facial structure and the facial expression are symmetrical [41]. This has motivated many studies on the application of symmetry to enhance the diversity of training samples. For example, Xu et al. [42] proposed to first generate a set of new samples based on the symmetry of the face and then both the original samples and the newly generated symmetrical training samples are used to perform face classification.
Inspired by the prior work proposed in [43] and the symmetry feature of human faces [44], we propose a novel robust manifold embedding (RME) algorithm. The key idea of the proposed RME algorithm is that both the manifold information and the symmetrical information of facial image are fully taken into consideration to boost up recognition performance. The advantages of RME are two-fold. Firstly, due to the use the mirror images of the original images, the proposed RME can get better recognition performance even though there are few training samples. Secondly, our method integrates two complementary characteristics of the label fitness and the manifold smoothness.
The rest of this paper is organized as follows. Section 2 reviews the related works. Section 3 presents the proposed RME algorithm. Experimental results are presented in section 4. Finally, Section 5 concludes the paper.

II. BRIEF REVIEW OF RELATED WORKS
Suppose that there are n training samples for c classes and each class has n 1 training samples. In each class, the first n 2 samples are labeled and the rest are unlabeled. Let X = [x 1 , x 2 , · · · , x u , x u+1 , · · · , x n ] ∈ R m×n be the training set, where m is the dimensional size of the samples, u = c × n 2 is the number of labeled data, and n = c×n 1 is the total number of training samples. For any labeled sample x i from i th class, it should belong to y i (y i = 1, 2, · · · , c) class. We denote the class label of all samples as a binary label matrix Y ∈ R n×c , and the i th row and j th column element Y ij in matrix Y is defined as Let G = {X , S} be an undirected connected graph with nodes X and similarity matrix S ∈ R n×n . The similarity matrix S is a symmetric matrix, in which each element S ij represents the similarity of a pair of samples [45]. The Laplacian matrix is denoted by L = D − S, and D is a diagonal matrix and its diagonal elements are denoted as D ii = j S ij .

A. GAUSSIAN FIELDS AND HARMONIC FUNCTIONS (GFHF)
Based on the label fitness and the manifold smoothness, GFHF is presented to predict class label matrix F ∈ R n×c . As shown in [43], the objective function g G (F) of GFHF is minimized as below: where Y i and F i are respectively the i th row of matrix Y and F, and Y i is comprised of the class label of the i th labeled sample, F i is comprised of the predicted class label of the i th unlabeled sample. In order to make [46]. Equation (2) can be rewritten as where L is a graph Laplacian matrix, and it is defined as L = D − S, D is a diagonal matrix and its diagonal elements are D ii = j S ij , U is a diagonal matrix, in which the first u diagonal elements are λ ∞ , and the rest diagonal elements are 0, respectively.

B. FLEXIBLE MANIFOLD EMBEDDING (FME)
On the basis of keeping the label fitness and the manifold smoothness, GFHF predicts the class labels of unlabeled samples. However, it can not get a projection matrix in GFHF algorithm. That is to say, GFHF can not map new samples to the desired subspace. By contrast to GFHF, the method proposed in [47] enables to convert new samples to feature subspace by linear Laplacian regularized least squares (LapRLS/L) method through the linear regression function as follows.
where W ∈ R m×c is a transformation matrix and t ∈ R c×1 is a translation vector. The objective function of LapRLS/L is to minimize the ridge regression errors and preserve the manifold smoothness simultaneously, which is defined as follows.
where λ and µ are two balance factors. Equation (4) can be rewritten as where e n = [1, 1, · · · , 1] n T is a n-dimensional vector of all ones. From the above definition, the obtained prediction labels F in LapRLS/L are restricted in the space spanned by all the training samples X . For a given new sample, its class label can be obtained by the projection matrix W . Nie et al. [39] suggested that Equation (6) may be overstrict to fit the data samples from a non-linear manifold. In order to solve this problem, flexible manifold embedding (FME) is presented. In FME, the constraint F 0 is added to represent the regression residue in Equation (6), i.e., where regression residue F 0 ∈ R m×c is used to evaluate the mismatch between F and h(X ). In order to obtain the optimal prediction label matrix F, the objective function of FME is to minimize the following function.
where λ 1 and λ 2 are balance factors. Equation 8 can further be converted as To seek the optimal solution to Equation (9), we can respectively fix W and F, and make the derivatives of the objective function in Equation (9) with respect to W and F Algorithm 1 FME Step 1. Set each element of matrix F ∈ R n×c to 1 Step 2. Compute the optimal projection matrix W using Equation (10).
Step 3. Use Equation (11) to compute the optimal label matrix F. equal to zero. The solutions of W and F can be represented as follows.
Because the solution to W and F in Equation (9) is contained each other, its solution cannot directly be obtained. An iterative way is to iteratively find the solution of the objective function. The FME algorithm is outlined as follows.

III. ROBUST MANIFOLD EMBEDDING (RME)
The advantages of the FME are two-fold. First, it can use a linear regression function h(X ) = X T W + e n t T to map the feature space of new samples to the expected class label space. Second, both the label information and manifold structure are integrated in FME for the improvement of recognition performance. However, the performance of this method heavily depends on enough training samples for each object when learning. We usually have not so many training samples for each object to apply to FME in reality. Especially for face recognition, there are often few training samples for each object, so it may encounter the problem of small sample size. In addition, face appearance can change drastically due to the variations in expression, illumination and pose. In particular, the variations between the images of the same face are nearly greater than the variations of face identity [48]- [51]. These above factors can bring adverse effect on the recognition performance of FME.
It is widely recognized that both facial structure and the facial expression are symmetric [41], [42]. Based upon this observation, we can use the mirror image of the face image to represent more possible illumination and pose changes which the original face image cannot provide. Inspired by this idea, we can generate the left-mirror-image and right-mirrorimage for each face image by using the left/right half face image. By this way, we can obtain triple training samples for each face image. Although the method of generating mirror images is simple, the generated mirror images are beneficial to reflect more possible variations of the original face image in both illumination and pose. As a result, the proposed RME algorithm is robust to the variations of illumination and pose.
The objective function of the proposed RME algorithm is minimized as follows.
where Z 1 and Z 2 are the data matrices consisting of the leftmirror-images and the right-mirror-images corresponding to the original training samples from training set X , respectively. λ ∞ , λ 1 , λ 2 and λ 3 are the balance factors. Equation (12) can further be reformulated as follows.
In order to get the optimal solution to Equation (13), we respectively make the derivatives of the objective function in Equation (13) with respect to W and F, and set them to be zero. The analytic solutions of W and F can be represented as follows.
In our algorithm, we assume that there are n samples for all objects, which includes u labeled samples (called the training samples) and l = n − u unlabeled samples (called the test samples). The proposed RME algorithm is outlined as follows.

IV. EXPERIMENTAL RESULTS
In this section, we carry out extensive experiments to evaluate the performance of the proposed RME algorithm by comparing with other five state-of-art algorithms: MSEC [52], CRC [35], FME [39], SOSI [24], MDRL [25], the method presented in [42], GFHF [43], and NLDLSR [53]. In our experiments, six public face databases including the PIE, YaleB, FERET and GT face databases under large variations of illuminations and poses are used as the benchmark to show the promise of the proposed method.

A. PARAMETERS SELECTION
In our RME algorithm, there are four parameters: the balance factors λ 1 , λ 2 and λ 3 , and the number of nearest neighbors k. The average recognition rates of RME respectively vary with the parameters λ 1 , λ 2 , λ 3 and k in Fig. 1. From Fig. 1, we can find that the value of λ 1 should be more than 10, the value of λ 2 should be between 1 and 10, the value of λ 3 should be less than 4, and the value of k should be between 3 and 5.

B. EXPERIMENTS ON CMU PIE FACE DATABASE
The CMU PIE face database contains 41368 face images from 68 subjects as a whole. The face images were under variations in poses, illuminations, and expressions. In this paper, we fixed the pose and expression, and we got 21 images under different lighting conditions for each subject. The size VOLUME 8, 2020

Algorithm 2 RME Algorithm
Step 1. Generate the left-mirror and right-mirror images for each face image. Let z ∈ R p×q be a given face image, and z 1 and z 2 denote the corresponding left and right mirror face images, respectively. Along the center of the face image, the left-half image z L and right-right image z R are firstly produced from the image z. Then the obtained two half face images are respectively mirrored to generate two other half images z L and z R . The obtained mirror images are represented as.
where z L = z(:, 1 : q/2) and z R = z(:, q/2 + 1 : end). The corresponding left mirror image z L and right mirror image z R can be gotten respectively using the following equations.
Step 2. Set t = 0 and all the elements in the matrix F 0 ∈ R n×c are set to 1.
Step 3. Compute matrix W t using Equation 14.
Step 4. Compute matrix F t using Equation 15.
Step 6. Checking convergence: if , stop the iteration procedure and output the optimal F * = F t and W * = W t . Otherwise, go to Step 3.
Step 8. Use the nearest neighbor classifier for classification. of each image in CMU PIE was 32 × 32 pixels. Fig. 2 shows some sample images of one person.
In the experiment, the first l(l = 1, 2, 3) face images of each object are respectively selected for the training set, and the rest for the test set. In order to truly reflect the performance of each algorithm, we seek the optimal parameters of each algorithm by manual experiments. In the FME algorithm, we set k = 10, λ 1 = 40, and λ 2 = 5. In the RME algorithm, we set k = 10, λ 1 = 40, λ 2 = 5, and λ 3 = 3, respectively. The recognition performance of MSEC, CRC, FME, MDRL, SOSI, and the proposed RME is shown in Table 1.
From Table 1, we can see that the recognition performance of the proposed RME is significantly superior to that of MSEC, CRC MDRL, and SOSI, especially when the number of training samples is 1 or 2. In addition, the recognition rates of both the FME and the proposed RME are 100% irrespective of the variation of the training sample sizes.

C. EXPERIMENTS ON THE EXTENDED YALEB FACE DATABASE
The Extended YaleB database contains images of 38 distinct persons, and each person has his/her frontal images taken under 45 different lighting directions. In our experiments, each image is resized to 48 × 42 pixels. Fig. 3 shows some sample images of one person.
For this experiment, we respectively select the first l(l = 1, 2) images of each object as the training set, and the rest is taken as the test set. In FME algorithm, the optimal neighbor number k is set to 6, and the optimal balance coefficients λ 1 and λ 2 are set to 40 and 5, respectively. The optimal neighbor number k and balance coefficients λ 1 , λ 2 and λ 3 are respectively set to 6, 40, 5, and 3 in the proposed RME algorithm. The recognition rates of the compared methods are shown in Table 2.
It can be seen from Table 2 that the proposed RME achieves the best recognition rate among the compared approaches. Its recognition rate is far higher than MSEC, CRC, FME, MDRL and SOSI. Note that FME and RME can obtain the same recognition performance on CMU PIE face database, and this is because that they are not sensitive to the variations of illumination. However, when the two methods are applied to Extended YaleB face database with large illumination variation, the recognition rates of our method is superior to that of the FME algorithm, which means that the proposed RME is more robust to lighting variations.

D. EXPERIMENTS ON FERET FACE DATABASE
The FERET database [54] contains a total of 13,539 face images of 1,565 subjects. The images vary in size, pose, illumination, facial expression, and age. We selected 1400 images of 200 individuals (each one has 7 images). Each image was resized to 40 × 40 pixels. Fig. 4 illustrates the sample images of one individual.
In this experiment, the first l(l = 1, 2, 3, 4) face images are respectively taken as training set, and the rest as test set. The optimal neighbor number k and balance coefficients λ 1 and λ 2 are respectively set to 10, 100 and 5 for FME algorithm. For the proposed RME algorithm, the optimal neighbor number k and balance coefficients λ 1 , λ 2 and λ 3 are respectively set to 10, 100, 5 and 3. Table 3 tabulates the recognition rates of each method.
Based on the results in Table 3, we can see that the proposed RME algorithm perform better than MSEC, CRC, FME, MDRL, and SOSI algorithms irrespective of the variation of training sample size. With one training sample for each object, the recognition rates of six compared algorithms are 43.67%, 43.97%, 47.50%, 47.12%, 46.53%, and 50.58%, respectively.

E. EXPERIMENTS ON GEORGIA TECH FACE DATABASE
Georgia Tech face database contains images of 50 people taken in two or three sessions, which were built at Georgia Institute of Technology. For each individual in this database, 15 color JPEG images with cluttered background were taken at the resolution of 640×480 pixels. Faces illustrated in these images may be frontal and/or tilted with different expressions, illuminations, and scales. Each image was resized to 60 × 50 pixels. The images are converted to grayscale. Fig. 5 shows a set of sample images of one person.
In this experiment, we respectively select the first one to six images per individual as training set and the rest as test set. In FME algorithm, the optimal neighbor number k and balance coefficients λ 1 , and λ 2 are respectively set to 5, 40 and 5 for FME algorithm. For the proposed RME algorithm, the optimal neighbor number k and balance coefficients λ 1 , λ 2 and λ 3 are respectively set to 5, 40, 5 and 3. Table 4 presents the recognition results of compared methods.
From Table 4, we can see that the recognition rates of the proposed RME are much higher than that of MSEC, CRC,   FME, MDRL, and SOSI algorithms. Especially when there is only one training sample for each object, the proposed RME can obtain better recognition performance than other competitors. The results further indicate that our RME algorithm VOLUME 8, 2020  is not sensitive to the variation of illumination, expression, and pose.

F. EXPERIMENTS ON UMIST FACE DATABASE
The UMIST database consists of a total 575 face images of 20 people. The individuals are a mix of race, sex, and appearance and are photographed in a range of poses from profile to frontal views. The number of different views per subject varies from 19 to 48. The size of each face image is 56 × 46 pixels.
In this experiment, the l(l = 1, 2, 3, 4, 5) face images of each person are respectively used to take as training set and the rest to be test set. All algorithms are repeated for 10 times. In FME algorithm, the optimal neighbor number k and balance coefficients λ 1 , and λ 2 are respectively set to 5, 35 and 6 for FME algorithm. For the proposed RME algorithm, the optimal neighbor number and balance coefficients λ 1 , λ 2 and λ 3 are respectively set to 5, 35,5 and 2. The recognition results are shown as Table 5. It can be known from Table 5 that our LRE achieves the best recognition performance irrespective of the variation of training sample size.

G. EXPERIMENTS ON YALE FACE DATABASE
The Yale face database includes 165 images from 15 objects, and each object has 11 images under various lighting conditions and facial expressions. In this experiment, we manually crop each image and make them to 50 × 40 pixels.
In this experiment, 2, 3, and up to 5 images per individual are respectively selected for training set and the rest images are used for testing images. All algorithms are repeated for 10 times. In FME algorithm, the optimal neighbor number k and balance coefficients λ 1 , and λ 2 are respectively set to 5, 20 and 4 for FME algorithm. For the proposed RME algorithm, the optimal neighbor number k and balance coefficients λ 1 , λ 2 and λ 3 are respectively set to 5, 25,5 and 3. Table 6 shows the recognition results. From Table 6, we can know that our method has better performance than the compared algorithms.

V. CONCLUSION
In this paper, we propose a novel robust manifold embedding algorithm (RMF). The label information and manifold smoothness from the training samples and test samples are both used in the proposed RME algorithm. In order to accurately capture the manifold structure of the data, plenty of samples are needed. However, there are not enough samples in face recognition applications. Based on the symmetry of the face, the left/right mirror images of all face image samples are generated and applied to the proposed RMF. The advantages of the proposed RMF are as follows. Firstly, the label information and manifold structure are fully utilized in our RMF. Secondly, the RMR algorithm is more robust to variations of illuminations and poses. The experiments on six public face databases prove that the proposed RMF algorithm has a good performance.

CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest regarding the publication of this paper.