F-2D-QPCA: A Quaternion Principal Component Analysis Method for Color Face Recognition

Two-dimensional quaternion principal component analysis (2D-QPCA) is one of the successful dimensionality reduction methods for color face recognition. However, 2D-QPCA is sensitive to outliers. For solving this shortcoming, an efficient robust method(F-2D-QPCA) is presented by means of Frobenius norm(F-norm). The goal of F-2D-QPCA is to find the projection matrix such that the projected data has the maximum variance based on F-norm, and it is more robust to outliers and has higher recognition accuracy than other methods, such as 2D-QPCA, $R_{1}$ -2-DPCA, F-norm 2DPCA and 2D-PCA, etc. Also, we study in detail a quaternion optimization problem, propose a nongreedy iterative algorithm and prove its convergence. Experiments on several color face databases illustrate the superiority of our proposed method.


I. INTRODUCTION
Face recognition has always been a focus in recent years. Principal component analysis (PCA) and its various variants have been successfully used for grayscale face recognition [1]- [7]. Based on the Karhunen-Loeve procedure for the characterization of human faces [1], Turk and Pentland [2] presented the eigenface method for face recognition. Early various PCA methods mainly deal with grayscale images by using a vector to represent a grayscale image. As a result, color information and partial spatial information of images are not fully utilized. In order to make full use of the spatial information of images, Yang et al. [3] proposed a novel two-dimensional principal component analysis (2D-PCA) by using a matrix to represent a grayscale image. All these methods adopt squared Euclidean norm or squared Frobenius norm as the distance metric. However, these norms are very sensitive to outliers. Thus, these methods have great shortcomings when processing data sets with outliers [8], [9].
To improve robustness, 1 -norm is adopted because it can suppress the outliers very well [9], [10]. Ke and Kanade [10] used 1 -norm instead of squared Euclidean norm to construct the reconstruction error, and then presented the The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . robust L1-PCA method by minimizing this error. Based on L1-norm maximization, Kwak [11] found the projection vectors, and presented the PCA-L1 method for image representation. For solving PCA-L1, Nie et al. [12] presented a nongreedy iterative algorithm. Correspondingly, to exploit spatial structure, many 2-D PCA methods based on 1 -norm have been proposed. Inspired by 2D-PCA, Li et al. [13] and Wang et al. [14] extended PCA-L1 to 2-DPCA-L1 and gave the greedy algorithm and the nongreedy algorithm, respectively. Pang et al. [15] extended PCA-L1 to robust tensor analysis with 1 -norm. By applying sparse constraint on 2-DPCA-L1, Wang and Wang [16] developed 2-DPCAL1-S for simultaneously robust and sparse modelling. Recently, Mi et al. presented a novel robust method, called nuclear norm based on PCA (N-PCA) to take full advantage of the structure information of error image [17] and a generalized robust 2-DPCA, which is named as 2-DPCA with l 2,p -norm minimization (l 2,p -2-DPCA), for image representation and recognition [18].
However, it is well known that the image covariance matrix characterizes the geometric structure of images, but these methods based on 1 -norm do not involve the image covariance matrix [19], [22], [23]. Moreover, although 1 -norm can suppress the role of outliers, but we do not know that whether 1 -norm can enhance the role of small distance. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Thus, to partially overcome this shortcoming, based on F-norm, Li et al. [20] proposed a F-norm distance metric based robust 2DPCA (F-norm 2DPCA) for face recognition, and based on R 1 -norm, Ding et al. [22] developed a rotational invariant 1 -norm (R 1 -norm) PCA (R 1 -PCA). Motivated by R 1 -PCA and 2D-PCA, Gao et al. [24] maximized the image covariance with R 1 -norm and proposed R 1 -2-DPCA method for grayscale face recognition. None of the above methods use the color information of the image. In [4], Torres et al. pointed the importance of color information in face recognition and extended traditional PCA to color face recognition by using the R, G, B color channels, respectively. But, this method do not consider the relationship between three channels. In order to overcome this shortcoming, Yang and Liu [5] used a set of color component combination coefficients to convert three color channels into one channel D by D = x 1 R + x 2 G + x 3 B and presented a general discriminant model for color face recognition, but the optimal coefficients x 1 , x 2 and x 3 are difficultly obtained. Xiang et al. [6] used a row vector to denote a color channel and then presented a color image as a 3 × n matrix. Then by utilizing both the spatial and color information, they proposed a color 2D-PCA (C2DPCA) method for color face recognition. However, all these methods do not directly use color information.
To directly deal with three channels of color image, the quaternion with zero real part was used to represent the color pixel consisting of three components [25]- [34]. Based on quaternion matrix theory, Jia et al. [35] presented the color two-dimensional principal component analysis (2D-QPCA) method for color face recognition. With the aid of two-dimensional quaternion matrices rather than one-dimensional quaternion vectors, 2D-QPCA utilizes the color information and the spatial characteristics simultaneously and mathematically. Recently, Xiao and Zhou [44] proposed a novel quaternion ridge regression (QRR) model for two-dimensional QPCA (QRR-2D-QPCA) and mathematically proved that this QRR model is equivalent to the QCM model of 2D-QPCA.
In this paper, inspired by F-norm 2DPCA method and 2D-QPCA method, we propose the F-2D-QPCA method, which maximizes image covariance based on the quaternion F-norm, and obtains the eigenface subspace by a nongreedy algorithm. Compared to most existing 2D-PCA methods and 2D-QPCA method, our method has the following advantages. First, our method treats a color image as a quaternion matrix, which makes full use of the color and spatial information of the image. Second, our method is more robust to outliers because F-norm weakens the role of outliers. Third, our method is based on the image covariance matrix, so it makes good use of the geometric structure of images.
The paper is organized as follows. In Section II, we review quaternion matrices and elaborate the principle of 2D-QPCA method and QSR-2D-QSPCA method for color face recognition. In Section III, we propose a quaternion optimization problem and develop a nongreedy iterative algorithm. And then, we propose a new color two-dimensional quaternion principal component analysis method for color face recognition, which is based F-norm and denoted as F-2D-QPCA method. In Section IV, experiments verify the efficiency of our method. Finally, the conclusion is presented in Section V.

II. PRELIMINARY
In this section, we review the relationship between quaternion matrices and color images, give some properties of quaternion matrices and elaborate the principle of the 2D-QPCA method for color face recongnition.

A. QUATERNION MATICES AND COLOR IMAGES
In 1843, William Rowan Hamilton found the quaternion: where q 1 , q 2 , q 3 , q 4 are real and i, j, k are three imaginary units stasfying The set of all quaternions is denoted by Q. The conjugate of q is defined as q * = q 1 − q 2 i − q 3 j − q 4 k and the modulus |a| is defined as |a| = √ aa * = a 2 1 + a 2 2 + a 2 3 + a 2 4 . If the real part is zero, we call q = ri + gj + bk as the pure quaternion, which can represent a pixel of the RGB color space, where R, G, B stand for the values of Red, Green, Blue components, respectively. So, an m × n color image can be saved as an m × n pure quaternion matrix A = (a ij ) m×n = Ri + Gj + Bk with the nonnegative integer matrix R, G and B.
For A = (a ij ) ∈ Q m×n , a ∈ Q n×1 , we list several quaternion matrix and vector norms which will be used in this paper [41].
(a). Euclidean norm: a 2 = a * 2 = √ a * a, where (·) * denotes conjugate transpose operation; (b). Frobenius norm: where Tr(M ) denotes the trace of M ; (c). 2-norm or spectral norm: The real matrix A R is known as real representation of the quaternion matrix A.
For A, B ∈ Q m×n , C ∈ Q n×s , the following properties are well known [21]. (a).
(c). A is a column unitary matrix if and only if A R is a column orthogonal matrix.
Inspired by [22], we present a new quaternion matrix norm.
Let Re(A) denote the real part of A. After the simple derivation, we can get the following result.
Theorem 1: For A ∈ Q m×n , B ∈ Q n×m , C ∈ Q n×n , we have (1). Re(Tr(AB)) = Re(Tr(BA)), Tr(AB) = Tr(BA); (2). [43]: where m × k projected matrix B is called the projected feature image of image A. A good projection matrix V can be determined by the total scatter of the projected samples. That is, the following function is adopted: where the covariance matrix G V of the projected feature images of the training samples can be denoted by and E(·) denotes the mathematical expectation. The physical significance of maximizing (2) is to find projection directions v 1 , v 2 , · · · , v k such that the total scatter of the resulting projected samples is maximized. Because we can define the color image covariance matrix (QCM) which is an m × m nonnegative definite matrix and can be evaluated directly using the training image samples.
denote the set of the training color image samples, where class index j = 1, 2, · · · , M .
We compute the average imageĀ and the color image covariance (scatter) matrix (QCM) G of training samples bȳ and The aim of 2D-QPCA is to find a set of unitary projection basis vectors v 1 , · · · , v k , whereV = Span(v 1 , · · · , v k ) is often called the eigenface subspace or the projection matrix, such that, when projected ontoV , the projected sample of A s has the maximal scatter.V , which maximizes the trace of the generalized total scatter criterion V * GV , meets this requirement. In other words,V is the solution of the following problem: In the 2D-QPCA method [35], the columns v 1 , · · · , v k of V are the eigenvectors (called eigenfaces) of G corresponding to the first k largest eigenvalues.
be the set of 2D quaternion samples and the set of projected quaternion samples, respectively. Here all quaternion samples are meancentered, i.e., E(X i ) = 0. As mentioned in the previous section, the objective of 2D-QPCA is to find an orthonormal quaternion basis V = (v 1 , · · · , v k ) so that the projected quaternion samples have the largest scatter after projection. What we should note here is that the QCM model for 2D-QPCA in [44] is working in the column direction. That is Y i = V * X i is defined as the projected sample of X i . Under the constraint of least-squares error, maximizing the scatter of projected quaternion samples is equivalent to minimizing the reconstruction error between projected quaternion samples and the input quaternion samples. Hence, the solution of QCM model [44] is equals to the solution of the following problem Based on this observation, the quaternion ridge regression (QRR) model has been proposed. Taking the advantages of sparse regularization, the QSR model for 2D-QSPCA has been further advanced by regularizing the QRR model with the l 1 -norm penalties. Theorem 2 presents the QSR model for 2D-QSPCA.
be a set of 2D quaternion samples and the columns of V s = (v s1 , · · · , v sk ) be the quaternion sparse basis vectors of 2D-QSPCA. V s can be obtained as follows.
In [44], an alternating minimization algorithm was developed to iteratively compute the solution of QSR model in the equivalent complex domain. The procedure is given in Table 2.

III. F-2D-QPCA
In this section we propose a new color two-dimensional quaternion principal component analysis method for color face recognition, which is based F-norm and denoted as F-2D-QPCA method. For this, we need to solve a quaternion optimization problem.

A. A QUATERNION OPTIMIZATION PROBLEM
In this subsection, we propose a quaternion optimization problem and develop a nongreedy iterative algorithm, which has not only a closed-form solution in each iteration but also a good convergence.

Re(Tr(HW ))
= Re(Tr(U ∧ V * W )) = Re(Tr(∧V * WU )) Since Q is column unitary orthogonal, we can obtain that d k=1 σ k Re(Q(k, k)) ≤ d k=1 σ k and the equality holds only and only if Q(k, k) = 1 for all k = 1, 2, · · · , d, that is, Q = I n×d and W = VI n×d U * . Now we consider how to find the optimal solution of (8). where In summary, the optimization problem (8) finally becomes the optimization problem where H is the function of W . To solve this problem, we present a nongreedy iterative algorithm(See Table 3). For QPCA-F algorithm, we have the following results.

Re Tr
According to Cauchy-Schwarz inequality, we have i.e., From (15), (17) and (10), we have Because a monotone bounded sequence must have the limit, we can know that the sequence Due to the imperfect theory of quaternion matrix calculus, we cannot discuss this problem on the quaternion ring. Next, we transform this problem into a problem in the real field through the real representation of the quaternion matrix. Because

Re(Tr(HW
whereĤ ∈ R d×4n is the first d rows of H R andŴ ∈ R 4n×d is the first d columns of W R , the optimization problem (13) finally becomes the optimization problem tells us that the optimization problem (8) is equivalent to the optimization problem From (18), (19) and Theorem 2 of [23], we can obtain the following result.
X iŴ F , whereŴ is a local solution of (8).

B. F-2D-QPCA
In this subsection, based on QPCA-F algorithm proposed in the previous subsection, we propose a new color two-dimensional quaternion principal component analysis method for color face recognition. F-norm is the unitary invariant norm and can retain traditional 2D-QPCA's nice properties such as geometric structure and rotational invariance. Moreover, compared to squared F-norm and R 1 -norm, F-norm weakens the large distance, and compared to squared Frobenius norm, F-norm enhances the role of small distance. Thus, to improve robustness of the 2D-QPCA method and the accuracy of the R 1 -2-DPCA method, we employ F-norm instead of squared Euclidean distance and R 1 -norm as the distance metric in the model (6) and obtain the following objective function: Then, we can get the following the F-2D-QPCA algorithm(See Table 4).

IV. EXPERIMENTS
In this section, we test F-2D-QPCA method by the famous Georgia Tech face(GT) database 1 and the color Face Recognition Technology database (FT), 2 and compare with 2D-QPCA method, R 1 -2D-PCA method, F-norm 2DPCA method, 2D-PCA method and QRR-2D-QPCA method. All experiments in this section are performed on a personal computer with 3.2 GHz Intel Core i5-6500 and 16 GB 2400 MHz DDR4 using MATLAB-R2018b and Quaternion toolbox for Matlab(QTFM 2.6) [45]. It is worth pointing out that Jia et al. [37]- [39] have developed a quaternion calculation toolbox based on the real number field, which has achieved higher accuracy and calculation efficiency.
The GT database are composed of color images of 50 individuals with 15 views per individual, and with no specific order in their viewing direction.
All images in the Georgia Tech face database are manually cropped, and then resized to 44 × 33 pixels. There are 50 persons to be used.  We randomly select 4 images per person and then place salt and pepper noise. The noise is random distribution and accounted for 0.0172 to 0.1550 of the image area (see Figure 1). Thus, we get a new gallery for the experiments, recorded as GT-noise. In this new dataset, The first x face images per individual person are chosen for training and the remaining five face images are used for testing. The number of chosen eigenfaces or projection vectors is recorded as r. Our approach and the aforementioned three approaches are employed to extract low-dimensional representations, respectively. This process is repeated 5 times.
Also, we randomly select 3 images per person, place salt and pepper noise to the entire area, and get a new gallery for the experiments, recorded as GT-noise1.
The FT database contains 14126 color face images of 1199 individuals. The minimal number of face images for one person is 6, and the maximal one is 44. The size of each cropped color face image is 192 × 128 pixels. We choose 219 persons with 10 views per individual as samples. Some samples are shown in Figure 2.
To further validate the robustness of R1-2-DPCA, we randomly choose three images from each class (person) in the GT database and add a 22 × 16 pixels object image (outlier) in the chosen images. Combining the remaining images, we get a new gallery for the experiments, recorded as GT-outlier. Figure 3 shows some images of this new gallery. Also, we randomly choose two images from each class (person) in the FT database and add a 48 × 32 pixels object image (outlier) in the chosen images. Combining the remaining images, we get a new gallery for the experiments, recorded as FT-outlier. In the experiments, 21 images per person, which include 16 noise-free images and five noised images, are randomly chosen for training, and the remaining images are used as probe images. We employ four approaches  to extract low-dimensional representations of images, respectively. We do it five times to evaluate performance of each method. Table 4 shows the average recognition accuracy of each approach on the the GT, GT-noise, GT-outlier, FT, and FT-outlier databases. Figure 4 plots the average classification curve versus the number projection vectors on the GT-outlier database. For these four methods, a large number of experiments on the above five databases show that the best recognition accuracy generally appears in r ≤ 6. Figure 5 plots the average classification curve versus the number projection vectors on the GT-noise1 database. Figure 6 plots the convergence curve of our method on the GT and FT databases, respectively, which shows that our algorithm has good convergence. For the computational complexity, compared with 2D-QPCA, F-2D-QPCA has one more singular value decomposition of quaternion matrix and the product of two quaternion matrices, and compared with F-norm 2DPCA, the computation of F-2D-QPCA is not more than four times higher because F-2D-QPCA processes color images, and F-norm 2DPCA processes grayscale images. VOLUME 8, 2020  From the above experimental results, we can obtain the following conclusions.
1) F-2D-QPCA and 2D-QPCA are superior to and R 1 -2-DPCA and 2D-PCA. Because the first two methods use the color information of the image, the last two methods only deal with the grayscale image.
2) F-2D-QPCA is slightly superior to 2D-QPCA on four databases and better than 2D-QPCA on GT-noise1 and GT-outlier database. This is probably because that 2D-QPCA is sensitive to the small variation due to the illumination, pose and occlusion. It results in unstable representation for face images.
Compared with 2D-QPCA, our approach is slightly more accurate. But in [20], for grayscale images, F-norm 2DPCA is superior to 2D-PCA on the modified Extended Yale B, AR and CMU PIE databases.
At the end of this section, we test QRR-2D-QPCA [44] and F-2D-QPCA on AR face database, 3    per condition). We use a popular subset of AR containing 100 individuals with 26 views per individual. Some samples are shown in Figure 7. We use seven nonoccluded color face images in session one for training and the corresponding seven nonoccluded images in session two for testing. All face images are resized to 32 × 32 pixels.
In our experiments, we compare the performance of F-2D-QPCA with QSR-2D-QSPCA. We refer to [44] and specify the parameter setting in the QSR model. Firstly, λ 2 is set to 0.001. As is shown in Table 2, the sparsity of the basis of QSR model is controlled via the parameter λ 1.j . According to [44], we do not explicitly preassign the value of λ 1,j . Instead, we specify the cardinality (the number of nonzero elements, denoted by card) of the basis. Card is set to 4 and 32 in our test. Obviously, QSR model reduces to the QRR model when card = 32.
The number of the chosen eigenfaces are from 1 to 32. The face recognition rates of two methods are shown in Figure 8. For the three methods, the projected testing samples are all classified based on the nearest-neighbor classifier using F-norm distance. It is obvious that the difference between F-2D-QPCA and QSR-2D-QSPCA is that the methods of computing the eigenface subspace V . Table 6 presents the recognition rates and time of calculating the eigenfaces for cases that the number of features are chosen as 10, 20 and 30. From the results we can see that F-2D-QPCA reaches the highest face recognition rate, and costs less time than QSR and QRR models. The reason is that the QSR model needs to be solved iteratively, which will take a lot of time.
F-2D-QPCA has the higher recognition rate than QRR-2D-QPCA. Moreover, due to the determination of many parameters, the complexity of the latter is much higher than that of the former.

V. CONCLUSION
In this paper, a robust quaternion-matrix-based subspace learning method is presented. F-2D-QPCA employs F-norm as the distance metric to measure image covariance matrix. Compared to F-norm 2DPCA method and most existing 2D-PCA methods, F-2D-QPCA method is more accurate. Also, compared to 2D-QPCA method, F-2D-QPCA method is slightly more robust to outliers. Moreover, our method retains 2D-QPCA's desirable properties such as rotational invariance and geometric structure. To solve F-2D-QPCA, we propose a nongreedy iterative algorithm which has good convergence. Experimental results on several color face image databases show the effectiveness of F-2D-QPCA method.