Marginal Fisher Analysis With Polynomial Matrix Function

Marginal fisher analysis (MFA) is a dimensionality reduction method based on a graph embedding framework. In contrast to traditional linear discriminant analysis (LDA), which requires the data to follow a Gaussian distribution, MFA is suitable for non-Gaussian data, and it has better pattern classification ability. However, MFA has the small-sample-size (SSS) problem. This paper aims to solve the small-sample-size problem while increasing the classification performance of MFA. Based on a matrix function dimensionality reduction framework, the criterion of the MFA method is reconstructed by using the polynomials matrix function transformation, and then a new MFA method is proposed, named PMFA (polynomial marginal fisher analysis). The major contributions of the proposed PMFA method are that it solves the small-sample-size problem of MFA, and it can enlarge the distance between marginal sample points of inter-class, so that it can get better pattern classification performance. Experiments on the public face datasets show that PMFA can get a better classification ability than MFA and its improved methods.

high-dimensional data. 23 Many dimensionality reduction methods have been pro-24 posed. Principal component analysis (PCA) [1] and LDA [2] 25 The associate editor coordinating the review of this manuscript and approving it for publication was Mehul S. Raval . are widely used linear subspace algorithms. As an unsuper-26 vised learning algorithm, the principle of PCA is to maximize 27 the covariance of reduced dimension samples. As a super-28 vised learning algorithm, the principle of LDA is to make 29 different classes as far as possible and the same classes as 30 close as possible after dimensionality reduction. 31 There are many dimensionality reduction methods based 32 on manifold, such as ISOMAP [3], LLE [4], LE [5], MVU [6], 33 t-SNE [7], LPP [8], and NPE [9]. What they have in common 34 is to find a neighborhood in each sample point, and retain 35 the local structure information of the sample points while 36 mapping the high-dimensional data into low-dimensional 37 data. With the emergence of these classical manifold learning 38 algorithms one after another, some researchers have hoped 39 to unify manifold learning algorithms using a framework. 40 VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In mathematics, any function can be approximated by a poly-97 nomial, so we can use the polynomial to uniformly represent 98 the above optional and various forms of functions. Then, this 99 framework is easier to use. 100 This paper aims to solve the SSS problem and improve the 101 classification ability of MFA. So, based on the above idea, 102 i.e., combining the polynomial function and the framework 103 in [21], we proposed a new MFA method, named PMFA 104 (polynomial marginal fisher analysis). Specially, we use two 105 appropriate polynomials to map the scattering matrices of 106 MFA to the new space, which can avoid the SSS problem and 107 get better pattern classification performance. We also discuss 108 the design of the two polynomial functions, and provide a 109 theoretical analysis of the proposed method. Experiments are 110 conducted on synthetic data set and some public face datasets, 111 which show that the proposed PMFA is an effective method. 112 As an effective feature extraction method, like MFA and 113 its variants, PMFA can be applied in many fields, such as 114 face recognition [13], [16], facial expression recognition [22], 115 autism trait classification [23], image representation [24], etc. 116 The remainder of this paper is organized as follows. The 117 Section II summarizes MFA and the matrix function dimen-118 sionality reduction framework. Section III presents polyno-119 mial marginal fisher analysis (PMFA). Section IV verifies 120 PMFA with experiments. Finally, Section V summarizes the 121 study and future directions. MFA first constructs an intrinsic graph G c = {X, W } and 125 penalty graph G p = X, W p , which are used to describe 126 inter-class separability and intra-class compactness, respec-127 tively. Then, MFA tries to find an optimal projection matrix U 128 and makes a projection y i = U T x i , so that the dimension of y i 129 is smaller than that of x i . To clarify the method, Table 1 sum-130 marizes the frequently used notations.

122
The intra-class compactness of the projected sample is 132 defined as: where N k 1 (i) represents an index set of the k 1 nearest neigh-140 bors of the sample x i in the intrinsic graph G c .

141
The projected inter-class data separation is defined as:

148
where N k 2 (c i ) represents an index set of thek 2 nearest pairs 149 for the class c i in the penalty graph G p .

150
The marginal fisher criterion is defined as:

152
Let the matrix Z p = XL p X T , matrix Z = XLX T , and upper 153 Eq. (5) can be solved using the generalized eigenvectors 154 problem: The basic MFA algorithm is given in Algorithm 1.

157
For the above Eq. (6), the rank of matrix Z has an 158 inequality: Compute the eigenvalues and eigenvectors as: Outputthe final linear projection direction as: The definition of the matrix function and the corresponding 167 properties are presented, which is used in the paper.     The criterion function of the manifold-based dimensional-189 ity reduction method is expressed as: Eq. (9) can be reduced the generalized eigenvectors problem: According to section II.C, we can solve the small-sample-225 size problem and improve the classification performance of 226 MFA by selecting two suitable functions to map the matrices 227 Z p and Z to the corresponding matrix functions, that is:

229
After mapping, we get a new criterion function: Theorem 2, the polynomial can approximate any function, 241 that is, any function can be formulated by a polynomial. 242 So, inspired by Theorem 2, we use polynomial functions to 243 implement the above objectives. 244 We choose an n-order polynomial f (x) = n k=0 a k x k to 245 map the matrix Z p , thus the gotten matrix function is f Z p = 246 a 0 I + a 1 Z p + · · · + a n Z n p . Simultaneously, we use a simple 247 linear function g (x) = b + x (b > 0) to map the matrix Z, the 248 gotten matrix function is g (Z) = bI + Z. Thus, the Eq. (13) 249 becomes: It can be reduced the generalized eigenvectors problem: Thus, a new MFA method has been presented. Since the 254 polynomial is used to reconstruct the criterion of MFA, this 255 new method is named polynomial marginal fisher analysis 256 (PMFA). The PMFA algorithm is given below: Compute the eigenvalues and eigenvectors as: Output the final linear projection direction as: In this section, we theoretically discuss why we chose f (x) = 259 a 0 + a 1 x + · · · + a n x n (a k > 0(k = 0, 1, · · · , n)) and g (x) = 260 b + x (b > 0) to map matrices Z p and Z. 261

262
After mapping the matrix Z with the linear function g (x) = 263 b + x (b > 0), the gotten matrix function is g (Z) = bI + Z. 264 Let λ wi be the eigenvalues of the matrix Z. We know that the 265 matrix Z is semidefinite according to Section II. A, and so 266 distance d w in the sample space can be expressed as: Let λ bi and λ wi be the eigenvalues of Z p and Z respectively, 281 the two distances can be written as: 284 For the PMFA method, the matrix Z p is mapped into the Usually, the eigenvalues of the matrix Z p , λ bi , take a larger 298 value, so we have f (λ bi ) = n k=0 a k λ k bi λ bi and g (λ wi ) = 299 b + λ wi ≈ λ wi . Then, we have d b d b and d w ≈ d w .

300
In this way, with the mapping of polynomial, the PMFA 301 method keeps almost the intra-class distance while greatly 302 enlarging the marginal space between the inter-class samples, 303 which is beneficial to pattern classification.

304
To illustrate the main idea of the PMFA method visually, 305 we present a geometric interpretation of PMFA in Fig. 1. For 306 convenience, two class examples are used for illustration. The 307 red circle and the blue circle represent two different classes 308 respectively, the red square and the blue square represent the 309 centers of the two different classes. In Fig. 1, orange lines 310 represent the intra-class distance, and green lines represent 311 the inter-class margin distance. Fig. 1(a) shows the initial 312 samples space. In Fig. 1(b), PMFA uses the polynomial func-313 tion to map the initial samples to the new space where the 314 intra-class distance is almost unchanged, and the inter-class 315 edge distance is enlarged. Fig. 1(c) shows a new space after 316 the projection of the samples. is discussed as follows.
According to this analysis, the classification ability of 379 PMFA should be better than that of EMFA, and much better 380 than that of MFA.  20) and (21), where the first ten largest 391 eigenvalues are used. Table 2 shows the comparisons of the 392 results of the MFA and PMFA methods on four face datasets. 393 As can be seen, compared with MFA, PMFA increases the 394 inter-class distance and maintains the intra-class distance.
In the further experiment, the PMFA method is also com-

418
The synthetic data of Gaussian distribution is a 419 3-dimensional dataset in Fig. 3(a). This three-class set 420 contains 600 points. Each class is generated using a single 421 Gaussian. 422 Fig. 3(b) shows the projection in a 1-D subspace using 423 LDA, Fig. 3(c), (d) is the 2-D projection of MFA and PMFA, 424 respectively. Fig. 3(e) shows the projection using PMFA in 425 a 3-D subspace. As we can see, that both MFA and PMFA 426 provide prediction data with good distinction capabilities. 427 We can also see that, compared with MFA, PMFA makes the 428 intra-class samples more compact, and the marginal distance 429 of the inter-class samples larger.

430
In Fig. 3(f), a binary classification problem shows the 431 classification ability of LDA, MFA, and PMFA in the case 432 of non-Gaussian distribution. Where, the red solid circles and 433 blue solid circles are two different classes of synthetic data, 434 which do not follow the Gaussian distribution. The solid lines 435 represent optimal classification lines and the dotted lines rep-436 resent the optimal projection directions learned from LDA, 437 MFA, and PMFA, respectively. The results show that: (1) in 438 the case of non-Gaussian distribution, LDA does not work 439 well, but MFA and PMFA can still find the best projection 440 directions; (2) the best projection direction learned by PMFA 441 is better than that of MFA, because PMFA not only considers 442 the edge points but also enlarges the distance of the inter-class 443 samples.    Table 3.  corresponding to the optimal subspace dimension is the opti-463 mal recognition rate. Therefore, for the three experiments, 464 there are three optimal recognition rates. Finally, the average 465 FIGURE 5. The sample pictures taken from the datasets in the experiment. The first line is from the ORL face dataset, the second line is from the Yale face dataset, the third line is from the Georgia Tech face dataset, and the last line is from the AR face dataset.    of these methods. 469 We also evaluate the performance of these methods when  there is a recognition rate. When the subspace dimension is 473 between 10 and 100, the recognition rate of each method in 474 each dimension can be got, Figs. 6-9 show how the recogni-475 tion rate varies with the dimension.    Table 8. These because the source code of these methods is not available. 485 The results show that the recognition rate of the PMFA 486 method is better than the latest methods.