Robust Ensemble Manifold Projective Non-Negative Matrix Factorization for Image Representation

Projective non-negative matrix factorization (PNMF) as a variant of NMF has received considerable attention. However, the existing PNMF methods can be further improved from two aspects. On the one hand, the square loss function that is intended to measure the reconstruction error is sensitive to noise. On the other hand, it is non-trivial to estimate the intrinsic manifold of the feature space in a principal manner. So current paper is an attempt that has proposed a new method named as robust ensemble manifold projective non-negative matrix factorization (REPNMF) for image representation. Specifically, REPNMF not only assesses the influence of noise by imposing a spare noise matrix for image reconstruction, but it also assumes that the intrinsic manifold exists in a convex hull of certain pre-given manifold candidates. We aim to remove noise from the data and find the optimized combination of candidate manifolds to approximate the intrinsic manifold simultaneously. We develop iterative multiplicative updating rules for the optimization of REPNMF along with its convergence proof. The experimental results on four image datasets verify that REPNMF is superior as compare to other related state-of-the-art methods.


I. INTRODUCTION
Matrix factorization is a highly effective strategy for representation learning. It aims to find two or more low-rank factor matrices for which the product could closely approximate the original high-dimensional data matrix. For one thing, the dimension of decomposed factor matrices is often lower than that of the original data matrix. This provides a compact representation, which is beneficial to many subsequent learning tasks, such as clustering. For another, because the The associate editor coordinating the review of this manuscript and approving it for publication was Mu-Yen Chen . learned latent representation can be represented by the corresponding basis components, we could purposefully regularize the factor matrices for specific assignments. Non-negative matrix factorization (NMF) [1] is one of the most characteristic techniques among matrix factorization. The non-negative constraints may tend to a part-based representation in which a zero-value represents the absence, and a positive-value represents the presence of a component. Therefore, NMF allows a non-subtractive combination of parts to make a whole, which is highly similar to the human perception mechanism. NMF has demonstrated superior performance in image representation [2]- [7]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ One limitation of NMF is that it fails to consider the outof-sample problem [3]. That is to say, NMF is not capable to get the coefficient of any unseen instance. To get rid of the limitations of standard NMF while inheriting all of its advantages, Yuan et al. [3] developed an improved NMF, named projective non-negative matrix factorization (PNMF), which approximates the data matrix by its non-negative subspace projection. Specially, PNMF learns the non-negative basis matrix of the low-rank subspace and reputes its transpose as the projection matrix. Due to the non-negative constraint of the learned projection matrix, PNMF can obtain non-negative coefficients for any new coming instance because the production of a non-negative matrix and non-negative vector is a non-negative vector. Moreover, PNMF has fewer parameters and generates a much sparser factor matrix, and thus, it is beneficial to subsequent learning tasks, such as data clustering.
Both NMF and PNMF attempt to obtain new basis vectors such that the data can be well represented. Data points are typically sampled from a sub-manifold of the ambient Euclidean space [8]- [10]. Therefore, considering the intrinsic manifold structure will be favourable for learning new data representations. To address this issue, graph regularized non-negative matrix factorization [2] and graph regularized projective non-negative matrix factorization [11] were proposed by adding the manifold regularizer on the new data representations derived from NMF and PNMF, respectively. Lu et al. [12] proposed projective robust non-negative matrix factorization (PRNF) to make PNMF more robust. However, in many real-world data analytic problems, data points might be sampled from various distributions; hence, it is crucial to estimate the intrinsic manifold in a principled manner.
To address this problem, we propose a novel method termed Robust Ensemble manifold Projective Non-negative Matrix Factorization (REPNMF) for image representation. On the one hand, a sparse error matrix is introduced into REPNMF to capture the noise such that the factorization can extract more intrinsic information from the scoured data. On the other hand, an assumption is made that intrinsic manifold is embedded in the convex hull of a set of pre-given candidate manifolds [13]. The main purpose is to find the combination of candidate manifolds to approximate the intrinsic manifold. In this way, the local geometrical structure of data can be better preserved because different candidate manifold characters have different structural information of the data. An iterative updating rule is generated that repeatedly updates the projective matrix to optimize the objective function. Furthermore, the convergence proof of updating rule is presented. Experimental results on three face datasets and one science-quality dataset (Chang'e 3 data: Rover Pano-ramic Camera images [14], [15]) have shown the superiority of REPNMF, and it can significantly outperform related algorithms.
The rest of the paper is structured as follows. Section II provides a short review of the related works. In Section III, we introduce the proposed robust ensemble manifold projective non-negative matrix factorization method. In Section IV, we report the experimental results with analysis. The paper is summed up in section 5 with concluding remarks.

II. RELATED WORK
The section II is designed to give overview of related works. This section starts with an explanation of commonly mentioned notations in present paper.

A. COMMON NOTATIONS
In this paper, we use uppercase boldface letters to denote matrices and lowercase boldface letters to denote vectors. M ij indicates the (i, j)-th element of matrix M. The i-th element of a vector a is denoted by a i . In the non-negative data matrix X ∈ R M ×N + , each column vector represents the feature vector of the corresponding item. Throughout the entire paper, Frobenius norm of matrix M is represented as ||M|| F .

B. NON-NEGATIVE MATRIX FACTORIZATION
Assumed a non-negative data matrix X ∈ R M ×N + which includes N data points in M -dimensional space and an existing positive integer K < min(M , N ), NMF aims to obtain two lower-rank non-negative matrices U ∈ R M ×K + and V ∈ R K ×N + the product of which could closely approximate the original high-dimensional data matrix X: Although NMF is jointly non-convex with respect to U and V, it is convex with respect to U and V, respectively. Thus, the optimization problem in Eq. (1) could be optimized by efficient multiplicative update rules [16] as follows: It should be noted that the non-negative coefficient of forecasted data point cannot be figured by NMF, because NMF encounters challenge of out of the sample problem.

C. PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION
To eradicate the insufficiency of NMF to deal with out-ofthe-sample problem, Yuan et al. [3] developed projective non-negative matrix factorization (PNMF). PNMF obtains a non-negative projective matrix and projects the original data matrix into a latent subspace that has lower dimensions. Given the basis matrix U, PNMF regards U T X as the new representation and employs UU T X for reconstruction data matrix X. The objective function of PNMF is formulated as: Since the objective function of J PNMF in Eq. (3) is not convex in W, achieving the global minimum of PNMF is impossible. The local minima of J PNMF is calculated using the following updating rule [3]: It was revealed by Yuan et al. [3] that the local minima can be found by objective function J PNMF according to the updating rule Eq. (4).

D. AUTOMATED GRAPH REGULARIZED PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION
NMF and PNMF fit data in Euclidean space, however the intrinsic geometry of data remains unexplored. To address this issue, Pei et al. [11] presented automated graph regularized projective non-negative matrix factorization. The purpose behind AGPNMF is to reform the regular PNMF that is done by combining automated graph regularization and the PNMF decomposition. The main feature of this reformed method is that AGPNMF instantaneously calculates the graph weights matrix and operates a dimensionality reduction of raw data. The model of AGPNMF is as follow: In above equation τ , α, β are constants. Additionally, multiplicative update rule was designed by Pei et al. [11] proposed to solve AGPNMF: where Pei et al. [11] have demonstrated that the objective function J AGPNMF will obtain the local minima according to the updating rules Eq. (6),(7).

E. ROBUST PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION
NMF, PNMF and AGPNMF are all susceptible to noise and are unfit for feature extraction when the data are contaminated by noise. To increase the robustness of these methods, projective robust non-negative factorization [12] (PRNF) is developed for robust feature extraction. To capture the geometrical structure of raw data, a graph regularization term was introduced into PRNF. Additionally, PRNF introduced a sparsity-inducing norm as sparsity constraints on the noise matrix. RPNF is formulated as follow: where λ 1 ≥ 0 and λ 2 ≥ 0 are regularization parameters, L = D−S and S is the weight matrix of the nearest neighbour graph, D ii = S ij . and E 1/2 is defined as An alternating scheme was designed by Lu et al. [12] to optimize the objective function J RPNF : Lu et al. [12] have demonstrated that the objective function J RPNF will obtain the local minima according to the updating rules Eq. (10), (11).

III. ROBUST ENSEMBLE MANIFOLD PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION
In this section, we develop a novel robust ensemble manifold projective non-negative factorization (REPNMF) algorithm, which takes into account the impact of noise and better captures the geometrical structure of the data simultaneously.

A. THE BASIC OBJECTIVE
PNMF [3] is an extension of NMF, but the squared loss used in PNMF is sensitive to noise data. Typically, there are two ways to deal with it. One involves using noise-robust loss instead of squared loss as in PNMF. The other is removing noise from the data. In this article, we choose the second method. Inspired by Robust Principal Component Analysis (RPCA) [17], the raw data matrix X can be decomposed as a low-rank component and a sparse component. In fact, we can find that the matrix U and U T X are all low-rank matrices. Intuitively, we can design a sparse component matrix E to the objective function such that the low-rank component can lead to a better reconstruction of the underlying structure. Thus, the basic objective function can be formulated as: where E 1 = ij |E ij | presents the l 1 norm, which ensures the sparseness of the matrix.

B. ENSEMBLE MANIFOLD
Recall that PNMF attempts to obtain a set of projective vectors that could be optimized for the linear approximation of the original data. The j-th column of matrix U T X, z j = [(U T X) j1 , · · · , (U T X) jr ] T can be viewed as the novel representation of each instance according to the new basis.
A natural assumption is that if data points x j , x l are adjacent in the original feature space, then z j , z l , the new representation of these two data points in the new basis should also be close to each other. This assumption is known as local invariance assumption, which plays an essential role in various algorithms, including semi-supervised learning algorithms [18] and dimensionality reduction algorithms [8].
Ensemble manifold theory [13] assumes that the intrinsic manifold is located in a convex hull of the pre-defined manifold candidates, each of which indicates one kind of data distribution. Ensemble manifold essentially learns to combine the diverse manifold candidates whose optimal linear combination can approximate the intrinsic manifold. Let L En be the intrinsic manifold, L i be the i-th candidate manifold, and there are q manifold candidates corresponding to various data distribution. In our work, we use the k-nearest neighbour graph to characterize data distribution. Specifically, for data point x j , we seek its k nearest neighbours and generate an edge between x j and its neighbours. Generally speaking, there are many schemes for defining weight matrix S. The three most common methods are as follows: 1) 0-1 Weighting. S jl = 1, if and only if node j and node l are attached by an edge. 2) Hear kernel Weighting. If node j and node l are linked, set 3) Cosine Similarity Weighting. If node j and node l are linked by an edge, set Each candidate graph Laplacian denotes a manifold which can be denoted by Thus, the ensemble learning assumption is equivalent to constraining the search space of candidate graph Laplacians, i.e., Because L En is a linear combination of m candidate graph Laplacians, it is also a graph Laplacian.

C. OBJECTIVE FUNCTION
To model the local geometric structure of the data points, we impose the ensemble manifold regularizer Tr(W T XL En X T W) into a basic objective function. Therefore, the objective function of REPNMF can be written as where the parameter λ controls the contribution of ensemble manifold regularizer, β controls the regularization term E 1 and α controls the regularization term µ 2 . Optimize problem (14) with respect to U while keeping E and µ fixed. 6 Optimize problem (14) with respect to E while keeping U and µ fixed. 7 Optimize problem (14) with respect to µ while keeping U and E fixed In this section, we investigate how to optimize the objective function of J REPNMF in Eq. (14). It is easy to see that J REPNMF is non-convex in U, E and µ jointly. However it is convex in them, respectively. Therefore, it is impossible to obtain the global minimum because no analytical solution can be found. We design an alternating scheme to optimize the objective function. The procedure is depicted in Algorithm 1. For the sake of convenience, we replace L En as L. It can be easily observed that L = L + − L − , L + ij = (|L ij | + L ij )/2 and L − ij = (|L ij | − L ij )/2. We rewrite the objective function as follows:

1) OPTIMIZATION U
Let ψ jl be the Lagrange multiplier for constraints u jl ≥ 0, and denote = [ψ jl ]. Then, we can write the Lagrange The partial derivatives of L with respect to U is According to KKT condition ik U ik = 0, let ∂L ∂U = 0, we can obtain where D1 = EX T U + XE T U, D2 = UU T XX T U + XX T UU T U and D3 = λXL + X T U

2) OPTIMIZATION E
It is easy to see that the optimization problem related to E is element-wise decoupled. In other words, there is more chance that E can be optimized individually. Denote R = X − UU T X = [r ij ]. Then each sub-problem with respect to e ij can be written as: The unique solution of Eq. (19) can be effectively solved by the soft-thresholding operator [19] as follows: By substituting Eq. (20) into Eq. (19), we may obtain It was revealed by the results that REPNMF can adaptively allocate square loss to minor-error entries for accurate reconstruction and 1 loss to major-error entries to reduce the influence of noise.

3) OPTIMIZATION µ
After fixing U and E, the objective function simplified as: This is a classical quadratic programming problem, various convex programming software are very effective designed for solving this problem. Here, this convex optimization problem is solved by CVX 1 , a Matlab-based convex modeling framework for convex optimization.

E. CONVERGENCE ANALYSIS
In the following, we will analyze the convergence of the updating rules in Eq. (20) and Eq. (18). Apparently O ij meets the minimum when e ij is updated as Eq. (20). Because O ij is non-increasing following Eq. (20), we only require to prove that O ij is non-increasing under the updating rule for U in Eq. (18). For iterative updating rules (18), we obtain the following theorem. Theorem 1: The objective function (14) does not increase under the update rules in (18).
Firstly, an auxiliary function is introduced to prove Theorem 1, in which the following definitions and lemmas are required: Let u ab is any element in U and F u ab denote the part of (15) relevant to u ab .
Lemma 2: Function is an auxiliary function for F u ab , which is the only part of (14) relevant to u ab . Proof: Since G(u, u) = F u ab , we only need to prove G(u, u t ab ) ≥ F u ab (u). We compare G(u, u t ab ) with the Talore series expansion of F u ab (u) where F u ab is the second-order derivative with respect to U.
It is easy to see that We find that to prove G(u, u t ab ) ≥ F u ab (u) is equivalent to prove To prove the above inequality, we have Thus, (27) holds and G(u, u t ab ) ≥ F u ab (u). Proof of Theorem 1: Replacing G(u, u t ab ) in (23) by (24), results in the update rule Because (24) is an auxiliary function, F u ab is non-increasing under this updating rule. VOLUME 8, 2020 Give the current solution U , we approximate J REPNMF by its Taylor-series expansion We construct an auxiliary function G(U, U ) of J REPNMF as follows: It is easy to verify that J REPNMF (U ) = G(U , U ). Next, we will prove that J REPNMF (U) ≤ G(U, U ) to complete the proof. For any z ≥ 0, we see that z ≥ 1 + log z. By substituting z = U ik U ik into the above inequality, we have By substituting (34) and (33) into (30), we prove that J REPNMF (U) ≤ G(U, U )).
Assuming that U is the minimum of G(U, U )), we get the following inequalities: The remaining items involve calculating U and verifying its nonnegativity constraint. To this end, we set the gradient of G(U, U )) to zero;, that is to say i.e., Eq. (36) gives Because (37) contains multiplications and divisions of non-negative entries, U is a non-negative matrix.
It is clear that (37) is equivalent to (18), and thus (35) implies that (18) decreases the objective function of REPNMF. It completes the proof.

IV. EXPERIMENT
We use extensive experiments on four image datasets: ORL, YALE, UMIST and Chang'e 3 image database to validate the efficacy of REPNMF.

1) DATASETS AND METRICS
Four datasets are chosen in the experiment. Table 1 summarizes the statistical information of these datasets.  Chang'e 3 is a Chinese robotic lunar exploration mission, started by China National Space Administration (CNSA). This mission is the first Chinese mission that has robotic lander and lunar rover. From past decades China has been successful in most of the space and lunar missions. Prof. KL Yung [20]- [22] has led many space mission successfully and his team was conducted at the Hong Kong Polytechnic University in which they designed the camera system to install it the lunar surface lander for getting images. For current experience 200 images of 10 classes from Change 3 images are randomly selected. All images are cropped to pixel grey-scale images and then reconstructed into a 1024-dimensional vector.
To evaluate the clustering performance, we use two commonly used metrics: Clustering accuracy (ACC) and normalized mutual information (NMI). They are defined as follows: where s i serves as the cluster label of item i and r i is the outcome of clustering of item i. If x is equal to y, then δ(x, y) is equal to 1, otherwise δ(x, y) = 0. map(r i ) is the permutation mapping function that maps r i to the equivalent cluster label in ground truth. H (C) denotes the entropy of cluster set C. MI (C, C † ) is the mutual information between C and C † : . (40) p(c i ) shows the likelihood of the item that is nominated randomly fits in cluster c i among testing items and p(c i , c † j ) is the joint probability of the randomly selected item in both c i and c † j . If the two cluster sets are identical, NMI(C, C † ) = 1. If the two cluster sets are fully independent, NMI(C, C † ) = 0.

2) COMPARED ALGORITHMS
To highlight the effects of the new method over other baseline algorithms and its significance, REPNMF is compared with the following methods: • Traditional kmeans clustering method (Kmeans in short).
• Projective robust non-negative factorization [12]-based clustering (PRNF in short).    Table 2 is the illustration of clustering accuracy that confirms the accuracy of all the procedures operated on four data sets, whereas Table 3 shows the normalized mutual information. The observations are analyzed as follows. Firstly, methods that made use of parted-based representation outperformed Kmeans. These results are consistent to preceding work on NMF. Secondly, PNMF-based methods outperformed NMF methods, which indicated that learning spatially localized, sparse and part-based subspace representation can improve discriminative structures of latent space. Thirdly, AGPNMF and PRNF performed better than PNMF. The reason behind this is aforesaid in Section II, which is that AGPNMF and PRNF consider native manifold structure of data that ultimately enrich the performance. Finally, REPNMF left behind all the baseline methods in almost all cases. This is due to the fact that REPNMF integrates the ensemble manifold into PNMF, which can be better to capture the intrinsic information of the dataset. Furthermore, REPNMF impose 1 norm sparsity constraint on noise matrix which can effectively weaken the noise.

4) PARAMETER STUDY
There are four parameters in REPNMF which are as follow: λ, β, α and the number of nearest neighbours k. λ measures the importance of the ensemble manifold regularized part of REPNMF, β administers the amount of sparsity of the noise matrices, and α is used to control the smoothness of weight vector µ. In this paper the impact of these parameters on REPNMF is investigated by changing one parameter while others parameters remain unchanged. REPNMF was executed 10 times for each set individually along with record of average performance. The outcomes of REPNMF in relation to k are outlined in Fig 1 (a). k tracks the number of nearest neighbors. k must be not too small, otherwise the native structure will not be entirely manipulated. If k is too large, the nearest neighbor graph may attach two samples with dissimilar labels. REPNMF has a tendency to perform better when k ∈ [2,6].
The performance of REPNMF with respect to λ is shown in Fig 1 (b). λ measures the importance of the ensemble manifold regularization. A small λ cannot affect the objective function because the geometrical structure of the data cannot be fully exploited. On the contrary, a large λ might dominate the objective function and lead to a trivial solution. REPNMF shows superior performance when λ = 100 for YALE and λ = 1000 for ORL and UMIST.
The optimization method for REPNMF unties the subproblems for U and E iteratively to obtain the local minimum of (14). We analyze empirical convergence properties on three datasets. Fig 2(a), Fig 2(b) and Fig 2(c) presents the objective function values against the number of iterations computed for ORL, YALE and UMIST, respectively. Results indicated that at the beginning, the objective function value decreased sharply and the performance increased dramatically. The optimization procedure is quickly convergent around dozens of iterations for 3 datasets.

V. CONCLUSIONS
In this work, we proposed robust ensemble manifold projective non-negative matrix factorization (REPNMF), a novel non-negative latent representation learning algorithm for image representation. REPNMF attempts to learn a projective subspace of items by exploiting both cleaned data and ensemble manifold. On the one hand, the 1 norm is added to REPNMF as the constraint of sparsity for noise, because it can efficiently weaken the influence of noise. On the other hand, ensemble manifold is incorporated into REPNMF, which could be able to better model the locally geometrical structure of the data. To optimize the objective function, we provided an alternative method to learn the variable and presented the convergence proof of the optimization scheme. Finally, the experimental results on three face datasets (ORL, YALE, UMIST) and one science-quality dataset (Chang'e 3 data: Rover Panoramic Camera images [14]) have demonstrated that the novel method obtains more competitive performance compared to some alternatives.