Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive Review

Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a baseline, we show how to improve NMF by utilizing the main properties of HSIs (e.g., spectral, spatial, and structural information). We categorize three important development directions including constrained NMF, structured NMF, and generalized NMF. Furthermore, several experiments are conducted to illustrate the effectiveness of associated algorithms. Finally, we conclude the article with possible future directions with the purposes of providing guidelines and inspiration to promote the development of hyperspectral unmixing.


MV
H YPERSPECTRAL images (HSIs) acquired by imaging spectrometers, record hundreds or thousands of spectral bands of the observed scene in a single acquisition [1]- [6].Owing to the wealthy spectral information, HSIs have been applied to many applications including agricultural and military defense [7]- [9], food quality control [10], [11], mineralogical mapping of earth surface [12], and pharmaceutical manufacturing industry [13].Due to the complexity of objects and the relatively low spatial resolution, pixels in HSIs are normally composed of mixed spectral responses from multiple ground objects [4], [14].Mixed pixels affect the performance of hyperspectral analysis, such as object classification and identification.To address this problem, hyperspectral unmixing is developed to decompose each pixel of an HSI into a set of endmembers and their corresponding abundances.
In general, unmixing algorithms can be divided into four categories: geometrical, sparse regression-based, statistical, and deep learning (DL)-based methods.Geometrical unmixing algorithms work under the assumption that the endmembers of an HSI are the vertices of a simplex with the minimum volume enclosing the data set or of a simplex with the maximum volume contained in the convex hull of the data set.Pure pixelbased and minimum volume (MV)-based methods belong to this category.The pure pixel-based algorithms assume that there is one pure pixel at least per endmember.The classical methods include the pixel purity index (PPI) [15], N-FINDR [16], the iterative error analysis (IEA) [17], the vertex component analysis (VCA) [18], and the simplex growing algorithm (SGA) [19], [20].The MV-based approaches seek a mixing matrix that minimizes the volume of the simplex defined by its columns, such as the minimum volume enclosing simplex (MVSA) [21] and the simplex identification via variable splitting and augmented Lagrangian (SISAL) [22].To reduce the influence induced by the intrinsic nonlinearity of the geometric manifold of the HSI and extract the endmembers accurately, a novel nonlinear endmember extraction algorithm [23] was proposed by combining the hypergraph frameworkbased manifold representation and fuzzy assessment.In [24], maximum volume inscribed ellipsoid (MVIE) method was presented to attract endmembers effectively.As one of the mainstream methods, the geometrical unmixing approaches have shown their powerful ability in extracting endmembers from HSIs.However, the geometrical algorithms may hardly extract the endmembers from the highly mixed data since pure spectral signatures are not available.
With the increasing availability of spectral libraries for materials measured on the ground, sparse regression-based methods are proposed by expressing each mixed pixel in a scene as a linear combination of a finite set of pure spectral signatures in a spectral library.Sparse regressionbased algorithms avoid estimating the number of endmembers and identifying the endmember signatures in the original data set [25], [26].Owing to these two advantages, more efforts have been dedicated to improving the sparse unmixing performance.For example, double weights were introduced in [27] to improve the sparsity of fractional abundances in both spectral and spatial domains, where one is used to enhance the sparsity of endmembers in the spectral library, and the other is to encourage the sparsity of fractional abundances.To make full use of the spatial-contextual information, a new spectralspatial weighted sparse unmixing (S 2 WSU) framework [28] was developed for hyperspectral unmixing.Besides, the spatial correlation was incorporated to promote the abundance estimation in [29]- [31].Specifically, a superpixel-based reweighted low-rank and total variation (SUSRLR-TV) [29] method was proposed to enhance the performance of the traditional spatialregularization-based sparse unmixing approaches.By using multiview collaborative sparse and spectral-spatial-weights, the new sparse unmixing model [30] took the advantage of spectral information as well as spatial information.In [31], graph Laplacian regularization was utilized to promote the smoothness of abundance maps in the sparse regression framework.These methods have obtained promising unmixing results.However, the spectra in the library have high coherence and are undesirable due to the diverse imaging conditions, which limit the applicability of these approaches.
The statistical algorithms identify the endmembers and their corresponding abundances at the same time by utilizing the statistical properties of the HSI.Popular statistical algorithms include independent component analysis (ICA) [32], [33], nonnegative matrix factorization (NMF) [34], [35], and Bayesian approaches [36]- [38].Among them, NMF provides a good fit for hyperspectral unmixing owing to its nonnegativity and interpretability.Therefore, numerous NMF-based methods have been developed to pursue better unmixing performance.
In the last few years, DL [39], [40] has shown great power and potential in pattern recognition.Therefore, many researchers have focused on hyperspectral unmixing using autoencoder and its variants, achieving more competitive unmixing performance [41]- [47].In addition, Hong et al. [48] proposed an effective guidance for real endmembers with shared weights in the autoencoder-like architecture.In [49], the HSI was processed as sequential data, and a long shortterm memory (LSTM) network was included in autoencoder architecture to capture spectral correlation information.These approaches can learn the abundance fractions from the original data via a series of the hierarchical layers, which are more suitable for coping with a variety of situations.For the availability in practice, [50] proposed a two-stage fully connected selfsupervised DL network for alleviating some practical issues, such as the noise and perturbation.Nevertheless, there are still several drawbacks [51].For instance, current approaches often require a lot of training samples and network parameters to achieve satisfactory unmixing performance.
Compared with the geometrical methods and sparse regression-based methods, the NMF-based methods are powerful to extract simultaneously the endmembers and their associated abundances.Combining the ability to extract hierarchical features as DL-based approaches, multilayer/deep NMF [52], [53] models have been developed to explore hidden information with interpretability power as in classical NMF.
From the perspective of spectral signatures in HSIs, there are two main challenges.One is spectral variability [54]- [57], which is often brought by many factors such as changes in illumination, environmental, atmospheric, and temporal conditions.It may lead to large amounts of errors in abundance estimation.To address this problem, a hierarchical sparse NMF (HSNMF) [58] introduced hierarchical sparsity constraints for describing endmember variability.Besides, endmember variability was considered by building a 4D endmember tensor along with a new low-rank regularization [59] and relying on structured additively-tuned linear mixing model [60].Another issue is the multiple physical interactions in the resulting observed spectrum [14].This challenge will reduce the generalization ability of unmixing methods based on the linear mixture model (LMM) and increase computational complexity.As such, combined with nonlinear mixture model (NLMM), NMF can also be applied to form some novel nonlinear unmixing methods [61], [62].
In this article, we aim to provide a survey on NMFbased hyperspectral unmixing.We take the NMF model as a baseline to show how to improve NMF by utilizing the main properties of HSIs (e.g., spectral information, spatial information, and structural information).We introduce three important development directions for the NMF model and discuss their pros and cons, including • constrained NMF by introducing additional constraints or penalty terms to the cost function, such as sparsity constraints, smooth constraints, and graph constraints.• structured NMF by modifying the structure of the cost function, e.g., weighted NMF, convex NMF, robust NMF, etc. • generalized NMF by extending the decomposition form, involving nonnegative tensor factorization (NTF), multilayer NMF, deep NMF, etc.
In addition, we conduct several experiments to demonstrate the effectiveness of some associated algorithms.The purpose is to give guidelines and inspiration for the future improvement of hyperspectral unmixing.
The layout of this article is as follows.In Section II, we give a general introduction to spectral mixture model and the unmixing problem.Sections III, IV, V review classical NMF unmixing categories according to different constraints, structures of the cost function, and decomposition forms, respectively.Extensive experiments are conducted and the results are discussed in Section VI.Section VII draws comprehensive conclusions and presents a brief outlook on future possible research directions.

II. CLASSICAL NMF FOR HYPERSPECTRAL UNMIXING
Let X ∈ R r×c×B denote an HSI with B bands, r rows, and c columns.Through the unfolding operation, the HSI can be represented by a matrix with the number of pixels P = r × c.The element at (b, p) denoted by x bp represents the reflection value from b-th band of p-th pixel.From the row perspective, where the b-th vector x b is the ground information in the bth band.Generally, is given from the column perspective, where the p-th vector x p is the spectrum of p-th pixel.The LMM assumes that an observed pixel spectrum in an HSI can be produced by a linear combination of endmember signatures and their corresponding abundances.The matrix formulation of the LMM can be described as where A = [a 1 , a 2 , • • • , a M ] ∈ R B×M denotes endmember matrix, the vector a m represents the m-th endmember signature, and M denotes the number of endmembers.S ∈ R M ×P represents the abundance matrix for all endmembers, and G ∈ R B×P represents the noise matrix.Typically, two constraints are imposed on S, i.e., the abundance nonnegative constraint (ANC) and the abundance sum-to-one constraint (ASC), given by S ≥ 0 and 1 T M S = 1 T P , here 1 M and 1 P are all-one column vectors with size M and size P , respectively, and (•) T denotes the transpose operation.
Given a matrix X, NMF [63] focuses on decomposing it into the product of two nonnegative matrices A and S, i.e., X ≈ AS.Obviously, this decomposition form is consistent with LMM.Thus, NMF is attractive for hyperspectral unmixing.In general, Frobenius norm is utilized to measure the approximation between X and AS, and the cost function is expressed as where the operator • F denotes the Frobenius norm.The multiplicative update rules are deduced as in which and stand for the element-wise multiplication and division, respectively.Meanwhile, when abundance matrix S is updated, the ASC requires to be satisfied by redefining the observation and spectral signature matrices as where δ is a parameter to control the impact of the ASC.
For the implementation of NMF, a crucial issue is how to initialize the related variables.The endmember matrix A can be initialized by the various methods such as random values from 0 to 1, vertex component analysis (VCA) [18], and automatic target generation process (ATGP) [64].Abundance matrix S is initialized according to the fully constrained least squares (FCLS) algorithm [65].To speedup the NMF processing, an adaptive projected NMF (APNMF) algorithm was parallelized by introducing its parallel version in [66].
In addition, by applying Nesterov's optimal gradient method, NeNMF [67] was proposed to accelerate the optimization.Nevertheless, the cost function of NMF is non-convex so that it easily falls into local optimal solutions.As such, to improve the unmixing performance, there are three important improvement directions for the NMF model.A considerable number of NMF-based methods address the spectral and spatial information by introducing additional constraints or penalty terms to the cost function.These methods are reviewed in Section III.As discussed in Section IV, many methods enable flexibility to account for more structures and details such as the difference of pixels, bands, and elements.Moreover, a lot of methods extend the decomposition form to acquire more essential characteristics, e.g., nonlinearities, 3D structure information, hidden information.Such methods are considered in Section V.The framework of the NMF-based methods for hyperspectral unmixing is illustrated in Fig. 1.The specifical unmixing methods are mainly summarized in Table I.

III. CONSTRAINED NMF FOR HYPERSPECTRAL
UNMIXING By exploiting the spectral and spatial information in HSIs, additional constraints have been imposed on the endmembers and abundances to obtain better unmixing performance.The constrained NMF model can be integrated as min where J(A) and J(S) are regularization terms for endmembers and abundances, respectively, and α and β are nonnegative parameters to balance the effect of the corresponding constraint terms.Next, we mainly describe the algorithms that incorporate the constraints on endmember matrix in Subsection III-A.By contrast, numerous works are reported in terms of imposing constraints for abundances, given in Subsection III-B to III-G.More details are presented as follows.

A. Endmember constraints
The constraints for endmembers are integrated into NMF by minimizing simplex volume [68]- [72], compacting endmember distance [73]- [77], keeping signature smoothness [78]- [81], introducing prior spectral information [82], and exploring high-level semantic information [83].Motivated by geometric insights, the minimum volume constraint (MVC) is robust without the pure pixel assumption, which can be presented as where Vol(A) is the volume of the simplex whose vertices correspond to the endmembers A. As a typical one, the first minimum-volume NMF was proposed in [68] by incorporating the MVC into the NMF to effectively extract endmembers from highly mixed data.It should be noted that J(A) is calculated with dimensionality reduction since the matrix determinant is valid only for square one.On this basis, Qu et al. [69] further introduced total variation (TV) and reweighted sparse regularizers to form a multiple-priors ensemble constrained NMF (MPEC-NMF) method.Although the above methods do not require the pure-pixel assumption, it is inconvenient to adopt appropriate dimensionality reduction.Therefore, more algorithms were proposed to define the volume of the simplex.
Boldfaced letters represent that L 1/2 regularizer is employed for the corresponding methods.
endmember matrix A. Under the framework of MDC-NMF, [74] reported an unmixing method along with sparsity constraint and graphics processing units (GPU).Yang et al. [75] introduced particle swarm optimization (PSO) to solve the optimization problem, which possessed good global search ability and convergence.Afterwards, the bilinear mixture model (BMM)-based constrained NMF algorithm (BCNMF) was presented in [76] with the EMD constraint for unsupervised nonlinear spectral unmixing, in which pixels were projected into their approximate linear mixture components based on the characteristics of BMM to reduce the collinearity greatly.Similarly, in [77], an inertia constraint was presented so as to promote the homogeneity of estimated spectra from the same class using the trace of the covariance matrix.It can deal with intra-class variability by extracting a separate set of pure material spectra from each observed pixel spectrum.
To promote the smoothness, different metrics are applied on the matrix A. Specifically, an adaptive potential function from discontinuity adaptive Markov random field (MRF) model was adopted in [78].Spectral dispersion function [79] encouraged the variance of each endmember spectrum to be as low as possible.In [80], an endmember dissimilarity function has been defined to make the estimated endmember signatures to be smooth.Besides, a quadratic weighted norm was used as a regularizer term in [81] to exploit spectral smoothness.By incorporating prior spectral information, Tong et al.
[82] introduced a constraint to minimize the differences between the estimated endmember and the known one.It not only considers the discrepancies between the standard signature and the corresponding one in the image, but also shows flexibility in terms of its extensions.
Recently, in order to effectively learn high-level semantic information, a multiple clustering guided NMF (MCG-NMF) unmixing approach was proposed in [83].Specifically, clustering analysis via the k-means method is conducted on the data matrix X.Then, selecting K pixels in each cluster, according to the principle of k-nearest neighbors.Finally, the R k , k = 1, • • • , K + 1 is constructed.Thus, the regularization term is expressed as where W is a diagonal matrix and depends on the similarity relationship between each pixel and their cluster mean.

B. Sparsity constraints
Sparsity regularizer is extensively exploited during the hyperspectral unmixing procedure since the distribution of each endmember is sparse in general.Inspired by [97], we discuss the works on imposing sparsity regularizer from sparsity, collaborative sparsity, and group sparsity perspectives.As shown in Fig. 2, diverse effects can be promoted for the abundances, where zero coefficients are denoted by white squares.
1) Sparsity Jia et al. [78] proposed two algorithms (called PSnsNMF and PSNMFSC) to explicitly represent sparsity.In particular, PSnsNMF controls the sparsity using the parameter of smoothing matrix in nonsmooth NMF (nsNMF) [163].Thus, it is difficult to balance the smoothness and sparsity.PSNMFSC enforces sparsity by setting both L 1 -norm and L 2 -norm.However, the algorithm needs to assign an exact sparsity level which cannot be known a priori, and sparsity levels of different endmembers also vary from each other.In order to encourage the sparsity of abundance matrix, L-norm is widely used in the field of unmixing.Along this line, L 0 regularizer counts the number of non-zero elements in the abundances to yield the sparsest results, while its optimization belongs to an NPhard problem.Therefore, L q regularizer is attractive in real applications, expressed as where s mp represents the element of the matrix S in the m-th row and p-th column.
• By setting q = 1, L 1 regularizer is a popular alternative for achieving a sparse abundance matrix [25], [164].Nevertheless, it is not compatible with the ASC during the unmixing procedure.• L q (0 < q < 1) regularizer achieves sparser results compared with the L 1 counterpart.Especially, Qian et al. [84] showed that q = 1/2 was an optimal choice and proposed an L 1/2 -NMF algorithm for unmixing the hyperspectral data.Due to the simplicity and effectiveness, the L 1/2 regularizer is widely applied to develop novel unmixing approaches (see the boldfaced methods in Table I).To solve the nonconvex optimization caused by the L 1/2 regularizer, a fast and efficient adaptive half-thresholding algorithm was proposed in [85].Considering that the mixed level of each pixel may be different from each other, a data-guided sparsity was provided to adaptively employ L q (0 < q < 1) constraint [86].In detail, a dataguided map (DgMap) was firstly learnt by measuring the uniformity of neighboring pixels, thus obtaining adaptive value for q. • The problem (15) with q = 2 is often considered to improve the unmixing performance as well.For instance, Huang et al. [87] proposed a data-guided constrained NMF (DGC-NMF) model by imposing sparsity with either L 1/2 -norm or L 2 -norm.Specifically, the sparsity of each pixel is measured first.Then, the L 1/2 regularizer is utilized to constrain the abundances of the pixels with a high sparsity level, while the L 2 regularizer is adopted for generating smooth results with a low sparsity level.Nevertheless, an L q (0 < q < 1) regularizer brings challenges as well since it is non-continuous and nondifferentiable.Accordingly, more efforts are reported to achieve better abundances.Specifically, by adopting higherorder norm, NMF with S-measure constraint (NMF-SMC) was constructed [88].Since NMF-SMC is a gradient based algorithm, the convergence speed may be slow for large scale data set.In [74], Wu et al. defined the sparsity constraint for the abundances using the difference between L 1 -norm and L 2norm.Furthermore, arctan function [81] was introduced to explore the sparse property of the abundance matrix.
In order to pursue sparser representation, He et al. [89] proposed to utilize a weighted sparse regularizer for abundances, expressed as where W is a weight matrix whose element at (m, p) is calculated by w mp = 1/(|s mp | + ε), and ε is a positive parameter.
It encourages the sparsity of the abundance matrix from the column perspective (i.e., spectral domain).Similarly, a double reweighted L 1 -norm regularizer is designed to further exploit the sparsity of abundances in spatial domains [27], [28], [145].Moreover, to ensure the sparsity of the learned coefficients, a local coordinate constraint was imposed to develop a robust nonnegative local coordinate factorization (RNLCF) method along with a correntropy induced metric (CIM) [90], where the weight was defined as w mp = a m −x p 2 .The generalization in terms of an L q -norm was presented in [96].By further accounting for local spatial information, the L 1 -norm was added to each non-overlapping subblock [91].
Based on generalized morphological component analysis (GMCA), each s m (the vector of S in m-th row, m = 1, • • • , M ) can be modeled as a linear combination of multiple morphological components (s mk ) that can be sparsely represented by its associated dictionary basis, i.e., s m ≈ k s mk = k D k ϕ mk [92].A concatenated dictionary D acts as a discriminator between different morphological components, and ϕ mk is the sparse representation coefficient constrained by the L 1 -norm.Furthermore, in order to simplify the solution, the sparse constraint ϕ 1 is transferred as Through GMCA, spatial information can be naturally considered in the unmixing process by exploiting the sparsity and morphological diversity of the abundance maps associated with each endmember.
Unlike imposing a sparsity regularizer on abundances directly, a transform domain based sparse regularizer was proposed in [93], expressed as where W is the curvelet transform basis.Very recently, correntropy-based adaptive sparsity constraint [94] was imposed to abundances for each pixel.Besides, a generalized minimax concave (GMC) sparsity regularizer was embedded into NMF [95], which is nonconvex and nonseparable, avoiding systematic underestimation of high components of sparse vector and producing more accurate sparse approximation.

2) Collaborative sparsity
As shown in Fig. 2(b), collaborative sparsity enforces the row sparsity (joint-sparsity), whose formulation is Since it is challenging to correctly identify the number of endmembers, an overestimation for it was studied in [71] and a collaborative sparsity regularizer is imposed to remove the redundant endmembers.Through this regularizer, the sparsity among the endmembers is achieved simultaneously (collaboratively) for all pixels, i.e., collaborative sparsity of the abundance matrix.In addition, a weighted L 2,1 -norm regularizer was applied to local similar abundances (i.e., blocks) to consider both sparsity and spatial information [165].

3) Group sparsity
As shown in Fig. 2(c), by fully exploring the spatial group structure and sparsity of the HSIs, spatial group sparsity regularizer was proposed in [97], defined as where s p (p = 1, • • • , P ) denotes the vector of S in the p-th column, and G is the number of the spatial groups (i.e., superpixels), which was generated by an improved simple linear iterative clustering (SLIC) algorithm.That is, the abundance matrix is divided into G groups as ) is a pixel-wise confidence index for relaxing the group sparsity constraints of heterogeneous pixels, such as boundaries and small targets.W g is updated iteratively to appropriately control the non-zero abundances of ϑ g .This regularizer is expected to promote the same sparse structure for pixels within a local spatial group.By further combining with nonlocal spatial information, Yang et al. [98] developed NLNMF to address the unmixing problem.Compared to the unmixing methods with smooth constraints, it is more reasonable and effective to utilize this local spatial groups with irregular shapes.Furthermore, group sparsity [166] was investigated in unmixing process that accounts for spectral variability through the use of group two-level mixed norm, i.e., L G,p,q = S G,p,q .More concretely, the fractional LASSO with L G,1,q , 0 < q < 1 aims to simultaneously enforce group and within-group sparsity.
As discussed above, the L q regularizers draw much attention to enforce the sparsity of abundances, especially for L 1 -norm and L 1/2 -norm.Compared with the L 1 -norm, the L 1/2 -norm [84] is excellent to obtain a sparser solution, thus enhancing the unmixing performance.Nevertheless, it requires to be combined with other constraints (e.g., graph constraint [103]) such that the intrinsic structures of HSIs can be considered.In addition, many approaches have been devoted to incorporate the spectral and spatial information into the L 1 -norm.For example, the reweighted L 1 -norm regularizer [89] promotes the sparsity of the abundance matrix from spectral domain, and double reweighted L 1 -norm regularizer [145] aims to further describe the sparsity in spatial domains.In this way, the improvement can be obtained greatly in the unmixing process.However, they are sensitive to noise corruption.Collaborative sparsity [71] is helpful to induce the row sparsity, whereas it is incompetent to explore the spatial information.Thus, Huang et al. [165] applied a weighted L 2,1 -norm regularizer on blocks and imposed TV regularizer.Meanwhile, it may be more reasonable to encourage the collaborative sparsity when the endmember matrix is composed of spectral library or bundles.Moreover, the group sparsity regularizer considers sparsity at the group level by integrating the spatial group structure [97], [98], but how to effectively group the abundances deserves more investigation.

C. Smooth constraints
Neighboring pixels are more likely to be constituted by the same materials.Accordingly, it is significant to investigate the spatial correlation among the neighboring pixels.Smooth constraints are often used to preserve the spatial-contextual information.
An adaptive potential function from discontinuity adaptive MRF model [78] was adopted to promote the piecewise smoothness of abundances, given as where S N is the neighborhood matrix of S.However, PSN-MFSC cannot perform well for many real data sets and has high computational complexities [99].Hence, Liu et al. proposed abundance separation and smoothness constrained NMF (ASSNMF) in [99].They defined the smoothness function as where Sm h is a weight matrix corresponding to S m ∈ R r×c , which is obtained via the reshape operation to the m-th row of S. Compared with PSNMFSC where only adjacent elements are considered, ASSNMF is more effective by exploiting its surrounding elements, and is acquired by a linear transform with a low computational cost.Nevertheless, it is not always true that the smoothness levels of two-pixel pairs are the same even if the spatial distances between them are the same.
To describe spatial correlations, a weighted nonnegative matrix factorization (WNMF) algorithm was presented in [100], and the designed weight is expressed as where w ij is a weight to characterize how much the neighboring pixel x j contributes to the considered pixel x i .This regularizer integrates both spectral and spatial information.
Similarly, an adaptive local neighborhood weight constraint was designed to propose a double abundance characteristics constrained NMF (DAC 2 NMF) [101] along with a separation constraint to prevent over-smoothness.In [96], the weight constraint with the L 2 -norm was generalized to use the L qnorm.
Recently, the abundance maps are assumed to be piecewise smooth.Therefore, the TV regularizer is accounted for capturing the piecewise smoothness structure of each abundance map to enhance the unmixing results.He et al. [89] first embedded the TV regularizer into the NMF framework, expressed as where F denotes the reshape operation from a vector with P pixels to a matrix with the size of c × r and • TV is the anisotropic TV norm.Together with collaborative sparsity and endmember constraint, [72] presented an improved collaborative NMF and TV algorithm (ICoNMF-TV) for the unmixing task.Similarly, multiple-priors ensemble constrained NMF (MPEC-NMF) [69] integrated the MVC, reweighted L 1/2 -norm, and TV regularizers.Besides, [102] extended the piecewise smoothness to the non-local smoothness, developing a non-local TV and log-sum regularized NMF (NLTV-LSRNMF) method.
For the above methods, the neighborhood is determined by a set of pixels involved in a predefined regular shape, such as a cross or square window.Consequently, smooth constraints can be utilized to exploit spatial-contextual information adequately, while ignoring the edges and local spatial details.By comparison, it may be more reasonable to use irregular shapes adaptively.

D. Manifold constraints
The above methods exploit the Euclidean structure of the hyperspectral data space.Considering that the data are more likely to lie on a low-dimensional submanifold embedded in the high-dimensional ambient space [167], the intrinsic manifold structure receives increasing attention in the hyperspectral unmixing field.The graph regularizer can be expressed as where Tr(•) denotes the trace of a matrix, W is an affinity matrix, L = D − W, and D is a diagonal matrix with its (i, i) element calculated by D ii = j W ij .Next, we introduce how to construct the affinity matrix.
1) Spectral similarity In [103], a manifold regularizer was incorporated into the sparsity constrained NMF, presenting graph-regularized L 1/2 -NMF (GLNMF) for hyperspectral unmixing, where the affinity matrix of the nearest neighbor graph W ∈ R N ×N is built to model the local structural information, given as which is known as the heat kernel.Here, x i is one of the knearest neighbors of x j .When x i and x j are similar, W ij is relatively large, and σ denotes the standard deviation.This regularizer was leveraged in [82] along with partially known endmembers for unmixing.Inspired by the graph regularizer based on L 2 -Laplacian, Rathnayake et al. [104] developed an L 1 -norm based graph, called GLR l1 .Besides, a Hessian graph [105] was adopted for hyperspectral unmixing.Compared with Laplacian graph, it is more stable and further captures the relationship of the nonlinear mapping.
2) Spectral-spatial similarity By respectively defining spatial geometric distance and spectral geometric distance of these two pixels in local window, dual Laplacian manifold regularizer [106] was established to exploit the geometric structure of the HSI.In order to encode the manifold structures embedded in the hyperspectral data space, a graph Laplacian is incorporated from spatial distance and feature distance perspectives simultaneously [107], where the weight matrix is obtained by using spectral angle distance (SAD) metric and two conditions are adopted to determine the neighbors: one is the nearest spatial distance via a local window, and the other is the nearest feature distance via the SAD similarities in the local window.In this way, the edges can be preserved by avoiding the graph across dissimilar pixels.Likewise, a bilateral filter regularizer based on graph theory was adopted to utilize the correlation information in both spatial and spectral spaces [108].Moreover, GLrNMF [109] defined a spatial-spectral semantic weight based on the intersection (denoted by QG) of spatial neighbor (denoted by Q) and spectral neighbor (denoted by G).Motivated by hypergraph learning, the spectral-spatial joint structure was modeled by a hypergraph to capture the high-order relation among multiple pixels [110].
3) Region-based similarity Considering that the spectra are similar in the same region while different between regions, Tong et al. [111] presented a region-based structure preserving NMF (R-NMF) to explore consistent data distribution in the same region, aiming at discriminating different data structures across regions.Furthermore, in view of the difference in spatial structure, an HSI was partitioned into homogeneous region and detail region based on a sketch map [112], in which the manifold constraint and the L 1/2 regularizer were employed for the homogeneous region, while the L 1 regularizer for detail region.
4) Other weight learning Different from the above perspectives to establish a graph regularizer, [113] was based on the local linear embedding assumption, where the weight matrix W is learned by minimizing the following equation: to exploit the local geometric structure.In addition, the clustering algorithm is also beneficial to characterizing the structure information of the HSIs.In [114], subspace clustering was applied to capture the latent characteristic structure, for which the weight matrix W is constructed by to form a graph regularizer.Here, H is the subspace structure matrix that is learned by minimizing X − XH 2 F , s.t.diag(H) = 0.It is noteworthy that only the largest k values are remained for each column of H.

E. Low-rank constraints
The high spatial correlation of HSIs can be also translated into the low rank of the involved abundance matrices.
Let G denote the number of non-overlapping subimages from the input HSI.By enforcing simultaneously the local low-rank and sparse structures to the abundance matrix S G , DSPLR-NMF [91] was reported for unmixing, in which, the local low-rank constraint is given as where • * denotes the nuclear norm.In order to naturally incorporate spatial priors, superpixels were generated by employing SLIC algorithm to the HSI and constrained using the low-rank penalty [109].For the learning of subspace structure [118], the low-rank constraint was utilized to construct a selfrepresentation matrix.

F. Abundance separation constraints
Taking the spatial correlation into consideration, the unmixing results can be improved and robust to noise generally.However, the over-smoothing problem may occur since dissimilar pixels are often ignored in a local window.Therefore, an abundance separation constraint [99] was imposed based on the KL divergence to minimize the correlation between different endmembers.Likewise, Liu et al. [101] introduced the separation constraint to preserve the inner diversity of the same type of materials.Moreover, in [115], a dissimilarity regularizer constructed by label information was incorporated into the NMF.
As discussed above, numerous works are devoted to obtaining better spatial structure from different insights, such as sparsity, spatial-contextual information, low-dimensional manifold structure, low-rank structure, and global spatial information.Among these constraints, it has considerable potential to make use of the spatial and spectral information simultaneously for HSIs applications.

G. Other spatial constraints
To maintain the structural information, clustering has contributed to the regularization term in [116], [117].A subspace structure learned from the HSIs was introduced to form a subspace regularizer S − SH 2 F + τ H * , thereby capturing the global correlation of all pixels [118].In view of the similarities between the substances, Yuan et al. [119] introduced a substance dependence constraint.To be more specific, the substance dependence is considered in the whole space to describe the global spatial information, which is reflected by the abundance.In [120], spectral-spatial joint sparse NMF (S 2 -NMF) was proposed by combining the global spatial information and local spectral information simultaneously.Based on the label information, Jia et al. [115] developed a similarity regularizer to compensate the dissimilarity regularizer.

IV. STRUCTURED NMF
A. Weighted NMF NMF-based methods achieve the unmixing by utilizing the statistical properties of HSIs.Thus, it is closely related to the number of samples.However, there is a large difference for the number of pixels concerning different endmembers in many scenarios.Therefore, a cluster-wise weighted NMF (CW-NMF) [121] method was provided to explore the information of imbalanced pixels.In particular, a weight matrix based on the result of clustering is integrated into the NMF, expressed as min where W is the diagonal weight matrix.Then, the effects of the pixels involving imbalanced endmembers can be enhanced by giving larger weight values, while the effects of the pixels that involving majority endmembers are reduced by assigning smaller weight values.Similarly, self-paced NMF (SpNMF) [122] was presented by adopting weighted least-squares losses, given as where h is a self-paced function associated with "model age" parameter γ to learn the weights adaptively.There are many self-paced functions such as binary, linear, logarithmic, and mixture functions.Besides, the Sp-NMF can be also extended for pixel weighting (i.e., min A,S P p=1 w p x p − (AS) p 2 2 + h(γ, w p ) ) and element weighting (i.e., min A,S B,P b,p=1 w bp [x bp − (AS) bp ] 2 + h(γ, w bp ) ).Meanwhile, the weighted NMF is also effective to improve the robustness of NMF.

B. Projection-based NMF
Inspired by the sparse regression-based unmixing methods, a projection-based NMF (PNMF) algorithm was proposed by introducing a spectral library into the NMF framework [123].In detail, related spectra in the library U are projected onto a subspace based on a transformation matrix V to obtain the projected endmembers, i.e., A = UV, the cost function is min In this way, the endmembers are not only adaptively generated from the spectral library, but also related to the HSIs.Meanwhile, the number of endmembers, which is a difficulty in sparse regression, is controlled by the dimension of the subspace.

C. Convex NMF
In NMF, some of the computed endmembers in A may be artificial, which do not belong to any real material in the scene, but exist only in the solution space of the problem.To keep the association between extracted and real endmembers, each spectrum a m was assumed to be nonnegative linear combinations of the observed pixels [124], i.e., A = XΞ.Thus, the cost function is min Along with subspace clustering, Lu et al. [114] also proposed to extract endmembers by linearly combining of all pixels in a spectral subspace, avoiding the generation of artificial endmembers.Under this assumption, the hierarchical sparse NMF (HSNMF) [58] introduced hierarchical sparsity constraints to accommodate endmember variability.
The L 2,1 -norm regularizer proposed by Ding et al. [168] can effectively handle noise and outliers.Based on this regularizer, different robust models are designed to reduce the impact of noise.In [62], robust NMF (rNMF) was proposed by introducing a residual term E ∈ R B×P accounting for outliers (i.e., nonlinear effects), whose cost function is written as where D (X|(AS + E)) measures dissimilarity between X and (AS + E), E 2,1 = P p=1 e p 2 , and e p denotes the p-th vector of E. As such, the problem (29) addresses the nonlinearities and improves the robustness against the noisy pixels.Based on the L 1,2 -norm regularizer, He et al. [126] used specific bands of the HSIs and modeled the sparse noise explicitly for significantly improving the robustness of NMF.The L 2,1 -norm was employed to replace Euclidean distance or KL divergence directly in [113], and the spectral-spatial constrained NMF (SS-NMF) model was developed to cope with non-Gaussian noises or outliers, whose reconstruction error is calculated by In this way, the model is robust to noisy pixels by columns.
To further achieve robustness to band noise by rows, the L 1,2norm was also incorporated to form a spectral-spatial robust NMF model for hyperspectral unmixing [127].
Correntropy is a nonlinear similarity measure, which is based on Gaussian kernel Recently, CIM is employed to replace the least-squares loss to develop some robust models.For example, a correntropy based NMF (CENMF) [128] was proposed to suppress the influence of noisy bands efficiently.Considering the diversity of the noise levels of pixels, correntropy-based spatial-spectral robust sparsity-regularized NMF (CSsRS-NMF) was proposed in [94] by adaptive assigning weights to noisy pixels.Furthermore, robustness can be achieved from an element-wise noise perspective [90], [129].By cutting off the large error via the truncation operation, the truncated Cauchy loss [169] exhibits robustness to outliers.Accordingly, the reconstruction error was measured via the truncated Cauchy function [130], which is expressed as where γ and are the scale parameter and truncated coefficient, respectively.Furthermore, a general loss-based NMF (GLNMF) model [131] was developed by introducing a general robust loss function defined in [170], given as to downplay the large noise.It is a generalization of the L 2 loss (υ → 2), Cauchy loss (υ → 0), and Welsch loss (υ → −∞), etc.Through the L 2,1 -norm, the model can effectively address the pixels with noise or outliers.By contrast, CIM can be applied flexibly to relieve the impact of noise from bands, pixels, and elements insights.Meanwhile, the algorithms with CIM are effective to process non-Gaussian and impulsive noise.The truncated Cauchy loss can suppress the outliers effectively.

A. Linear-quadratic NMF
Combined with the linear-quadratic model, NMF can be extended as X = AS = A a S a + A b S b .Subsequently, the following cost function requires to be minimized: Under this framework, the unmixing task can be addressed from the following two perspectives.
• First, Tang et al. [132] proposed an unmixing method by using the prior knowledge of some signatures in the scene.To be specific, A a and A b represent the matrices of known and unknown endmembers with related abundance fractions S a and S b , respectively.• Then, the second-order scattering of light can be considered to improve the performance [61], [133]- [135].In this case, A a and A b are the endmember matrix and bilinear endmember matrix, while S a and S b are abundance matrix and interaction abundance matrix, respectively.Among them, semi-nonnegative matrix factorization (semi-NMF) was used for the optimization to process an entire image in the matrix form [61], [135].

B. Kernelized NMF
Kernel methods can be introduced for nonlinear hyperspectral unmixing without estimating the nonlinear mixture model.In [136], a constrained kernel NMF (CKNMF) was proposed for dealing with nonlinearities.Through kernel mappings (denoted by φ), the observed matrix X and endmember matrix A are transformed to high-dimensional feature space, obtaining φ( As a result, the data are linearly separable in the feature space.The cost function is given as where φ is a nonlinear function.Generally, the Gaussian kernel ) is utilized to achieve dot product operator in a high-dimensional kernel feature space.Taking into account both the input and feature spaces, a biobjective NMF [137] was formulated by combining the linear and kernel-based models.However, the kernel-based methods suffer from computation burden.For dealing with large-scale and streaming dynamic data, Zhu et al. [138] proposed an online KNMF (OKNMF) framework to control the computational complexity via adopting the stochastic gradient descent (SGD), mini-batch, and averaged SGD (ASGD) strategies.In addition, the KNMF was extended to incremental KNMF (IKNMF) and improved IKNMF (IIKNMF) for desired unmixing accuracy and efficiency [34].

C. Transform-domain based NMF
Due to the dense spectra (typically 10 nm) and overlapped information, HSIs are compressible in a suitable transformed domain.Very recently, wavelet transform has been exploited to express the hyperspectral data compactly [139].Specifically, biorthogonal discrete wavelet transform was employed to represent the hyperspectral data in the wavelet domain, denoted as x pw .Accordingly, the LMM in the wavelet domain can be written as X w = A w S w + N w .Hence, the cost function is On this basis, three prior terms (i.e., volume regularizer, spatial smoothness prior, and sparseness constraint) in the wavelet domain are integrated to better handle the ill-posedness.Similarly, a wavelet-based approach was proposed for estimating abundances in [171].Furthermore, considering that the curvelet is capable of characterizing anisotropic singularity such as curves or edges in the image, Xu et al. [93] adopted fast discrete curvelet transform to impose a sparse regularizer on the transformed domain of abundances, thereby enhancing both sparsity and diversity.

D. Nonnegative tensor factorization (NTF)
A third-order tensor, which is the high-dimensional extension of a matrix, can be used to represent the HSI for preserving the intrinsic structure information.Accordingly, nonnegative tensor factorization (NTF) has been successfully applied to HSIs processing such as denoising [172] and classification [173].In [140], the NTF method was used to the unmixing task by using canonical polyadic decomposition (CPD).However, a limitation is the lack of an explicit link with LMM.In [141], a matrix-vector NTF (MV-NTF) unmixing method was proposed based on block term decomposition (BTD).The MV-NTF factorizes the HSI tensor into a sum of several component tensors as where X ∈ R r×c×B is a third-order HSI tensor, a m is regarded as an endmember, S m is the corresponding abundances represented by the product of two low-rank matrices C m ∈ R r×R and B T m ∈ R R×c , and G denotes the noise term.R is related to the rank of abundance matrix and • denotes the outer product.Apparently, this model constructs a straightforward link between LMM and tensor factorization.The cost function is expressed as Although intrinsic structure information is preserved, the local spatial information is not fully exploited due to the strict rank constraint.Subsequently, under this framework, Xiong et al. further incorporated the similarity graph regularizer [142] and the TV regularizer [143] so as to describe the local spatial information.Likewise, three constraints were embedded into MV-NTF, including sparsity, minimum volume, and nonlinearity in [144].In addition, some unmixing methods were also presented by combining additional constraints [145]- [147].Considering that NMF characterizes more local spatial details through dealing with HSI at the pixel level, MV-NTF and NMF were coupled to make full use of their individual advantages [148].In [149], a low-rank tensor regularization was introduced during the learning process, allowing flexibility to the rank of the estimated abundance tensor.Endmember variability was considered based on the 4D endmember tensor that was constrained by a new low-rank regularization [59], [150].Besides, a nonlocal Tucker decomposition method [151] was provided to exploit the spectral-spatial correlations and the nonlocal self-similarity.

E. Multilayer/Deep extensions
The aforementioned approaches explore the information in a single-layer manner, which do not allow for hierarchical refinement of the obtained endmembers and abundances.In order to extract hierarchical features with hidden information, DL [39] has achieved commendable success in pattern recognition [174]- [176].Hence, Rajabi and Ghassemian [52] unfolded NMF into multilayer architecture (i.e., multiple basis matrices and one abundance matrix), and proposed the multilayer NMF (MLNMF) model for hyperspectral unmixing.As shown in Fig. 3 (a), in the first layer, the matrix X can be factorized into A 1 and S 1 .Then, in the next layer, S 1 is further factorized into A 2 and S 2 .A similar process is continued until the factorization of the L-th layer is completed.Here L represents the maximum number of layers.Accordingly, the observation matrix is decomposed into L + 1 nonnegative factors, i.e., The latent factors can be obtained by minimizing the following cost function: with l = 1, • • • , L, and S 0 = X when l = 1.Then, the endmember and abundance matrices are , respectively.Based on this multilayer architecture, a double-constrained multilayer NMF (DCMLNMF) [152] method was proposed to jointly explore the sparsity and the geometric structure.Besides, Tong et al. [153] developed novel unmixing approaches by further imposing constraints, such as a spectral and spatial total variation regularizer, an adaptive graph regularizer [154], and a homogeneous region regularizer [155].Moreover, the classical MLNMF was restructured and improved by integrating the Hoyer's projector in [35].Different from factorizing S mentioned above, A is decomposed in [156] to form constrained multilayer NMF (CMLNMF) along with MVC and sparsity constraints, shown in Fig. 3 (b).Thus, Here, the endmember and abundance matrices are A = A L and S = S L • • • S 2 S 1 , respectively.Furthermore, multilayer factorization was investigated with fast kernel archetypal analysis (KAA) [157] and kernel NMF [158] for unmixing.However, these models are optimized by only minimizing the cost function of each layer, which fail to reduce the total reconstruction error.Trigeorgis et al. [176] formulated deep Semi-NMF, achieving a significant breakthrough.Motivated by this, a deep NMF structure [53] was proposed to address the unmixing task, whose cost function is where denote the endmembers and abundances, respectively.The proposed model consists of pretraining stage and fine-tuning stage, where the former pretrains all factors layer by layer and the latter is used to reduce the total reconstruction error.Likewise, a sparsity-constrained deep NMF (L 1 -DNMF) was proposed for hyperspectral unmixing [159].Multiview concept learning was incorporated to explicitly model the consistent and complementary information [177].
In addition, an asymmetric encoder-decoder framework was presented in [160] for hyperspectral unmixing, where a multilayer nonlinear network was designed to powerfully encode the original data, and the resulting abundances were then decoded by the decoder part with one layer.The cost function is given by where σ(•) is a nonlinear activation function, abundances , and E is introduced in the decoder to characterize sparse noise.
Besides, various deep NMF models [178] have been proposed in an increasing number of applications.
(1) Deep orthogonal NMF [178]- [180] is a variant of deep NMF by imposing orthogonality constraint to S l , where the decomposition is the same as in [156].
(2) Deep convex NMF [175] was developed by extending convex NMF [181] that is also known as archetypal analysis (AA) or concept factorization (CF), where each basis vector (named concept) is modeled as a linear combination of data points, i.e., a m = N n=1 x n w nm .Fig. 3 (c) shows the factorization process of multilayer concept factorization [182].Accordingly, Different from directly using the output of the previous layer as the input of subsequent layer, Zhang et al. proposed a novel deep self-representative concept factorization network (DSCF-Net) [183] and a deep semisupervised coupled factorization network (DS 2 CF-Net) [184].(3) By introducing a smoothing matrix at each layer, deep nsNMF (dnsNMF) [185] was reported to learn features hierarchically in the context of text mining.Let Z l denote the "smoothing" matrix.The matrix X is factorized into A 1 , Z 1 , and S 1 in the first layer.Then, S 1 is factorized into A 2 , Z 2 , and S 2 in the next layer.The same process will be continued until the L-th layer is reached, shown in Fig. 3 (d).Hence, the observation can be represented using (4) Deep autoencoder-like NMF (DANMF) [186] was proposed for community detection, whose cost function is expressed as Similar to deep autoencoder, DANMF consists of an encoder component and a decoder component.This architecture empowers DANMF to learn the hierarchical mappings between the original network and the final community assignment with implicit low-to-high level hidden attributes of the original network learned in the intermediate layers.
Meanwhile, the hierarchical factorization has applied to tensor, developing some multilayer frameworks of tensor decomposition [187], and deep tensor decompositions [188]- [190].In addition, the deep unfolding technique was used for unrolling the iteration inference algorithm into a layerwise structure to obtain novel neural network-like architectures that enjoy the advantages of well-defined interpretability, strong learning power, and little computational cost [51], [161], [162].
Multilayer/deep extensions of NMF combine both interpretability and the extraction of multiple hierarchical features.Nevertheless, it is also an important and challenging research issue such as how to determine the parameters (e.g., the inner ranks and the number of layers) and loss function, and how to choose efficient optimization algorithms and initial conditions.

VI. EXPERIMENTAL RESULTS AND DISCUSSION
Spectral angle distance (SAD) is utilized to assess unmixing performance quantitatively, given as: where A m and Âm are the m-th original and estimated endmember spectral signatures, respectively.We choose the most popular methods in different categories to conduct experiments, such as L 1/2 -NMF [84], SGSNMF [97] 1 , TV-RSNMF [89], L 1/2 -RNMF [126], MV-NTF-TV [143], MLNMF [52] 2 , and SSRDMF [160].It should be noted that all the results are averaged after ten independent runs.All experiments are conducted under the environment of MAT-LAB R2015b software and computer configuration Intel Core i5 CPU at 2.80 GHz and 8.00 GB RAM.For hyperspectral unmixing, the number of endmembers is a crucial factor, which can be set manually or estimated through an effective method, such as virtual dimensionality [191], [192] and HySime [193].
For illustrative purposes, Figs. 5, 7, and 9 plot the reference signatures (green solid line) along with the endmembers identified by different methods (red dash line) on the Cuprite, Samson, and Jasper Ridge data sets, respectively.It can be seen that the endmember signatures are nearly all around correlated in spectral terms with respect to the reference partners.Meanwhile, Figs. 6, 8, and 10 show the abundance maps estimated by different methods on the Cuprite, Samson, and Jasper Ridge data sets, respectively.Due to the ground-truth abundance maps are unavailable, we compare the obtained abundance maps with the real scene image, i.e., Fig. 4. In the abundance maps, the higher value means the proportion of the material is larger.From Fig. 6, it is obvious that similar abundance maps can be achieved in most cases for each material.From Fig. 8, we can observe all estimation results have a good correlation with the geological maps of the Samson data set.For the Jasper Ridge data set, as shown in Fig. 9 and Fig. 10, it is difficult to  obtain desirable estimations in terms of Road for all methods, which may be because there are few pixels for Road so that VCA fails to extract this signature when initializing.These seven methods have their own benefits.Last, we investigate the average running time of ten times on Jasper Ridge data set for L 1/2 -NMF, SGSNMF, TV-RSNMF, L 1/2 -RNMF, MV-NTF-TV, MLNMF, and SSRDMF, which are 4. 99, 60.81, 25.19, 43.31, 107.50, 19.38, and 136.54 s, respectively.Apparently, the L 1/2 -NMF performs the fastest estimation since it is a single-layer factorization with efficiency sparse regularizer.SGSNMF and TV-RSNMF require more time due to the learning of spatial group structure and piecewise smooth structure, respectively.To handle sparse noise, L 1/2 -RNMF models the sparse noise explicitly, increasing the running time.MLNMF is also fast since it only decomposes the observation matrix iteratively layer by layer.MV-NTF-TV and SSRDMF need much more time mainly owing to the complex factorization.VII.DISCUSSION AND FUTURE DIRECTIONS NMF plays an increasingly significant role in the field of hyperspectral unmixing.In particular, the constrained NMF has the capacity of providing more accurate endmembers and abundances by integrating the spectral constraints and the spatial constraints.The structured NMF enables flexibility to account for more structures and details such as the difference of pixels, bands, and elements.By extending the decomposition form, the generalized NMF exhibits great potential in acquiring more essential characteristics, e.g., nonlinearities, 3D structure information, and hidden information.
Nevertheless, there are still several drawbacks.For instance, the constrained NMF generally requires extensive parameter tunning to achieve satisfactory results.The generalized NMF often suffers from time-consuming.Secondly, the NMF-based methods rely on proper guidance or initialization to generate meaningful endmembers.Besides, it is difficult to thoroughly capture the plentiful information of HSIs.To this end, it is quite challenging to obtain excellent unmixing performance.In the future, how to design NMF methods for unmixing deserves further research.We provide some considerations as follows.
• One important research direction is to make use of the spatial and spectral information simultaneously to guarantee a more reliable unmixing performance.The spectral-spatial joint has shown considerable potentials for hyperspectral unmixing.• Many NMF approaches have been reported to exploit the spectral characteristics, such as the corresponding simplex volume, the endmember distance, and the signature smoothness.However, more efforts are required to describe the endmember variability under the NMF model, e.g., constructing a 4D endmember tensor in [59].Combining insight from spectral variability with a mathematical treatment would be valuable to improve the performance significantly.• Most of the current NMF algorithms rely strongly on LMM to obtain unmixing results.In real scenarios, multipath scattering is common due to complex landforms, resulting in nonlinear spectral mixture effects.As shown in [61], [133]- [135], a nonlinear mixture model is closely related to the LMM, indicating NMF methods that aim to solving nonlinear unmixing problems deserve to be further investigated.• To achieve more reliable performance in practical scenarios, there is growing attention on improving the robustness of the methods.Although many robust NMF methods have been proposed, they mainly focus on the robustness to noise.In addition to relieve the effect of different types of noise, it is also important to investigate the robustness to the selections of the initialization methods and the tunable parameters in various application scenarios.
• The computational complexity is also an aspect that brings difficulties to apply most existing methods (e.g., graph regularized algorithms, NTF, and multilayer/deep approaches).Meanwhile, HSIs are very large in general.
Considering that many applications need real-or near real-time processing, it is crucial to develop fast alternatives to reduce the computation time.
Minimum volume NLMM Nonlinear mixture model NMF Nonnegative matrix factorization NTF Nonnegative tensor factorization PPI Pixel purity index PSO Particle swarm optimization SGA Simplex growing algorithm SLIC Simple linear iterative clustering TV Total variation VCA Vertex component analysis I. INTRODUCTION

3 )
3 http://speclab.cr.usgs.gov/spectral.lib064https://www.usgs.gov/labs/spectroscopy-lab2) Samson Data set: As shown in Fig. 4(b), it contains 95 × 95 pixels, and each pixel has 156 bands ranging from 0.401 − 0.889µm.The number of endmembers is set to 3, including Soil, Tree, and Water.Jasper Ridge Data set: As shown in Fig. 4(c), it contains 100 × 100 pixels, and each pixel has 198 bands (i.e., 4 − 107, 113−153, and 167−219) after removal of noisy bands ranging from 0.38 − 2.5µm.The number of endmembers is set to 4, including Tree, Water, Soil, and Road.The experimental results are summarized in Tables II-IV and plotted in Figs.5-10.Tables II, III, and IV list the SAD values between each reference spectrum and the endmembers extracted by each method on Cuprite, Samson, and Jasper Ridge data sets respectively.Figs. 5, 7, and 9 present the correlation of the endmember signatures obtained by these seven methods and the reference signatures.Figs. 6, 8 and 10 show the visual comparison of abundance maps estimated by all algorithms.
Figs. 6(a), 8(a), and 10(a) show that L 1/2 -NMF can obtain a sparse abundance map because L 1/2 -norm regularizer is an optimal choice for the hyperspectral unmixing.Figs.6(b), 8(b), and 10(b) keep the spatial group structure and the sparsity within a local spatial group through utilizing the spatial group sparsity regularizer.The results in Figs.6(c), 8(c), and 10(c) show that it is effective to utilize the piecewise smoothness.The estimated abundances in Figs.6(d), 8(d), and 10(d) are robust to noise, which is mainly because L 1/2 -RNMF describes the sparse noise explicitly.Figs.6(e), 8(e), and 10(e) preserve the local spatial structure and the global spectral-spatial information.The results listed in Figs.6(f-g), 8(f-g), and 10(f-g) display that the abundance maps estimated by MLNMF and SSRDMF are always in accordance with the ground truth, demonstrating the effectiveness of the multilayer/deep architectures.From the results of experiments, we find that sparse regularizer, spectral-spatial information, multilayer/deep architectures are all beneficial to the hyperspectral unmixing.

TABLE II SAD
(AVERAGE OF 10 RUNS) ALONG WITH STANDARD DEVIATION ON THE AVIRIS CUPRITE DATA SET FOR DIFFERENT METHODS.BOLDFACED NUMBER DENOTES THE BEST RESULT UNDER EACH CONDITION.

TABLE III SAD
SCORES (AVERAGE OF 10 RUNS) ALONG WITH THEIR STANDARD DEVIATION ON THE SAMSON DATA SET FOR DIFFERENT METHODS.BOLDFACED NUMBER DENOTES THE BEST RESULT UNDER EACH CONDITION.

TABLE IV SAD
SCORES (AVERAGE OF 10 RUNS) ALONG WITH THEIR STANDARD DEVIATION ON THE JASPER RIDGE DATA SET FOR DIFFERENT METHODS.BOLDFACED NUMBER DENOTES THE BEST RESULT UNDER EACH CONDITION.