Graph-Based Active Learning for Nearly Blind Hyperspectral Unmixing

Hyperspectral unmixing (HSU) is an effective tool to ascertain the material composition of each pixel in a hyperspectral image with typically hundreds of spectral channels. In this article, we propose two graph-based semisupervised unmixing methods. The first one directly applies graph learning to the unmixing problem, while the second one solves an optimization problem that combines the linear unmixing model and a graph-based regularization term. Following a semisupervised framework, our methods require a very small number of training pixels that can be selected by a graph-based active learning method. We assume to obtain the ground-truth information at these selected pixels, which can be either the exact (EXT) abundance value or the one-hot (OH) pseudo-label. In practice, the latter is much easier to obtain, which can be achieved by minimally involving a human in the loop. Compared with other popular blind unmixing methods, our methods significantly improve performance with minimal supervision. Specifically, the experiments demonstrate that the proposed methods improve the state-of-the-art blind unmixing approaches by 50% or more using only 0.4% of training pixels.

constituent end-members at each pixel, also known as the abundance map.The spectral signature of a pure material is called end-member, which can often be measured under a laboratory setting.Unfortunately, the ground-truth endmember is often unavailable due to its large variability in any real scenario.The blind unmixing process involves the estimation of all the end-members and the abundance map simultaneously.

A. Literature Review of HSU
In our study, we employ a linear mixing model for unmixing, wherein each pixel's spectral measurement is represented as a linear combination of constituent end-members.Given the physical interpretation of hyperspectral mixing, we impose nonnegativity constraints on both end-members and the abundance map.In addition, we apply a sum-toone constraint, a common practice in HSU, signifying that each pixel's abundance vector resides within the probability simplex.There are some extended linear mixing models that consider end-memberwise scaling factors [1] and the spectral variability [2].Note that these nonlinear mixing models [3], [4] rely on more complicated assumptions about how light rays interact with end-members.It is also plausible to remove the sum-to-one constraint when illumination conditions or the topography of the scene changes locally [1].
Specifically for blind HSU, it is natural to apply the nonnegative matrix factorization (NMF) [5], [6] that decomposes the data matrix into a product of two matrices with nonnegative entries (one encodes the end-member matrix, and the other is the abundance map) [7], [8], [9].However, even with the nonnegativity and sum-to-one constraints, blind HSU is a highly ill-posed inverse problem, and hence, a variety of regularizations have been proposed to refine the solution space.One classic method is the ℓ 2 norm in fully constrained least squares unmixing (FCLSU) [10].Furthermore, spatial sparsity of abundances is a reasonable assumption due to the fact that only a few end-members could appear in a single pixel.Some popular sparsity-promoting regularizations used in HSU include the ℓ 0 norm [11], the ℓ 1 norm [12], the ℓ 1/2 norm [13], and the mixed ℓ p,q norm for group sparsity [14].By treating the abundance map for each material as an image, total variation (TV) regularization [15] has been applied to HSU for spatial continuity and edge preservation.TV-related approaches include sparse unmixing via variable splitting augmented Lagrangian and TV (SUnSAL-TV) [16], This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
TV with sparse NMF [17], TV with nonnegative tensor factorization [18], and an improved collaborative NMF with TV (ICoNMF-TV) [19].Recently, TV is reformulated as a quadratic regularization that promotes the minimum volume in the NMF framework, referred to as quadratic minimum volume (QMV) [20].
Graph-based approaches [21] also play an important role in HSU.TV has been extended from vectors in Euclidean space to signals defined on a graph.For example, the graph TV (gTV) [22] is a special case of the p-Dirichlet form [23], [24] in graph signal processing.Some graph regularization techniques for hyperspectral imaging include structured sparse-regularized NMF (SS-NMF) [25] and graph-regularized ℓ 1/2 NMF (GLNMF) [26].Graph-based approaches, while powerful, can suffer from intensive computation, particularly when computing pairwise similarity between pixels.Strategies in speeding up the weight computation include the use of superpixels [27] rather than using the entire hyperspectral image and the Nyström method [28] to generate low-rank approximations of the graph Laplacian [29], [30].Another efficient alternative is the use of sparse weight matrices, such as the K -nearest neighbors (KNNs) weight matrix [31], [32].
In recent years, neural networks have been applied to the blind unmixing problem, such as two-staged self-supervised networks [33], [34], a minimal simplex convolutional neural network [35], a two-stream Siamese deep network [36], and attention networks [37].Furthermore, some semisupervised advanced deep learning methods [38], [39] have integrated the use of the graph Laplacian and exhibited remarkable potential in HSU.None of these methods address the issues discussed in the next paragraph.

B. Motivation and Our Contributions
There are limitations of the existing blind HSU methods in many real-world scenarios.For example, unmixing often requires estimation of the number of end-members [40], while the abundance maps require human experts to convey meaning to each end-member.Meanwhile, acquiring the ground-truth abundance maps or end-member spectra is a great challenge [41], [42], making it extremely difficult to train fully supervised models.Such drawback motivates us to consider a semisupervised model with pseudo-labels.These are representative pixels for each end-member, which can be easily obtained by expert's visual inspection of the data.Prior knowledge of the number of end-members is naturally included in the pseudo-labels.We further adopt an active learning [43], [44], [45] approach to reinforce semisupervised machine learning methods by carefully automating the selection its training set.Active learning has been successfully applied to hyperspectral image classification and segmentation tasks [46], [47], [48], [49], [50].The core of active learning is to sample data points according to an acquisition function [51], [52] that automates the introduction of new training data during the algorithm.We believe active learning and pseudo-labels would be an ideal combination to improve the performance of HSU.
The key problem we solve in this article is to maximize the improvement of our estimated abundance maps and end-member spectra using minimal supervision.With the training pixels selected by active learning, we propose two semisupervised HSU models.We refer to these methods as nearly blind HSU, since both of them require a very small number of training labels, and accept either pseudo-one-hot (OH) labels or ground-truth abundance maps.Our first model, called graph learning unmixing (GLU), takes the output of a graph learning method [21] directly as the abundance maps for the HSU, followed by the estimation of the end-member matrix.Our second model, called graph-regularized semisupervised unmixing (GRSU), combines the linear unmixing model, a graph-based regularization term, and a loss function applied to the label set (obtained by active learning) by solving a joint optimization problem.The flowchart of our proposed models (GLU and GRSU) is illustrated in Fig. 1.We summarize the novelties of this work as follows.
1) We present an effective pipeline (Section III-A) to select labeled pixels by graph-based active learning for the HSU problem.2) We apply the idea of graph Laplace learning to the HSU problem by taking its output (class probability) as the estimated abundance map.Based on this idea, we develop the GLU model (Section III-B).3) We develop a novel semisupervised HSU model, GRSU (Section III-C), by combining graph-based regularization terms with the linear mixing model and a small number of labeled pixels.4) Our proposed methods, GLU and GRSU, bear significant practical implications.By utilizing only a small number of easily obtainable pseudo-labels, our methods markedly improve the HSU performance.The rest of this article is organized as follows.Section II reviews knowledge about graph learning and active learning.Section III introduces our regularized graph Laplace learning together with semisupervised unmixing algorithms.Section IV shows the experimental results that compare our proposed algorithms and other blind HSU methods.Finally, Section V concludes this article.

II. BACKGROUND AND NOTATION FOR GRAPH ACTIVE LEARNING
This section reviews the related works on graph construction, graph learning, and graph-based active learning approaches.These techniques are key ingredients in our proposed semisupervised unmixing models (detailed in Section III).

A. Graph Construction Using KNN Similarity Weight
Given a set of N nodes or vertices X = {x 1 , x 2 , . . ., x N } with x i ∈ R p×1 for i = 1, . . ., N , we construct a graph G = (X , W ), in which the similarity weight matrix W is computed by KNNs.Specifically, we define the angular distance function d(x i , x j ) as the angle between x i and x j , i.e., (1)  A).Two red boxes are our proposed models, GLU (Section III-B) and GRSU (Section III-C).GLU applies graph Laplace learning directly to the unmixing task, while GRSU combines the graph-based regularization term with the linear unmixing model from hyperspectral imaging into a joint optimization to be solved.GLU also serves as the initialization of the GRSU optimization process.The blue boxes are the outputs of GLU and GRSU, i.e., estimated end-members and abundance map.
We only consider edges between x i and its KNNs according to the angular distance d(x i , •), which can be implemented by an approximate nearest neighbor search algorithm [31].For each node x i , we denote {x i k }, k = 1, 2, . . ., K , as the KNNs of x i (including x i itself).We define a weight value W KNN i j between nodes x i and x j as follows: where )/2, so that the weight matrix W is symmetric.Note that W is sparse and nonnegative (i.e., W i j ≥ 0).
Given the weight matrix W , the degree matrix D is a diagonal matrix, whose diagonal entry is d i = N j=1 W i j for i = 1, . . ., N .We further define the graph Laplacian matrix [53], [54] by L = D − W .

B. Graph Laplace Learning
With the graph G = (X , W ) constructed in Section II-A, we review a graph-based approach for semisupervised learning, referred to as graph Laplace learning [55].We suppose ground-truth labels of q classes are available on a subset of vertices X ⊂ X .For every x j ∈ X , its label is denoted by y † j ∈ {1, 2, . . ., q}, and its corresponding OH vector is defined as y , where e k is the kth standard basis vector with all zeros except a 1 at the ith entry.The goal for graph learning is to predict the labels of the unlabeled set X \ X .
The classification of nodes is based on a node function u : X → R q .Then, the predicted label of x i ∈ X is The graph Laplace learning model [55] obtains the optimal node function u * by minimizing the following objective function: The first term of J ℓ (u; y † ) in (3) quantifies the smoothness on the node function in the sense that u(x i ) is closer to u(x j ) for a larger weight W i j between the two nodes.The second term of J ℓ (u; y † ) uses the loss function ℓ to measure the difference between the prediction u( x j ) and the ground truth y † j for x j ∈ X .We follow [55] to choose the hard-constraint penalty as the loss function, that is: The penalty function ℓ h forces the minimizer to take u * ( x j ) = y † j for x j ∈ X .The node function u can be stored by an N × q matrix U whose ith row represents the output of u at node i.We can rewrite the first term in (3) as follows: 1 where ⟨•, •⟩ F is the Frobenius inner product for matrices.The matrix form (5) makes it easier to find the minimizer of the objective function J ℓ (u; y † ).

C. Graph-Based Active Learning
The performance of the graph learning method depends on the selection of the labeled node set X .The classification accuracy varies over different selections under the same number of label nodes.A graph-based active learning approach is an iterative process developed to boost classification accuracy by strategically selecting a series of nodes on a graph G = (X , W ) to label.Specifically, one initializes a label set X 0 L , chosen randomly in practice.At the ith iteration with the current label set X i−1 , one classifies the remaining unlabeled nodes X \ X i−1 via a certain graph learning scheme.Using the classification result on the set X \ X i−1 , one calculates an acquisition function A : The sequential active learning selects the query set Q to be a single node with the largest acquisition function value in X \ X , i.e., Q = {x k * }, xk * = arg max x∈X \ X A(x).The batch active learning selects a query set of multiple unlabeled nodes in graph G.In this article, we adopt the LocalMax [56] batch active learning approach, as it performs similar to sequential active learning but is more efficient, proportionally to the batch size.It is originally developed for the classification task of the synthetic aperture radar (SAR) images [56] and is further proved successful in the image segmentation task [50].Specifically, in each iteration of the active learning process, the batch active learning selects the query set Q LM with the size B > 1, containing the unlabeled nodes that satisfy the local-max condition on the graph G.
This article uses graph Laplace learning [55] as the underlying graph-based classifier in the active learning process, together with two specific acquisition functions: the variance optimality (VOpt) acquisition function [51] and the model-change VOpt (MCVOpt) acquisition function [45].Details about the formulas of these two acquisition functions are provided in the Appendix.

III. SEMISUPERVISED HSU
This section details our semisupervised HSU methods based on the linear mixing model.Specifically, given a hyperspectral data cube I of the dimension m × n × p with p spectral channels, we reshape We assume a linear mixing model that generates the data, i.e., where S ∈ p×q is the end-member spectrum matrix, A = [a 1 , a 2 , . . ., a N ] ∈ q×N is the matrix of abundance maps of q materials, and the matrix E ∈ R p×N denotes a noise term.
In Section III-A, we describe the training data selection process by adapting the graph-based active (Section II-C) to the hyperspectral setting.Then, in Section III-B, we introduce the GLU model, which applies graph Laplace learning (Section II-B) directly to the HSU problem.Finally, in Section III-C, we propose our GRSU model that combines graph-based regularization terms with the linear mixing model (6).

A. Training Data Selection
We use the notations X and X to denote a data matrix and a set of nodes, respectively.Given X , we can construct a graph G = (X , W ) according to Section II-A.Then, we apply the graph-based active learning process (Section II-C) to select a set of pixels to acquire labels for both of the proposed models, GLU (Section III-B) and GRSU (Section III-C).
Given two positive integers m < M, we begin the active learning process with an initial label set of m random pixels.In each iteration, we apply graph Laplace learning (Section II-B) with the current label set and calculate the acquisition function on the remaining pixels.Based on the values obtained by the acquisition function, we select a query set to be augmented to the label set and terminate this iterative process when the size of the current label set reaches M. Algorithm 1 summarizes the active learning process, and its outputs, the label dataset X = { x 1 , x 2 , . . ., x M } and the set of corresponding labels A = { y 1 , y 2 , . . ., y M }, serve as the training data for our semisupervised unmixing framework.
We want to clarify three aspects of the outputs of the active learning (Algorithm 1).First, one example of acquiring labels in Algorithm 1 is a human-in-the-loop process.Second, the ground-truth "label" y i ∈ R q×1 for the "labeled" pixel x i ∈ R p×1 can be either the ground-truth abundance vector or its OH pseudo-label, the latter of which can be determined by the experts for identifying the most significant end-member.According to the experimental results in Section IV, requiring the ground-truth abundance for active learning is not necessary.Third, we adopt the matrix forms of X = [ x 1 , x 2 , . . ., x M ] ∈ R p×M and A = [ y 1 , y 2 , . . ., y M ] ∈ R q×M of the output sets X and A, respectively.
Given the training matrix X , we estimate the abundance map A for the entire data matrix X .It is true that there is an overlap between X and X , but we cannot completely trust the training labels, especially those obtained by using the OH pseudolabels.As a result, we update the abundance map for all the pixels even though a subset of them are selected to acquire some sort of ground-truth information.Another rationale to have two separate matrices X and X is the option to select the training pixels from one image and perform semisupervised unmixing on the other image, which falls out of the scope of this article.

B. Graph Learning Unmixing
Following the training data selection process, we obtain a training data matrix X ∈ R p×M consisting of M labeled pixels.We concatenate X with the original data matrix X and construct a graph G based on the combined data matrix X = [ X , X ] with the corresponding graph Laplacian L according to Section II-A.The graph Laplace learning produces the class probability, which can be regarded as the abundance map.Specifically, we adopt (5) to estimate the abundance map by projecting the graph Laplace learning solution A GL onto q×N , i.e., Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where L ll ∈ R M×M and L uu ∈ R N ×N are the parts of the labeled pixels X and the unlabeled pixels X , respectively, and represents the cross interaction of graph Laplacian between X and X .Then, the minimization problem (7) has a closed-form solution which can be solved by the preconditioned conjugate gradient method because of the symmetric and semipositive definite properties of the part L uu .Equation ( 8) is to project the output A GL ∈ R q×N of the graph Laplace learning onto the set q×N by a projection operator P q×N , defined by This projection operator can be implemented by a fast algorithm [57].
Given A GLU , we can then find the optimal end-member matrix S GLU that minimizes the combination of the least-squares errors of the linear mixing model (6) and the misfit of the training data, i.e., with a weighting parameter α > 0. Equation ( 12) has a closedform solution S GLU = max 0, S 0 GLU .
Equation ( 14) means to take the entrywise maximum of S 0 GLU and 0, i.e., replace each negative entry in S 0 GLU by 0. Algorithm 2 presents the pseudo-code of our GLU method, which only involves three steps to find A GLU and S GLU (no iteration is needed).

C. Graph-Regularized SemiSupervised Unmixing
By assuming the linear mixing model (6), it is standard to solve the blind unmixing problem in a regularized least square

2, Projection:
A GLU = P q×N (A GL ). 3, Estimate the endmember spectrum matrix: with a positive weighting parameter λ.The term (1/2)∥X −S A∥ 2 F is a least-squares misfit between the matrix product S A and the data measurement X , while J (A) is a regularization term of the abundance matrix A. We impose the nonnegative constraints of S and A as well as a sum-to-one constraint of A.
Following the graph Dirichlet energy [58], [59], we consider a graph Laplacian regularization, formulated by: where the weight function w : R p × R p → R is defined by with the angular distance d(x i , x j ) between node x i and x j , and σ i and σ j defined in (1) and (2), respectively.
In addition, we further develop a semisupervised graph regularization term J 2 (A, A, X ) that includes the label information X and A, that is, We consider the sum of both terms J 1 (A) and J 2 (A; A, X ) as the regularization J (A).As J 1 and J 2 have the same form that Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
leads to the same scale, we assign the equal weight of them when formulating J , i.e., J (A) = J 1 (A) + J 2 (A; A, X ).Putting ( 15)-( 17) together with the label information ( X , A) obtained by the active learning approach, we propose a GRSU model to simultaneously estimate the abundance map A GRSU and the end-member matrix S GRSU , that is, with two positive parameters α and λ.We define the indicator function to rewrite the minimization problem ( 18) into an unconstrained formulation as follows: min We apply the alternating direction method of multipliers (ADMM) [60] to solve the unconstrained problem (20).In particular, we introduce two auxiliary variables T ∈ R p×q and B ∈ R q×N to express problem (20) equivalently as follows: The augmented Lagrangian of ( 21) is written as follows: with dual variables B and T and two positive constants ρ and γ.One benefit of ADMM is that it turns the joint minimization problem (21) into four subproblems that are associated A, B, S, and T separately.In each iteration, we iterate as follows: T ← arg min The A, S, and T subproblems are the same as in the blind unmixing paper [29]; thus, the details are omitted.
As for the B subproblem, we write it explicitly by using Note that we add the term (1/4) M i, j=1 ∥ y i − y j ∥ 2 2 w( x i , x j ) in ( 23) that does not affect the minimization over B, but rather turns the B subproblem (23) into a regularized graph Laplacian learning problem (see Section II-B).Specifically, we consider the graph G built on the combined data matrix X = [ X , X ] with the graph Laplacian matrix L.Then, ( 23) is equivalent to Using the block representation of L in ( 9), we have a closed-form solution to (24) as follows: where I N is an N × N identity matrix.The semisupervised unmixing (GRSU) method is summarized in Algorithm 3. Its initial values of A 0 and S 0 are obtained by GLU (Section III-B) that outputs A GLU and S GLU .

IV. EXPERIMENTS AND RESULTS
In this section, we conduct extensive experiments to demonstrate the performance of the proposed unmixing models.All codes of our proposed methods and following experiments are available on our GitHub repository. 1 Specifically, in Section IV-A, we compare our semisupervised methods (GLU and GRSU) with the state-of-the-art (unsupervised) blind unmixing methods, followed by a discussion of our semisupervised unmixing methods with respect to different numbers of training pixels in Section IV-B.In Section IV-C, we test the robustness of various methods by adding different amounts of Gaussian white noise to the HSI.We test on four standard hyperspectral image datasets, described as follows.
1) Jasper Ridge: The Jasper Ridge dataset [61] is a hyperspectral image of the size 100 × 100 with 198 channels.Originally, it had 224 hyperspectral channels spanning from 380 to 2500 nm, and 26 channels are removed as a preprocessing step due to dense water vapor and atmospheric effects.Four end-members are latent: Tree, Water, Dirt, and Road.
CONSTRUCT: a graph G = on X = [ X , X ] (by Section II-A) with the graph Laplacian L and its block form of (9) INITIALIZE: S 0 = S GLU and A 0 = A GLU (by GLU Algorithm 2); B 0 = A 0 , B0 = 0, T = 0, Err = 1, and i = 0. WHILE i < I max ANDErr > ϵ, DO: 2) Samson: The Samson dataset [61] is of the size 95 × 95× with 156 channels that span from 401 to 889 nm.Three end-members are latent: Soil, Tree, and Water.Note that a different ground truth is considered in [35], while we use the original ground-truth information.3) Urban4: The Urban dataset [61] is of the size 307 × 307 with 162 channels.Each pixel of this image corresponds to a 2-× 2-m2 area.The original 221 channels span from 400 to 2500 nm.There are three versions of the ground truth, which contain four, five, and six end-members.Here, we use the version of four end-members, labeled as Asphalt, Grass, Tree, and Roof.4) Apex: The Apex dataset 2 [62] is a hyperspectral image of the size 111 × 122 with 285 bands spanning from 413 to 2420 nm.Four end-members are latent in this data: Road, Tree, Roof, and Water.We apply two metrics, root-mean-square error (RMSE) and spectral angle distance (SAD), to evaluate the quality of the abundance matrix A and the end-member spectrum matrix S, respectively.RMSE and SAD are defined as follows: where A and S are the fitted matrices, A gt and S gt are the ground truth, and s i denotes the ith column of the matrix S.
Table I shows the computation times of various methods applied to each dataset.Our semisupervised method GLU is much faster than the neural network methods approaches (MSC and EGU) and is comparable to traditional unsupervised methods (GLNMF, QMV, and GTVMBO) in terms of computation time.As the GRSU method requires solving the graph Laplace learning problem in each iteration, it does require more computation times compared with the regularizationbased methods, while it is still faster than neural network methods.
For labels used in our semisupervised framework, we consider the exact (EXT) abundance map and the OH pseudolabel.The latter (OH) can be obtained by thresholding the EXT abundance map or an expert identifying the most significant end-member, which is more practical than the former (EXT) in real circumstances.
For each experiment, the active learning process starts with only one random pixel per material as an initial label set and terminates until around 0.4% of the total pixels are sampled according to the active learning algorithm (Algorithm 1).We fix ϵ = 10 −3 , I max = 1000, and K = 50 in the KNN graph construction (Section II-A).Note that K is chosen to be relatively small for computational benefits, while ensuring the connectivity of the constructed graph G, which is required in the calculation of the acquisition function (28).We select the optimal combination of the parameters α, λ, γ, and ρ in the range of α ∈ {10, 20, 50, 100}, λ ∈ {10 i , 5 × 10 i |i = 0, 1, 2, 3}, and γ = ρ ∈ {10 i |i = −2, −1, 0, 1, 2} that yields the smallest sum of RMSE and SAD using 10% of randomly selected pixels as a validation set.We list the parameter choices, the amount of training data, and the acquisition function (Acq Fun) in the active learning approach for each dataset in Table II.
Tables III and IV report the results of RMSE(A, A gt ) values and SAD(S, S gt ) values, respectively.For our methods, the "-OH" suffix refers to training on the one-hot pseudolabels, while the "-EXT" suffix refers to training on the exact abundance maps.In both tables, we highlight the best results in our four semisupervised methods and the other five unsupervised methods, separately.By comparing the bold results in each row, the best of our methods achieves nearly 50% improvements over the competing unsupervised methods in most cases.There are only a few exceptions.For example, EGU has an outstanding performance of the SAD on the Apex dataset, which is slightly better than ours.
In addition, we observe that the winner of the five supervised methods scatters over Tables III and IV.For example, QMV and GTVMBO perform the best on Jasper Ridge and Urban4, respectively, for both abundance and end-member estimations.For the Samson dataset, GTVMBO attains the best RMSE, while EGU has the best SAD.For Apex, MSC has the best RMSE, while EGU achieves the best SAD.Our methods, on the other hand, yield consistent performance in that GRSU is generally better than GLU.One exception is the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV
SAD BETWEEN THE ESTIMATED SPECTRUM MATRIX AND THE GROUND TRUTH Fig. 2. Abundance maps estimated by different methods on the Jasper Ridge dataset.Each row of the plot matrix corresponds to an end-member of the dataset, including Tree, Water, Dirt, and Road.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.The last two columns are the ground truth and the label pixels selected by active learning (red dots) for our GLU and GRSU methods.Each red dot corresponds to a labeled pixel, which is enlarged for visual illustration.Notice that the Road abundance map contains fine structures that are easily smeared out by other methods, while active learning can successfully identify pixels with high abundance values for Road to acquire labels.Urban4 dataset, in which GRSU-EXT is slightly worse than GLU-EXT.Since GLU is the initialization of GRSU, we can see improvements of GRSU after the ADMM iterations, which are particularly significant on the SAD values of the Jasper Ridge, Samson, and Apex datasets.Furthermore, training on OH pseudo-labels sometimes has a better performance than training on the EXT abundance map, which implies that it is not necessary to require the EXT abundance maps for the training process of our approaches.
We arrange the estimated abundance maps and end-members in pairs for Jasper Ridge, Samson, Urban4, and Apex datasets, sequentially, showing in Figs.2-9.Note that we normalize each end-member (column) in the spectrum matrix S to have the unit norm, i.e., ∥s i ∥ 2 = ∥s The labeled pixels that are selected by active learning are indicated in red dots on the ground-truth abundance map.
The active learning approach can identify the distribution of each end-member by selecting a few representatives.Take the Road in the Jasper Ridge dataset for an example.As illustrated in the last row of Fig. 2, the abundance map for the Road contains fine structures that are easily smeared out by other methods, while active learning can successfully identify pixels with high abundance values of this end-member to acquire labels.With those sampled labeled pixels, the estimated abundance maps' quality increases significantly compared with unsupervised methods.Another evident example is the end-member Roof in the Urban dataset (the last row of Fig. 6).Both GLU-OH and GRSU-OH well preserve the contrast of the rectangular rooftop in the Roof abundance.
In conclusion, our methods demonstrate a significant improvement of approximately 50% in unmixing performance as measured by the RMSE of the abundance maps and   the SAD of the mixing matrices, requiring only a minimum amount of supervision (e.g., 0.4% labeled pixels).It is important to note that EXT abundance maps are not needed during the training process; instead, we leverage the practical and readily obtainable OH labels.Furthermore, while our GRSU methods may exhibit slower computation times due to the iterative nature of our optimization approach, they still maintain a speed advantage over neural network methods, such as MSC and EGU.This not only underlines the efficiency of our methods but also represents a competitive balance between computational speed and unmixing performance.

B. Discussion on the Number of Training Pixels
We discuss the influence of the number of labels sampled by active learning on our semisupervised methods, GLU-OH, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.GLU-EXT, GRSU-OH, and GRSU-EXT.The active learning process starts with one random pixel per material as an initial label set, followed by Algorithm 1 (the active learning algorithm), until 5% of the total pixels are reached.We conduct experiments only on the Jasper Ridge and the Apex datasets for demonstration purposes.The corresponding parameters Fig. 9. End-member matrix estimated by different methods (in orange) on the Apex dataset with the ground truth (in blue).All the end-member vectors are normalized to have the unit norm.Each row of the plot matrix corresponds to an end-member of the dataset, including Road, Tree, Roof, and Water.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.Fig. 10.RMSE and SAD curves with respect to the number of labeled pixels for the Jasper Ridge and the Apex datasets.For each plot, the x-axis on the top shows the percentage of labeled pixels, while the bottom one is the number of labeled pixels.In the active learning process, we apply the MCVOpt and VOpt acquisition functions for the Jasper Ridge and Apex datasets, respectively.Each curve starts with only one random pixel per end-member and sample up to 5% of labeled pixels.where C α is a constant depending on the dataset and N train is the number of training pixels.In practice, C α = 50 for the Jasper Ridge dataset and C α = 10 for the Apex dataset.Fig. 10 shows the changes in the RMSE on abundance and SAD on the end-members with respect to the number of labeled pixels obtained by active learning.Two abundance RMSE curves in Fig. 10(a) and (c) illustrate that our methods trained on OH pseudo-labels (GLU-OH and GRSU-OH) improve in the very beginning and deteriorate with more labeled pixels (after 1%), which is attributed to the inaccurate information of the OH labels.At the beginning of active learning, the most representative pixels are selected, whose ground-truth abundance vectors might be close to the OH vector.When we incorporate more OH labels for unmixing, the OH labels are misleading as opposed to the EXT abundance map.On the other hand, GLU-EXT and GRSU-EXT always benefit from increasing the number of labeled pixels increases, but with a diminishing gain after around 3% of the label rate.In practice, we would like to use the OH pseudo-labels for training, since it is much easier to obtain, but not exceeding 1% for the active learning process.
Compared with the RMSE curves, the end-member SAD curves are relatively more stable with respect to the increase in the number of training pixels.In practice, after 2% of the label rate, the SAD values do not change much for all our methods, which implies that only a small number of training pixels are needed to estimate the end-members.
Overall for both GLU and GRSU, the increase in the training pixels does not always deem improvements in the performances.In fact, including more OH pseudo-labels is not beneficial to unmixing.A label rate around 1% would be a good choice experimentally.

C. Robustness Study
In order to provide a quantitative validation of the robustness exhibited by our approach, we have undertaken an extensive investigation into the performances evaluated by RMSE (26) and SAD (27) of various methods applied to the Jasper Ridge dataset.We add different amounts of Gaussian white noise to this data matrix, such that the resulting data have the signal-tonoise ratio (SNR) ranging from 5 to 40 dB with a 5-dB increment.Note that a higher SNR value implies less noisy data.
As evidenced in Fig. 11, the proposed GRSU method achieves the highest accuracy in estimating the abundance map A, no matter whether OH or EXT is considered under all the noise levels.The spectrum matrix S estimated by GRSU is the best for low SNR regime and is comparable to QMV when the data are less noisy (SNR > 15 dB).This result is reasonable, as we only impose two simple constraints on S (i.e., nonnegativity and sum-to-one).On the other hand, the GLU method seems sensitive to noise.In circumstances where the SNR is relatively low, the RMSE and SAD values produced by GLU tend to be large.But, we also observe a sharp decline in both RMSE and SAD by the GLU method with a gradual increase in SNR.

V. CONCLUSION
In this article, we propose two semisupervised hyperspectral models, the GLU and GRSU.GLU applies the graph Laplace learning directly to solve the HSU problem by regarding the class probability as the abundance map.Initialized by GLU, GRSU estimates the abundance map A and end-member spectrum matrix S by solving an energy-minimizing problem via an iterative ADMM scheme.We extended the graph Laplace learning to a regularized version and explored the close-form solutions for efficient computation.This regularized graph Laplace learning is a subproblem in the ADMM iteration process.We conducted extensive experiments using four standard hyperspectral datasets to compare our semisupervised methods with five state-of-the-art methods in hyperspectral blind unmixing.All the results demonstrated that the proposed GLU and GRSU methods have a significant improvement with a minimum amount of supervision.Furthermore, both methods can take either the ground-truth abundance maps or the OH pseudo-labels as the training information, the latter of which is much easier to obtain, since it only requires determining the major end-member of each training pixel.According to our experiments, it is unnecessary to require the ground-truth abundance maps to feed in the semisupervised framework.Sometimes, models trained on pseudo-labels yield better performance than the EXT values.We also discussed the influence of the number of training pixels on the model performance, revealing that a label rate of 1% is experimentally sufficient for satisfactory results.In addition, our GRSU method has great robustness over Gaussian white noise.The new methods have a great potential for real-world problems, since they do not need a ground-truth abundance map and instead can work with pseudo-labels.
There are several promising directions following this work.First, scaling factor [1] and spectral variability [2] terms can be incorporated into our model to further enhance the unmixing performance.Second, Fig. 11(b) suggests a need for regularizations of the spectral matrix, in addition to nonnegativity and sum-to-one constraints, for improving the robustness of the proposed methods.Finally, the current graph Laplacian learning solver can be replaced with neural networks or graph neural network techniques.

APPENDIX ACQUISITION FUNCTION FORMULAS
To make the paper self-contained, we provide the formulas for the VOpt [51] and MCVopt [45] functions.Given a graph G(X , W ), the formulas are based on a low-rank approximation of the graph Laplacian matrix L. With ordered eigenvalues of L as 0 = λ 1 < λ 2 ≤ • • • ≤ λ N , the truncated decomposition of L with the smallest M < N eigenvalues has the form L ≈ V V ⊤ , where ∈ R M×M is a diagonal matrix with diagonal entries λ 1 , λ 2 , . . ., λ M and V ∈ R N ×M is the matrix of corresponding eigenvectors.Furthermore, a Gaussian correlation matrix C is defined by where P ∈ R |X L |×N is a projection matrix onto the indices corresponding to the label set X and γ is a positive constant (we choose γ = 0.1 in this article).When the graph G is connected, such a matrix C exists, i.e., the matrix + V ⊤ ((1/γ 2 )P ⊤ P)V is invertible.
At each iteration in the active learning process with the current label set X , graph Laplace learning gives the optimized node function u * (and corresponding node matrix U * ) by minimizing the energy (3).Denoting the kth column of V ⊤ by v k , the VOpt and MCVOpt acquisition functions are given by where y * k is the OH thresholding of u * (x k ).

Fig. 1 .
Fig. 1.Flowchart of our semisupervised HSU models.The gray box indicates an input hyperspectral image.The orange boxes are the graph construction and graph-based active learning to select labeled nodes (pixels) for the training process (Sections II-A and III-A).Two red boxes are our proposed models, GLU (Section III-B) and GRSU (Section III-C).GLU applies graph Laplace learning directly to the unmixing task, while GRSU combines the graph-based regularization term with the linear unmixing model from hyperspectral imaging into a joint optimization to be solved.GLU also serves as the initialization of the GRSU optimization process.The blue boxes are the outputs of GLU and GRSU, i.e., estimated end-members and abundance map.

Algorithm 2
GLUINPUT: data matrix X , training data ( X , A), and α > 0. OUTPUT: matrices S GLU and A GLU .INITIALIZE: build a graph on X = [ X , X ] with corresponding Laplacian matrix L. Segment L into the block form(9).

Fig. 3 .
Fig. 3. End-member matrix estimated by different methods (in orange) on the Jasper Ridge dataset with the ground truth (in blue).All the end-member vectors are normalized to have the unit norm.Each row of the plot matrix corresponds to an end-member of the dataset, including Tree, Water, Dirt, and Road.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.

Fig. 4 .
Fig. 4. Abundance maps estimated by different methods on the Samson dataset.Each row of the plot matrix corresponds to an end-member of the dataset, including Soil, Tree, and Water.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.The last two columns are the ground truth and the label pixels selected by active learning (red dots) for our GLU and GRSU methods.Each red dot corresponds to a labeled pixel, which is enlarged for visual illustration.

Fig. 5 .
Fig.5.End-member matrix estimated by different methods (in orange) on the Samson dataset with the ground truth (in blue).All the end-member vectors are normalized to have the unit norm.Each row of the plot matrix corresponds to an end-member of the dataset, including Soil, Tree, and Water.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.

Fig. 6 .
Fig.6.Abundance maps estimated by different methods on the Urban4 dataset.Each row of the plot matrix corresponds to an end-member of the dataset, including Asphalt, Grass, Tree, and Roof.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.The last two columns are the ground truth and the label pixels selected by active learning (red dots) for GLU and GRSU methods.Each red dot corresponds to a labeled pixel, which is enlarged for visual illustration.Note that both GLU-OH and GRSU-OH well preserve the contrast of the rectangular rooftop in the Roof abundance.

Fig. 7 .
Fig. 7. End-member matrix estimated by different methods (in orange) on the Urban4 dataset with the ground truth (in blue).All the end-member vectors are normalized to have the unit norm.Each row of the plot matrix corresponds to an end-member of the dataset, including Asphalt, Grass, Tree, and Roof.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.

Fig. 8 .
Fig.8.Abundance maps estimated by different methods on the Apex dataset.Each row of the plot matrix corresponds to an end-member of the dataset, including Road, Tree, Roof, and Water.The first five columns are (unsupervised) blind unmixing methods, and the following four columns are our semisupervised methods.The last two columns are the ground truth and the label pixels selected by active learning (red dots) for our GLU and GRSU methods.Each red dot corresponds to a labeled pixel, which is enlarged for visual illustration.
Fig.10.RMSE and SAD curves with respect to the number of labeled pixels for the Jasper Ridge and the Apex datasets.For each plot, the x-axis on the top shows the percentage of labeled pixels, while the bottom one is the number of labeled pixels.In the active learning process, we apply the MCVOpt and VOpt acquisition functions for the Jasper Ridge and Apex datasets, respectively.Each curve starts with only one random pixel per end-member and sample up to 5% of labeled pixels.(a) Jasper RMSE.(b) Jasper SAD.(c) Apex RMSE.(d) Apex SAD.

Fig. 11 .
Fig. 11.RMSE (for A) and SAD (for S) curves with respect to the SNR values of noisy input of the Jasper Ridge dataset corrupted by Gaussian white noise.In both panels, the performance of other blind methods is illustrated by dashed lines, whereas our proposed semisupervised methods are represented by solid lines.(a) RMSE versus SNR.(b) SAD versus SNR.
Algorithm 1 Sample Labeled Pixels Through Active Learning INPUT: dataset X ; corresponding graph G = (X , W ); initial sample number m; total sample number M. OUTPUT: the label dataset X ⊂ X and the set of corresponding labels A. INITIALIZE: randomly sample m pixels as the initial label set X ; acquire the labels for X as A. WHILE | X | < M (| • | means cardinality) DO: Apply the graph Laplace learning on G based on labels of X to predict labels on X \ X .Calculate the acquisition function A on X \ X based on the Laplace learning outputs.Select a query set Q based on the acquisition function values according to the sequential active learning or LocalMax.Update the current label set: X → X ∪ Q; Acquire the labels of Q and update the label set A accordingly.END WHILE Problem (7) is the standard graph Laplace learning.Consider the block representation of L as follows: