Kernel-Based Decomposition Model With Total Variation and Sparsity Regularizations via Union Dictionary for Nonlinear Hyperspectral Anomaly Detection

Many linear approaches have been extensively proposed for the anomaly detection problem in hyperspectral images (HSIs), while nonlinear approaches have been rarely studied although most practical cases are nonlinear. Moreover, these existing nonlinear methods simply nonlinearly map each pixel into a high-dimensional space, which does not describe complex light scattering effects between endmembers. To address the above issues, this article proposes an endmember-kernel-based decomposition model with total variation (TV) and sparsity regularizations via a union dictionary for the nonlinear anomaly detection in HSIs. The proposed decomposition model utilizes endmember-kernel theory to handle nonlinear interactions between atoms in the dictionary, allowing for the effective characterization of complex light scattering effects. By using this endmember-kernel-based decomposition model, hyperspectral imagery can be decomposed into three components: anomaly, background, and noise. To separate these components effectively, the TV and sparsity regularizations are incorporated into the decomposition model to characterize the spatial properties of the background and the anomaly, respectively. Besides, we present a novel construction framework of union dictionary that combines superpixel segmentation and clustering methods sequentially to achieve more accurate dictionary representation capabilities. Finally, the anomalous level of a tested pixel is calculated by the abundances associated with the anomaly dictionary. The experimental results on both synthetic and real hyperspectral datasets demonstrate that the proposed method outperforms several linear and nonlinear state-of-the-art anomaly detectors.


I. INTRODUCTION
I N RECENT decades, advances in remote sensing technology have enabled sensors on airborne or space platforms to collect hyperspectral images (HSIs) in a more convenient manner. The HSIs have a high spectral resolution because the sensor acquires a spectral vector with hundreds or thousands of elements for each pixel of the HSIs [1], [2]. In fact, the different materials have completely different spectral curves, which leads to the development of several practical applications, such as unmixing and target detection [1], [2], [3], [4]. Target detection is one of the most widely used HSI processing techniques, and according to whether the target spectra are known in advance, the detection problem can be divided into unsupervised and supervised types. In the case that the target spectra are unknown, unsupervised target detection, which is also known as anomaly detection, has to be applied. It is used to identify uncommon objects with a significant difference from their surrounding background. Considering the fact that the specific spectral curves of the targets are difficult to obtain, anomaly detection is more suitable for practical applications, such as environmental monitoring and military detection [5], [6].
Many approaches have been proposed to solve the problem of anomaly detection for HSIs. In view of the convenience of calculation and physical interpretation, most of them are researched on the basis of the hyperspectral linear mixing model (LMM). The most typical anomaly detection method is the Reed-Xiaoli (RX) detector [7], which computes the anomalous level of the tested pixel by the Mahalanobis distance between the tested pixel and the mean of the background under the assumption of multivariate Gaussian distribution. Several modified anomaly detectors based on RX have emerged, including local RX (LRX) [8], weighted RX [9], segmented RX [10], regularized RX [11], and so on. However, in real HSIs, it is difficult to satisfy the assumption of a multivariate Gaussian distribution in RX. Besides, the estimates of the mean of the background are susceptible to noises and anomalies. Compared with the above RX-based detectors, the representation-based methods do not make any specific assumptions about the HSIs. The collaborative-representationbased detector (CRD) proposed in [12] is based on the concept that background pixels can be represented by their spatial neighbors, whereas the anomalies cannot. Unfortunately, there is no general rule to choose the appropriate window size so that each HSI has an appropriate window size setting, which is not feasible in practical situations. Furthermore, RX and CRD calculate the anomalous level for each tested pixel separately, thus ignoring the global statistical properties of the entire hyperspectral imagery.
In recent years, robust principal component analysis (RPCA) [13] was utilized to decompose the HSIs into sparse and low-rank components [14]. However, in HSIs, the spectra are generally contaminated by noise, which results in the high false alarm of RPCA. In terms of this issue, the low-rank and sparse matrix decomposition (LRaSMD) [15] method models the anomalies and noise separately, but it still cannot separate the noise and weak anomalies effectively. In [16], [17], and [18], to represent the other anomalies in HSIs, the anomaly dictionaries are constructed by using some anomalous pixels that are significantly different from the surrounding background. In this way, the anomaly component can be well separated from the noise since the noise cannot be described by the latent anomaly dictionary. Considering the spatial similarity of background pixels in the neighboring regions, the total variation (TV) and sparsity regularized decomposition model (TVSDM) [18] incorporates the TV regularization to make the adjacent pixels' abundances close. Besides, sparsity regularization is incorporated to restrain the spatial sparsity of anomalies distributed in the whole HSIs [18]. Moreover, the anomaly detector with local spatial constraint and TV (LSC-TV) [19] divides the whole hyperspectral imagery into several superpixels and makes the background abundances close in each superpixel. However, the above detectors can only be used to solve the anomaly detection problem with LMM. In reality, the model errors caused by the inherent nonlinear characteristics [20] in real HSIs reduce the performance of LMM-based anomaly detectors significantly. Therefore, the nonlinear models must be considered.
Recently, a number of nonlinear mixture models (NMMs) have been presented to characterize the nonlinear interactions in HSIs. The well-known Hapke model [21] assumes that the multiple scattering is isotropic and utilizes a set of physical parameters to accurately calculate the single scattering, and thus, it can be used to solve the nonlinear problem. Due to the difficulty of obtaining accurate physical parameters in advance, this model is rarely applied in practical situations. Consequently, in order to avoid presetting physical parameters, the recently proposed NMMs extend the LMM by adding a nonlinear term to characterize the nonlinearity in spectral mixing. The Fan model (FM) [22] takes the second-scattering between two different endmembers into consideration, and then, the generalized bilinear model (GBM) [23] introduces nonlinear parameters for each interaction component to control the weight of the nonlinear mixing component. Furthermore, the polynomial postnonlinear model (PPNM) [24] considers that the second-order scattering phenomenon also exists between two same endmembers. Since the nonlinear interactions in real HSIs are mainly generated by second-order scattering, these bilinear mixture models (BMMs) are adequate in some practical applications. However, the high-order scattering effect cannot be ignored by others. In [25], a multilinear mixing (MLM) model is derived by introducing a probability parameter about multiple interactions, which extends the PPNM to an infinite number of degrees.
The kernel theory can be utilized to extend the LMMbased anomaly detectors to their nonlinear versions by mapping the hyperspectral data into a high-dimensional space. Kernel RX (KRX) [26] first executes the nonlinear mapping through kernel transform, which enhances the separability of the data, and then, the RX algorithm is applied in this highdimensional space. Similar to the KRX, the nonlinear version of CRD, namely, kernel CRD (KCRD) [12], achieves the collaborative representation by the kernel function implicitly. Considering the fact that anomalies are far from most of the backgrounds in the high-dimensional space, a novel nonlinear anomaly detector with kernel isolation forest (KIFD) [27] is proposed. Moreover, the support vector data description (SVDD) [28] constructs a minimum hypersphere in the hyperspectral data space, and then, the pixels falling outside of this hypersphere are regarded as anomalies. The above nonlinear methods map the hyperspectral data into a highdimensional space to extract nonlinear features. Although such methods can improve the separability of HSIs, the inherent nonlinear interactions in HSIs are ignored. Besides, there is no general rule to select the kernel function, which results in very unstable anomaly detection results. In terms of this issue, the KHype [29] proposes the endmember-kernel theory to realize the nonlinear interactions between endmembers. On this basis, the KNUD [30] utilizes the endmember-kernel theory with a dual window to implement anomaly detection. However, the detection result of KNUD is affected to some extent due to the absence of global spatial properties, and the deficiency of the dual window has still not been solved.
The specific spectral information of the anomaly and background is unknown in the case of anomaly detection. Therefore, in order to distinguish the background and anomalies effectively, the existing anomaly detection algorithms generally construct a background dictionary or a union dictionary that consists of an anomaly dictionary and a background dictionary [16], [17], [18]. In order to express other anomalies in the hyperspectral imagery, the anomaly dictionary is composed of anomalies with significantly different spectra from their surroundings [17], [18], [31]. Besides, the global clustering methods consider the spatial homogeneity [32] of background pixels and group all the pixels into several clusters, such as mean shift [33], k-means [34], and k-means++ [35]. Then, the background dictionary can be constructed by selecting the representative pixels in each cluster. However, it is unrealistic to make the assumption that the cluster number is known. Moreover, the global clustering methods have a common drawback that cannot correctly distinguish the sparse background and anomalies because of their similar proportions in the whole hyperspectral imagery.
To sum up, there are several problems with the existing hyperspectral anomaly detectors.
1) The traditional kernel theories map the hyperspectral data into a high-dimensional space, which is inconsistent with the inherent nonlinear interactions of HSIs based on the NMMs.
2) The spatial information of HSIs is not well exploited in nonlinear anomaly detection algorithms, which is very important for effective model decomposition.
3) The existing dictionary construction methods have a common drawback that cannot correctly distinguish the sparse background and anomalies, reducing the accuracy of anomaly detection.
In this article, in order to solve the above problems, we propose a novel kernel-based decomposition model with TV and sparsity regularizations via a union dictionary for nonlinear anomaly detection in HSIs, named KDM-TVS. The proposed decomposition model utilizes endmember-kernel theory to handle nonlinear interactions between atoms in the dictionary so that the complex light scattering effect can be effectively described. Compared with KNUD, which also utilizes endmember-kernel theory to implement anomaly detection in the dual window, the proposed KDM-TVS decomposes the whole hyperspectral imagery into three components: anomaly, background, and noise. In order to separate these components effectively, the spatial sparsity of the anomalies is enforced by means of sparsity regularization. Besides, TV regularization is utilized to make the abundance of adjacent pixels in the background similar so that the spatial smoothness of the background is realized. Moreover, in order to enhance the accuracy of model decomposition, we present a novel construction framework of union dictionary, in which the hyperspectral imagery is first divided into several blocks using a superpixel-based segmentation method, and then apply a local clustering method within each block. Unlike other global clustering methods that cannot distinguish the sparse backgrounds and anomalies, the proposed construction framework of union dictionary extracts background dictionary atoms in each block, and thus, a complete background dictionary can be obtained. Meanwhile, the anomaly dictionary can be constituted easily by picking out some strong anomalous pixels. By means of the optimal abundances associated with the latent anomaly dictionary, precise detection results can be obtained.
The main contributions of this article can be briefly summarized as follows.
1) A novel endmember-kernel-based decomposition model for nonlinear hyperspectral anomaly detection is proposed to handle the nonlinear coactions between dictionary atoms. By fully considering the inherent nonlinear interactions of hyperspectral imagery based on the NMMs, the decomposition model is more conform with the physical process of light scattering effect and more physically interpretable compared with the traditional kernel methods that map the hyperspectral data into a high-dimensional space. Besides, the sparsity and the TV regularizations are incorporated into the decomposition model to characterize the spatial properties of anomalies and background, respectively, to further achieve the effective separation of the anomaly, background, and noise. 2) In view of the difficulty of distinguishing sparse backgrounds and anomalies in the existing methods, we present a novel construction framework of the union dictionary, which combines the superpixel segmentation and clustering methods sequentially. By means of local clustering, the interference of anomalies can be easily eliminated, and the sparse backgrounds can be effectively singled out to constitute a background dictionary. The rest of this article is arranged as follows. Section II briefly introduces the related works. In Section III, the proposed nonlinear anomaly detection method will be described in detail. The experiments on both synthetic and real hyperspectral datasets are conducted, which demonstrate the superiority of the proposed method in Section IV. Finally, Section V draws the conclusions of this article.

A. Hyperspectral Mixing Models
Each pixel in HSIs may consist of several endmembers on account of the low spatial resolution, namely, the spectra of the pixel are the interactions of several types of the endmember spectra. Denote E = [e 1 , e 2 , . . . , e P ] ∈ R L×P an endmember matrix with L spectral bands and P endmembers. y ∈ R L×1 is a tested pixel; the form of LMM is shown as follows: where α = [α 1 , α 2 , . . . , α P ] T denotes the abundance vector of y and n represents the noise. Besides, the abundance vector α needs to satisfy two constraints, i.e., abundances' sum-to-one constraint (ASC) and abundances' nonnegativity constraint (ANC) [36]. However, the LMM is insufficient because of the multiple light scattering effects. In order to characterize the nonlinear interactions in HSIs, the FM shown in (2) considers the second-order scattering effects between two different endmembers where represents the Hadamard product and e i e j = [e i,1 e j,1 , e i,2 e j,2 , . . . , e i,L e j,L ] T models the secondorder interaction between two endmembers. In order to control the weight of the nonlinear mixing component, the GBM shown in (3) introduces the nonlinear parameter γ The FM and GBM only consider the nonlinear interactions between different endmembers. Recently, the PPNM shown in (4) considers that the second-order scattering phenomenon also exists between two same endmembers [25]  To sum up, the aforementioned nonlinear mixing models limit the light scattering phenomenon in two times, and a general form can be summarized as follows: where τ i j , θ 1 , and θ 2 denote the parameters of each nonlinear mixing models, and their specific definition is shown in Table I. The nonlinear interactions are indeed dominated by the second-order scattering for some practical applications. However, the high-order scattering should be considered in complex scenes with significant altitude differences. In order to consider the high-order scattering effects present in HSIs, the MLM model [26] extends PPNM by introducing a probability parameter q: Both the PPNM and MLM demonstrate that the nonlinear interaction between endmembers is more physically interpretable compared with the pixel spectral distortions. Moreover, the existence of the high-order scattering phenomenon cannot be ignored in most of the natural scenes. Therefore, we utilize PPNM and MLM to represent the nonlinear mixings among the endmembers in this article.

B. K-Hype
The K-Hype algorithm proposes an endmember-kernel theory to solve the HSI unmixing. As discussed in [29], each tested pixel y ∈ R L×1 in HSIs can be modeled as follows: where y l is the lth band of the tested pixel y, n l is the lth band of the noise n, and e λ l ∈ R 1×P is the lth row of E. ψ : R P → R is an unknown nonlinear function, which can be an arbitrary real-valued function of the reproducing kernel Hilbert spaces (RKHSs) H. Thus, this model is flexible enough to replace most NMMs by adjusting the function ψ.
Owing to the Riesz representation theorem [37], the mapping result for each fixed e λ l ∈ E can be calculated by a reproducing kernel function as where ·, ·H denotes the inner product in the RKHS H and k n (·, ·) is the used reproducing kernel function. The nonlinear function ψ can be expressed by the mapping results of other spectral bands. Detailed proof of this can be found in [29]. It is easy to find that this model is able to represent the NMM by selecting an appropriate nonlinear function ψ. The objective function of K-Hype is formulated as follows: arg min α,ψ 1 2 which can be solved by standard duality theory [38].

III. PROPOSED METHOD
A detailed description of the proposed KDM-TVS is given in this section. We first present a novel kernel-based decomposition model to separate the hyperspectral imagery into three components: anomaly, background, and noise. To separate these components effectively, the TV and sparsity regularizations are incorporated to characterize the spatial properties of the background and the anomaly, respectively. The nonlinear interactions among the endmembers can be represented by the endmember-kernel theory. By means of the alternating direction method of multipliers (ADMM) [39], the abundances associated with the union dictionary can be obtained. Then, the anomalous level of a tested pixel can be calculated by the abundance of the latent anomaly dictionary. In order to obtain a union dictionary precisely, a novel construction framework of union dictionary, which consists of the local clustering method and the global anomaly information, is presented. The graphical description of the proposed KDM-TVS is shown in Fig. 1.

A. KDM-TVS for Anomaly Detection
denote the anomaly dictionary and the background dictionary, respectively. Then, the union dictionary can be expressed as where P + R is the number of atoms in the union dictionary. The nonlinear mixing model can be formulated by the union dictionary and abundance matrix based on (7) where Z ∈ R R×M and X ∈ R P×M are the abundance coefficient matrix associated with the anomaly dictionary A and the background dictionary B, respectively. Thus, the union abundance coefficient matrix can be expressed as Usually, the natural background is spatially smooth generally, that is, the abundances of the adjacent pixels are similar. Thus, the TV regularization [40], [41], which makes the adjacent pixels' abundances close [42], [43], [44], is incorporated to characterize the background component in the anomaly detection of HSIs. Besides, the anomalies in HSIs show spatial sparsity, that is, the abundance matrix Z should be column sparse. Therefore, the l 2,1 -norm [45], which is defined as the sum of l 2 -norm of each column in the abundance matrix, can be used to characterize the anomaly component.
In the meantime, we expand (10) to the whole HSIs. Besides, in order to separate the objective function, we introduced two auxiliary variables, Z 1 and X 1 . In this way, the problem of the proposed KDM-TVS can be described as arg min α, The problem (12) can be transformed into the corresponding Lagrange function as follows: where β il , γ i p , and λ i are the Lagrange multipliers, and 1 P+R is a (P + R)-dimensional column vector with an element of 1. D 1 is the auxiliary variable that denotes the differences between the original variable X and the corresponding auxiliary variable X 1 . Similarly, D 2 represents the differences between Z and Z 1 . Each variable in (13) can be updated iteratively by means of ADMM [39]. The subproblems of the Lagrange function are presented as follows.
1) The abundance coefficient vector α i associated with the tested pixel y i can be formulated as By taking the derivative of the original variables in (14), the optimality conditions are [29] The dual problem (16), as shown at the bottom of the page, can be obtained by substituting (15) into (14). In (16), K denotes the Gram matrix defined as K l j = k(e λ l , e λ j ).
In order to handle the nonlinear interactions between endmembers, we introduce the second-order polynomial kernel and the Gaussian kernel, which correspond with the description of nonlinear interactions in PPNM and MLM, respectively, The conjugate gradient algorithm [46] is utilized to solve the convex quadratic programming problem (16). Once the optimal dual variables β * i , γ * i , and λ * i are obtained, the iterative result α (t+1) can be got according to (15).
2) The variable Z 1 can be updated by solving the following problem: whose solution is where denotes the solving operator for the l 2,1 -minimization problem [45].
3) The subproblem for X 1 can be expressed as We use the fast gradient-based algorithm introduced in [47] to solve (21) in this article. Then, before proceeding to the next iteration, the Lagrange multipliers D 1 , D 2 and the parameters ρ, η 1 , and η 2 are updated as follows: Finally, the detection value of the tested pixel y i can be calculated by the optimal anomaly abundance matrix Z * Z * The proposed KDM-TVS can be summarized in Algorithm 1.
The computational complexity of Algorithm 1 mainly depends on the update of the abundance α. In each iteration, (15) and (16)  Here, M and L are the number of pixels and spectral bands, respectively, and P and R denote the number of atoms in the background dictionary and the anomaly dictionary, respectively. Therefore, Algorithm 1 has the computational complexity of O((L + P + R) 2 M).

B. Construction of Union Dictionary
Let the sketch map with 8 × 8 pixels denote hyperspectral imagery shown in Fig. 2(a). Due to the similar proportion of sparse backgrounds and anomalies, the global clusteringbased methods cannot distinguish the sparse backgrounds and anomalies effectively. Considering the fact that the spatial distributions of the sparse backgrounds and anomalies are very different, we present a novel construction framework of union dictionary. First, we utilize a superpixel segmentation method to divide the hyperspectral imagery into several superpixel blocks, as shown in Fig. 2(b), and the proportion of sparse backgrounds in a superpixel block increases significantly compared with the proportion of sparse backgrounds in the original hyperspectral imagery because of the compact spatial distribution, whereas the proportion of anomaly does not change. Then, the separation of the sparse backgrounds and anomalies can be achieved by a local clustering method in each superpixel block. Simple linear iterative clustering (SLIC) [48] is a superpixel segmentation method proposed in recent years, which integrates spatial distance and spectral distance, so as to obtain an excellent segmentation result focusing on boundary maintenance. Therefore, it is adopted to execute the superpixel segmentation for the union dictionary construction in this article. Besides, in order to avoid presetting the number of clusters in clustering methods, the density peak-based clustering (DPBC) method [49], which is only based on the similarity between data points, is utilized to execute the local clustering operation.
In order to construct a representative latent anomaly dictionary A, which can be used to represent other anomalies in HSIs [17], [18], [31]. Traditional global anomaly detectors are used to pick out strong anomalies that are significantly different from their surroundings. Considering the complexity and efficiency, this article utilizes the first R pixels of the RX [7] detection results to construct an anomaly dictionary. Finally, superior detection results will be obtained by combining the union dictionary with the proposed method.
The proposed construction framework of union dictionary is summarized in Algorithm 2.

Algorithm 2 Procedure for the Construction of Union Dictionary
Input: Data Y ∈ R L×M , the number of superpixel blocks N s , the number of atoms in the latent anomaly dictionary R. Background dictionaryB: 1) Divide Y into N s superpixel blocks by the superpixel segmentation method. 2) By means of the local clustering method, the background dictionary B can be constituted by using representative pixels in each superpixel block. Anomaly dictionaryA: Apply the global anomaly detector RX. Output: Anomaly dictionary A, and background dictionary B.

IV. EXPERIMENTAL RESULTS
In this section, a synthetic hyperspectral dataset is utilized to analyze the specific procedures and the performance of KDM-TVS comprehensively. Then, three real datasets are used to verify the effectiveness of our proposed KDM-TVS in the actual applications.
Since the receiver operating characteristic (ROC) curve [50] and the area under the ROC curve (AUC) [51] can intuitively evaluate the detection results of an anomaly detector. Therefore, the ROC curve and the AUC are utilized to compare the KDM-TVS with other several anomaly detection methods quantificationally. All the experiments are implemented on a computer with an Intel Core i9-11900K CPU at 3.5 GHz and 32-GB RAM, MATLAB R2018b.

A. Experiments on Synthetic Dataset
The synthetic hyperspectral dataset is produced based on the San Diego dataset, which was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). After removing the water vapor absorption and low signal-to-noise ratio (SNR) bands (i.e., 1-6, 33-35, 94-97, 107-113, 153-166, and 221-224) from the original 224 spectral bands, 186 bands are retained in the following experiments. The original whole scene with 400 × 400 pixels is displayed in Fig. 3(a), and a subimage with 100 × 100 pixels in the red square is cropped to generate the synthetic dataset. The anomaly pixel t is selected from an airplane in the original scene, whose signature is shown in Fig. 3(b). Here, the anomalous pixels are generated by the target implantation method [52]. Specifically, a synthetic anomalous pixel z is generated by fractionally embedding the selected anomaly pixel t into a given background pixel based on PPNM where f denotes the abundance, which belongs to [0.05, 0.1, 0.2, 0.4, and 0.5]. Each of them generates three anomaly blocks with different sizes of 1 × 1, 1 × 2, and 2 × 2, and the positions of these blocks are completely random, considering the true distribution of anomalies in real hyperspectral imagery. ξ denotes the tradeoff between linear and nonlinear components in PPNM, which is randomly generated in the  range of 0-1 for each anomaly. Fig. 3(c) and (d) shows the pseudocolor image and the corresponding ground-truth map of the synthetic hyperspectral dataset, respectively. It is worth pointing out that the synthetic datasets are challenging for anomaly detection, especially in the following two aspects: 1) some sparse backgrounds in the scene may cause a higher false alarm rate in the detection result and 2) all the mixing coefficients f are relatively low in the synthetic anomalous pixels, which leads to the difficulty of weak-anomaly detection problem.
Then, we utilize this synthetic dataset to analyze the proposed KDM-TVS comprehensively, including the construction framework of union dictionary and the parameter setting. Then, the detection result of the proposed KDM-TVS is compared with other anomalous detectors, and the different levels of Gaussian noise are added to the synthetic dataset to obtain a reliable comparison result.
1) Dictionary Construction: According to the analysis in Section III-B, the SLIC [48] and the DPBC [49] are adopted to execute the superpixel segmentation and the local clustering in this article, respectively. The number of superpixel blocks N s is set to 9 empirically because the amount of dominant background is in single digits generally. Then, the DPBC is utilized to process each superpixel block, and the final segmentation map is shown in Fig. 4(a). It can be observed that the proposed construction framework of union dictionary avoids the interference of anomalies, and the sparse backgrounds are included in the background dictionary simultaneously, also being reflected in the follow-up experiments of real datasets. Correspondingly, the segmentation map of the global clustering method with a small cluster number is shown in Fig. 4(b). It can be seen that the sparse background at the left bottom is classified as a road. With the increase in the cluster number, the global clustering method is able to separate the sparse background, but many anomalies are also classified separately because of the similar proportion of sparse backgrounds and anomalies, which is shown in Fig. 4(c).
To sum up, the proposed construction framework of union dictionary with the local clustering method can pick up the sparse backgrounds in the background dictionary without the interference of anomalies, which overcomes the imperfection of global clustering methods.
As for the anomaly dictionary, the predetector is just utilized to pick out several high-purity anomalies. With this latent anomaly dictionary of strong anomalies, the proposed KDM-TVS is able to represent other weak anomalies in HSIs. Here, we compare the performances of RX, LRX, CRD, and KRX in the process of anomaly dictionary construction. The detection maps based on different union dictionaries are shown in Fig. 5, and the ROC curves are shown in Fig. 6. The AUC values of these detectors and the AUC values of KDM-TVS(G) based on these detectors are displayed in Table II. It can be seen that, although the AUC values of these detectors are not good due to the limitation of themselves, the detection results of the proposed KDM-TVS(G) based on these detectors are satisfactory, which indicated that many efficient anomaly detectors can be applied to pick out strong anomalies. Furthermore, the KRX has a kernel parameter σ that needs to be adjusted, the sizes of dual windows in LRX and CRD also need to be preset, and it can be observed that it takes less time to build an RX-based union dictionary than other detectors. Therefore, to improve the stability and efficiency  of the predetection, we use the simple and efficient RX to generate the latent anomaly dictionary. Fig. 7 shows the variation curve of AUC values with the changes of R. The strongest anomalies can be picked out by traditional anomaly detectors. Then, these strongest anomalies can represent other anomalies in the whole HSIs, and thus, the proposed KDM-TVS obtains a satisfactory detection performance with a few of anomaly atoms. In order to guarantee the representation capability of the anomaly dictionary by the proposed Algorithm 1, the value of R should not be too small to construct a complete anomaly dictionary. However, the latent anomaly detection dictionary probably contains some sparse backgrounds when R increases, resulting in the deterioration of the anomaly detection results. Therefore, the value of R should take a relatively appropriate value. It can be observed from Fig. 7 that the proposed KDM-TVS can obtain a satisfactory detection result in a relatively wide range of R.
To avoid the effect of sparse background and cover the strong anomalous pixels as many as possible in the meantime, we set R = 7 in the follow-up experiments.  2) Parameter Setting: We evaluate the influence of the parameters of the proposed KDM-TVS (Gaussian kernel) on the detection result. η 1 and η 2 are tradeoff parameters used to balance the regularization terms, which do not need to be adjusted because they get smaller as the iteration proceeds.
Next, we design a parametric experiment on the tradeoff parameter μ, which is used to balance the error term. The adjustment range of μ belongs to [0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1], and the corresponding AUC values are listed in Table III. It reveals that the change of AUC value is not severe when the tradeoff parameter μ is set in a relatively wide range. Therefore, we empirically fix μ = 0.2 in the follow-up experiments for the sake of simplicity.
3) Ablation on Regularization Terms: We investigate the contributions of the incorporated regularizations, i.e., the sparsity (l 2,1 -norm) term and the TV term. The proposed  TABLE IV   AUC VALUES AND RUNNING TIMES OF ALL THE DETECTORS ON THE SYNTHETIC DATASET KDM-TVS without only the sparsity regularization terms (η 1 = 5 and η 2 = 0) and without any regularization term (η 1 = 0 and η 2 = 0) is implemented on the synthetic dataset, respectively. The detection maps are displayed in Fig. 8(a) and (b), respectively. It can be seen that the TV term makes the detection result distinct for its superior performance at keeping the background smooth. Another observation is that there are a number of false alarms in the detection result. The reason is that the l 2,1 -norm can restrain the number of numerically large points in the anomalous abundance map. The intact anomaly detection result with two incorporated regularizations is shown in Fig. 5.

4) Comparison of Detection Results:
Here, we compare the detection performance of the proposed KDM-TVS with other anomaly detectors by utilizing the synthetic dataset. The competitors include the linear detectors (LRX [8], CRD [12], TVSDM [18], and LSC-TV [19]) and the nonlinear detectors (KRX [26], KCRD [12], KNUD [30], and KIFD [27]). LRX and KRX are two improved versions of RX, which are recognized as benchmarks among the statistical approaches. CRD and its kernel version KCRD are based on the representation in the neighborhood, which accounts for the relationship between adjacent pixels. TVSDM incorporates the TV and sparsity regularizations to the decomposition model based on LMM. LSC-TV also makes the background abundances close in each superpixel. KIFD applies an isolation forest algorithm in the high-dimensional space to detect anomalies. KNUD utilizes the endmember-kernel theory with a dual window to implement anomaly detection, which takes the nonlinear mixing between endmembers into account.
We set the involved parameters according to the values recommended in the literature. Specifically, the dual window sizes (w in , w out ) in LRX [8], CRD [12], KCRD [12], and KNUD [30] are all set to (3,9). The tradeoff parameter λ in CRD and KCRD is set to 10 −6 as recommended in [12]. For KNUD, the number of anomaly atoms and the proportion of the selected pixels in the window are set to 15 and 95% according to [30], respectively. Similarly, for TVSDM [18], the number of anomaly atoms and background atoms in each class are set to 20 and 20, respectively. Besides, we set λ = 10 −3 , β = 5 × 10 −3 , P = 20, and S = 15 for LSC-TV as recommended in [19]. The number of trees and the subsampling size of the tree in KIFD are set to 1000 and 3% according to [27], which are fixed in the follow-up experiments.
The detection maps of the different detectors on the synthetic dataset are displayed in Fig. 9. It can be observed that the CRD and LRX cannot identify the anomalies, that is, the weak anomaly detection case is severe for these two linear detectors. Besides, a large number of false alarms are included in the detection results of KRX, KCRD, and KIFD. It should be due to the inappropriate nonlinear mapping, which ignores the inherent nonlinear interactions among endmembers. Both the KNUD and the proposed KDM-TVS consider the nonlinear interaction between dictionary atoms, while the detection results of KNUD are slightly dim. Correspondingly, the detection results of the proposed KDM-TVS are distinct, which gives the credit to the consideration of spatial relations by utilizing the TV and sparsity regularizations.
The ROC curves of each anomaly detector are displayed in Fig. 10 to evaluate the detection performance quantitatively. It is obvious that the detection rate of the proposed KDM-TVS is higher than that of other detectors at nearly all false alarms apart from the first 2%. Moreover, the proposed KDM-TVS can achieve a 100% detection rate when the false alarm rate is lower than 2%, which shows the significant improvement of the proposed KDM-TVS with other detectors. Besides, the AUC values of different detectors are shown in Table IV, which indicates the superiority of the proposed KDM-TVS quantitatively. It is worth noting that the detection result of KDM-TVS(G) is slightly worse than that of KDM-TVS(P), just because the synthetic dataset is generated on the basis of the PPNM, which only considers the second-order interactions between atoms.
The running times of different detectors are also provided in Table IV. Compared with several nonlinear detectors with the large computational cost of kernel matrix, the linear detectors besides LSC-TV are time-saving. Here, we analyze the timeconsuming items in each detector. The LSC-TV updates the guided matrix Z in each superpixel block. The KIFD generates 1000 binary trees to detect anomalies. The KCRD computes the kernel matrix in each local region, while the KRX does it based on the whole HSIs, which results in that the running time of KCRD is much shorter than that of KRX. Besides, the KNUD utilizes the linear detector to obtain the background dictionary in each of the local regions, resulting in the time cost. The proposed method generates and solves (16) in each iteration, which is time-consuming but still acceptable compared with other detectors.

5) Robustness to Different Noise Levels:
Here, the synthetic dataset is corrupted by different levels of Gaussian noise (30,35, and 40 dB, respectively). The SNR can be formulated as SNR = 10 log 10 E y T y E n T n (26) where E[·] computes the expectation of what it contains, and y and n represent the original pixel and the noise, respectively. The ROC curves are displayed in Fig. 11. It can be seen that the proposed KDM-TVS still outperforms other detectors at nearly all false alarms. Finally, we perform 20 replicates for each level of noise and then list the mean value and the standard deviation (std) of AUC values in Table V. It can be observed that KDM-TVS(P) and KDM-TVS(G) are more robust to the noise. Especially, KDM-TVS(G) is superior to KDM-TVS(P) in all noisy cases, which could be explained by the addictive noise affecting the original nonlinear mixing properties, while the Gaussian kernel is able to explain the high-order nonlinear mixings, which leads to that the KDM-TVS(G) is more robust to noise than the KDM-TVS(P). In summary, the proposed KDM-TVS outperforms other linear and nonlinear anomaly detection methods according to the experimental results. Moreover, the proposed KDM-TVS shows superior performance in noisy situations, and its running time is in an acceptable range.

B. Experiments on Real Hyperspectral Dataset
In this section, the anomaly detection performance of the proposed KDM-TVS is evaluated by three widely used real hyperspectral datasets, which were collected by the different sensors. The areas selected are the urban and airport scenarios, and the altitude difference of landcovers widely exists in these real HSIs due to the various man-made structures. These real HSIs are influenced by the light scattering effect, and thus, the nonlinear interaction has to be emphatically taken into consideration during anomaly detection.
The first real hyperspectral dataset covers an urban residential area and was captured by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor. The original hyperspectral imagery displayed in Fig. 12(a) contains 400 × 400 pixels with 174 spectral bands, and the spatial resolution is 1 m/pixel. A subimage (with 80 × 100 pixels) in the red square is adopted in our experiments, and the corresponding pseudocolor image is displayed in Fig. 12(b).  The anomalies are roofs and vehicles in this dataset, and a portion of them are subanomalies for their small size.
The second dataset used in our experiments is a subimage (with 100 × 100 pixels) captured by AVIRIS, whose pseudocolor image is shown in Fig. 12(c). The anomalous targets to be detected are three airplanes with 72 pixels. It is worth pointing out that there are a number of subpixel anomalies in this dataset due to the mixing between the edge of planes and the background.
The third one was collected over Pavia by the reflective optics system imaging spectrometer (ROSIS) sensor. The original hyperspectral imagery has a size of 1096 × 1096 pixels with 102 spectral bands, and the spatial resolution is 1.3 m/pixel. A subimage with 100 × 100 pixels is adopted in the experiments, whose pseudocolor image is displayed in Fig. 12(d). The anomalies are the vehicles on the bridge, and the high-order scattering effects exist in this hyperspectral imagery for the prominent altitude difference between the river, bridge, and vehicles. For the HYDICE dataset, the segmentation map of the proposed construction framework of union dictionary is shown in Fig. 13(a). It can be seen that sparse backgrounds (green regions in the bottom right area) are also included in the background dictionary. The dual window sizes (w in , w out ) in LRX [8], CRD [12], KCRD [12], and KNUD [30] are set to (3,7), (3,9), (3,9), and (3, 5), respectively. The optimal setting for the parameters in KNUD is R = 15 and η = 0.95. The optimal parameters in TVSDM [18] are P = 20 and r = 20 as recommended in the literature, which are fixed in the follow-up experiments. The detection maps of all detectors on the HYDICE dataset are displayed in Fig. 14. It can be observed that the CRD and LRX cannot identify the anomalies. On the contrary, the KRX, KCRD, KIFD, and KNUD can identify most of the anomalies with a large number of false alarms (as shown in the red circle). The proposed KDM-TVS cannot only effectively identify all of the anomalies but also restrict the response of sparse background. Especially, the detection performance of KDM-TVS(P) is slightly worse than that of KDM-TVS(G), just because the high-order scattering effects widely exist in this dataset on account of the altitude differences of ground, plants, and buildings. The AUC values and ROC curves are demonstrated in Table VI and Fig. 17(a), respectively. It is obvious that the detection rates of the proposed KDM-TVS(G) are highest on almost all of the false alarm rates apart from the first 4%. The AUC values also indicate the superiority of the proposed KDM-TVS.
For the real AVIRIS dataset, the segmentation map of the proposed construction framework of union dictionary is shown in Fig. 13(b). It is obvious that the sparse backgrounds (orange regions in the bottom left area and deep blue regions in the   upper left area) are also included in the background dictionary. The dual window sizes (w in , w out ) in LRX [8], CRD [12], KCRD [12], and KNUD [30] are set to (11,13), (11,13), (11,13), and (9, 15), respectively. The detection maps of all methods on the AVIRIS dataset are shown in Fig. 15. It can be observed that the CRD and LRX are still unable to identify the anomalies. It is worth noting that the detection result of TVSDM is affected significantly due to the high intensity of nonlinear interactions in this dataset. As shown in the red circle, the anomalous values of structures even exceed the true anomalies in the detection maps of these nonlinear detectors. However, these false alarm areas are suppressed to a certain extent in the detection map of the proposed KDM-TVS. The AUC values and ROC curves are presented in Table VI and Fig. 17(b), respectively. It can be observed that the detection rates of the proposed KDM-TVS(P) are highest on almost all of the false alarm rates, and the AUC values also validate the effectiveness of the proposed KDM-TVS.
For the real Pavia dataset, the segmentation map of the proposed construction framework of union dictionary is shown in Fig. 13(c). After extensively optimizing, the dual window sizes (w in , w out ) in LRX [8], CRD [12], KCRD [12], and KNUD [30] are set to (9, 11), (7, 13), (7,9), and (3,9), respectively. The detection maps of different detectors on the AVIRIS dataset are displayed in Fig. 16. Similarly, the differences between anomalies and backgrounds in the detection results of the nonlinear detectors are more obvious compared with the linear detectors. The high-order scattering effects exist in this dataset due to the evident altitude differences between rivers, bridges, and vehicles, leading to that the detection performance of KDM-TVS(G) is better than that of KDM-TVS(P). The ROC curves displayed in Fig. 17(c) show that

C. Summary
The superiority of the proposed KDM-TVS has been fully validated by the experiment results on both the synthetic and real hyperspectral datasets. Three main advantages of the proposed method can be briefly summarized as follows. 1) Effectiveness: By means of the proposed decomposition model with endmember-kernel theory, each tested pixel can be represented by the nonlinear characteristics between union dictionary atoms appropriately based on the NMMs (i.e., PPNM and MLM). This strategy is more conform to the physical process of light scattering effect in HSIs. In order to achieve better separation performance, the sparsity and TV regularization terms are incorporated into the decomposition model to characterize the spatial properties of anomaly and background, respectively. Besides, a novel construction framework of union dictionary, which combines the superpixel segmentation and clustering methods sequentially, is designed to distinguish sparse backgrounds and anomalies effectively. The experiment results demon-strate that the anomalies can be identified by the proposed KDM-TVS from the background (even the sparse background) effectively and can achieve a 100% detection rate with a very low false alarm rate. 2) Robustness to the Noise: In order to evaluate the robustness of the proposed KDM-TVS in noisy cases, the synthetic dataset is corrupted by three levels of Gaussian noise. The incorporated TV regularization term is especially effective for suppressing noise pollution, and the experimental results on the synthetic dataset show that the proposed KDM-TVS remains satisfactory even when the SNR is very low.

3) Convenience in Parameters Setting:
The experimental results on the change in parameters demonstrate that the detection performance of the proposed KDM-TVS is insensitive to the choices of algorithm parameter μ and dictionary parameter R in a relatively large range. Moreover, μ and R are fixed throughout the experiments on all of the datasets and still yield satisfactory detection results.

V. CONCLUSION
In this article, we have proposed a novel endmemberkernel-based nonlinear anomaly detection method (named KDM-TVS). Compared with the traditional nonlinear methods that map the hyperspectral data into a high-dimensional space, the proposed KDM-TVS considers the inherent nonlinear characteristics between dictionary atoms, which is a more physically interpretable strategy and achieved by the endmember-kernel theory based on the NMMs implicitly. By means of the proposed decomposition model, hyperspectral imagery can be decomposed into three components: anomaly, background, and noise. Moreover, in order to obtain the optimal separation result, the sparsity and the TV regularizations are incorporated into the decomposition model to characterize the spatial property of background and anomaly, respectively. Besides, we have presented a novel construction framework of union dictionary, which combines the superpixel segmentation method and clustering method sequentially to represent the background and anomaly, which ensures that the proposed framework can include the sparse background into the background dictionary and eliminate the influence of anomalies when generating the background dictionary. Compared with other state-of-the-art detectors, the effectiveness and robustness of the proposed KDM-TVS have been verified adequately by the experimental results on both synthetic and real hyperspectral datasets.
It can be concluded from the experiments that the choices of kernel result in the difference in detection results on account of the various mixing modes in various datasets. Therefore, in order to accommodate the demand of the anomaly detection problem in various datasets, multiple kernel learning is a relevant strategy worth researching. Besides, the accuracy of the constructed dictionary would directly affect the accuracy of the subsequent solution of abundance. To make dictionaries more adaptive, the study of iterative strategies combining dictionary learning with coefficient learning should be our future work.