Hyperspectral Anomaly Detection via Sparse Representation and Collaborative Representation

Sparse representation (SR)-based approaches and collaborative representation (CR)-based methods are proved to be effective to detect the anomalies in a hyperspectral image (HSI). Nevertheless, the existing methods for achieving hyperspectral anomaly detection (HAD) generally only consider one of them, failing to comprehensively exploit them to further promote the detection performance. To address the issue, a novel HAD method, which integrates both SR and CR, is proposed in this article. To be specific, an SR model, whose overcomplete dictionary is generated by means of the density-based clustering algorithm and superpixel segmentation method, is first constructed for each pixel in an HSI. Then, for each pixel in an HSI, the used atoms in SR model are sifted to form the background dictionary corresponding to the CR model. To fully exploit both SR and CR information, we further combine the residual features obtained from both SR and CR model by the nonlinear transformation function to generate the response map. Finally, to preserve contour information of the objects, a postprocessing operation with guided filter is imposed into the response map to acquire the detection result. Experiments conducted on simulated and real datasets demonstrate that the proposed SRCR outperforms the state-of-the-art methods.


I. INTRODUCTION
R ECENTLY, the remote sensing image interpretation methods covering panchromatic images [1], [2], multispectral images [3], [4], and hyperspectral images (HSIs) [5] have achieved remarkable achievements. Among them, HSIs can provide rich spectral information (i.e., about 10-nm spectral resolution), which makes it possible to identify the ground objects with different characteristics by means of the spectral information [6]. Based on the above advantage, the HSIs are widely employed in the field of HSI classification [7], [8], hyperspectral unmixing [9], [10], hyperspectral pansharpening [11], [12], band selection [13], [14], hyperspectral anomaly detection (HAD) [15], [16], and hyperspectral target detection [17], [18]. The HAD, which aims to search for the pixels whose spectral signatures are deviated from the surrounding background pixels without priori knowledge about anomalies, has attracted extensive attention in the military and civilian fields. Due to the absence of the priori knowledge about anomalies for HAD, there are two common assumptions for the task of HAD, which are 1) the anomalous pixels are spectrally or spatially different from the surrounding background pixels; and 2) the abnormal pixels only occupy a quite small proportion relative to the normal ones (i.e., background) in an HSI [19].
In the past decades, numerous research works are proposed to deal with the problem of HAD. The RX detector [20], proposed by Reed and Yu, is the classical statistics-based theory method to detect the anomalies in an HSI. The RX detector hypothesizes that the background is conform to the multivariate Gaussian distribution, and with this condition, the statistical characteristics (i.e., mean vector and the covariance matrix of all pixels in an HSI) are first calculated, and then the Mahalanobis distance of the pixel under test (PUT) by virtue of statistical characteristics is computed to measure the abnormal degree of PUT. However, there are two aspects of shortcomings for the RX detector in practical application. For one thing, the hypothesis about background is not applicable to the complex scenes. For another thing, the background modeling (i.e., calculating the statistical characteristics) is vulnerable to being contaminated by the anomalies, thus resulting in suboptimal modeling effect. To tackle above issues, some variants based on RX detector, such as local RX [21], kernel RX [22], fractional Fourier entropy RX [23], and fractional Fourier transform-based tensor RX [24], are proposed. However, these methods fail to fully characterize the background.
In recent years, the deep learning (DL)-based methods have shown unprecedented success in the field of image interpretation, such as image classification [25], [26], object detection [27], [28], cosaliency detection [29], [30], and so on, due to its strong ability to capture nonlinear and deep features. To fully excavate the potential of DL, the DL-based methods are introduced into HAD. Among them, the autoencoder (AE) is widely applied into HAD owing to its outstanding reconstruction capability. Chang et al. [31] proposed a sparse AE-based HAD method to detect the anomalies in the HSI by using the reconstruction errors of AE, based on the fact that the reconstruction error of AE can reflect the characteristic of anomalies. Zhao and Zhang [32] proposed a spectral-spatial stacked AE method, which fully considers both spatial deep features from the low rank part and spectral deep features generated by sparse component. To reflect This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the local intrinsic structure of HSI in the latent features, Lu et al. [33] proposed a manifold constrained AE architecture for HAD. Similarly, in order to solve the problem that geometric structure among samples is lost in the latent features, Fan et al. [34] developed a robust graph AE detector to achieve HAD. Different from the above strategy that involves manual parameter settings, Wang et al. [35] proposed an autonomous anomaly detection network based on fully convolutional AE to perform HAD. In addition, some research works including variational AE [36], deep belief network [37], and adversarial AE [38] are introduced into HAD to further promote the detection performance.
Different from the methods mentioned above, reconstructionbased methods, which can be roughly separated into three parts: low rank representation (LRR) methods, sparse representation (SR) methods and collaborative representation (CR) methods, have been widely adopted to achieve HAD. The LRR methods holds that the background is low rank and anomalies are sparse [39]. Zhang et al. [40] proposed a low rank and sparse matrix decomposition based method for HAD, which is an early exploration to excavate the potential of background with low-rank characteristic and anomalies with sparse property. However, the above method models the sparse component containing anomaly and noise as a single distribution, which potentially confuses weak anomaly and noise. To address this issue, Li et al. [41] proposed a modified low rank and sparse decomposition model (LSDM) for HAD, which models the sparse component as a mixture of Gaussian (MoG). Further, to comprehensively use the sparse component and low-rank component, Feng et al. [42] developed an LSDM with density peak guided CR to achieve HAD. Different from the above methods, Xu et al. [43] proposed a low-rank and sparse representation (LRASR) method for HAD, which models the low-rank background with the product of a preconstructed background dictionary and corresponding coefficient matrix. However, the purity and representativeness of the background dictionary are not ideal for LRASR. To this end, Su et al. [44] proposed a low rank and CR detector to further promote the effect of the background dictionary construction. Additionally, some regularization terms are imposed into the LRR model, such as graph and total variation regularized low-rank representation [45], local spatial constraint and total variation [46], and dual collaborative constraints regularized low rank and sparse representation [47].
The SR-based methods assume that the pixels in an HSI can be represented by a few elements from the overcomplete dictionary. The background joint sparse representation detector [48], proposed by Li et al., is a successful exploration to introduce the SR into HAD. Further, Ma et al. [49] developed a reliable overcomplete dictionary, termed as discriminative feature learning with multiple-dictionary SR, to promote the effect of HAD. In addition, some constraints are exerted onto the SR model, such as SR and linear mixture model [50] and archetypal analysis and structured sparse representation [51].
Similarly, the CR model [52], proposed by Li and Du, assumes that the background pixel to be tested can be linearly represented by the surrounding background pixels (also called as background dictionary) within the dual windows, while the anomalous pixels to be tested cannot. However, the background dictionary constructed by the dual windows will be affected by the anomalous pixels when the dual windows setting is unreasonable, thus leading to a limited detection accuracy. To tackle this problem, some purification works regarding background dictionary for CR are developed, such as CRD with principal component analysis [53], dual collaborative representation [54], and CR based with outlier removal anomaly detector [55]. In order to fully mine the spatial information of HSIs, an HAD method, namely total variation regularized low-rank tensor decomposition and CR, is proposed by Feng et al. [56]. In the above CR methods, all spectral signatures are constrained to share the same representation coefficient, ignoring the differences among these spectral signatures, thus resulting in an unsatisfactory effect. To resolve this problem, Wu et al. [57] proposed a relaxed CR detector with a nonglobal dictionary, so as to fully consider the difference derived from the different spectral signatures. Although the detection accuracy of the SR-based methods and CR-based methods is acceptable for the conventional scenes with low requirements for detection performance, these methods are hardly suitable for the occasions requiring high detection rate and low false alarm rate. Therefore, it may be an advisable way to combine the SR model and CR model so as to fully excavate the potential of both.
In order to achieve aforementioned goal, we develop a novel approach integrating SR and CR for HAD. First, an HSI is divided into several clusters by the density-based clustering algorithm, in which the cluster with massive entries can be viewed as a coarse background set. Then, an SR model, whose overcomplete dictionary is constructed by the coarse background set and pregenerated superpixels, is established. Further, the employed atoms in the overcomplete dictionary belonging to the SR model are collected to serve as the background dictionary of CR model. In order to comprehensively utilize the information of both SR and CR, we combine the residual features acquired from both SR and CR together by a nonlinear transformation function to generate the response map. Finally, to preserve contour information of the objects, a postprocessing operation with guided filter is exerted onto the response map to get the detection result. The contributions of this article are as follows: 1) A novel method integrating SR and CR is proposed for HAD. Further, a postprocessing operation with guided filter is executed on the response map obtained by combining the residual features of SR and CR, so as to preserve contour information of the objects. In this way, the detection result with highly suppressed background and highlighted anomalies can be achieved. 2) We propose an overcomplete dictionary construction strategy used in the SR model under the aid of the density-based clustering and superpixel segmentation, which can sift pure and representative background pixels to form the overcomplete dictionary. Similarly, a simple yet effective background dictionary construction strategy, which selects the atoms utilized in SR model to construct the background dictionary, is proposed for the CR model. Through the above way, the anomaly interference can be alleviated in the construction of dictionaries, thus yielding a better detection accuracy. The rest of this article is organized as follows. Section II briefly reviews the related work. The proposed SRCR is presented in Section III. Section IV introduces the experiments. The conclusion of the article is drawn in Section V.

A. Sparse Representation Model
For an SR model, it is assumed that a signal can be linearly represented by some atoms from an overcomplete dictionary. A hyperspectral signal (i.e., a pixel) can be represented as , in which b and N separately represent the number of spectral bands and the number of pixels in an HSI. Let D SR ∈ R b×m be an overcomplete dictionary and α SR i ∈ R m denote a sparse vector, where m is the number of atoms in the overcomplete dictionary. Based on this, the hyperspectral signal x i can be linearly represented by the multiplication of the overcomplete dictionary D SR and the sparse vector α SR i . Notably, b m in D SR and only a few entries in α SR i are nonzero. The sparse vector α SR i can be acquired by solving the optimization problem as follows: where || · || 2 and || · || 0 separately stand for the l 2 norm and l 0 norm. K refers to the upper bound of the sparsity level for α SR i . Clearly, (1) is an NP-hard problem. To cope with it, there are two frequently used methods to be adopted, which are greedy pursuit-based methods [58], [59] and l 1 norm convex relaxation methods [60], [61].
Once the estimated sparse vectorα SR i is acquired, the residual response r SR i corresponding to x i can be computed by After the residual response of all pixels in an HSI are calculated through (2), an SR-based residual feature F SR = [r SR 1 , r SR 2 , . . . , r SR N ] can be obtained.

B. Collaborative Representation Model
The CR model assumes that the background pixel to be tested can be linearly represented by the surrounding background pixels (i.e., background dictionary) generated from the sliding dual windows, while the anomalous pixel to be tested could not [52]. Let D CR i ∈ R b×n denote the pixels collection located in the sliding dual windows centered on the pixel to be tested x i ∈ R b , in which b and n are the number of spectral bands in an HSI and the number of the pixels within the dual windows centered on the pixel to be tested, respectively. The CR aims to determine the weight vector α CR i by minimizing both where λ denotes the tradeoff parameter. Taking the derivative of (3) with respect to α CR i and making the derivative be zero where I CR represents an identity matrix. Considering the fact that there is different importance for the pixels in the dual windows, a diagonal matrix is used to adjust the weight vector α CR i , which can be expressed as where d CR 1 , . . . , d CR n represents the columns in D CR i . By imposing the diagonal matrix to (3), the corresponding solution becomeŝ Finally, the residual response r CR i can be obtained by the following formula: Through (7), the residual response corresponding to all pixels in an HSI can be computed, termed as CR-based residual feature

III. PROPOSED METHOD
In this section, a novel HAD method, which combines the SR model and CR model, is proposed, as shown in Fig. 1. The proposed SRCR can be roughly categorized into four key parts: SR-based detector, CR-based detector, feature fusion with nonlinear transformation function, and postprocessing with guided filter, as described below.
A. Sparse-Representation-Based Detector 1) Sparse Representation Model: As stated in Section II-A, the hyperspectral pixel can be linearly represented by some atoms from an overcomplete dictionary. In this case, the SR model can be expressed by (1). Through using the orthogonal matching pursuit method [59] to solve the (1), we can obtain the SR-based residual feature with the help of (2).
2) Overcomplete Dictionary Construction: In the SR model, the overcomplete dictionary is expected to be composed of background pixels, so as to well represent the background pixel to be tested and poorly express the anomalous pixel to be tested [48]. Obviously, an easy consideration is to employ all the background pixels in an HSI to serve as the overcomplete dictionary. However, due to fact that there is no priori information about anomaly available, how to select the background pixels becomes an urgent issue to be resolved.
To resolve the above issue, an overcomplete dictionary construction strategy via density-based clustering and superpixel segmentation is proposed, as illustrated in Fig. 2. To be specific, the density-based clustering method, termed as density-based spatial clustering of applications with noise (DBSCAN) [62]   is applied to obtain a clustering map, in the consideration of its strong ability to discover clusters with different shapes and scales and low requirements on specifying cluster numbers in the presence of noise and outlier. The DBSCAN characterizes the compactness of the data distribution by a pair of neighborhood parameters (i.e., (Eps, MinPts)), in which Eps stands for the minimum distance between adjacent points, and MinPts refers to the minimum number of points. Given a set of data points, the detailed clustering process of the DBSCAN algorithm can be summarized as follows.
First, calculating the neighbors of each point and collecting the points whose number in the neighborhood exceeds given threshold MinPts as core points set. Then, randomly selecting a core point from the core points set to serve as the seed, and searching for points with a distance of no more than a given Eps relative to the core point to form the first cluster. Subsequently, the core points contained in the cluster formed by the previous step will be removed from the core points set, and a new core point will be chosen from the rest of the core points set in a random manner to act as the seed so as to yield the next cluster. Repeat the previous step for times until the core points set is empty. Once the above process is terminated, the cluster map is generated. Given an HSI, under the certain (Eps, MinPts), the cluster map obtained by performing DBSCAN is displayed in Fig. 2. By observing Fig. 2, it is noticed that "cluster 1" accounts for a large proportion of the pixels belonging to the HSI relative to the other clusters, which is consistent with the phenomenon that background pixels occupy a quite large proportion relative to the anomalous pixels in an HSI [19], [63]. Based on this fact, we take the cluster with the most pixels (i.e., "cluster 1" in Fig. 2) as the coarse background set. In this case, it may be a reasonable choice to employ the coarse background set as the overcomplete dictionary. However, a few anomalous pixels probably fail to be excluded from the overcomplete dictionary (i.e., coarse background set), thus resulting in the missing detection of the anomalous pixels.
To address the above concern, considering the fact the anomalous pixels are generally far from the center of each superpixel due to its distinctive spectral characteristics [46], we exploit the center of each superpixel to serve as the atom, so as to construct the overcomplete dictionary that only contains the background pixels. Due to the fast speed, much more memory-efficient and excellent boundary adherence, the simple linear iterative clustering (SLIC) algorithm [64] has been widely used to generate superpixels. In addition, only one parameter (i.e., presegmented number of superpixels), notated by n S , is involved in this algorithm. However, the original SLIC algorithm cannot process the HSI. To cope with it, the principle component analysis (PCA) [65] is first executed on the HSI to acquire the first three principle components. On the one hand, the predominant information about the HSI can be efficiently preserved by the first three principle components. On the other hand, the generated first three principal components can be used to yield the superpixels. Once the superpixels (i.e., {S v } n * S v=1 , in which S v and n * S denote the vth superpixel and the number of generated superpixels, respectively) are formed, we select the pure background pixels from the HSI X with the guidance of both the superpixels {S v } n * S v=1 and coarse background set index map B cb to construct the overcomplete dictionary D SR , which can be formulated as where x c v refers to the pixel in X which has the same position with the center of the vth superpixel. B cb S p v signifies the entry in B cb which has the same position with the pth pixel in the vth superpixel, in which n v stands for the number of pixels in the vth superpixel. Through the above way, the overcomplete dictionary with pure background pixels can be obtained.

B. Collaborative-Representation-Based Detector
1) Collaborative Representation Model: As described in Section II-B, the background pixel to be tested can be represented by the surrounding background pixels in an HSI, while the anomalous pixel to be tested cannot. Correspondingly, the CR model can be formulated by (3) with the weighted matrix. Through calculating the closed-form solution of the objective function, we can acquire the CR-based residual feature F CR = [r CR 1 , r CR 2 , . . . , r CR N ] by virtue of (7). 2) Background Dictionary Construction: For the CR model, it is believed that the background dictionary is vital for the detection performance of HAD [53], [66]. In the existing CRbased methods, numerous researchers attempt to purify the background dictionary constructed by the pixels within the dual windows with the exclusion of the potential anomalous pixels. However, there are two issues that should be addressed for better detection performance. For one thing, the anomalous pixels may be hardly removed completely from the constructed background dictionary, which will cause an unsatisfactory effect for HAD. For another thing, the configuration of the scale of the inner window and outer window is a challenging problem, due to the situation that the inappropriate setting for the size of the dual windows will lead to unreliable results.
To surmount aforementioned problems, we propose a simple yet effective background dictionary construction strategy, which exploits the atoms used in the SR model described in Section III-A to construct the background dictionary. Compared with the existing dual windows-based CR model, the proposed background dictionary construction strategy can sift the pure background pixels for the background dictionary without the interference of the anomalous pixels, and the complicated scale setting for the dual windows is also resolved.

C. Feature Fusion With Nonlinear Transformation Function
For the SR model, in theory, only a few atoms in the overcomplete dictionary participate in the representation of the test pixel. Different from the above, the CR model uses all atoms in the background dictionary to represent the test pixel. Additionally, it is verified in Section IV-F that the CR-based residual feature (i.e., F CR ) has better background suppression effect compared with that of SR-based detector. Therefore, a nonlinear transformation function utilizing F CR is applied on F SR to suppress the background and improve the detection performance, as follows: where D RM stands for the response map by combining the both residual features F SR and F CR .

D. Postprocessing With Guided Filter
Considering that the neighboring pixels in the HSI usually have strong correlations with each other, a postprocessing operation with guided filter [67] is further exploited to smooth the response map while preserving edges. For the guided filter, there are two images (i.e., the filtering input image and guidance image) that should be given in advance. Clearly, the filtering input image is the response map D RM , and the guidance image is defined by the user. It is worth noting that the informative structure of the HSI should be contained in the guidance image. Based on the above situation, we employ the first principle component of the HSI generated by PCA [65], which is composed of the shape of each class or object, and boundary among different classes or objects, to act as the guidance image G = [g 1 , g 2 , . . . , g N ], in which N denotes the number of pixels in the guidance image. The detection result can be acquired by imposing the guided filter on the response map D RM , which can be formulated as where D fin = [d fin 1 , d fin 2 , . . . , d fin N ] represents the detection result. g i refers to the ith pixel in the guidance image G.ā i = (1/|ω|)Σ k∈ω i a k andb i = (1/|ω|)Σ k∈ω i b k refer to the average coefficients of all windows overlapping i, in which |ω| stands for the number of the pixels in a local window ω i centered at the position of the pixel i with window size r where β stands for the regularization parameter. u k and σ 2 k are separately the mean value and variance corresponding to the guidance image G in ω k .d RM k refers to the mean value of the filtering input image D RM in ω k .

A. Datasets
In this section, six frequently employed datasets, which include one simulated dataset and five real datasets, are used to evaluate the performance of the proposed SRCR, which are described as follows:  [68]: The first dataset is a simulated dataset generated by exerting the target implantation method [69] on a real dataset, which was captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Salinas Valley, CA, USA. The dataset has 204 spectral bands and each of them contains 120×120 pixels. The spatial resolution of this data set is 3.7 m per pixel. The composite image and the corresponding ground truth of the dataset are separately displayed in Fig. 3(I-a) and (I-b).

1) Salinas Dataset
2) San Diego Dataset [70]: The second dataset collected by AVIRIS sensor covers the area of San Diego airport, CA, USA. The size of the dataset is 100×100×189, whose wavelength ranges from 0.4 to 2.5 μm. Three airplanes, occupying 134 pixels in the dataset, are considered as the anomalies. The composite image and the corresponding ground truth of the dataset are showed in Fig. 3(II-a) and (II-b), respectively.
3) Pavia Dataset [71]: The third dataset was captured by the Reflective Optics System Imaging Spectrometer at the Pavia city in northern Italy. The dataset consists of 100×100 pixels, with 102 spectral bands ranging from 0.43 to 0.86 μm wavelengths. Some vehicles located in the bridge are viewed as the anomalies, which cover 68 pixels of the dataset. The composite image and the corresponding ground truth of the dataset are separately showed in Fig. 3(III-a) and (III-b).
4) HYDICE Dataset [72]: The fourth dataset was acquired in the urban area of CA, USA by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) airborne sensor. The dataset is composed of 80×100 pixels in each spectral band, and there are 175 spectral bands in total for the dataset, whose wavelength ranges from 0.4 to 2.5 μm. Some cars and roofs with different sizes occupying 21 pixels are treated as the anomalies. The composite image and the corresponding ground truth of the dataset are illustrated in Fig. 3(IV-a) and (IV-b), respectively. [73]: The fifth dataset, collected by AVIRIS sensor, covers the area of Texas Coast. There are 204 spectral bands in wavelengths ranging from 0.45 to 1.35 μm in the dataset, and each of them contains 100×100 pixels. The buildings occupying 67 pixels embedded in the background are deemed as the anomalies. The composite image and the corresponding ground truth of the dataset are separately exhibited in Fig. 3(V-a) and (V-b). [74]: The sixth dataset was captured by the SpecTIR hyperspectral airborne Rochester experiment with a 5-nm spectral resolution and 1-m spatial resolution. In this dataset, there are 180×180 pixels for each spectral band, with 120 spectral bands in total. Some man-made colorful square fabrics with different sizes are deemed to be anomalies. The composite image and the corresponding ground truth of the dataset are displayed in Fig. 3(VI-a) and (VI-b), respectively.

1) Comparative Methods:
To prove the superiority of the proposed SRCR, 10 public methods for HAD, including RX [20], CRD [52], LSMAD [40], LRASR [43], PAB-DC [75], LSDM-MoG [41], TPCA [76], RGAE [34], GAED [77], and NJCR [78] are utilized as a comparison. Among them, the RX detector is the classical statistical theory-based method. The CRD is the pioneering work based on collaborative representation method. The LSMAD and LRASR are typical low rank and SR methods. The PAB-DC is the union dictionary-based low rank and SR method, which can effectively exclude the noise interference. The LSDM-MoG models the sparse component as an MoG, so as to more accurately characterize the complex distribution. The TPCA is a typical preprocessing method to separate the background and anomalies before performing HAD. The RGAE and GAED are recently proposed DL-based methods. The NJCR is the union dictionary based CR method constrained by non-negative characteristic.
2) Evaluation Metrics: To verify the detection performance of the proposed SRCR, we employ the receiver operating characteristic (ROC) curve [79], the area under the curve (AUC) [80], and separability map [81] to act as the evaluation metrics. It is worth mentioning that there are two types of ROC curves, i.e., ROC curve of (P d , P f ) and ROC curve of (P f , τ). Correspondingly, there are two kinds of AUC values, which are separately AUC value of (P d , P f ) and AUC value of (P f , τ). For the ROC curve of (P d , P f ), the closer to the upper left corner, the better the detection performance. In contrast, the ROC curve of (P f , τ) approaching to the lower left corner means that the lower false alarm rate. As for the AUC value of (P d , P f ), whose value spans from 0 to 1, and the higher the value, the better the performance. Conversely, the desired AUC value of (P f , τ) is 0, and the lower the value, the more excellent the performance. The separability map is used to characterize the ability of the detector to extract anomalies from background. For the separability map, the larger the separation distance between the background and anomalies, the more outstanding the performance.
3) Implementation Details: The parameter settings of the comparative methods are configured in light of the suggestions of the original authors or the AUC value of (P d , P f ). The RX detector has no additional parameters to be configured. For CRD, the size of the dual window is configured as (9, 7), (17,15), (15,11), (7,5), (13,11), and (17, 15) over the datasets, respectively. The regularization parameter is configured as 0.01 for all datasets. With respect to the LSMAD, the upper bound of the rank of background component is separately configured as 20, 28, 25, 30, 25, and 25 for the datasets. As for LRASR, the number of clusters and the number selected atoms in each cluster are separately configured as 15 and 20 for all datasets. The regularization parameters (i.e., β and λ) are set as (10, 0.01) for the most datasets, except for the San Diego dataset whose regularization parameters are configured as (1, 0.01). The PAB-DC contains seven parameters to be analyzed, which are window size, cluster number, the level of sparsity, the percent of the selected atoms to build the background dictionary, the number of selected atoms to build the potential anomaly dictionary, and two regularization parameters (i.e., β and λ), respectively. For all datasets, the above parameters are separately set as (1, 10, With respect to the LSDM-MoG, the tunable parameters are initial rank r 0 and the number of mixture Gaussian noise K, and for all datasets, r 0 is configured as 20, and K are separately set to 10,8,7,9,7, and 2. Similar to RX detector, there is no additional parameters to be configured for the TPCA. For the RGAE, three parameters that needs to be adjusted for better performance, which are tradeoff parameter λ, the number of superpixels S, and the dimension of hidden layer n_hid. To get a better accuracy, the three parameters are separately set as 0.01, 150, and 100 for the most datasets, except for the HYDICE and SpecTIR datasets. For the HYDICE and SpecTIR datasets, the three parameters are configured as (0.1, 150, 20) and (0.1, 500, 100), respectively. In the GAED, the penalty coefficient is set as 1 and the window size is configured as 7 for all datasets. For the Salinas, Texas Coast, and SpecTIR datasets, the dimension of the AE middle-hidden layer, learning rate, and number of iterations are set as 25, 0.3, and 500, respectively. In addition, the three parameters mentioned before are configured as (25, 0.1, 300), (25, 0.3, 300), and (30, 0.3, 500) for the San Diego, Pavia, and HYDICE datasets, respectively. With regard to the NJCR, parameter to be adjusted is the regularized scalar λ. For the datasets mentioned in Section IV-A, λ is set as 1, 100, 0.01, 0.1, 100, and 0.001 in turn, respectively. Our experimental environment regarding the proposed SRCR is Python 3.6.13 on an Intel Core i7-9700T CPU 2.00 GHz machine with 16 GB of RAM. Similarly, the PAB-DC is carried out by using Python With respect to the proposed SRCR, the parameters introduced in Section IV-C are configured as fixed values, as listed in Table I, when one of them is analyzed.

C. Parameter Sensitivity Analysis 1) Minimum Distance Eps:
To validate the effect of the performance, we empirically set the Eps from 0.01 to 0.08 at an interval of 0.01, and the corresponding AUC values of (P d , P f ) on experimental datasets are displayed in Fig. 4(a). Through Fig. 6(a), we can notice that the AUC values of (P d , P f ) are minimum distance Eps described in DBSCAN on the detection extremely close to 1 on Pavia and SpecTIR datasets relative to those on the other datasets. In contrast, the AUC values of (P d , P f ) have a violent fluctuation for the San Diego dataset. As for the remaining three datasets (i.e., Salinas, HYDICE, and Texas Coast datasets), the fluctuation degree of the AUC values of (P d , P f ) is between the aforementioned two. For convenience, the optimal Eps on experimental datasets is listed in the second line of Table I.

2) Minimum Number of Points MinPts:
For the minimum number of points MinPts employed in DBSCAN, we configure the values from 1 to 20 at an interval of 1 on experimental datasets, as shown in Fig. 4(b). By observing Fig. 4(b), it can be seen that the AUC values of (P d , P f ) have little change on HYDICE and SpecTIR datasets, and the variations are also small for Salinas and Pavia datasets when MinPts increases. Contrarily, the AUC values of (P d , P f ) have significant variations for San Diego and Texas Coast datasets. It is noteworthy that the optimal settings regarding MinPts on experimental datasets are listed in the third line of Table I.
3) Number of Superpixels n S : To analyze the influence of the number of superpixels n S on detection performance of the proposed SRCR, a large quantity of experiments is carried out on experimental datasets. For the superpixels, massive superpixels means the high computation complexity, while the fewer number of superpixels will fail to construct the overcomplete dictionary employed in the SR. To this end, we set n S as {200, 300, 400, 500, 600, 700, 800 } , and the corresponding AUC values of (P d , P f ) on experimental datasets are exhibited in Fig. 4(c). From Fig. 4(c), we can see that the AUC values of (P d , P f ) remain very stable and stay at a high level on the most datasets except for Salinas and San Diego datasets. Evidently, the fluctuation degree of AUC values of (P d , P f ) on San Diego dataset is  Here, "SR_LRASR" represents that using the background pixels selection method in LRASR to construct the overcomplete dictionary. "SR_DBSCAN" denotes the proposed overcomplete dictionary construction strategy. Fig. 6. 2-D histogram of the detection results with different background dictionary construction strategies. Here, "CRD" denotes the regular background dictionary construction strategy (i.e., sifting background atoms in a dual windows manner). "CRD_SR" represents the proposed background dictionary construction strategy.
higher than that on Salinas dataset. Notably, the optimal settings with regard to n S on experimental datasets are listed in the fourth line of Table I.

4) Sparsity Level K:
For the sparsity level K configured in the SR model, the greater the K, the higher the computation burden.
Correspondingly, the K with small value will bring inaccurate representation for SR model. To this end, we empirically set it as {2, 4, 6, 8, 10, 12}, as shown in the Fig. 4(d). Through the Fig. 4(d), it can be seen that the AUC values of (P d , P f ) keep rising when K is less than 8 for the most datasets, except for the Pavia data set (the optimal detection result is achieved when K is equal to 6). The optimal settings for K on experimental datasets are given in the fifth line of Table I.

5) Tradeoff Parameter λ:
For the tradeoff parameter λ used in the CR model, to analyze its effect on detection performance, λ is configured as {10 −4 , 10 −3 , 10 −2 , 10 −1 , 10 0 , 10 1 } on experimental datasets, which are displayed in Fig. 4(e). By observing Fig. 4(e), it is clearly that the AUC values of (P d , P f ) perform stably on the most datasets, except for the Salinas and San Diego datasets. For the Salinas dataset, AUC values of (P d , P f ) fluctuate to a large extent with the increase of λ. Similarly, the AUC values of (P d , P f ) have a gratifying improvement when λ ranges from 10 −4 to 10 0 on San Diego dataset, while a slight drop occurs for that when λ changes from 10 0 to 10 1 . For convenience, the optimal parameter λ on experimental data sets is displayed in the sixth line of Table I. 6) Window Size r: For the r used to determine the local window in the guided filter, to analyze the effect regarding r on the AUC values of (P d , P f ) over the experimental datasets, a group of numbers range from 3 to 19 at an interval of 2 are configured, as exhibited in Fig. 4(f). Through Fig. 4(f), it is clearly seen that the AUC values of (P d , P f ) can achieve the best point when r equals 5 for the Salinas, San Diego, and SpecTIR datasets. Similarly, as for the Pavia and Texas Coast datasets, the optimal AUC values of (P d , P f ) can be obtained when r is equal to 9. Besides, the best AUC value of (P d , P f ) regarding the HYDICE dataset can be acquired with r = 3.
The aforementioned optimal settings about r on experimental datasets are summarized in the seventh line of Table I.

7) Regularization Parameter β:
For the regularization parameter β employed in the guided filter, we empirically set it as {10 −4 , 10 −3 , 10 −2 , 10 −1 , 10 0 , 10 1 } to verify the change trend of the AUC values of (P d , P f ) over the experimental datasets, which are displayed in Fig. 4(g). By observing Fig. 4(g), we can see that the AUC values of (P d , P f ) have obvious fluctuation on Salinas, San Diego, HYDICE, and Texas Coast datasets, especially for Salinas and HYDICE datasets. Besides, the change trend of AUC values of (P d , P f ) is not obvious for the remainder of the datasets. Also, the optimal β over the experimental datasets is listed in the last line of Table I.

D. Component Analysis 1) Effectiveness Evaluation of Overcomplete Dictionary:
In order to prove the effectiveness of the overcomplete dictionary construction strategy in the SR model of the proposed detector, an SR model which uses all pixels of an HSI to construct the overcomplete dictionary is easily considered for comparison. However, the above strategy to construct the overcomplete dictionary will make the anomaly representation as good as the background representation owing to the participation of anomaly in the representation process, thus resulting in high missing detections. To resolve the above concern, we utilize the frequently used background pixels selection strategy in LRASR [43] to construct the overcomplete dictionary, and the corresponding comparison results are exhibited in Fig. 5. In Fig. 5, compared with the overcomplete dictionary construction strategy based on LRASR (i.e., SR_LRASR), AUC values of (P d , P f ) corresponding to the proposed overcomplete dictionary strategy (i.e., SR_DBSCAN) have obvious improvements on the most datasets, except for the SpecTIR dataset, which is mainly because the detection performance of both mentioned above approaches to the limit value (i.e., 1).
2) Effectiveness Evaluation of Background Dictionary: To validate the effectiveness of the background dictionary, the regular background dictionary construction strategy (i.e., sifting the atoms in a dual windows manner) and the proposed background dictionary construction strategy are considered, as displayed in Fig. 6. By observing Fig. 6, we can find that the AUC values of (P d , P f ) for the proposed background dictionary construction strategy (i.e., CRD_SR) have a gratifying improvement on all experimental datasets compared with those of the regular background dictionary construction strategy (i.e., CRD), which indicates that the proposed background dictionary construction strategy is effective to improve the detection performance of HAD.
3) Effectiveness Evaluation of Postprocessing: In order to verify the effect of postprocessing, the proposed detector without and with postprocessing are analyzed for a comparison, as shown in Fig. 7. In Fig. 7, it is can be noticed that the proposed detector with postprocessing (w/PP) achieves significant improvement on San Diego, Pavia, and Texas Coast datasets compared with the proposed detector without postprocessing (wo/PP). Similarly, the AUC values of (P d , P f ) corresponding to the w/PP have slight improvements than those of the wo/PP over the Salinas, HYDICE, and SpecTIR datasets. For the above cases, we think that there are two possible reasons: 1) the w/PP is nearly close to its limitation value in terms of the AUC values of (P d , P f ) on Salinas, HYDICE, and SpecTIR datasets; 2) as for San Diego, Pavia, and Texas Coast datasets, the highlighted anomalies in the response map of proposed detector before postprocessing have an incomplete structure, making it possible to transfer the structure of the guidance image into the filtering output, thus improving the detection performance of the proposed detector to a large extent. In contrast, for the remaining datasets (i.e., Salinas, HYDICE, and SpecTIR datasets), highlighted anomalies in the response map have relatively complete structure, which has a limited structure migration in this case, thus achieving limited performance improvement.

E. Detection Performance
To validate the effect of the proposed method, detection maps and three evaluation metrics described in Section IV-B-2, which are separately ROC curves (consisting of ROC curve of (P d , P f ) and ROC curve of (P f , τ)), AUC values (including AUC value of (P d , P f ) and AUC value of (P f , τ)) and separability maps, are employed. Fig. 8 displays the detection maps of our method and the comparative methods. The ROC curves of (P d , P f ) and ROC curves of (P f , τ) are plotted in Figs. 9 and 10, respectively. Table II lists both the AUC values of (P d , P f ) and AUC values of (P f , τ).
For the Salinas dataset, the anomalies are effectively detected with high suppression for the background in Fig. 8(I)-(k) relative to the comparative methods. In Table II, it can be seen that the proposed method achieves the highest AUC value of (P d , P f ) (i.e., 0.9963) and the lowest AUC value of (P f , τ) (i.e., 0.0005), which means that the proposed method performs well on both highlighting the anomalies and suppressing the background.
With regard to the San Diego dataset, through Fig. 8(II), we can easily find that the anomalies (i.e., three airplanes) are completely detected for the most comparative methods. However, the existence of the false alarms (such as the background objects in the bottom right-hand corner of the detection map) results in low AUC values of (P d , P f ), as shown in the third line of the Table II. Different from the above, the proposed method can comprehensively detect the anomalies with quite low false alarms, which are separately 0.9946 for AUC value of (P d , P f ) and 0.0022 for AUC value of (P f , τ).    The detection maps regarding the proposed method and the comparative methods on Pavia dataset are exhibited in the third line of Fig. 8. By observing these detection maps, it is noticed that the most comparative methods, such as RX, LS-MAD, PAB-DC, and LSDM-MoG, fail to completely locate the anomalies. Although the anomalies are well detected by the CRD, LRASR, TPCA, and NJCR, the AUC values of (P d , P f ) corresponding to these methods are not very high, which are separately 0.9650, 0.9889, 0.9905, and 0.9943, owing to the relatively high false alarms. In addition, the RGAE and GAED achieve acceptable performance in both the location effect and background suppression. In terms of our method, the detection map can extraordinarily locate the anomalies appeared in the HSI, preserving a quite low false alarm rate, which indicates the high AUC value of (P d , P f ), and low AUC value of (P f , τ) for our method.
The fourth line of Fig. 8 displays the detection maps both the proposed method and the comparative methods on HYDICE data set. Through Fig. 8(IV)-(k), it can be noticed that the proposed method detects the anomalies comprehensively, which is consistent with the AUC value of (P d , P f ) in Table II. By  observing Table II, we can find the proposed method achieves the first-rank AUC value of (P d , P f ) and the suboptimal AUC value of (P f , τ) relative to the comparative methods. Among these comparative methods, the CRD and PAB-DC slightly lower than the proposed method, which are 0.9951 and 0.9955, respectively. In addition, the RX, LSMAD, and LSDM-MoG are in the position of the second echelon in terms of AUC values of (P d , P f ). Although the AUC value of (P d , P f ) is unsatisfactory, the AUC value of (P f , τ) is acceptable for GAED. Even worse, the LRASR, TPCA, RGAE, and NJCR perform poorly both on AUC values of (P d , P f ) and AUC values of (P f , τ).
As for the Texas Coast dataset, as illustrated in Fig. 8(V)-(k), we can find that the proposed method can identify all anomalies with a quite low false alarm rate. In terms of the comparative methods, the background suppression of the first three methods (i.e., RX, CRD, and LSMAD) and last three methods (i.e., RGAE, GAED, and NJCR) is slightly superior to that of the four methods in the middle (i.e., LRASR, PAB-DC, LSDM-MoG, and TPCA). As a whole, the proposed method has optimal AUC value of (P d , P f ) and suboptimal AUC value of (P f , τ) relative to that of the comparative methods, which can be seen in the sixth line of Tables II and III, respectively. Fig. 8(VI)-(k) exhibits the detection map of the proposed method on SpecTIR dataset, which can fully locate the anomalies with few false alarms. Compared with the proposed method, the comparative methods have lower AUC values of (P d , P f ) and higher AUC values of (P f , τ) , which means that their detection performance is worse than that of the proposed method. In addition, the average detection performance for both the AUC values of (P d , P f ) and AUC values of (P f , τ) is listed in the last line of Tables II and III, respectively. Through observing the average detection performance, it can be found that proposed method achieves the excellent AUC value of (P d , P f ) (i.e., 0.9978) and AUC value of (P f , τ) (i.e., 0.0063), which far outperforms that of the comparative method with the second highest detection performance, i.e., 0.9815 for AUC value of (P d , P f ) and 0.0146 for AUC value of (P f , τ).
Further, the ROC curves of (P d , P f ) and ROC curves of (P f , τ) are plotted in Figs. 9 and 10, respectively, to display the variation trend of the (P d , P f ) when the threshold varies. Notably, in order to clearly exhibit the change trend of the ROC curves of the proposed method, the ROC curves of the proposed method are represented by heavy solid red line. Through Fig. 9, we can notice that the ROC curves of (P d , P f ) corresponding to the proposed method are higher than those of the comparative methods on Texas Coast and SpecTIR datasets. Although the ROC curves of (P d , P f ) for the proposed method are not always optimal for the remaining datasets relative to the comparative methods, the overall trend of the proposed method is leading compared with the others. In Fig. 10, it can be found that the ROC curves of (P f , τ) are lower than those of the comparative methods on the Salinas and SpecTIR datasets. With respect to the San Diego and Pavia datasets, although the ROC curves of (P f , τ) for the proposed method are slightly higher than those of RGAE within a specific scope, the overall trend of ROC curves of (P f , τ) for the proposed method is lower than that of the RGAE. For the HYDICE and Texas Coast datasets, the proposed method has acceptable variation trend in terms of the ROC curves of (P f , τ) relative to those of the comparative methods. It is noteworthy that the ROC curves are plotted in a logarithmic-scale manner for better visualization.
In addition, the separability maps, over the experimental datasets, regarding both the proposed method and comparative methods are plotted in Fig. 11 to evaluate the capability of the detector to extract anomalies from background. In Fig. 11, the anomaly and background for each detector are represented by the red line and blue line, respectively. The upper bound and lower bound of the box are 10% and 90% corresponding to the detection result, respectively, and the middle line of the box is the mean of the detection result. The lines of the top and bottom corresponding to each detector are the maximum value and minimum value. The interval of two boxes (i.e., red box and blue box) represents the separability degree of the anomaly and background, and height of the blue box reflects the background suppression ability of the detector. Among the comparative methods, the proposed method has a good separability degree of the anomaly and background and extremely outstanding background suppression effect for all datasets. Similarly, the separability effect of the LRASR performs well on all datasets, while the background suppression effect is slightly lower than that of the proposed method. The RX, CRD, PAB-DC, and NJCR achieve acceptable separability effect of background and anomaly for the most datasets. For the LSDM-MoG, TPCA, RGAE, and GAED, the separability performance of background and anomaly is poor for the most datasets. Even worse, the LSMAD only achieves a nice separability effect of background and anomaly for the Pavia dataset, while the separability performance is extremely terrible for the other datasets.
To evaluate the computation burden of both the proposed method and comparative methods, Table IV lists the averaging computation time of these methods. In Table IV, for all experimental datasets, it can be easily noted that the RX has the minimum calculation time and the PAB-DC needs more time to achieve HAD. For our method, the computation time is acceptable relative to the other methods except for RX on all experimental datasets.

F. Discussion
In this section, the necessity of strategies that combining clustering, superpixels and fused features is validated on the experimental datasets. Notably, when one of the strategies is considered, the remaining strategies are still kept.

1) Clustering-Based Strategy:
If there is no clustering in the proposed method, the centers of all superpixels will be sifted to act as the atoms, so as to construct the overcomplete dictionary. Obviously, in this case, some anomalous pixels involved in the overcomplete dictionary will affect the SR, which will make the   anomalous pixel to be tested achieve an excellent representation, and thus leading to some missing detection of the anomalous pixels. To verify above analysis, the experimental comparison with or without clustering-based strategy on experimental datasets is conducted, as shown in Fig. 12. By observing Fig. 12, for all datasets, it can be noticed that the proposed method with clustering has a better performance relative to that of the proposed method without clustering, which is consistent with the above analysis, indicating that the clustering-based strategy is necessary for the proposed method. 2) Superpixels-Based Strategy: In the proposed method, the use of superpixels has advantages both on the detection performance and running time. To be specific, for one thing, employing the superpixels can further purify the overcomplete dictionary, so as to enlarge the residual of SR of the anomalous pixels, thus yielding a better performance. For another thing, adopting the superpixels enables to reduce the number of the atoms in the overcomplete dictionary, thus shortening the running time of the SR. To prove the aforementioned analysis, we conducted a flurry of experiments on all data sets, as displayed in Fig. 13   For the first three rows, the numbers in yellow located in the upper right corner are the AUC value of (P d , P f ) and AUC value of (P f , τ) , respectively. and Table V. In Fig. 13, compared with the proposed method without superpixels, it can be noticed that the proposed method with superpixels performs quite well on all datasets, especially for the Salinas, San Diego, and SpecTIR datasets. By observing Table V, clearly, it can be found that the proposed method with superpixels has less running time relative to that of the proposed method without superpixels for all datasets. Obviously, the experiments validate the analysis mentioned above, which proves the necessity of the superpixels-based strategy.
3) Fused-Feature-Based Strategy: To verify the necessity that fusing the SR-based feature and CR-based feature, a group of experiments are carried out on the experimental datasets, and the corresponding features are exhibited in Fig. 14. It is worth noting that the above experiments do not consider the impact of the postprocessing operation on the feature. In Fig. 14, it can be seen that the fused features have better performance (i.e., higher AUC values of (P d , P f ) and lower AUC values of (P f , τ)) compared with those of SR-based feature or CR-based feature for the most datasets, except for Texas Coast and SpecTIR datasets. Although the AUC values of (P d , P f ) corresponding to the fused features are comparable to those of the single feature, the AUC values of (P f , τ) keep in a low level, which proves that the fused feature is essential to boost the performance of the proposed method.

V. CONCLUSION
In this article, a novel HAD method integrating SR and CR is proposed. Concretely, an SR model with its overcomplete dictionary constructed via the density-based clustering and superpixel segmentation is first developed. Then, a CR model, whose background dictionary is constructed by using the employed atoms in the SR model, instead of the conventional background dictionary construction strategy based on dual windows, is presented. To take fully advantage of residual features obtained from both the SR and CR, a feature fusion strategy with a nonlinear transformation function is exploited to yield the response map. Finally, a postprocessing operation with guided filter is further to exert on the response map so as to preserve contour information of the objects. To verify the effect of the proposed method, six frequently used datasets, consisting of one simulated dataset and five real datasets, are employed in this article. The experimental results regarding both the component analysis and detection performance demonstrate that the effectiveness and superiority of the proposed method. In the future work, we will transfer the idea of combining SR and CR into other HSI processing tasks, such as classification, band selection, and unmixing.