Dual Collaborative Constraints Regularized Low-Rank and Sparse Representation via Robust Dictionaries Construction for Hyperspectral Anomaly Detection

The low rank and sparse representation (LRSR) technique has attracted increasing attention for hyperspectral anomaly detection (HAD). Although a large quantity of research based on LRSR for HAD is proposed, the detection performance is still limited, due to the unsatisfactory dictionary construction and insufficient consideration of global and local characteristics. To tackle the above-mentioned concern, a novel HAD method, termed dual collaborative constraints regularized low-rank and sparse representation via robust dictionaries construction, is proposed in this article. Concretely, a robust dictionary construction strategy, which thoroughly excavates the potential of the density estimation model and local outlier factor, is proposed to yield pure and representative dictionary atoms. To fully exploit the global and local characteristics of hyperspectral images, dual collaborative constraints corresponding to the background and anomaly components are imposed on the LRSR model. Notably, two weighted matrices are further exerted on the representation coefficients to improve the effect of collaborative constraints, considering the fact that the surrounding pixels similar to the testing pixel should be given a large weight, otherwise the weight is expected to be small. In this way, the background and anomaly components can be well modeled. Additionally, a nonlinear transformation operation, which combines the output of the density estimation model and local outlier factor with the detection result derived from the LRSR model, is developed to suppress the background. The experiments conducted on one simulated dataset and three real datasets demonstrate the superiority of the proposed method compared with the four typical methods and four state-of-the-art methods.


I. INTRODUCTION
H YPERSPECTRAL images (HSIs) have been widely used in the field of band selection [1], [2], [3], image classification [4], [5], [6], hyperspectral pansharpening [7], [8], Manuscript [9], hyperspectral unmixing [10], [11], anomaly detection [12], [13], [14], [15], and target detection [16], [17], [18], owing to the rich spectral information. Hyperspectral anomaly detection (HAD), which aims to search for spectral signatures deviating from the background, has become a research hot topic due to its widespread application in military defense, maritime rescue, and mineral exploration. Compared with hyperspectral target detection, HAD brings more challenges because of the absence of prior information about the anomaly. In recent years, many researchers proposed various methods to detect anomalies in the HSIs. The most typical method is the RX detector [19], which is proposed by Reed and Yu. Also, it is regarded as the milestone of the statistic theory-based methods for HAD. The RX detector holds that the background conforms to the multivariate Gaussian distribution, and the mean value vector and covariance matrix of all pixels belonging to the HSI are estimated to calculate the Mahalanobis distance of each testing pixel. Different from the RX detector, which utilizes the global statistic characteristics, the Local RX [20] employs dual windows to acquire the local statistic characteristics for a better background model. Nevertheless, two-aspect challenges, which consist of the improper multivariate Gaussian assumption of background and the inaccurate background model in the interference of anomaly, exist in the aforementioned methods for the real hyperspectral scenes. To resolve the above-mentioned issues, various extensions of RX detector, containing subspace RX [21], kernel RX [22], random-selection-based anomaly detector [23], multiwindow RX [24], [25], and fractional Fourier entropy RX [12], are gradually proposed to promote the detection accuracy. However, these methods can hardly accurately model the background, with the interference of the anomalies and noise.
To model the background as accurately as possible, the representation-based methods, which can well represent the background pixels and poorly express the anomalous pixels, are widely used for HAD. The residual of the representation-based methods can be employed to measure the anomalies. In general, the larger the residual of a testing pixel, the higher the possibility of a testing pixel being an anomaly. The representationbased methods can be separated into three categories: sparse representation-based methods, collaborative representation (CR)-based methods, and low-rank representation-based methods, considering the different regularization constraints imposed This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the representation models. The sparse representation assumes that the background pixels can be linearly expressed by the atoms belonging to the overcomplete dictionary, whereas the representation effect for the anomalous pixels is terrible. The background joint sparse representation detector, which is a typical representative to explore the sparse model, models the background by selecting the representative atoms in the local regions [26]. Additionally, constructing a reliable overcomplete dictionary is the other key research direction for the sparse representation-based methods. The discriminative feature learning with multipledictionary sparse representation detector, proposed by Ma et al. [27], is an effective strategy to construct a reliable overcomplete dictionary for HAD. However, due to the lack of prior information, it is hard to completely prevent interference from anomalies, constructing an accurate overcomplete dictionary.
The classical CR detector (CRD), proposed by Li and Du [28], holds that background pixels can be linearly represented by the pixels within the dual windows, whereas the anomalous pixels could not. Unfortunately, the pixels located in the dual windows are vulnerable to being contaminated by the anomalous pixels, resulting in unsatisfactory detection performance. To alleviate this problem, a large number of variants, including dual CR [29], self-weighted CR [30], and outlier removal-based CR [31], are proposed to enhance detection performance. With regard to the CR-based methods, the local characteristics are considered in modeling the background, whereas the global attributes fail to be exploited, leading to limited detection performance.
In recent years, the low-rank representation-based methods play a vital role in HSI processing, such as denoising [32], [33], clustering [34], [35], classification [36], [37], and anomaly detection [38], [39]. Different from the CR-based methods, the low-rank representation-based methods assume that the background has a low-rank attribute and the anomalies are treated as sparse. The low-rank and sparse matrix decomposition (LSMAD) method proposed by Zhang et al. [40] is an early attempt to model the background with the low-rank property of background and sparsity of anomalies. Further, the low-rank and sparse decomposition method with a mixture of Gaussian (LSDM-MoG) [41] is proposed to better characterize data distribution. Furthermore, the background component is converted into the product of the background dictionary and the representation coefficient matrices, which has attracted increasing attention due to the great potential in the construction of the background dictionary. The low-rank and sparse representation (LRASR) method [42] can be regarded as a successful exploration for expressing the background part with the background dictionary and representation coefficient matrices. However, the purity of the background dictionary cannot be guaranteed for LRASR. To tackle this issue, Jiang et al. [43] proposed a low-rank embedded network to detect anomalies under the condition of the sample-free. Xie et al. [44] proposed a weakly supervised low-rank representation method for constructing a representative and discriminative background dictionary. To further effectively detect anomalies and lower the influence of noise, Huyan et al. [45] proposed a novel dictionary construction strategy, termed potential anomaly and background dictionary construction (PAB-DC), to describe the anomaly and background characteristics. Notably, the above low-rank representation-based methods only consider the global information, and the local attributes are ignored. To fully consider the local geometrical structure information and global property, Cheng and Wang [46] imposed the graph and total variation regularized terms on the low-rank representation model. In addition, Su et al. [47] proposed a low-rank and CRD (LRCRD), considering both the global attribute and local property. Although the global attribute and local property are considered in LRCRD, the collaborative constraint related to the anomaly component in the dual dictionaries-based low-rank and sparse representation model fails to be considered. Clearly, the anomalous pixels in HSIs can be represented by the potential anomaly dictionary, indicating that CR is suitable for the anomaly component. Besides, the purity and representativeness of dictionaries are vital to detection performance.
To achieve this goal, a novel HAD method, termed as dual collaborative constraints regularized low-rank and sparse representation via robust dictionaries construction, is proposed in this article, as illustrated in Fig. 1. First, a robust dictionaries construction strategy is proposed, jointly leveraging the density maps generated by density estimation model and local outlier factor (LOF). Dual collaborative constraints are then imposed on the low-rank and sparse representation model, in the comprehensive consideration of the global property and local attribute of the HSI. Furthermore, two weighted matrices are exerted on the representation coefficients considering the fact that different atoms should have different importance. Finally, a nonlinear transformation operation is introduced to perform the background suppression, expanding the difference between the background and anomaly. Compared with the existing HAD methods, the main contributions of our method are as follows.
1) A novel dual dictionaries construction strategy based on two-stream density maps is proposed in this article. With the dictionary construction strategy, the pure and representative pixels can be selected to act as the dictionary atoms. 2) The dual collaborative constraints, which consider both background and anomaly components, are imposed into the low-rank and sparse representation model. In this way, global and local information can be fully exploited. 3) To carry out the background suppression, a nonlinear transformation operation, in consideration of the twostream density maps, is developed to exert on the detection result generated by the low-rank and sparse representation model. The remainder of this article is organized as follows. Section II presents the related work. The proposed method is presented in Section III. Section IV introduces the experiments. Finally, the conclusions are outlined in Section V.

II. RELATED WORK
This section is to introduce the classical models utilized in this article. The detailed description of these classical models, consisting of low rank representation (LRR) model and CR model, is as follows.

A. Low-Rank and Sparse Representation Model
In HSIs, the background is regarded as low rank, whereas the anomalies are sparse [42]. Inevitably, noise with stochastic or deterministic characteristics should be considered due to its corruption for HSIs [40]. Based on the above-mentioned analysis, the HSI Y can be decomposed into three components, i.e., the low-rank component B, sparse component A, and noise component N, which are expressed as follows: Considering the fact that there is a strong correlation between background pixels [42] in HSIs, indicating that each of them can be linearly represented by some representative background pixels. Similarly, each anomalous pixel can be expressed by the potential prior of anomalies in HSIs [45]. In this way, the (1) can be converted into the following equation: where D B W Y and D A S Y separately symbolize the low-rank term and sparse term. D B and D A stand for the background and potential anomaly dictionaries, respectively. W Y and S Y refer to the representation coefficient matrices corresponding to the low-rank term and sparse term, respectively.

B. Collaborative Representation Model
The core idea of CR model is that the background pixels can be represented by the surrounding pixels, whereas the anomalous pixels cannot [28]. Let x denote the testing pixel, D S represent the background dictionary, and ϕ indicate the weight vector. The goal of the CR is to determine the weight vector ϕ by minimizing both x − D S ϕ 2 2 and ϕ 2 2 . As a result, the objective function can be described as where υ indicates the regularization parameter. Generally speaking, the surrounding pixels that are similar to the testing pixel should be given a large weight, whereas the surrounding pixels which are different from the testing pixel should have a smaller weight. Accordingly, the distance-weighted-based optimization problem becomes 2 2 , . . . , x − d s 2 ) indicates the diagonal regularization matrix, each item of which denotes the diagonal entry, and diag( · ) represents the operation to construct a diagonal matrix. d 1 , d 2 , . . . , d S are the columns of D S .

III. PROPOSED METHOD
Generally speaking, the 3-D HSI cubic can be converted into a 2-D matrix denoted as X ∈ R d×N , in which d and N signify the number of spectral bands and pixels in an HSI, respectively. For an HSI, the background and anomalies are separately deemed as low rank and sparse [42]. To distinguish the background and anomaly, the dual dictionaries-based representation method is adopted, which can be expressed as X = X B W + X A S + E, where X B W ∈ R d×N , X A S ∈ R d×N , and E ∈ R d×N separately symbolize the low-rank term, sparse term, and noise term. X B ∈ R d×m and X A ∈ R d×n separately stand for the background and potential anomaly dictionaries, where m and n separately represent the number of atoms in background dictionary and potential anomaly dictionary. W ∈ R m×N and S ∈ R n×N refer to the representation coefficient matrices corresponding to the low-rank term and sparse term, respectively.
In an HSI, the CR theory is widely used to characterize the background [28], [30]. With this situation, to exploit the global low-rank and local collaborative attributes, Su et al. [47] regularized the low-rank representation model with l 2 norm of each column of the representation coefficient matrix corresponding to the low-rank component. Motivated by this, dual collaborative constraints, which are composed of the l 2 norm of each column of representation coefficient matrices belonging to the background and anomaly components, are imposed on the low-rank and sparse representation model to act as the regularization terms. In this case, the optimization objective function in the consideration of dual collaborative constraints can be reformulated as where α > 0, β > 0, λ > 0, and γ > 0 stand for the regularization coefficients to tradeoff each term. w i and s i refer to the ith column of W and S, respectively. Note that (6). Therefore, the optimization objective function can be converted into In general, the CR-based methods assume that the surrounding pixels that are similar to the center pixel should be given a large weight, whereas the surrounding pixels which are different from the center pixel should have a smaller weight [28]. To this end, two weighted matrices are imposed on the background and anomaly representation coefficients to adjust the contribution of each atom in the dictionary. Therefore, (7) can be written as represent the weighted matrices corresponding to background and anomaly terms, respectively, in which w * (i,j) indicates the Euclidean distance between the ith dictionary atom and the jth pixel of HSI. • denotes the Hadamard product.

A. Density Estimation via Gaussian Mixture Components
In the existing HAD task, the HSIs are generally considered to follow a single multivariate Gaussian distribution [19], [22]. Apparently, this assumption is not applicable to the real scene due to the highly complicated characteristics of the background of HSIs [43]. As a result, we consider the mixture of Gaussian distribution to characterize the HSIs. There are twofold effects for Gaussian mixture model (GMM), which can be not only for density estimation but also for clustering. A detailed description of GMM-based density estimation is as follows, and the GMM-based clustering is recounted in Section III-C.
To implement GMM-based density estimation, a density estimation model, which contains n H fully connected layers, is established, as illustrated in Fig. 2. Clearly, there is no direct relation between the density estimation model and GMM. To cope with this concern, the output of the density estimation model is set to the number of Gaussian mixture components. In this way, the density estimation model can be closely combined with GMM.
Let f (x) represent the probability density function (pdf), which can be expressed as where μ and Σ are the mean vector and covariance matrix, respectively. | · | indicates the determinant of a matrix. Clearly, f (x) can be determined by μ and Σ. Therefore, we termed it as f (x|μ, Σ). In this case, the pdf of GMM can be defined as where ω q > 0 denotes the coefficient to balance the Gaussian components, and the sum of all coefficients is 1. μ q and Σ q represent the mean vector and covariance matrix corresponding to the qth Gaussian mixture component, respectively. n M signifies the number of Gaussian mixture components. It is worth noting that ω q , μ q , and Σ q are unknown, indicating that these parameters should be estimated. Generally speaking, the parameters of pdf can be estimated by the maximum-likelihood estimation, which can be formulated as follows: To train the density estimation model, the maximumlikelihood function of GMM is employed to act as the loss function. However, the maximum-likelihood function is to calculate the maximum value, whereas the loss function of the density estimation model should be minimized. To this end, the inverse of the maximum-likelihood function of GMM is adopted to act as the loss function to learn the density estimation model. The detailed loss function can be formulated as where o i = softmax(N dem (x i ; θ dem )) signifies the output of the density estimation model corresponding to the ith training sample, in which N dem represents the mapping operation of the density estimation model, and θ dem symbolizes the parameters of the density estimation model. b indicates the batch size of training samples. Once the training process finished, the HSI can be fed into the density estimation model to obtain the output, which can be employed to calculate the parameters of GMM by (13)- (15).
With this condition, we can acquire the density map through utilizing pdf under the estimated parameters. Nevertheless, there is low density value for the position corresponding to anomalous pixels, which is contrary to our original intension. To tackle this concern, the negative log pdf is employed to act as the indicator for obtaining the density map, which can be formulated as where d i m1 denotes the density value of the ith pixel in HSI. When all pixels of an HSI are considered, a 2-D density map D m1 can be obtained.

B. Density Evaluation Based on Local Outlier Factor
LOF [48], which is first proposed for outlier detection, is an effective strategy to evaluate the anomalous degree of each sample. Clearly, this technique can serve as a performance indicator to determine the probability that each sample belongs to an anomaly. In addition, the LOF can also be regarded as a kind of density evaluation without any distribution assumptions. Based on these aspects, the LOF can be employed to evaluate the density of each pixel in HSI. Taking the testing pixel x t in HSI as an example, the detailed description of LOF can be expressed as follows.
1) Compute the Euclidean distance of testing pixel x t and other pixels and define the kth nearest pixel of x t as kdistance. 2) Set the collection to the pixels whose distance from x t is less than k as N k (x t ).

3) Define the reachability distance of
, which can be defined as Compute the LOF of x t with the following formula: . (19) In the above case, the pixel whose LOF value approaches 1 is more likely to be the same class as k-nearest neighbors. In addition, the pixel whose LOF value is much greater than 1 will be reckoned as an anomaly, owing to the sparse characteristics of the anomalies in HSI. Through the above way, the anomaly degree of each pixel in HSI can be calculated. Eventually, the 2-D density map D m2 is formed. Fig. 3. Example to illustrate the process of anomaly judgment. Here, the points with the same color represent the same cluster, and each cluster obeys the Gaussian distribution.

C. Dual Collaborative Constraints Regularized Low-Rank and Sparse Representation 1) Dual Dictionaries Construction:
In the field of HAD, the GMM-based density estimation approach or LOF-based density evaluation method is generally used to detect anomalies [49], [50], whereas there is a limited capability for each of them to measure the anomalous degree of testing pixels in HSIs. As a result, we jointly consider both GMM and LOF to estimate the anomalous degree of each pixel. Concretely, the elementwise product of D m1 and D m2 is employed to search for the most representative anomalies. In this way, the anomaly degree of anomalous pixels will be highlighted, whereas that of background pixels will be suppressed. Similarly, the addition operation pixel by pixel of D m1 and D m2 is executed to reinforce the potential anomalies. The above description can be expressed as As stated in Section III-A, the Gaussian mixture distribution can be used to cluster the HSIs, which can be expressed as where c i ∈ {1, 2, 3, . . . , n M } indicates the cluster to which the ith sample belongs to. argmax(·) refers to the index of the maximum value. o ik stands for the kth output of the density estimation model for the ith sample. With this circumstance, each cluster can be deemed to obey a Gaussian distribution. In terms of the Gaussian distribution, normal samples fall into the interval of 3σ principle, i.e., (μ 0 − 3 × σ, μ 0 + 3 × σ), whereas the outlier samples are outside of the aforementioned range, where μ 0 and σ are the mean value and standard deviation of the samples, respectively. For better comprehension, an example is exhibited in Fig. 3. Based on the above-mentioned analysis, the output can be represented by "0" and "1," corresponding to the normal samples (i.e., background) and outlier samples (i.e., anomaly), respectively. Nevertheless, it is hard to directly adopt the intensity values of pixels in an HSI to calculate μ 0 and σ, due to the fact that there is a wealth of bands for each pixel in an HSI, so we utilize the value of D 1 and D 2 to act as the intensity value of the HSI. Through the aforementioned way, two binary maps, i.e., B 1 and B 2 , are obtained.
To construct the background and potential anomaly dictionaries as accurately as possible, B 1 and B 2 are jointly combined to assist in selecting the atoms of background and potential anomaly dictionaries. Similar to D 1 and D 2 , the elementwise product of B 1 and B 2 acts as the indicator to sift the potential anomaly atoms, and the sum of B 1 and B 2 is used to be the director to choose the background atoms. Notably, to reduce the computation complexity, the principal component analysis is first utilized to reduce the dimensionality of the HSI, and then the simple linear iterative clustering (SLIC) [51] is performed on the first three principal components to obtain the superpixels. Finally, the background atoms can be acquired through the superpixels and the sum of B 1 and B 2 , whereas the potential anomaly atoms can be obtained via elementwise product of B 1 and B 2 . The details can be formulated as where X A and X B represent the collection of potential anomaly pixels and background pixels, respectively.  (8), our goal is to solve the optimization problem in the consideration of the dual collaborative constraints, with the constructed background dictionary X B and potential anomaly dictionary X A . To tackle the aforementioned low-rank and sparse representation problem, four auxiliary variables P, Q, R, and V are introduced to make the objective function separable. Consequently, the problem (8) can be converted into the following form: min W,S,E,P,Q,R,V The augmented Lagrange function of problem (25) is where Y 1 , Y 2 , Y 3 , Y 4 , and Y 5 are the Lagrange multipliers and η is a penalty parameter. tr [·] represents the trace of the matrix. The problem (26) can be solved with the alternating direction method of multipliers (ADMM) method [52], of which the core idea of this is to fix other variables while updating one variable.
1) Update P while fixing other variables. The objective function can be derived as 2) Update Q while fixing other variables. The objective function can be derived as 3) Update E while fixing other variables. The objective function can be derived as 5) Update V while fixing other variables. The objective function can be derived as 6) Update W while fixing other variables. The objective function can be derived as 7) Update S while fixing other variables. The objective function can be derived as The solution of the optimization process is outlined in Algorithm 1. Note that Φ, Ω, and Θ used in Algorithm 1 separately
Once the optimization process is terminated, the detection result D 3 can be determined by the sparse component X A S as where [·] :,i denotes the ith column of a matrix.
To suppress the background interference in the detection result, a nonlinear transformation, which considers both D 1 and D 2 , is performed on D 3 for better detection performance. The nonlinear transformation can be formulated as where D F denotes the final detection result. τ 1 > 0 and τ 2 > 0 indicate the parameters to adjust the shape of nonlinear transformation.

IV. EXPERIMENTS
In this section, we conducted a flurry of experiments to validate the effectiveness and superiority of the proposed method.

A. Datasets and Evaluation Metrics 1) Data Description:
One simulated dataset and three real datasets, acquired by three different sensors, are used to evaluate the performance of the proposed method. The anomalies existed in the datasets appear in different forms, and the details are described in the following.
Salinas Dataset [56]: The first dataset, generated by exerting the target implantation method [57] on a real dataset captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensors, covering the area of Salinas Valley, CA, USA, is the simulated dataset. There are 204 spectral bands in total for the dataset, and each of them contains 120×120 pixels. The spatial resolution of the dataset is 3.7 m. The pseudocolor image and corresponding reference map are displayed in Fig. 4(I-a) and (I-b), respectively.
Pavia Dataset [40]: The second dataset, collected by the Reflective Optics System Imaging Spectrometer (ROSIS-03) sensor, covers the area of the center of Pavia city located in northern Italy. The number of spectral bands is 102 for the dataset, ranging from 0.43 to 0.86 μm. There are 100×100 pixels and a 1.3-m spatial resolution for the dataset. The pseudocolor image and corresponding reference map are separately shown in Fig. 4(II-a) and (II-b), respectively.
HYDICE Dataset [58]: The third dataset was obtained by the Hyperspectral Digital Imagery Collection Experiment (HY-DICE) airborne sensor, covering an urban area of CA, USA. There are 80×100 spatial pixels and 175 spectral bands ranging from 0.4 to 2.5 μm for the dataset, and its spatial resolution is 1 m. The pseudocolor image and corresponding reference map are exhibited in Fig. 4(III-a) and (III-b), respectively.
Gulfport Dataset [47]: The fourth dataset with a size of 100×100×191 was acquired by the AVIRIS sensor at Gulfport, with a spatial resolution of 3.4 m. The pseudocolor image and corresponding reference map are separately illustrated in Fig. 4(IV-a) and (IV-b).

2) Evaluation Metrics:
To evaluate the performance of the proposed method, three commonly utilized evaluation metrics, containing the receiver operating characteristic (ROC) curve [59], the area under the curve (AUC) [60], and the separability map [61], are employed. The ROC curve depicts the relationship between the true positive rate (P D ) and false positive rate (P F ) at different thresholds. The commonly used ROC curves are (P D , P F ) and (P F , τ), which are separately employed to characterize the detection rate and false alarm rate. For the ROC curve of (P D , P F ), the higher the curve, the better the performance. Inversely, the low ROC curve of (P F , τ) means a low false alarm rate, indicating superior performance. Correspondingly, two kinds of AUC scores are adopted, which can be expressed as AUC score of (P D , P F ) and AUC score of (P F , τ), respectively. A detector whose AUC score of (P D , P F ) closely approaches to 1 is viewed as an excellent one. Conversely, the AUC score of (P F , τ) for a superior detector should be close to 0. The separability map, which is generated from the boxplot, characterizes the ability of the detector to extract anomalies from background. The better the detector, the larger the separation distance between the background and anomalies.

B. Experimental Setup 1) Implementation Details:
The running environment of our method is Python 3.6.13, Pytorch-GPU 1.8.2, and CUDA 10.2, which are executed on an Intel Core i7-9700T CPU 2.00GHz machine with 16GB of RAM and a GeForce RTX 2080Ti graphics card with 11GB memory. Other compared methods are carried out on a running environment with MATLAB R2017b except for PAB-DC, which is conducted on a running environment with Python 3.6.13. We optimize the density estimation model via the Adagrad [62] optimizer with a learning rate of 0.0001, and the number of iterations is 1000. The number of hidden nodes is empirically set to 128. The batch size of the density estimation model is set to N, which is the number of total pixels in HSI. For the proposed method, we empirically set the parameters to fixed values, which are expressed as n H = 1, n M = 8, k = 20, n s = 400, α = 0.1, β = 0.1, λ = 0.1, γ = 0.1, τ 1 = 1, and τ 2 = 1, respectively. In our experiments, the above parameters are configured by default, and the optimal detection results can be obtained by tuning these parameters by the user.

C. Parameter Sensitivity Analysis
This section aims to analyze the performance of the proposed method with different parameters. There are ten parameters to be considered, which are separately the number of hidden layers n H , the number of Gaussian mixture components n M , the number of k-nearest neighbors k, the number of superpixels n s , the regularization coefficients α, β, λ, γ, and transformation coefficients τ 1 and τ 2 . In our experiments, we fix the other parameters when a certain parameter is analyzed, and the other parameters are set to the preconfigured values, as stated in Section IV-B1. The details of parameter analysis are illustrated in Fig. 5, in which the abscissa represents the range of the parameters and the ordinate indicates the AUC scores of (P D , P F ).

1) Number of Hidden Layers n H :
To analyze the influence of the number of hidden layers in the density estimation model on the detection performance, a group of numbers, ranging from 1 to 5 at an interval of 1, are set, as displayed in Fig. 5(a). By observing Fig. 5(a), we can notice that the proposed method performs well when n H is equivalent to 1 or 5 on all experimental datasets. Clearly, the computation burden of the density estimation model with five hidden layers is higher than that of the density estimation model with one hidden layer. Therefore, it is considered to be very wise to set the number of hidden layers of the density estimation model as 1.
2) Number of Gaussian Mixture Components n M : Fig. 5(b) shows the effect of n M with different values on the detection performance. Concretely, n M spans from 1 to 10 with an interval of 1. For the Gaussian mixture components, a proper parameter n M is helpful to characterize the background of HSI and isolate the anomalies. Through Fig. 5(b), we can find that the proposed method has an obvious fluctuation with the change of n M for Pavia dataset, while it keeps stable on the other datasets. Notably, the proposed method can achieve excellent results when n M = 8 for all experimental datasets.
3) Number of K-Nearest Neighbors k: For the number of k-nearest neighbors k, if k is relatively small, the central region of the anomalies occupying a large area will reside in a low-density area. Inversely, the computation burden is quite high when k is particularly large. Therefore, the value of k is set as {10, 20, 30, 40, 50}, as illustrated in Fig. 5(c). From Fig. 5(c), we can see that the detector can achieve satisfactory detection performance when k = 20.

4) Number of Superpixels n s :
The effect of the number of superpixels on the detection performance is analyzed, as exhibited in Fig. 5(d). For the detector, fewer superpixels mean that fewer pixels participate in the construction of the background dictionary, resulting in low detection performance. Contrarily, the more superpixels will lead to heavy calculation consumption. To this end, we set n s as {200, 300, 400, 500, 600} to tradeoff the detection performance and calculation burden. By observing Fig. 5(d), it is can be found that the detection performance of the proposed method maintains stable with the variation of n s for the HYDICE dataset. Different from the above, there is a certain degree of change for the remaining three datasets, especially for the Gulfport dataset. Evidently, it is intuitively plausible to achieve the excellent detection performance when n s is equal to 400.
5) Regularization Coefficients α, β, λ, and γ: For the regularization coefficients, all of them are configured as {10 −4 , 10 −3 , 10 −2 , 10 −1 , 10 0 }, and the corresponding detection results under different values are displayed in Fig. 5(e)-(h), respectively. As a whole, the detector has an obvious change in detection performance for Pavia dataset compared with that of other datasets over all regularization coefficients. Through Fig. 5(e)-(h), we can find that the detector can obtain gratifying detection performance when the overall regularization coefficients are equivalent to 10 −1 , 10 −1 , 10 −1 , and 10 −1 , respectively. 6) Transformation Coefficients τ 1 and τ 2 : With regard to the transformation coefficients, both of them are configured as {1, 5, 10, 15, 20, 50}, considering the fact that the high response values in (38) are expected to be reinforced, while the corresponding low response values should be suppressed. Taking the Pavia dataset as an example, the detection results plotted in a 3-D manner are visualized in Fig. 5(i). By observing Fig. 5(i), it can be found that the detection performance remains stable and at a high level. Notably, the detection performance achieves the optimal result when τ 1 = 1 and τ 2 = 5. To obtain the optimal performance, a great quantity of experiments is executed on all experimental datasets, and the corresponding parameter settings are listed in Table I.

1) Effectiveness Evaluation of Dual Dictionaries Construction Strategy:
To validate the effect of dual dictionaries construction strategy based on two-stream density maps, the proposed method is compared with the typical dual dictionariesbased method (i.e., PAB-DC), as displayed in Fig. 6. Here, the postprocessing operation, i.e., the nonlinear transformation in (39), is ignored in the proposed method for a fair comparison. By observing Fig. 6, we can notice that the detection performance of the proposed method outperforms that of the typical method on all experimental datasets. Concretely, the detection performance of the proposed method is much higher than that of the typical method on Salinas, Pavia, and Gulfport datasets. For the HYDICE dataset, the improvement is limited for the proposed method relative to PAB-DC, owing to the fact that the detection performance of both detectors is quite close to 1. Furthermore to clearly show the superiority of the proposed method, the background and potential anomaly dictionaries belonging to the proposed method and PAB-DC are visualized in    7. Compared with the background dictionary constructed in PAB-DC, that constructed in the proposed method is hardly contaminated by the anomalous pixels. Similarly, the interference degree of background pixels to potential anomaly dictionaries in the proposed method is much lower than that of PAB-DC. This means that the proposed method can acquire more pure and representative dictionary atoms relative to those of PAB-DC, achieving more excellent detection performance. Note that "wo/CC" represents the proposed method without collaborative constraints, "w/BCC" indicates the proposed method with the background collaborative constraint, and "w/DCC" signifies the proposed method with dual collaborative constraints.

2) Effectiveness Evaluation of Dual Collaborative Constraints:
To verify the effect of the dual collaborative constraints imposed on the low-rank and sparse representation model, three aspects, containing the proposed method without collaborative constraints, the proposed method with background collaborative constraint, and the proposed method with dual collaborative constraints, are considered, as plotted in Fig. 8. Similar to Section IV-D1, the nonlinear transformation is also omitted for fully excavating the effect of the dual collaborative constraints. As we can see in Fig. 8, the detection performance of the proposed method far exceeds that of other aspects of Salinas, Pavia, and Gulfport datasets. Clearly, the improvement with regard to the detection performance is not obvious for the HYDICE dataset, which is mainly because the detection performance of the proposed method is nearly close to the limitation. Taking the Salinas dataset for example, the detection maps separately corresponding to the three comparative components are visualized, as illustrated in Fig. 9. Through Fig. 9, it can be found that the detection performance of the proposed method with dual collaborative constraints is remarkably higher than those of other types, indicating the effectiveness of the dual collaborative constraints.

E. Detection Performance
We employ detection maps and three evaluation metrics (i.e., ROC curve of (P D , P F ) and ROC curve of (P F , τ), AUC score of (P D , P F ) and AUC score of (P F , τ), and separability maps) to assess the performance of the proposed method. Fig. 10 displays the detection maps of the proposed method and compared methods. The ROC curves of (P D , P F ) and (P F , τ) with respect to the proposed method and compared methods are separately shown in Figs. 11 and 12. Correspondingly, the AUC scores of (P D , P F ) and (P F , τ) are exhibited in Tables II  and III, respectively. In addition, the separability maps between the proposed method and compared methods are illustrated in Fig. 13. It is worth noting that the bold numbers and underlined numbers indicate the best result and worst result, respectively.
For the Salinas dataset, there is the highest AUC score of (P D , P F ) (i.e., 0.9982) and the lowest AUC score of (P F , τ) (i.e., 0.0003) for the proposed method relative to the compared methods. The KIFD achieves the first rank over AUC score of (P D , P F ) among compared methods, whereas superiority over the AUC score of (P F , τ) is not conspicuous enough for KIFD. Notably, the LSDM-MoG has the lowest AUC score of (P D , P F ), which is much lower than the proposed method, indicating that the localization capability of LSDM-MoG is insufficient for the simulated dataset. In terms of PAB-DC, AUC score of (P F , τ) is the highest among all methods, which means a very high false alarm rate, and the localization ability is unsatisfactory. Similarly, the effect of RX and LRASR is poor for locating anomalies. Additionally, the CRD, LSMAD, and MLW_ LRRSTO can identify anomalies with a high detection rate and a low false alarm rate.
With regard to Pavia dataset, as illustrated in Fig. 10(II-i), we notice that the proposed method can identify the total anomalies, preserving low false alarms. The RX, LRASR, LSMAD, and MLW_LRRSTO are second to the proposed method in terms of the AUC score of (P D , P F ), which are 0.9887, 0.9889, 0.9842, and 0.9886, listed in Table II, respectively. Though the anomalies are well located for CRD, the relatively high false alarm rate results in lower detection accuracy. Notably, PAB-DC and KIFD fail to identify the anomalies comprehensively due to the very high false alarm rate, which are up to 0.3009 and 0.2360, respectively.
For HYDICE dataset, the proposed method still outperforms compared methods both on AUC score of (P D , P F ) and AUC score of (P F , τ), especially for LRASR. The CRD, MLW_ LRRSTO, PAB-DC, and KIFD are slightly lower than the proposed method over the AUC score of (P D , P F ), which are separately 0.9951, 0.9960, 0.9955, and 0.9966. At the same time, the false alarm rate of CRD, MLW_LRRSTO, and PAB-DC is satisfactory relative to KIFD. Besides, it is acceptable in the consideration of both AUC score of (P D , P F ) and AUC score of (P F , τ) for RX and LSMAD.
The detection map of the proposed method on Gulfport dataset is displayed in Fig. 10(IV-i). By observing Fig. 10(IV-i), we can notice that the anomalies (i.e., three airplanes) are identified to a large extent. Though two small airplanes are not very clear from a visual point of view, the detection accuracy (i.e., AUC score of (P D , P F )) is very high in comparison with other methods due to the low false alarm rate. In terms of visual effects, the anomalies are submerged into the background for LRASR, PAB-DC, and KIFD owing to the strong false alarm rate. Remarkably, the AUC      13. Separability maps over compared methods on (a) Salinas, (b) Pavia, (c) HYDICE, and (d) Gulfport. Here, a-"RX," b-"CRD," c-"LRASR," d-"LSMAD," e-"MLW_LRRSTO," f-"PAB-DC," g-"LSDM-MoG," h-"KIFD," and i-"Proposed." scores of (P D , P F ) on LRASR and PAB-DC are much lower than that of KIFD, which is slightly lower than the proposed method. The LSMAD, MLW_ LRRSTO, and LSDM-MoG can well detect the anomalies, while the high false alarm rate leads to a low detection rate. In addition, the RX and CRD achieve encouraging false alarm rates, whereas the detection rate of CRD is poor relative to other methods except for LRASR.
To prove the superiority of the proposed method, both ROC curves (i.e., ROC curve of (P D , P F ) and ROC curve of (P F , τ)) are utilized to judge the performance, as illustrated in Figs. 11 and 12, respectively. By observing the ROC curves of (P D , P F ) in Fig. 11, we can find that the ROC curve of (P D , P F ) of the proposed method is higher than that of the compared methods on HYDICE dataset, indicating that the detection performance of the proposed method is optimal on HYDICE dataset. Additionally, the ROC curves of (P D , P F ) of the proposed method lie nearer the upper left corner relative to the compared methods on Salinas, Pavia, and Gulfport datasets when the false positive rates are greater than 10 −2 , which means that the proposed method performs well on these three datasets when the false positive rates are in the range of [10 −2 , 10 0 ]. Although the performance of the ROC curves of (P D , P F ) of the proposed method is not optimal on the aforementioned three datasets when the false positive rates are in the range of [10 −3 , 10 −2 ], the area under the ROC curves of (P D , P F ) listed in Table II is optimal. In summary, as long as the overall performance is excellent, the local behavior is slightly suboptimal and can be acceptable. With regard to the ROC curve of (P F , τ), all compared methods are strongly higher than the proposed method, meaning that the false alarm rate of the proposed method is completely lower than that of the others. Notably, both ROC curve of (P D , P F ) and ROC curve of (P F , τ) are displayed in a logarithmic-scale way for better visualization.
The separability maps on experimental datasets are further adopted to evaluate the effect of extracting anomalies from background, as shown in Fig. 13. In Fig. 13, two-column boxplots corresponding to each detector separately represent the anomaly (in red) and background (in blue). The bottom bound and upper bound of the box are separately 10% and 90% of the statistical interval of the detection results, respectively, and the line in the middle of the box is the median of the statistical interval. The lines of top and bottom for each column are the extremums, which are separately the maximum and minimum values of the statistical interval, while the 0%∼10% and 90%∼100% intervals are represented by dotted lines. The interval between the red box and the blue box indicates the separability degree of the anomalies and background. The height of the blue box represents the suppression effect for the background. For Salinas dataset, there are overlaps between the red box and blue box to some degree for RX, LRASR, PAB-DC, and LSDM-MoG, indicating that their separability degree is limited. The rest of compared methods and the proposed method perform well in separating the background and anomaly, whereas the background suppression capability of the proposed method is superior to the rest of compared methods, especially for KIFD. Similarly, for Pavia dataset, we can notice that the overlaps with regard to the red box and blue box are obvious for PAB-DC, LSDM-MoG, and KIFD, which means that the separability performance is unsatisfactory. Although the separability of the other compared methods is slightly better than the proposed method, the background suppression effect of the proposed method evidently outperforms all compared methods. Likewise, for HYDICE dataset, we can find that there is a nice tradeoff between the separability (i.e., the ability to extract the anomaly from background) and background suppression in comparison with the compared methods. It is worth noting that the proposed method still performs well for Gulfport dataset, while the separability is poor for most compared methods except for KIFD. In summary, on experimental datasets, the proposed method has the best background suppression capability and acceptable separability between the anomaly and background relative to the compared methods.
Furthermore, the running time of all methods is listed in Table IV to evaluate the computational burden. Notably, the bold numbers and underlined numbers represent the best result and worst result, respectively. The RX achieves the first-rank realtime performance in comparison with other methods. Although the proposed method is more time-consuming relative to most methods, the running time of the proposed method is less than that of PAB-DC, which is the typical dual dictionaries-based low-rank and sparse representation method, indicating that the proposed methods can be acceptable in the consideration of the detection performance and running time.

V. CONCLUSION
In this article, a novel HAD method, which imposes dual collaborative constraints on low-rank and sparse representation models, in conjunction with the robust dictionary construction, is proposed. Concretely, to thoroughly employ the global and local information in the HSI, dual collaborative constraints act as the regularization terms to model the background and anomaly components. Considering the fact that surrounding pixels with high similarity relative to the pixel under test should be given high weight, two weighted coefficient matrices corresponding to background and anomaly components are further introduced. To acquire better detection performance, a robust dictionary construction strategy, which fully exploits the density maps generated by the density estimation model and LOF, is proposed, yielding a pure and representative background and potential anomaly dictionaries. To validate the effectiveness and superiority of the proposed method, the experiments are carried out on four datasets consisting of one simulated dataset and three real datasets with different sensors. The effectiveness of the proposed method is verified by the component analysis of experimental datasets. Similarly, the superiority of the proposed method is also proved by comparing it with the four typical methods and four state-of-the-art methods. In future work, we will think about how to reduce the complexity of the proposed model, reducing the running time.