Hyperspectral Anomaly Detection Method Based on Adaptive Background Extraction

Anomaly detection based on clustering is a classic method that supplies a simplified manner to describe a cluttered background. However, traditional clustering methods need to know the number of clusters in advance and attempt to classify all the background pixels at one time. In addition, compared with large background clusters, small clusters are hard to discriminate due to their small populations. In this paper, an anomaly detection method based on adaptive background extraction is proposed. We apply an unsupervised clustering method to determine the cluster centers according to only the similarity of the spectral signature. To reduce the influence of the population, we propose to extract background clusters iteratively. Every iteration, we only cluster the larger clusters and extract them from the data-set. In the next iteration, the remaining pixels are clustered again. Without interference from the larger clusters, the centers of smaller clusters will appear obviously. The clustering process stop when the number of remaining pixels nears the appearance probability of anomaly (generally approximately 10%~20%). Then, only anomalies and few background pixels remain to test. Finally, every extracted background cluster, as a viewer, is applied to measure the anomaly salience of the test pixels. In addition, a weighted summation is proposed to fuse the different salience values from different viewers. Simulation experiments on two sets of real data are presented to demonstrate the superiority of the proposed method.


I. INTRODUCTION
Hyperspectral technology provides almost continuous spectral information of the ground surface and makes it possible to detect and extract features and targets on the ground [1], [2]. An anomaly is a kind of target that has obviously special spectral signature compared with the surrounding background. Without atmospheric correction and radiation calibration, anomaly detection technology discriminates the interested target or area based on its special spectral characteristics [3]. As an unsupervised target detection technology, anomaly detectors have been widely applied in environmental protection, precision agriculture, geological exploration, national defense security, and search and rescue in public accident [4]- [6].
Background model-based anomaly detection has been the most popular method for about thirty years, and it is still The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy . a major and effective solution for anomaly detection today. The basic idea is that an anomaly can be separated if the background or non-anomaly can be described [7]. For this, accurate background description has become the key point in these methods. The Gaussian model and multiple Gaussian mixture model are the typical background description models because of their ease of calculation.
In 1990, Reed L. proposed local model-based anomaly detector RXAD, which describes the background by a dual window [8]. RXAD constructs a local background model, based on the statistic feature of an external window. Then, it measures the anomaly saliency of the test pixel in an internal window. Local background modeling provides a simplified method for background description. However, it is easy to trap into a local optimum. In addition, the size of the sliding dual window is hard to choose, while too large a window will induce contamination from anomalies, and too small a window cannot ensure the accuracy of the background description. For this, different local background models were proposed to decrease the influence of the windows size, such as the single local area [9], [10] and multi-local area (MSAD) [11].
Compared with the local model, the global model is aimed to obtain a homogenous description of the background [12], [13]. The global model constructs a uniform model which is based the whole background features. However, the diversity of ground objects in the background increases the difficulty of the global background description. Many solutions have been proposed to simplify the complexity of the global model, such as the multivariate normal-based anomaly detector (MVN) [14], the cluster-based anomaly detector (CBAD) [15], [16] and subspace model-based anomaly detector (SSM) [11]. The MVN hypothesizes that the background consists of different materials, and the spectral character of every material obeys the Gaussian distribution. The cluster-based background description is a compromise method, which can effectively avoid the contamination problem and obtain a global background model. CBAD clusters the background into multiple clusters, and the spectral signature of every cluster is simple. In addition, the whole background is characterized by mixture Gaussian model. However, CBAD needs to know the number of clusters in advance. In addition it is hard to know how many kinds of objects exist in a cluttered background. Another kind of model simplification method depends on spatial transformation, such as nonlinear kernel function mapping. For example, Kernel-RXAD simplifies background features by nonlinear kernel transform [17], [18]. Then, the cluttered background can be described by a Gaussian distribution in the highdimensional feature space. However, the appropriate scale parameter is a bottle neck problem in these methods. A small scale will induce discontinuity, and large scale also has the problem of over-smoothing. The random selection anomaly detector (RSAD) [19] constructs multiple background model, and calculates the anomaly saliency of test pixels by different background models. Finally, different detection results are fused to obtain the final result. A cluster kernel RX algorithm (CKRX) [20], which induces cluster process to compute kernel distance and has fewer computations than traditional Kernel RXAD, is proposed as a generalization of kernel RX(KRX).
With an accurate background model, anomalies are easy to separate due to their obvious spectral difference. However, pure background pixels are difficult to extract due to the limited spatial resolution and complex spectrum of ground objects. In addition, the statistical model cannot address the cluttered background. For this, there are some remedy methods to improve the background model-based detector. A background purification method is proposed to obtain a purified background set, which improves the performance of LRXD, GRXD and KRXD [21]. It is also a good method for estimating a weight matrix and using the detected result of any AD method. A large weight value means a high probability of being an anomaly [22].
To avoid the extraction of background pixels, some methods utilize characteristics of low-rank and a sparse background to detect anomalies. With the assumption of similarity of spectral features and spatial features of the background, many of the latest anomaly detection (AD) attempt to learn a dictionary and detected anomaly by reconstruction error [23]- [27]. Other methods apply slowly varying signal analysis [28], matrix decomposition [29], [30] or optimal filters [31], [32] to detected targets.
Without the spectrum signature of the anomaly target, it is hard for the algorithm to determine the background and anomaly accurately. Anomaly detectors based on a background model need pure background pixels to construct a background statistic model. Many works have been performed to reduce the purity requirements of background pixels or enhance the robustness of the model to anomaly contamination. Recently, clustering and iterative calculation have become increasingly popular in anomaly detectors. Unsupervised clustering, as a type of self-organized classification method, only extracts most background pixels which supply a relatively accurate measurement for anomaly salience [22] and spatial-spectral similarity [25]. It also supplies a relatively accurate data-set to construct a background dictionary [24], [26], [27]. Iterative computations is another effective method for solving this kind of problem. Because of the lack of a specific quantity of anomaly appearance probability, anomaly detection (AD) is according to the relative rareness of the spatial distribution and difference in spectral features. That means we do not know the correct answer, but we obtain a closer possible answer by iterative calculation.
In this paper, we propose an anomaly detection method that extracts background pixels iteratively by an adaptive clustering process. It has three contributions as follow: (1) We apply an adaptive clustering method to cluster background pixels. Cluster centers can be identified quickly and adaptively according to only the similarity of the spectral signature. (2) To reduce the influence of larger clusters, an iterative clustering process is proposed to extract background clusters step by step. After eliminating larger clusters, centers of small clusters will appear in the densitydistance chart. (3) Considering the confidence level of different clusters, a weight fusion is proposed to fuse the detected results from different viewers. The weight of every cluster is calculated by a weight function. This paper is arranged as follows. Section 2 introduces the main idea and processing steps of our method. Section 3 evaluates the effectiveness of our method by experiment and analysis. Section 4 discusses the selection of parameters. The final section conclude.

II. METHOD
The proposed adaptive background extraction-based anomaly detection method contains two parts: iterative clustering and multi-viewer anomaly detection. VOLUME 8, 2020 First, an adaptive clustering method is applied to extract background clusters iteratively. An innovative clustering method, as an unsupervised clustering method, can adaptively and quickly determine the cluster centers [33]. Referring to the spectrum similarity of pixels, we construct a densitydistance chart of hyperspectral data. In the chart, cluster centers are far away from the majority. Considering the influence of larger cluster on small clusters, we only cluster and extract 2∼4 larger clusters in one iteration. Then, we draw the density-distance chart of the remaining pixels and clusters. The centers of smaller clusters will appear without interference from larger background clusters. Most background pixels are extracted by clustering until the number of the remaining pixels near the appearance probability of anomaly (generally approximately 10%).
Second, we use all the extracted background clusters as viewers to measure the anomaly salience of the remaining pixels. For every pixel, there are different anomaly salience values measured by different viewers. In addition, the final result is fused by a weighted summation.

A. ITERATIVE CLUSTERING
Just like most AD methods, spectral similarity is the basic criterion for discriminating target and background. In this paper, we apply an innovative clustering method for extracting background pixels according to the similarity of the spectrum signature. It constructs two measure parameters: density and distance. Density is the local population around the point. Distance is the other parameter, which is decided by density. This clustering method assigns the cluster centers as the points which have higher population and are far away from each other. Based on the local density and relative distance, cluster centers can be found by competition between different clusters.
Hyperspectral data X has N pixels . . x N }, X ∈ R N ×K and every pixel x i has K bands. The local density of pixel x i is defined as the number of pixels x i with a similar spectrum signature in a local neighborhood. where It measures the similarity of the spectrum signature between the ith pixel x i and the jth pixel The area of neighborhood is confined by cutoff radius T d . Pixels with higher local density ρ i have a higher possibility to be a cluster center. However, many pixels may have higher density in the same cluster. There is other knowledge that cluster centers should be far away from each other. Then, relative distance δ i of pixel x i is defined according to the value of ρ i and ρ j . where δ i is the maximum distance between x i and x j (i = j), when ρ i > ρ j . That means pixel x i is a cluster center, and relative distance δ i confines the maximum radius of this cluster. δ i is the minimum distance between x i and x j , when ρ i < ρ j . That means x i is a follower of a cluster center at pixel x j . Relative distance δ i is the distance from pixel x j to its center. A density-distance chart (ρ − δ chart) is drawn to determine cluster centers adaptively. In ρ − δ chart, cluster centers stand out because the centers have larger density ρ i and a larger distance δ i simultaneously. Then, the centers of the clusters are chosen artificially. Take a random data-set as an example and draw its ρ − δ chart to show the clustering process. In Fig. 1(a), five colored points, which have both higher density and higher distance, are far from the majority. Considering that density implies the size of the cluster, the cluster centered at the yellow point has the highest population, and the cluster centered at the brown points has a lower population. In Fig. 1(a), there are some black points that depart from the majority due to their larger δ and ρ. However, they are not remarkable compared with those colored points. Therefore, in the same chart, the centers of the small clusters are inhibited by those larger clusters. Taking these colored points as cluster centers, we obtain five clusters (showed in Fig. 1(b)) which are colored the same as their center. In addition, other black points do not belong to any cluster. 35448 VOLUME 8, 2020 Considering the complexity of hyperspectral data, the size of the background clusters is different. To extract as many background clusters as possible, especially small clusters, we propose to cluster the background iteratively. In each iteration, we cluster the larger clusters and extracted it from the data set, and then cluster the remaining data. Without the influence of the large clusters, the centers of the smaller clusters will emerge in the new ρ − δ chart. The iterative process stops when the number of remaining pixels nears a lower limit. Fig. 2 shows the iterative clustering process of a random data set.
In Fig. 2 (a), there are two pixels far from the majority, which can be considered as cluster centers. The clusters centered at ''point 1'' and ''point 2'' have larger sizes. The other two points labeled as ''point 3'' and ''point 4'' are not remarkable compared with ''point 1'' and ''point 2''. We extract two clusters centered at ''point 5'' and ''point 6'' from the dataset and redraw the ρ − δ chart for the remaining data ( Fig. 2(b)). Many points are released from the majority without the inhibition of the largest cluster. Another ''point 1'' and ''point 2'' are removed from the majority. This time we extract two background clusters centered at these two points. Clustering processing continues iteratively until the number of remaining pixels is lower than the threshold. Considering the appearance probability of anomalies is 10%, we set the threshold to 20% to avoid loss.

B. MULTI-VIEWER ANOMALY DETECTION
After iterative clustering, most background data have been extracted. To obtain a comprehensive and objective valuation of anomaly salience, each background cluster is viewed as an independent observer to measure anomaly salience of the remaining data. For example, the Mahalanobis distance of the ith test pixel x i is defined based on the background cluster C 1 , where C 1 = { x 1 , x 2 , . . . , x i , . . . , x N 1 } has N 1 pixels.
where is the covariance matrix of background cluster C 1 ; µ is the mean value of C 1 . Mahalanobis distance D 1 ( x i ) measures the similarity between pixel x i and background cluster C 1 . A larger D 1 ( x i ) indicates a larger difference between and background cluster C 1 and equals higher anomaly salience. If the background U has three clusters U = {C 1 , C 2 , C 3 }, every pixel x i will have three anomaly salience value with different background clusters.
To obtain the final anomaly salience of pixel x i , we need to fuse all the salience values D 1 ( x i ).
Considering the larger background cluster occupied most of the background area, anomaly salience obtained by this cluster should have higher credibility. In contrast, small background clusters only occupied a small part of the background. And we cannot exclude the probability that small clusters are anomaly or target clusters. Therefore, the anomaly salience measured by small clusters will be adopted cautiously. So, a weighted summation is proposed to fuse the results from different viewers. The weighted summation fusion process is denoted as follow.
where γ c is the weight of cluster C c . The value of γ c is calculated by the normal weight function according to the size of the background cluster C c . A larger cluster has a larger weight, a smaller cluster has a smaller weight.
where C c ∈ {C 1 , C 2 , . . . , C C }, N c is the pixel number of cluster C c , N is the total number of pixels in data-set. The proposed method is showing as follow.

A. DATA SET DESCRIPTION
Two kinds of hyperspectral data are applied to evaluate the performance of our proposed method. VOLUME 8, 2020 Adaptive Background Extraction Algorithm Input: hyperspectral data matrix X N ×K , anomalies percent p = 20%. Output: final detection result of every pixel M x i .
Step 1: Normalized the hyperspectral data X N×K ; Step 2: Calculate density ρ i and distance δ i of every pixel x i in data set X; Step 2.1: Identify centers of larger clusters by the ρ − δ chart; Step 2.2: Cluster the larger clusters and extract them from the data set. Step 2.3: Redraw the ρ − δ chart for the remaining data; Step 2.4: if the number of remaining pixels is more than 20% of the total number, then go to Step 2.1.
Step 3: Calculate the anomaly salience D c x i of the pixel x i with the background cluster C c c = 1, 2, · · · , C Step 4: Obtain weight γ c of cluster C c by the normal weight function according to the size of cluster C c ; Step 5: All of the anomaly salience value of the pixel x i are fused by weighted summation; The first group of experimental data, the HYDICE Urban data-set is download from the U.S. Army Engineer Research and Development Center website. The HYDICE data includes 210 spectral bands ranging from 400 nm to 2,500 nm, with a spatial resolution of 3 m. After removing the bands of water vapor absorption and low SNR (1-4, 76, 87, 101-111,136-153, 198-210), 162 effective bands remained. The size of the original data is 307 × 307×210. In this paper, image blocks with a size of 80 × 100 are cropped for the simulation experiment. This area contains vegetation, buildings, asphalt, motor vehicles and other ground objects, as shown in Fig. 3(a).These data also provide the ground truth, as shown in Fig. 3(c), where the bright target is the anomaly, and the size of each anomaly target is between 1 × 2 and 2 × 2 pixels.
The second group of experimental data is HyMap data. These data is taken by the HyMap imaging spectrometer for Cook city, Montana. The spatial resolution of the HyMap imager is 3 m, and it provides 126 spectrum bands ranging from 450 nm to 2,500 nm. The size of the original data image is 280×800×126. We only take part of the data as the simulation experiment data with a size of 100×100 pixels, as shown in Fig. 3(c).There are six abnormal targets in this region, including building areas and vehicles. The corresponding ground truth shows the location of the anomaly target, and the size of anomaly target is approximately 2 × 2 to 3 × 3, as shown in Fig. 3(d).

B. DETECTION PERFORMANCE
To verify the performance of the proposed algorithm, CBAD, LRX, WSCF, 2DCAD and other algorithms are selected as comparison methods to detect anomalies. CBAD is the first detection method which introduces clustering into the background description. LRX is the most classic anomaly detector, and WSCF has similar accuracy as LRX. 2DCAD is a brand new method which is different from the idea of background description. It detects an anomaly by the edge feature of targets. The performances of the detection methods are evaluated by visual display and criteria quantitative parameters. The classical receiver operating characteristic (ROC) and area under the curve (AUC) are used as evaluation criteria.
The background in HYDICE includes roads, water bodies, airports, green land and other ground objects. It can be seen in Fig. 3(a) that all background ground objects are evenly distributed and have similar sizes. The anomalies in these data are cars on the road, which are small and sparse. According to the size of targets, the sliding windows of the comparison algorithms are chosen as internal windows is 3 × 3 and external windows is 7 × 7. CBAD needs to set the number of clusters in advance. Here, it is set to 8, the same as paper in [15]. The WSCF algorithm set the parameter α = 0.2. In the proposed method, we set the threshold of the iterative process is to p = 0.2, which indicates that the extraction process of the background cluster will stop when the number of remaining pixels is 20% of the total quantity of hyperspectral data. Finally, we apply the normal weight function to compute the weight of different background clusters. For HYDICE data, 7 background clusters are extracted in total. In addition, for HyMap data, 8 background clusters are extracted. In the simulated experiment, the centers of the cluster are chosen artificially. Every time, we chose at least 2 centers.
Anomaly salience of the proposed method and the comparison algorithms are shown in Fig. 4(a)-(f) in 3-D. Fig. 4(a) is the ground truth (GT) of the HYDICE data. It can be seen that most anomaly targets appear on the left side of Fig. 4(a). In Fig. 4(b), the location and number of anomalous targets are mostly coherent with GT, and the distribution structure of the ground object is mover obvious than others. We cannot see the significant geographical structure in the other four salience maps, Fig.4(c)-Fig. 4(f).
Also, we use 3-D to display the anomaly salience of different detection methods for HyMap data, which is shown in Fig. 5(a)-(f). The first figure is the ground truth of this data; the targets are at the center part of this area. In Fig. 5(b), anomaly salience of most background are zero. In addition, at the center part, the salience value is larger than the background.
For other methods, show in Fig. 5(c)-(e), the background also has higher anomaly salience. In 2DCAD, the background and target can be separated by anomalous salience. However, the false alarm rate is higher than other methods. The proposed method is effective in inhibiting background and highlighting anomalies.
According to the anomalous salience value, an ROC curve is applied to evaluate the abnormal detection effect of each algorithm on the HYDICE and HyMap data. Fig. 6 shows the hit rate of the proposed method. Compared with other algorithms, the proposed method is higher at the same false alarm rate obviously. Additionally, the AUC is larger than that of the other algorithms. Fig. 7 is a comparison of the ROC curve and the AUC for HyMap data. It also shows that the proposed method has a better performance than that of the other methods.

IV. DISCUSSION
CBAD determines that the number of anomalies is too small to form a single cluster, and its spectral distribution is obviously different from the background. Based on this assumption, hyperspectral data is clustered into a specified number of clusters, and the model attempts to obtain a global background model. The proposed method applied a clustering method to extract the background. However, it is a coarse method compared with CBAD. Without the number of clusters, the clustering process stops when the most VOLUME 8, 2020  background pixels are extracted, which avoids contamination from anomalies, and reduces the limitation of prior information. Paper [20] proposed a simple filter which expect to improve the anomaly detection performance. In the future study, we will improve our method with this filter.
LRX and WSCF are based on local background modeling. Their performances are related to the size of the double windows and fitness between the background and model. 2DCAD detects anomalies by the mutations in the edge of the target. It also implies that the background should be smooth and simple. A simple background is more beneficial for LRX, WSCF and 2DCAD.

A. COMPARISON OF DIFFERENT WEIGHT FUNCTIONS
In this paper, we use the normal function to estimate the weight of every background cluster. In addition to this, there are some other weight functions, such as parabolic types, the function and the average method. To check if different weight functions will affect the accuracy of the final result, four weight functions are applied to calculate the weight of every background cluster. Fig. 8 compares the detection results with different weights. The ROC curve (Fig. 8 (a)) and AUC value (Fig. 8 (b)) shows that parabolic type function and function have lower AUC values. However, there is little difference between different weight functions.
Without knowledge of the numbers and sizes of the clusters, the proposed method iteratively and adaptively extracts background pixels according to the spectral similarity. There are still some limitations, which need to be further studied. First, in every iteration, we need to choose more than one center to cluster, because the clustering method obtains cluster centers by competition between different centers. Second, we use an empirical value as the iteration termination condition in the background extraction process. However, it is not the optimal value in complex situations. Third, cluster centers are chose artificially during the iterative clustering process. These questions will be addressed in our next study.

B. COMPARISON OF RUNNING TIMES
The computational cost is another important fact, which affects the anomaly detectors efficiency. We compare the running time of the proposed method and other classical methods in TABLE 1. Timings are taken on an Intel Core i5-2520M, 2.5 GHz CPU with 8 GB RAM. All the experiments are implemented on the MATLAB R12a platform. In this paper, we calculate the spectral distance of pixels and save it as a.mat file in advance. The calculation time does not include this part of the calculation time. Our method needs more time than other classical methods. Iterative computation always has high computational costs. Acceleration of the proposed method will be an interesting future research topic.

V. CONCLUSION
For anomaly detection of hyperspectral data, the accuracy of background extraction is directly related to the accuracy of subsequent detection. With the cluttered background, background pixels are difficult to identify because of the contamination by anomalies and the limited spatial resolution. In this paper, an adaptive background extraction method is applied to obtain the most background pixels. Without knowing the specific number of clusters, background pixels are extracted iteratively by data-driven cluster processing. Every background cluster is viewed as an observer to measure the anomaly salience of the tested pixels. In addition, a weighted summation is applied to fuse the different results from different viewers. Experimental results demonstrate that the proposed method has advantages of adaptiveness, robustness and efficiency under cluttered backgrounds. Nice performance also been showed by the ROC curve and AUC index compared with the other classical methods. However, the computational cost of the proposed method is higher, and the selection of cluster centers needs human assistance. These will be points for improvement in further research.