Anomaly Detection Based on Tree Topology for Hyperspectral Images

As one of the most important research and application directions in hyperspectral remote sensing, anomaly detection (AD) aims to locate objects of interest within a specific scene by exploiting spectral feature differences between different types of land cover without any prior information. Most traditional AD algorithms are model-driven and describe hyperspectral data with specific assumptions, which cannot combat the distributional complexity of land covers in real scenes, resulting in a decrease in detection performance. To overcome the limitations of traditional algorithms, a novel tree topology based anomaly detection (TTAD) method for hyperspectral images (HSIs) is proposed in this article. TTAD departs from the single analytical mode based on specific assumptions but directly parses the HSI data itself. It makes full use of the “few and different” characteristics of anomalous data points that are sparsely distributed and far away from high-density populations. On this basis, topology, a powerful tool in mathematics that successfully handle multiple types of data mining tasks, is applied to AD to ensure sufficient feature extraction of land covers. First, the redistribution of HSI data is realized by constructing a tree-type topological space to improve the separability between anomalies and backgrounds. Then, topologically related subsets in this space are utilized to evaluate the abnormality degree of each sample in a dataset, and detection results for the HSI are output accordingly. Abandoning traditional modeling but focusing on mining the data characteristics of HSI itself enables TTAD to better adapt to different complex scenes and locate anomalies with high precision. Experimental results on a large number of benchmark datasets demonstrate that TTAD could achieve excellent detection results with considerable computational efficiency. The proposed method exhibits superior comprehensive performance and is promising to be popularized in practical applications.


I. INTRODUCTION
H YPERSPECTRAL remote sensing utilizes hyperspectral sensors (i.e., imaging spectrometers) mounted on different space platforms to image-specific scenes in continuous and subdivided spectral bands spanning visible light, near-infrared, and short-wave infrared (0.4-2.5 μm) [1], [2], [3]. Compared with traditional images, hyperspectral images (HSIs) contain both image information and spectral information [4], [5], [6]. The abundant information provided by HSIs makes hyperspectral remote sensing a valuable technology with strong comprehensiveness and broad application prospects [7], [8], [9]. The research on target detection is one of the most important directions of hyperspectral remote sensing [10], [11], [12], exhibiting excellent performance and unique advantages in many civil and military fields [13], [14], [15]. With the rapid development of remote sensing, the quality of captured observational data has substantially improved [16], [17]. For target detection in real scenes covering multiple types of land covers, the abundant and detailed information contained in images raises higher requirements for data mining and information extraction techniques. Therefore, it is of great practical significance to develop hyperspectral target detection to meet the broad demands of this technology in various fields.
According to whether the prior spectral information of target is available [18], [19], [20], target detection could be divided into two categories: supervised matching detection and unsupervised anomaly detection (AD) [21], [22]. In practical applications, it is very likely to encounter the lack of fully informative spectral databases and accurate reflectance inversion algorithms [23], [24]. Moreover, the subpixel problem and the constraints of measurement conditions also lead to certain limitations in matching detection [25]. While the operators used in AD methods do not require any prior spectral information of target or background, and are widely used in these cases [10]. Therefore, the research on unsupervised AD is highly practical [26]. Traditional hyperspectral AD algorithms are derived based on signal processing theory [1]. Such methods have been proposed in large numbers since the 1990s, providing a solid foundation for the discipline [27]. The general design process for traditional methods is to This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ obtain statistics from the HSI first, and derive the decision function through specific model assumptions and decision criteria [4], [28]. Then, the test pixel represented by a high-dimensional vector is substituted into the decision function, and the output value is compared with a given threshold to determine whether the anomaly exists [29], [30]. Primitive space model [8], subspace projection model, and probability distribution statistics (whitening space) model are the most classic model assumptions in signal processing [10], [31]. There are Reed-Xiaoli detector (RXD), low probability target detector, and uniform target detector based on probability distribution statistical models to process HSI data to achieve AD [10]. Among them, a series of derivative versions developed according to RXD have been widely used and manifested stable performance [32], [33], [34].
The development of hyperspectral AD based on signal processing has established a mature theoretical system and a relatively complete variety of algorithms [1], [35]. Such traditional detectors could achieve effective separation between target and background and show good detection effect under reasonable model assumptions [27]. However, most of the hyperspectral remote sensing data in practical applications are captured by imaging real scenes covering multiple types of land covers. Traditional detectors rely heavily on specific assumptions and are limited in their analytical capabilities for complex models, resulting in inapplicability to real data with distributional complexity of land covers [36], [37]. As a result, the detection effect of traditional methods is suboptimal due to the inability to fully exploit the abundant detailed information provided by HSIs [38], [39].
In view of the bottleneck encountered in the development of traditional methods, more and more scholars focus on machine learning (ML) to seek for breakthroughs and vitality for hyperspectral AD. With the rapid development of ML theories [40], [41], the algorithms designed based on them have performed brilliantly in various fields including hyperspectral remote sensing [42], [43]. In recent years, ML-based methods for hyperspectral AD have emerged continuously [44], [45]. Kernel methods [46], sparse representation models [47], discriminative subspace analysis, spectral data self-learning, and deep learning represent several major research directions [23]. The widespread application of ML-based method in hyperspectral AD exhibits strong analytical ability for complex models. It is fully demonstrated that the unique advantages of ML enable it to parse HSI data with complexity and sufficiently extract information [48], [49]. However, the aforementioned popular ML-based methods specialize in various application problems and satisfy different usage conditions, which means that the effectiveness of such methods usually needs to meet certain conditions to be stimulated [50], [51]. Moreover, in addition to consuming a huge amount of time and space resources, some methods based on data-driven mechanism have high requirements on the quantity and quality of training datasets [52], which are quite difficult to collect in practical applications. The strict usage conditions still restrict these methods to a certain extent, whether in implementation, application, or popularization. Different from traditional signal processing based methods and several popular ML-based methods, this article treats AD as a data mining task, focusing on data features of anomaly and background in an HSI, rather than prioritizing and constrained by specific model assumptions. An HSI is mathematically modeled as a data cloud, in which it is obvious that data points corresponding to the background are densely distributed, while the anomalous data points are sparsely distributed and far away from the high-density populations. Based on such fact, topology, a powerful mathematical tool, is adopted to solve the data mining task of AD. Topology has been proved to be capable of solving various types of tasks in ML involving point set analysis, including AD, information retrieval, classification [53], [54], etc. The general idea of its implementation in these tasks is to map the research object into a certain number of point sets, whose relations are represented by a geometric space to achieve an evaluation purpose associated with the task requirements. In a topological space, cardinality could be simply defined as the number of elements in a specific point set. Basener et al. proposed a topology-based algorithm for AD in dimensionally large datasets, demonstrating the superiority of topology over RXD in separating anomalies from the background [55]. Topology is essentially the mathematicalization of the sets and intuitive properties of those very simple and basic graphs. It is perfectly suitable for problems related to point set analysis, and could provide a brand new and feasible solution for the data mining task of hyperspectral AD.
So far, the demands of the solution for AD turned out to be simple and direct. The design of the proposed method is accomplished by achieving two phased goals: 1) separation of the target and background; and 2) highlighting the target and suppressing the background. Given the core idea of applying topology to point set analysis is to achieve the ultimate evaluation purpose through the geometric deformation of space according to requirements of various tasks, it is crucial to choose the form of mapping to construct the corresponding topological space for AD. Binary tree is a hierarchical structure defined by branch relationship in ML [56], [57], exhibiting immense potential in data mining [58]. It could exploit the numerical differences between the anomaly and the background in different dimensions to achieve the separation between these two [59], which is highly compatible with high-dimensional datasets such as HSI. This article takes full advantage of such compatibility and a tree-structured mapping is chosen to construct a topological space to significantly improve the separability between different types of data points, thereby achieving the aforementioned first phased goal of designing a detection method. On this basis, the detection output is designed to meet the critical requirements of highlighting the anomaly and suppressing the background, so as to achieve the second phased goal. Taken together, this article proposes a tree topology based anomaly detection (TTAD) method for HSIs. The design for TTAD overcomes the limitations encountered by traditional detectors by not relying on any specific model assumptions. The proposed method fully utilizes the simple, direct, and effective measurement strategy provided by point set topology for detection output to meet the essential requirements of the data mining task of AD. Briefly, the contributions of related research work on TTAD are summarized as follows: 1) This article successfully applies point set topology to the data mining task of hyperspectral AD. For the incompatibility of traditional detectors with HSIs in changeable real scenes, the analytical thinking that relies on specific model assumptions is completely divorced. Instead, the data characteristics of HSI itself are deeply parsed to guide the algorithm design. The abstract differences between the anomaly and the background in data features are successfully emphasized and highlighted in an intuitive way through the geometric deformation of space, thereby making the anomaly easier to be divided. 2) With the construction of topological space, a novel anomaly measurement called "topological cardinality" is developed for detection output with high precision. In the process of geometric deformation, anomalies are sparsely distributed and far away from high-density populations, resulting in the formation order and cardinality of the ultimate subset where they are located are clearly distinguishable. The topological cardinality combining these two terms is perfect for quantifying the abnormality of a sample. Moreover, the simple and basic intuitive properties in topological space enable the acquisition of detection output without any expensive computations that consume a lot of time and memory. 3) The proposed method performs AD tasks in parallel topological spaces to improve robustness. Given the distributional complexity of land covers in real HSIs, it is unavoidable to implement redistribution of HSI data with randomness in a single topological space. Hence, data redistribution in multiple parallel spaces is employed to penetratingly reveal the characteristics for various land covers. The topological cardinality of the test pixel in parallel spaces is averaged to achieve the detection output with accuracy and stability. TTAD is equipped with strong adaptability for various imaging scenes with complexity, providing a reliable solution for locating anomalies in practical applications. The arrangement of this article is described as follows. Section II elaborates the methodology of the proposed TTAD. Section III presents the experimental results and relevant analysis and discussion. Section IV summarizes the research work in this article and gives the conclusions.

II. ANOMALY DETECTION BASED ON TREE TOPOLOGY (TTAD)
The implementation flow of the proposed TTAD is shown in Fig. 1. The methodology is divided into the following three parts, which are explained in Section II-A, B, and C, respectively: 1) the topological space based on tree-structured mapping is constructed to realize the redistribution of HSI data, which makes the anomalies easier to be divided by improving the separability between different land covers; 2) in the tree-shaped topological space, an exclusive measurement for quantifying the abnormality of test pixels-"topological cardinality" is developed; and 3) the AD task is accomplished by parallel measurements of topological cardinality in parallel topological spaces.

A. Tree-Structured Mapping for Topological Space
Each pixel in an HSI corresponds to a nearly continuous spectrum, and the spectral differences between various types of land covers establish the basis for the realization of target detection, segmentation, and classification. Although there are a variety of land covers contained in the real scene, the only two categories concerned in the design of unsupervised AD methods are anomaly and background. In the absence of prior spectral information, it is crucial to find and fully utilize the data features of these two types of samples. Constructing the topological space through the tree-structured mapping could extract the features of HSI itself and realize the redistribution of entire dataset. Specific subsets are divided accordingly to assign distinct cardinalities to the subsets where the anomaly and the background are located.
Specifically, the redistribution of HSI data is realized by constructing parallel topological spaces through a series of mappings, each of which adopts a dedicated and unique treeshaped frame. This stage is completely based on random subsampling, so no prior information is required. Assume X = [x 1 , . . . , x N ], X ∈ R L×N represents an HSI dataset, and the number of bands and pixels are denoted by L and N , respectively. For the construction of each tree-shaped frame, N sub pixels are randomly selected from the original HSI X ∈ R L×N to form a hyperspectral data subset X sub ∈ R L×N sub . Given the fact that various imaging scenes correspond to different dataset sizes in practical applications, each subsampling in this stage uses a percentage of the total number of pixels to set the size of the subset N sub = N · sub percent (1) where sub percent represents the percentage of subsampled pixels in X sub to the total number of pixels in X. The selection of the feature dimension hierarchy use corresponding to a band of the root node is random, and then a value is randomly selected as the newly added node between the maximum and minimum values of the dataset N odeSet current on this band. After a new node is added to the tree, the remaining subset X remain could be divided into two parts X left and X right according to the numerical relationship with the node. With the current hierarchy hierarchy current of the tree is incremented by 1, the remaining data subset X remain is updated through X left and X right , and the updated X remain could be utilized to continue constructing the left and right subtrees. For each subsampling, constructing a tree-shaped frame is a recursive process, and the termination condition of the recursion is closely related to two variables, the current hierarchy hierarchy current of the tree and the remaining subset X remain . The iterative process terminates when the hierarchy of the tree hierarchy current reaches the limit hierarchy limit or the remaining subset X remain is indivisible. Specifically, the meaning of X remain being indivisible is the pixel values of the current dimension N odeSet current in X remain are all equal or there is only one pixel left. In addition, to ensure that the sparse points corresponding to the anomaly are stretched away from the dense points corresponding to the background in the tree topology, the hierarchy of the tree is preferably as close as possible to the dimension of the high-dimensional data. Hence, the number of bands L in the original HSI is utilized to limit the hierarchies of a single tree-shaped frame. Algorithm 1 details the implementation steps for the first stage. It is worth noting that the construction of a tree-shaped frame in Algorithm 1 utilizes a randomly selected subset, while the mapping process utilizes a complete set of an HSI. After the HSI X to be processed is put into the root node, according to the band indexes corresponding to the root to the leaf in the tree topology recorded by hierarchy order , a single-band image corresponding to the current hierarchy of the tree can be found from X. And this single-band image is divided into left and right subsets through the magnitude relationship with the current node value. The left and right subsets are further divided according to the band index corresponding to the next hierarchy recorded by hierarchy order , as described in the above steps. It proceeds sequentially from root to leaf until all pixels in an HSI fall into leaf nodes, that is, disjoint leaf sets are generated. As the mapping is implemented, the dataset is redistributed so that a separable anomaly pixel falls earlier into its leaf set with a smaller cardinality, whereas a background pixel falls later into its leaf set with a larger cardinality. As a result, intuitive differences between the anomaly and background emerge in the tree topology.
Accordingly, after completing the construction of the framework, HSI could be mapped into topological spaces to realize Algorithm 1: TopoFrame (X, Frame size , sub percent ).
Input: X ∈ R L×N -the original HSI, Frame size -the number of parallel spaces, sub percent -the percentage for a single subsampling. Output: T opoF rame-the frame for parallel topological spaces. Initialization: hierarchy current = 0, hierarchy limit = L , N sub = N · sub percent . 1: Start looping to build each frame according to the number of parallel spaces; for i = 1 : Frame size do 2: Randomly select N sub pixels to obtain a subset X sub ∈ R L×N sub , X remain =X sub ; 3: Randomly arrange the band indexes of the original HSI to obtain hierarchy order , which are the corresponding bands from root to leaf in a tree topology; 4: if hierarchy current < hierarchy limit and X remain is divisible then 5: hierarchy use = hierarchy order (hierarchy current + 1), N odeSet current = X remain (hierarchy use , :); 6: Randomly sample a value value node in the uniform distribution of the continuous interval between the minimum and maximum of N odeSet current as the node of the current hierarchy, tree.node = value node ; 7: X left = X remain (:, find(N odeSet current < = value node )); 8: X right = X remain (:, find(N odeSet current > value node )); 9: hierarchy current = hierarchy current + 1; 10: tree.LeftNode ← X remain = X left , re-execute from step 4; 11: tree.RightNode ← X remain = X right , re-execute from step 4; 12: end if (step 4) 13: return tree 14: T opoF rame = T opoF rame ∪ tree ; 15: end for (step 1) 16: return T opoF rame data redistribution. The first phased goal of designing detection methods mentioned in Introduction, the separation of the anomaly and background, has been achieved. Moreover, the abstract differences in data features of anomaly and background are fully extracted and emphasized in an intuitive way through the geometric deformation of space. So far, the formation of the tree topology is fully prepared for the subsequent AD tasks. To achieve the second phased goal, how to design the detection output to highlight the anomaly and suppress the background is the most critical issue.

B. Topological Cardinality for Detection Output
Performing AD tasks based on tree topology is essentially a binary classification of anomalies and backgrounds in HSIs containing a large number of pixels, which needs to provide a reliable basis for subsequent judgments on whether the test pixels are anomalous or not. However, the absence of prior information imposes strict requirements on the design of detectors: how to extract data features of anomalies and backgrounds and exploit them to design detection outputs for test pixels in a tree-shaped topological space? This directly determines whether the AD task could achieve high-precision performance in real scenes. To address this problem, this article proposes a novel anomaly measurement called "topological cardinality" for detection output.
The HSI dataset is mapped into a topological space using a tree structure, where each node from the root to the leaf corresponds to a specific point set. There is an entire set of an image in the root node, and the leaf nodes correspond to the subsets. Obviously, the anomalous points are sparsely distributed and far away from the high-density populations, they are easier to be divided than the background, so the leaf sets are formed earlier. Moreover, the extremely small proportion of anomalies leads to a relatively small cardinality of the leaf set where they are located. In the tree-shaped topological space, the hidden and latent data features in the original space are presented more intuitively and simply. As a result, the anomaly and background differ significantly in quantity, spatial distribution, and spectral characteristics, and such differences are emphasized and highlighted in an intuitive way through space geometric deformation. For the obtained tree topology, the formation order of a leaf set and its cardinality are suitable for measuring the abnormality degree for a test sample, so these two are combined to develop a novel measurement: topological cardinality. For a test pixel, the earlier the formation order of the leaf set covering it, the smaller the cardinality and the smaller the topology cardinality, indicating that the abnormality degree is higher and the possibility of being judged as anomalous is greater. Based on this, the anomaly and background are assigned distinct scores to achieve high-precision detection results. It is worth mentioning that the aforementioned formation order and cardinality are simple and basic intuitive properties in topological space, which enables the abnormality degree of test pixels could be effectively measured without expensive computations that consume a lot of time and memory.
After the mapping is completed, each pixel of the original HSI is located in a leaf node, and the tree topology makes each leaf node correspond to its unique root-to-leaf trace, which records its formation order. Correspondingly, the trace on the tree for each pixel is unique. Each node of the tree topology is a specific point set, and each trace starting from the root and ending at the leaf connects a series of point sets, whose cardinality is decreasing. Considering the distributional characteristics of data points belonging to various land covers in topological space, the formation order and cardinality of leaf sets reflect the abnormality degree of data points in them. In other words, for the test pixel, the topological cardinality for detection output could be calculated by observing the formation order and cardinality of the corresponding leaf set. As shown in Fig. 2, X ∈ R L×N is an HSI dataset. It is assumed that the image to be processed contains 20 pixels, that is, N = 20, and x i , x j ∈ R L×1 are two test pixels in X. In the tree topology, x i and x j are located in different leaf subsets, corresponding to two different root-to-leaf traces. Their leaf sets and traces are marked with red circles and arrows, respectively, and the cardinalities of all leaf sets are given in Fig. 2. For the leaf subset where x i is located, its formation order is represented by hierarchy leaf (x i ), and its cardinality is represented by cardinality leaf (x i ). The topological cardinality of x i could be obtained by calculating the product of these two Similarly, using hierarchy leaf (x j ) and cardinality leaf (x j ) to represent the formation order and cardinality of the leaf subset where x j is located, the topological cardinality of x j could be calculated by the following formula: It can be observed from the tree topology shown in Fig. 2 that since the spectral characteristics of x i are significantly different from most pixels, there are few pixels similar to it, and it is easier to be divided. As for x j , like most pixels, is divided into leaf nodes at a deeper hierarchy of the tree, and has more similar pixels. Obviously, x i is more in line with the characteristic of "few and different" and is more likely to be judged as the anomaly, while x j is more likely to be judged as the background. As the previous calculation results, TopoCard(x i ) is smaller than TopoCard(x j ), the smaller the topological cardinality is, the higher the abnormality of the sample, and the greater the possibility of it being judged as the anomaly. The above example shows that, as a anomaly measurement, the topological cardinality could effectively quantify the abnormality of a test pixel, and provide a reasonable basis for the subsequent judgment on whether it is anomaly. For a test pixel x ∈ R L×1 , TopoCard(x) is utilized to indicate its topological cardinality, and the decision criterion is set as follows: where η is the comparison threshold. If D(x) > η, x is judged as the anomaly. Conversely, if D(x) < η, x is judged as the background.

C. Hyperspectral Anomaly Detection in Parallel Topological Spaces
During the construction of topological space, the data subsets used to form a tree topology are randomly selected from HSIs. While real scenes are diverse and complex, since this stage does not utilize any prior information, it is difficult to guarantee the effective extraction of anomaly and background features if the selected subset does not contain any anomalous points. This is the least expected situation that could be encountered during the formation of a tree topology, which will lead to serious missed detection. According to the ensemble learning theory [60], to reduce the influence of such worst case on subsequent detection results and further improve the robustness of the proposed TTAD algorithm, the AD task is performed in parallel spaces containing multiple tree topologies. In the second stage of performing the AD task, parallel measurements of topological cardinality are implemented in multiple tree topologies, and the detection results are output accordingly. In the traversal of HSI, for the test pixel x i , first find the leaf set where it is located in a single tree topology corresponding to a space, and calculate the reciprocal of its topological cardinality to obtain 1 TopoCard(x i ) . Then implement parallel measurements on 1 TopoCard(x i ) in all tree topologies, accumulate and average to get result i , which is utilized as the detection output of the current test pixel. The larger result i is, the higher the abnormality degree of x i is, and the more likely it is to be judged as the anomaly. The implementation steps of the second stage are shown in Algorithm 2.
In general, the complete processing flow of TTAD includes two stages, corresponding to Algorithms 1 and 2, respectively. In addition to the HSI to be processed, two key parameters are included in the input, the number of parallel spaces Frame size and the subsampling percentage used to form a subset for a single topological frame sub percent . Since the proposed AD method does not adopt a specific detection mode, such as sliding dualwindow, the settings of the above two parameters affect both Algorithm 2: TTAD (X, T opoF rame).
Input: X ∈ R L×N -the original HSI, T opoF rame-the frame for parallel topological spaces. Output: result ∈ R 1×N -detection results. 1: Start looping to process each pixel of the HSI; for i = 1 : N do 2: x i ∈ R L×1 ←i-th test pixel of X; 3: result i_sum = 0; 4: for j = 1 : Frame size do 5: TreeTopo j ← j−th tree topology of T opoF rame; 6: Find the leaf set where x i is located in TreeTopo j , and get its formation order hierarchy leaf (x i ) and cardinality cardinality leaf (x i ); 7: Compute the topological cardinality for x i and take the reciprocal, the subsequent detection effect and computational efficiency to a great extent. The experimental part will discuss the parameter settings and find the optimal combination for datasets in different real scenes. Then, TTAD is compared with other widely used and advanced algorithms in terms of detection effect and execution time to fully demonstrate the superiority of the proposed method in comprehensive performance.

III. EXPERIMENTAL RESULTS AND ANALYSIS
This article attaches great importance to both the detection effect and practicability of the proposed method. Therefore, benchmark HSI datasets in both natural and artificial environments are selected for experiments to fully evaluate the performance of TTAD in various scenes. Section III-A describes the evaluation indicators used for detection results. Section III-B introduces HSIs in multiple scenes including natural and artificial environments, which are adopted as experimental datasets. Section III-C shows and discusses the detection results of TTAD under different settings for key parameters. Section III-D makes a comparison of TTAD with other classical and advanced detectors to demonstrate the effectiveness and efficiency of the proposed method. All experiments are carried out with MAT-LAB R2014a on a Windows 10 computer with an Intel Celeron CPU N3350 @ 1.10 GHz and 4.00 GB RAM.

A. Evaluation Indicators
The well-recognized and widely used indicators in this research field are employed for qualitative and quantitative analysis of experimental detection results as follows:

1) Receiver Operating Characteristic (ROC) Curve: ROC
visualizes the correspondence between the probability of detection (PD) and the false alarm rate (FAR), based on which the detection performance of algorithms could be scientifically analyzed and compared [61]. The ideal detection performance of an algorithm exhibits high PD and low FAR. The more the ROC curve shifts to the upper left, the better the detection performance of the related algorithm. 2) Area Under the Curve (AUC): AUC provides a quantitative evaluation indicator by integrating the ROC to calculate the AUC [11]. As mentioned above, the trend of ROC curve reflects the detection performance, which corresponds to the value of AUC. The larger the AUC, the better the performance. The magnitude of AUC conduces to further accurately evaluate the performance for algorithms. 3) Separability Map: For detection results at anomaly and background locations, the separability map visualizes the separability between these two groups of values [49]. Specifically, two boxes distinguished by color represent the statistical range of the two groups of values. The horizontal bar on the box indicates the median value. The two edges of the box from bottom to top represent the 25th and 75th percentile, respectively. And the whiskers extend from the 0.5th percentile to the 99.5th percentile. This indicator reflects the detector's ability to separate out the anomaly.

B. Experimental Datasets
There are five publicly available benchmark hyperspectral experimental datasets used to verify the comprehensive performance of the proposed method, including Hyperspectral Digital Imagery Collection Experiment (HYDICE), Airborne Visible/Infrared Imaging Spectrometer (AVIRIS-1), AVIRIS-2, AVIRIS-World Trade Center (WTC), and Airport-Beach-Urban (ABU). The experimental datasets selected in this article cover HSIs in various complex scenes, such as airports, urban areas, beaches, and towns, each of which contains multiple types of land covers in natural or artificial environments. Both the anomaly and the background meet the verification conditions for AD on HSI data with complexity, which could fully demonstrate both the detection effect and practicability for the proposed TTAD.
1) HYDICE: This dataset was acquired by the HYDICE sensor imaging an area in Michigan, USA, with a spatial resolution of about 2 m. The original image contains 210 bands from the visible to the near-infrared spectrum. After removing the low-quality bands, such as the low signalto-noise ratio (SNR) and water absorption, 175 bands are reserved for experiments. There are 307 × 307 pixels in the full image covering the urban scene as shown in Fig. 3(a).
In the experiment, a subimage of size 80 × 100 containing 21 vehicle pixels as anomalies is cropped [62], as shown in Fig. 3(b). Among them, the spatial size of anomalies distributed in 10 locations is in the range of 1-4 pixels.    Fig. 4(a) and (b) shows the true-color and ground-truth maps, respectively. 2) AVIRIS-1: This dataset was acquired by the AVIRIS covering an area of San Diego in the spectral range of 370-2510 nm with a spatial resolution of 3.5 m. The water absorption and low SNR bands are removed from the original 224 spectral channels, and 189 bands are reserved for experiments [7]. As shown in Fig. 5(a), the full image contains 400 × 400 pixels. The AVIRIS-1 is a cropped subimage containing 100 × 100 pixels, as shown in Fig. 6(a). Three planes occupying 20, 22, and 22 pixels, respectively, are regarded as anomalies, and the corresponding groundtruth map is shown in Fig. 6(b). 3) AVIRIS-2: As shown in Fig. 5(b), this dataset is also cropped from the abovementioned full AVIRIS image.   Its spatial size is 128 × 128 and contains 189 bands [18]. In this subimage, three planes occupying 34, 42, and 44 pixels, respectively, are regarded as anomalies. The true-color and ground-truth maps of AVIRIS-2 are illustrated in Fig. 7 The fire sources in this scene are anomalies to be detected, occupying 83 pixels in a total of 10 locations [63]. Fig. 8(a), (b), and (c) illustrate the three-dimensional (3-D) data cube of AVIRIS-WTC, the true-color map, and the ground-truth map, respectively. 5) ABU: This dataset was manually cropped after being downloaded from the AVIRIS website [26]. There are 13 images in total, and the corresponding scenes include airports, beaches, and urban areas. For the images used  in experiments, the bands heavily disturbed by noise have been removed. Detailed information, such as data size, spatial resolution, and capture location are listed in Table I. Figs. 9, 10, and 11 show the distribution of land covers and anomaly locations in three categories of scenes: airports, beaches, and urban areas, respectively.

C. Discussion on Parameters
It can be seen from Algorithms 1 and 2 that the input of the proposed TTAD involves two key parameters, Frame size and sub percent , which indicate the number of parallel topological spaces and the subsampling percentage corresponding to the  framework required for a single topology, respectively. Since no prior information is available during the formation of topological frameworks, improper parameter setting would make it difficult to ensure effective extraction of anomaly and background features, resulting in unideal detection effect. In addition, considering that Frame size and sub percent also affect the computational efficiency, the design of the experiments needs to ensure that the search range for the two parameters is large enough to find a suitable combination. Based on the above considerations, in the early stage of experiments, the ranges of Frame size and sub percent are set to [10,20,30,40,5,60,70,80,90, 100] and [0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%], respectively. After traversing the permutations, there are 100 parameter combinations in total. In order to compare the detection results under different parameter combinations in a fair, reasonable, and convenient way, AUC is adopted as a quantitative indicator to evaluate the performance for the proposed method.
For the HYDICE dataset, Table II gives the AUC values corresponding to the detection results of TTAD under 100 parameter combinations. The surface in Fig. 12(a) is drawn according to the results in Table II, which more intuitively illustrates the relationship between different parameter combinations and detection performance. The maximum and minimum values of AUC are 0.9888 and 0.4071, respectively. Correspondingly, the optimal parameter combination is Frame size = 80, sub percent = 1.0%. From the surface colored according to the amplitude of AUC in Fig. 12(a), it can be seen that when Frame size and sub percent are small, the fluctuation amplitude of AUC is large. With the increase of these two parameters, the AUC gradually increases and the fluctuation becomes more gentle. It shows that when the subsampling percentage corresponding to the framework for a single topology and the number of parallel topological spaces are large, the proposed TTAD could achieve excellent detection effects and exhibit robustness.
For the AVIRIS-1 dataset, Table III and the corresponding colored surfaces in Fig. 12(b) demonstrate the detection performance of the proposed method within a given range of parameter settings. Under 100 parameter combinations, the minimum AUC value is 0.6636 and the maximum is 0.9890, and the corresponding optimal parameter combination is Frame size = 50, sub percent = 0.6%. Among the four surfaces corresponding to different experimental datasets shown in Fig. 12, the fluctuation of AUC on AVIRIS-1 is the most obvious. However, such drastic fluctuation only occurs when the number of topological spaces and the subsampling percentage are small. With the increase of Frame size and sub percent , the AUC value gradually increases and tends to converge.
For the AVIRIS-2 dataset, the AUC values obtained from the detection results of the proposed method under the variation of Frame size and sub percent are given in Table IV, which is related to the surface in Fig. 12(c). The variation range of AUC is 0.7913 to 0.9343, when Frame size = 30, sub percent = 0.5%, AUC reaches the maximum. From the observation of the colored surface shown in Fig. 12(c), the overall variation range of AUC is relatively small, indicating that the detection effect of TTAD on AVIRIS-2 is not particularly sensitive to different parameter settings.
For the AVIRIS-WTC dataset, Table V provides the experimental results of TTAD under different Frame size and sub percent , the AUC varies from 0.8448 to 0.9942, and the optimal parameter combination is Frame size = 40, sub percent = 0.4%. In the colored surface of Fig. 12(d), it can be observed that under a wide range of parameter combinations, the variation range of the AUC evaluation results is small, and the proposed method performs excellent and stable on AVIRIS-WTC.
For the ABU dataset, there are a total of 13 images in the airport (4), beach (4), and urban (5) scenes. Each image is experimented with 100 combinations of parameters Frame size and sub percent , and the proposed TTAD is run a total of 1300 times on this dataset. Figs. 13, 14, and 15 illustrate the detection evaluation results of TTAD on the experimental datasets of three major categories of scenes: airports, beaches, and urban areas, respectively. A total of 13 HSIs correspond to their respective maximum AUC values and optimal parameter combinations. Preliminary conclusions could be drawn from the analysis of the overall variation trend of all colored surfaces. When the number of topological spaces Frame size and the subsampling percentage sub percent are small, the AUC evaluation results are small and the fluctuation range is large, representing poor detection effects.
As the values of the two parameters increase gradually, the AUC increases and its variation range decreases, and the surface becomes smoother, indicating that the detection performance of TTAD tends to be stable.
According to the above experimental results, Table VI provides the optimal setting schemes for parameters Frame size and sub percent when AUC reaches the maximum on different   TABLE III  AUC VALUES OF TTAD UNDER DIFFERENT PARAMETER COMBINATIONS ON THE AVIRIS-1  TABLE IV  AUC VALUES OF TTAD UNDER DIFFERENT PARAMETER COMBINATIONS ON THE AVIRIS-2   TABLE V     experimental datasets. In the proposed method, these two parameters are the key factors involved in the implementation of tree topology-based AD tasks, Frame size determines the number of parallel topological spaces and sub percent determines the number of subsampled pixels required to build a framework for a single topology. Imaging various real complex scenes to obtain hyperspectral data to be processed, in which the number of pixels and the proportion of anomalous points are different. Therefore, the settings of the two parameters are crucial for the extraction of data features, which in turn affects the subsequent detection results. Initially, in order to avoid erroneous conclusions caused by the contingency of parameter discussion with a few experimental results, a certain range for each parameter was selected to be set. The reason is that first, the search range for the optimal combination could be expanded, and second, the sensitivity and variation trend of TTAD performance to different parameter settings could be reasonably analyzed through the evaluation results within a given range. On the whole, when the values of the two parameters are small, for the redistribution of HSI data by constructing topological spaces in stage 1, the total utilization rate of pixels in an image is too low to ensure effective extraction of data features, resulting in unsatisfactory detection effects. By observing all 17 colored surfaces in Figs. 12-15, it can be concluded that with the increase of the parameter values, the detection effect of the proposed method improves steadily, and the corresponding fluctuation range of AUC decreases. It is fully demonstrated that TTAD exhibits excellent detection effects and robustness within reasonable ranges of parameter settings. In practical applications, we recommend that Frame size and sub percent be set to 100% and 1.0% at most to achieve the desirable detection effect. It is worth noting that although the search range of parameter combinations in experiments is large, the setting limits of Frame size and sub percent are 100% and 1.0%, respectively, such a subsampling scale is quite small for the whole image. It is further demonstrated that the proposed method with low computation and memory consumption is suitable for HSIs in various real complex scenes as expected.

D. Comparison of Detection Performance
In this article, TTAD is compared with other classical and advanced hyperspectral AD methods to further demonstrate the effectiveness and superiority of the proposed method. The  algorithms selected for comparison in the experiments include 1) RXD (Global-RXD) [32]; 2) local RXD (Local-RXD) [64]; 3) segmented RXD (Segmented-RXD) [33]; 4) robust principle component analysis RXD (RPCA-RXD) [11]; 5) collaborativerepresentation-based detector (CRD) [47]; and 6) relative-massbased detector (RMD) [65]. For algorithms using the sliding dual-window detection mode, the inner and outer windows are set according to the spatial size of anomaly in different experimental datasets to achieve excellent detection results. According to the detailed information of the experimental dataset provided in Section III-B, for HYDICE, the outer and inner window sizes are set to 13 × 13 and 3 × 3, respectively; for AVIRIS-1, the two windows are set to 15 × 15 and 5 × 5; for AVIRIS-2, the dual window sizes are 17 × 17 and 7 × 7, and for AVIRIS-WTC, they are 15 × 15 and 5 × 5, respectively. For the ABU dataset, there are 13 images in the scenes of airports, beaches and urban areas, among which different images have their respective settings for the dual-window sizes, including 13 × 13 and 3 × 3, 15 × 15 and 5 × 5, 17 × 17 and 7 × 7. On all benchmark hyperspectral datasets, the comparison experiments are carried out under the same conditions. Moreover, qualitative and quantitative indicators are utilized to make fair and scientific evaluations of detection results, so as to fully demonstrate the comprehensive performance and conduct analysis for all algorithms.
For the HYDICE dataset, the processing results of all algorithms are linearly stretched to the range of 0-255. Fig. 16 shows 2-D maps of the detection results for all algorithms involved in the comparison, which are colored according to the magnitude of detection values. Due to the small spatial size of anomaly in this scene, in the results of different algorithms, in addition to pixels corresponding to anomaly locations in the ground-truth being highlighted, other high-brightness pixels may cause false alarms. In the map of Local-RXD, the values of background locations are lower on the overall level, indicating a strong ability to suppress background interference. For the other six algorithms, it is difficult to accurately distinguish the difference in performance only by visual effects. Fig. 17 provides both ROC curves and separability map for comparison. The abscissa display range of the ROC curves is 0-0.1, showing the PDs under low FARs. The curves of Global-RXD, RPCA-RXD, CRD, RMD, and TTAD in Fig. 17(a) are closer to the upper left, Local-RXD is inferior when FAR > 0.004, while Segmented-RXD is outstanding when FAR > 0.048. From the observation of Fig. 17(b), it is obvious that the position of the blue box representing the background of  Local-RXD is the lowest, indicating that it could suppress the values of background part to a lower level, which corresponds to the visual effect shown in Fig. 16(d). However, compared with the other six algorithms, the position of the red box representing the anomaly of Local-RXD is too low, indicating a poor ability to highlight the anomaly. Among the other six algorithms, except that the blue box of RPCA-RXD is slightly lower, the blue boxes of Global-RXD, Segmented-RXD, CRD, RMD, and TTAD are located at similar positions, indicating that the suppression effects on the background are also relatively close. While the positions of the red boxes of CRD, RMD, and TTAD are generally higher, which shows that the separability of anomaly and background is stronger in the detection results of these three.
For the AVIRIS-1 dataset, Fig. 18 presents the visual detection results colored according to the magnitude of values. Except for Local-RXD, the shapes of three planes are basically preserved in the detection results of the other six methods. Compared with Global-RXD, Segmented-RXD, and RPCA-RXD, the brightness of CRD, RMD, and TTAD at anomaly locations is generally higher, and the ability to highlight anomalous pixels is stronger. Moreover, the proposed method preserves the spatial information of anomaly to the greatest extent, so that the detection map has a superior visual effect. The ROC curves and separability plots in Fig. 19 further evaluate and compare the detection results between the algorithms. Obviously, the red curve corresponding to TTAD in Fig. 19(a) is closest to the upper left, and the PD of TTAD is higher than that of other comparison algorithms, indicating that the proposed method could obtain detection results with high confidence. The curve of RMD is second only to TTAD. And the positions of the curves for Global-RXD, Segmented-RXD, RPCA-RXD, and CRD are similar. The ROC evaluation results of CRD and RPCA-RXD are overall better than Global-RXD and Segmented-RXD, while Local-RXD performs worse. The separability map of Fig. 19(b) visually demonstrates the statistical range of values for anomaly and background locations in all detection results. Compared with the other six algorithms, TTAD has a higher position of the red  box associated with the anomaly, illustrating the superior ability to separate out the anomaly.
For the AVIRIS-2 dataset, Fig. 20 gives the 2-D colored maps for comparison, and the corresponding ROC curves and separability map are shown in Fig. 21. The visual detection results of all the algorithms involved in the comparison could highlight the anomaly to varying degrees. Global-RXD, RPCA-RXD, CRD, and TTAD are slightly inferior to Local-RXD, Segmented-RXD, and RMD in the suppression of background interference. While in the colored map of TTAD, pixels at the anomaly locations are very conspicuous, hence the overall visual effect is ahead of other algorithms. The ROC evaluation results in Fig. 21(a) further demonstrate the superior performance for the proposed method, with the PDs of RMD and TTAD being higher under the same FAR. The ROC curves of Global-RXD, Segmented-RXD, RPCA-RXD, and CRD are roughly close and interleaved. The AUC evaluation results are provided subsequently to more accurately distinguish the differences in the detection effects of algorithms. Fig. 21(b) visualizes the separability between anomaly and background in detection results. The intersection area between the red and blue boxes of Local-RXD is quite large, which is prone to false alarms. In contrast, the two boxes of Global-RXD, Segmented-RXD, RPCA-RXD, CRD, RMD, and TTAD are farther apart. Among them, the range of values in the anomaly locations of TTAD is at a higher level, which highlights the anomaly to the greatest extent, so its overall detection effect wins.
For the AVIRIS-WTC dataset, the colored detection maps are shown in Fig. 22. The values of Local-RXD and CRD in the background part are well suppressed, but the highlighting effect for anomaly is not obvious. In the maps of Global-RXD, Segmented-RXD, RPCA-RXD, RMD, and TTAD, pixels at the anomaly locations are brighter, and their detection results are all affected by background interference to a certain extent. The ROC evaluation results are shown in Fig. 23(a). It can be seen that the curves of RMD and TTAD are maximally skewed to the upper left, leading other algorithms by a prominent advantage. When FAR > 0.008, the performance of RPCA-RXD is second only         to RMD and TTAD. The overall trends of the curves of Global-RXD and Segmented-RXD are similar, and the trends of Local-RXD and CRD are similar. In the separability map of Fig. 23(b), although the blue boxes of Local-RXD and CRD have lower positions, the intersection of boxes in two colors is large in both algorithms. This indicates that a large part of the statistical ranges of detection results at the anomaly and background locations overlap, reflecting a poor separation effect between these two land covers. The detection results of the remaining Global-RXD, Segmented-RXD, RPCA-RXD, and RMD in the background part are roughly at a similar level. The separability degree of anomaly and background in RMD is the highest. On the other hand, TTAD exhibits a rather low overall intersection degree of two colored boxes despite the slightly higher position of the blue one, which proves that the proposed method with excellent detection ability could effectively separate the anomaly from the background.
For the ABU dataset, there are a total of 13 images in scenes such as airports, beaches, and urban areas. Figs. 24, 26, and 28 illustrate the colored maps of detection results for all algorithms in these three categories of scenes, respectively. In Fig. 24, the first to fourth rows correspond to Airport-1 to Airport-4, respectively; in Fig. 26, the first to fourth rows correspond to Beach-1 to Beach-4 respectively; in Fig. 28, the first to fifth rows correspond to Urban-1 to Urban-5, respectively. In addition, Figs. 25, 27, and 29 give the ROC curves and separability maps obtained from the detection performance comparison experiments on datasets of three scenes, respectively. It is worth mentioning that the ABU dataset is large in scale and contains multiple types of real complex scenes. There are considerations for the selection of this dataset: first, the robustness of the proposed method could be analyzed more comprehensively through the discussion on parameters in the previous stage; second, it can be further verified whether the proposed method is suitable for various scenes and exerts its unique advantages through the comparison of detection performance at this stage. The diversity of experimental datasets provides sufficient basis for the trial and promotion of the research in this article in practical applications. As described above, AUC, as a widely used quantitative evaluation indicator, simultaneously examines the PD and the FAR, which could scientifically and reasonably evaluate the detection performance for an algorithm. Therefore, in order to compare all detection methods more concisely and intuitively, Table VIII summarizes the AUC evaluation results on all experimental datasets including ABU.
In addition to evaluating the detection ability of algorithms using indicators, such as colored map, ROC curve, and separability map, Table VII records the execution time of algorithms involved in the comparison on all experimental datasets. An examination of computational efficiency is added to analyze the comprehensive performance for algorithms. Table VIII summarizes the AUC evaluation results of algorithms on all experimental datasets, and the maximum of AUC on each HSI is marked in bold. Due to the large scale of datasets and many evaluation results obtained in experiments, in order to fairly and scientifically compare the performance of detection algorithms on a total of 17 HSIs, this article adopts the Wilcoxon rank sum test to evaluate all the participating algorithms [66]. The Wilcoxon SCORE  table is drawn based on Table VIII. In Table IX, the 7 detection algorithms used for comparison are set as the column labels, and the row labels are the 17 experimental HSIs. For a specific AUC value, compare it with each AUC in its row, and if it is greater, integrate one point, otherwise not. Finally, all the integrals of each column are summed to obtain the Wilcoxon SCORE corresponding to each detection algorithm. For the evaluation of the participating algorithms, the larger the Wilcoxon SCORE, the better the detection performance. In the last row of Table IX, the Wilcoxon SCORE achieved by TTAD is the highest among all the detection algorithms participating in the comparison. It can be concluded that the proposed method exhibits the best overall detection effect on all experimental datasets, demonstrating its strong adaptability to various real scenes with complexity. Moreover, by comparing the total execution time of all algorithms in Table VII, it can be seen that, except that the processing time of Global-RXD, Segmented-RXD, RPCA-RXD, and RMD is shorter than that of TTAD, the computational efficiency of TTAD is much higher than that of Local-RXD and CRD. In summary, TTAD could achieve the best overall detection effect under the premise of considerable time consumption. The comparative experiments with other algorithms verify

IV. CONCLUSION
In this article, an unsupervised AD method based on tree topology is proposed for the lack of spectral prior information in practical applications. The point set topology is applied to HSI processing, which successfully avoids feature extraction of land covers with distributional complexity in real scenes through specific model assumptions. The proposed method fully exploits the "few and different" characteristics of anomalous data points that are sparsely distributed and far away from high-density populations, and constructs a topological space through tree-structured mapping to realize the redistribution of HSI data. Through the geometric deformation of space, the hidden and potential differences in data features between the anomaly and the background in the original space are presented in an intuitive and simple form. On this basis, a measurement called "topological cardinality" is developed to quantify the abnormality degree of all samples in a data set for detection output. Finally, the AD task is performed in parallel topological spaces, and parallel measurements of topological cardinality is implemented in multiple tree topologies according to the ensemble learning theory to boost the detection performance. Extensive experimental results on HSIs of various scenes in both natural and artificial environments demonstrate that the proposed TTAD exhibits superior detection effects and robustness within a reasonable range of parameter settings. Compared with other classical and advanced AD algorithms on all experimental datasets, TTAD stands out with certain advantages in overall detection performance. The proposed method is capable of adapting to various complex scenes without excessive time consumption, and the considerable comprehensive performance makes it promising to be popularized and advantageous in practical applications.