Feature Selection for Cross-Scene Hyperspectral Image Classification via Improved Ant Colony Optimization Algorithm

Hyperspectral images (HSIs) generally contain a large amount of spectral bands (features), and the redundant information in them will cause the Hughes phenomenon in the classification process. And feature extraction and feature selection are the two main existing methods to effectively reduce the redundancy of spectral information in the field of HSIs classification. Compared with feature extraction methods, feature selection methods can preserve most of the features of the original HSIs data without losing their valuable details. However, most existing feature selection methods based on single scene (domain) perform poorly in some scenes (domains) with insufficient labeled samples. Therefore, how to adopt an efficient feature selection method to select the optimal feature subsets of source scene and target scene and use the sample information of source scene to assist in the classification of target scene so as to improve the classification accuracy of images in the target scene as much as possible is still very challenging. In order to solve the above problem, this paper proposes a new cross-scene algorithm: Improved Ant Colony Optimization Algorithm-Based Cross-Scene Feature Selection Algorithm (IMACO-CSFS). In order to obtain more accurate feature subsets of the two scenes, IMACO-CSFS proposes a priority sorting-based ant colony strategy to make the subsequent search process focus on the global optimal solution (optimal feature subset) found in the previous iteration. In addition, in order to further accelerate the convergence speed of the global optimal solution, an ant colony strategy based on elite ants is proposed in IMACO-CSFS to more efficiently obtain the optimal feature subsets of the two scenes for training the classifier. Furthermore, this paper simultaneously considers overall classification accuracies of the optimal feature subsets for both scenes and dynamically adjusts their scale to ensure the consistency of the selected features between the two scenes, attenuating the effect of spectral shift and achieving the higher image classification accuracy in the target scene. Experimental results on three cross-scene HSI data pairs demonstrate that IMACO-CSFS is superior in cross-scene feature selection and cross-scene HSIs classification.


I. INTRODUCTION
HYPERSPECTRAL images (HSIs) typically comprise 26 dozens or hundreds of spectral bands, which contain a lot 27 of spatial and spectral information about ground objects [1], 28 so they have been widely used in agriculture, forestry, 29 The associate editor coordinating the review of this manuscript and approving it for publication was Gerardo Di Martino . environment, ecology and other fields [2], [3]. However, 30 problems of redundant features and high data dimensions 31 still exist in HSIs [4], which may cause Hughes phenomenon 32 in the process of HSIs classification (Hughes phenomenon 33 refers to the phenomenon that the performance of the classi-34 fier ''increases first and then decreases'' with the increase 35 of the feature/band dimension involved in the operation, 36 during the analysis and processing of HSIs). Therefore, 37 tion of the target scene, the problem of spectral offset will 92 basically be encountered because of the different material 93 composition of the same type of ground object in different 94 time and size and the different nonlinearity of sensors to 95 obtain the data of the two scenes. Therefore, it is particu-96 larly important to reduce spectral offset. At present, some 97 methods such as cross-perspective learning [27] and transfer 98 learning [28] have been proposed to solve such problems. 99 For example, Literature [29] calculates the distance of condi-100 tional distribution between two scenes in a regenerated kernel 101 hilbert space based on a new dataset shift measure. Based 102 on deep adaptive network (DAN), multi-kernel maximum 103 mean discrepancies (MK-MMD) is used in [30] to align the 104 deep features of source and target scenes. In [31], I-ReliefF 105 (Iterative-ReliefF) was applied in the field of cross-scene 106 classification, and the experiments on the newly proposed 107 CDIRF (Cross-Domain Iterative ReliefF) method on three 108 cross-scene datasets validate its superiority in cross-scene 109 feature selection and reduction of spectral shift between two 110 scenes.

111
Ant colony optimization algorithm (ACO) is a classi-112 cal heuristic random search algorithm inspired by nature, 113 because its positive feedback mechanism can help ant colony 114 find the advantage of optimal solution in a short time, ACO 115 has been widely used in image processing, feature/band selec-116 tion for HSIs data and other fields [32], [33]. However, 117 it is not suitable to be directly used to solve the problem 118 of feature/band selection due to its shortcomings such as 119 slow convergence rate and significant decrease of population 120 diversity in the process of cyclic iteration [34], [35]. Just 121 as the Improved Ant Colony Optimization Algorithm-Based 122 Band Selection Algorithm (IMACO-BS) newly proposed in 123 literature [34] is based on ACO and uses a new pre-filter to 124 reduce the number of candidate bands(n), accelerating the 125 convergence of the standard ant colony optimization algo-126 rithm. In addition, the adaptive information updating strat-127 egy was introduced in IMACO-BS to avoid the ant colony 128 falling into local optimum. Although it has the advantages of 129 quick ant colony convergence speed and difficulty in falling 130 into local optimal solution, this method is only suitable for 131 the selection of the optimal band subset of a single scene. 132 With the development of HSIs classification technology, it is 133 becoming more and more important to use the information of 134 two scenes to deal with the problem of high-dimension cross-135 scene HSIs classification. As Zhang et al. [31] extended the 136 iterative ReliefF method to a cross-scene HSIs classifica-137 tion method (CDIRF), CDIRF (Cross-Domain Iterative Reli-138 efF) extracts effective features from redundant spectral bands 139 for subset evaluation, improving the the classification accu-140 racy of the target scene. Therefore, IMACO-BS is applied 141 and improved in this paper to solve the problem of fea-142 ture selection and image classification of cross-scene HSIs, 143 and the new algorithm is called cross-scene IMACO-FS 144 (IMACO-CSFS). 145 At present, there are also some related works that apply 146 ACO to the HSIs classification and feature selection, and 147 some achievements have been achieved. For example: Lit-148 erature [36] proposes a remote sensing image classification 149 VOLUME 10,2022 technology based on the optimal support vector machine and  search process can focus on the global optimal solution found 205 so far. The strategy can obtain more accurate feature subsets 206 of source and target scenes than other cross-scene feature 207 selection methods; 208 2) In view of the shortcomings of the original IMACO-FS 209 that the optimal solution converges too slowly in each itera-210 tion process, this paper proposes an ant colony strategy based 211 on elite ants: in order to make the current optimal solution 212 more attractive to the ants in the next iteration, this strategy 213 gives the optimal solution an extra pheromone after each iter-214 ation. The ant colony strategy based on elite ants can further 215 accelerate the convergence speed of the global optimal solu-216 tion of IMACO-FS, so as to obtain the optimal feature subsets 217 in the source scene and the target scene more effectively for 218 training the classifier; 3) In this paper, the feature selection method IMACO-FS 220 based only on a single scene is successfully applied to the 221 fields of cross-scene HSIs feature selection and cross-scene 222 image classification, which further improves the update strat-223 egy of pheromone in the ant colony algorithm, weakens the 224 influence of spectral drift on cross-scene image classifica-225 tion, and greatly improves the feature selection accuracy and 226 image classification accuracy of the target scene.

227
The rest of this paper is as follows: Section II intro-228 duces basic knowledge of ACO and IMACO-FS. Section III 229 introduces a cross-scene feature optimal selection algo-230 rithm, IMACO-CSFS, which extends the original single-231 scene-based IMACO-FS algorithm to make it suitable for 232 cross-scene feature selection. In Section IV, several algo-233 rithms are compared on three cross-scene HSIs data pairs and 234 the experimental results are analyzed, proving the superiority 235 of the proposed IMACO-CSFS algorithm. Finally, Section V 236 summarizes the work of this paper.

238
A. ANT COLONY OPTIMIZATION ALGORITHM (ACO) 239 ACO is a heuristic global optimization algorithm used to 240 find the optimal path. It was first proposed by Marco Dorigo 241 in his doctoral dissertation in 1992. This algorithm has the 242 characteristics of distributed computing, positive feedback 243 of information and heuristic search [49]. ACO imitates the 244 behavior of ants looking for the optimal path in the process 245 of finding food. This principle of finding the optimal path is 246 shown in Fig. 1. While ants travel, they leave behind a volatile 247 secretion called pheromone. Ants can sense the presence of 248 this substance during foraging and walk along paths with 249 higher pheromone concentrations. And each passing ant con-250 tinues to leave pheromones on the way, forming a mechanism 251 similar to positive feedback. Through this mechanism, ants 252 can finally find the best action path [50]. In IMACO-FS, the set A = {A 1 , A 2 , . . . , A n } repre-257 sents n graph nodes (a node represents a HSI feature), 258 represents a subset of features selected by the ants. Each    [40], which is set to n/2 in this paper).

299
The calculation method of the heuristic expectation η ij (t) 300 is shown in formula (1), where O ij is the OA calculated by the 301 SVM with feature i and feature j.
The pheromone calculation formula between feature i and 304 other n-1 features is shown in formula (2).
where O ij is the OA achieved by SVM with feature i and 307 feature j, O max and O min are the maximum and minimum OA 308 achieved by SVM between feature i and the remaining n-1 309 features. The transformation mode of ants' states is as follows: where q 0 = 1 − e −1/s (s = 1, 2, . . . , I M , s is the number 314 of iterations), q is a random decimal between 0 and 1, p k ij (t) 315 represents the probability that ant k selects the next feature 316 node j at time t, which can be calculated as follows: where α is the pheromone, β is the heuristic information, T 319 is the sum of I M iteration times, τ ij (t) and η ij (t) represent 320 the pheromone concentration and heuristic expectation on the 321 edge (A i , A j ) at time t, respectively, and tabu table k stores 322 the feature nodes that ant k has visited.

324
After each ant completes one cycle (iteration), the pheromone 325 concentration on the path is updated as follows: where τ ij (t + 1) and τ ij (t) respectively represent the 328 pheromone concentration on the edge (A i , A j ) at time t +1 and 329 time t + 1, and τ ij k is the pheromone newly left by ant k on 330 the edge (A i , A j ), the calculation method is as follows: where Q is the initiavalue of the pheromone secreted by the 333 ants at the beginning of an iteration, which is a constant, O max 334 is the maximum OA achieved by the SVM between feature i 335 and other o − 1 features in feature subset F k , t represents 336 time t, and T is the sum of I M iteration times. 337 VOLUME 10, 2022

338
The common HSIs feature selection method generally only 339 selects the optimal feature based on a single scene. The

365
The optimal path found by ant k so far will obtain additional 366 pheromone τ ij * , and the edges passed by the remaining 367 P − 1 ants can also obtain a certain amount of pheromone 368 τ ij k (k = 1, 2, . . . , P − 1). The ant colony strategy based on 369 priority sorting can further search around the current optimal 370 solution δ, thus obtaining the more accurate optimal feature 371 subsets M s and M t in the source and target scenes.

372
In addition, since original IMACO-FS has the disadvantage 373 that the convergence speed of optimal solution δ is too slow in 374 each iteration process, this paper proposes an ant colony strat-375 egy based on elite ants: in order to make the current optimal 376 solution δ more attractive to ants in the next cycle, this strat-377 egy will give the optimal solution an additional pheromone 378 τ ij * * after each iteration, and the optimal solution δ * is 379 completed after I M iterations is the global optimal solution, 380 and the ant that finds this solution is an elite ant. The elite 381 ants-based ant colony strategy further accelerates the conver-382 gence speed of the global optimal solution δ * in the original 383 IMACO-FS, thereby obtaining the optimal feature subsets M s 384 and M t in the source and target scenes more efficiently.

385
The newly proposed pheromone update formula can be 386 calculated as follows: τ ij k is the update of the pheromone content on the 390 path by the P − 1 ants according to their rank k, the calculation method of τ ij k is shown in formula (8), the calculation method of τ ij * is shown in formula (9), and the calculation 393 method of τ ij * * is shown in formula (10).
Among them, Q is the initial value of pheromone secreted   (12): 435

Input:
The labeled samples of source scene T s , The labeled samples of target scene T t .

Output:
Objective function δ * . 1: Train a classifier using randomly selected 1/10 of the target scene labeled samples and calculate D t by using equation (12) (see Table 1). 2: Train a classifier using randomly selected 1/10 the number of labeled samples from the source scene and calculate O s (see Table 1). 3: Train a classifier using randomly selected 1/10 the number of labeled samples from the target scene and calculate O t (see Table 1). 4: Use (11) to compute δ * . 5: return δ *

Input:
Parameter µ, Pheromone α, The labeled samples of target scene T t , The labeled samples of source scene T s .

Output:
Objective function δ * . 1: Initialize pheromone α using (1).   Table 2, which is very suitable for the experiment for 468 classification of cross-scene HSIs. For better comparison, 469 we select Huston 2018 images and Huston 2013 images with 470 a size of 935 × 209 pixels in our experiments. Fig. 4 shows 471 their false color map and ground truth map (the color of each 472 type of ground object is marked next to the ground truth map). 473 The second data pair is University of Pavia (source scene) 474 and Center of Pavia (target scene). The Pavia dataset was 475 acquired by the DAIS hyperspectral sensor. The size of the 476 PaviaU (University of Pavia) image is 243 × 243 pixels and 477 the number of bands is 72, while the size of the PaviaC (Cen-478 ter of Pavia) image is 400 × 400 pixels and the number of 479 bands is also 72. Although they have different pixels, they 480 have 6 identical land cover classes, and the number distribu-481 tion of their labeled samples is shown in Table 3    It performs I-ReliefF on the source scene and the target 512 scene respectively. CDIRF proposes a cross-scene feature 513 weight update rule, which uses two distance measures in 514 experiments. They are CDIRF1 using the absolute dis-515 tance and CDIRF2 using the squared Euclidean distance, 516 and the latter CDIRF2 performs better in the experi-517 ments as a whole. To verify the performance of fea-518 ture selection, CDIRF chooses SVM with radial basis 519 function (RBF) as the classifier. The parameters σ ∈ 520  It performs the hybrid whale optimization algorithm with 527 simulated annealing (WOASA) on the source and target 528 scenes, respectively. CDWOASA selects feature subsets that 529 are discriminative and scene-invariant by using information 530 VOLUME 10, 2022    it uses a combination of particle swarm algorithm and 565 minimum distance classifier to select the optimal OBFs. 566 PSO-OBFS also uses the chaotic random inertia weight 567 which was regarded as one of the best strategies [51]. Further-568 more, PSO-OBFS linearly reduces the acceleration coeffi-569 cient c 1 from 2.5 to 0.5 and linearly increases the acceleration 570 coefficient c 2 from 0.5 to 2.5 to avoid being trapped in local 571 optima [52]. It performs DRB-RESNET on a single scene. DRB-RESNET 575 uses an improved deep residual network to extract the spa-576 tial and spectral information of the hyperspectral image 577 after dimensionality reduction, and forms a fully connected 578 two-branch feature extraction network. It mainly uses two 579 networks of different depths: the upper network arranges the 580 processed images of the same size into the DRB-ResNet net-581 work of a specified depth (11 layers) for training, and sum 582 the feature extraction data of each region; the lower network 583 first reduces the dimension of the spectral information of each 584 pixel to be classified, and then expands it to 11 × 11 pictures, 585 and then inputs it into the DRB-ResNet network (7 layers) for 586 feature extraction. Finally, combined with the feature extrac-587 tion results of the upper and lower layers, the fully connected 588 network is used to obtain the final classification result [53]. 589 7) IMACO-FS (Feature Select Based on Improved ACO): 590 It selects the optimal feature only based on the labeled 591 samples of a single scene (target scene) to achieve the 592 highest possible overall classification accuracy. A new pre-593 filting method is introduced into IMACO-FS to optimize the 594 pheromone initialization of the ant colony system, thereby 595 speeding up the convergence speed of the ant colony sys-596 tem, and IMACO-FS also adopts pseudo-random rules and 597 adaptive information update strategy to maintain the diver-598 sity of the ant colony. In IMACO-FS, P (user-defined num-599 ber) features with the highest OA value including feature i 600 are selected, and P is set to n/2 (n is the total number of 601 bands) [40]. IMACO-FS initializes the control parameters 602   [45], [46], and the initial values of β and ρ are set to 605 1 and 0.2, respectively. β is an integer between 1 and 10, 606 and the step size is 1 each time it's updated, while ρ is a 607 decimal between 0.1 and 0.9, and the step size of each change 608 is 0.1 [34]. 609

8) Cross-Scene IMACO-FS (IMACO-CSFS): IMACO-610
CSFS is a newly proposed algorithm in this paper, which is 611 executed on the source scene and the target scene respectively. 612 IMACO-CSFS simultaneously considers the consistency of 613 the optimal feature subsets of the two scenarios and the dis-614 criminability of different kinds of ground objects in the target 615 scene. In the experiment, we use SVM with RBF and squared 616 Euclidean distance as the classifier. The maximum number of 617 iterations I M in equation (12) is set to 100; the ant colony 618 size n on each data pair is set to 5; the ant quantity P of 619 the ant colony is set as about 1.5 times of the number of 620 bands (The initial value of P is set to 0.1 times of the number 621 of bands, and it is gradually increased by 0.1 times until it 622 VOLUME 10, 2022   Fig. 7 and Fig. 8. In Fig. 7, (a)   are shown in Table 5. Table 5 shows that the D t in the first    Table 1 for details).  finally makes the features selected in the right part of Fig-805 ure 13 show a shape similar to ''pile''. Therefore, the features 806 selected by IMACO-CSFS around the optimal solution are 807 more evenly distributed than those selected by other methods, 808 IMACO-CSFS tends to select features that are more dispersed 809 and uniform, which benefits from the priority sorting-based 810 ant colony strategy. Fig. 14 contains the classification graphs 811 of several algorithms on PaviaC dataset when F n =40. Com-812 bining Fig. 5 and Fig. 14 Fig. 15 shows the classification accuracy of several algo-817 rithms under different feature dimensions on Hangzhou 818 dataset. As shown in Fig. 15, the OA, AA and κ of 819 IMACO-FS and IMACO-CSFS almost all reached their max-820 imum at F n =8. On the whole, OA and κ of IMACO-CSFS 821 is almost always larger than that of other several algorithms, 822 however, AA of IMACO-CSFS is inferior to that of them 823 when F n is big, its disadvantages is revealed with the increase 824 of F n . Note that when F n ∈ {8, 10, 12}, the κ of CDWOASA 825 is slightly higher than that of CDIRF, which corresponds 826 to the slightly higher κ of CDWOASA than CDIRF in the 827 Hangzhou dataset in Table 7 below. It may be because 828 CDWOASA provides slightly more mutual information about 829 the strong agreement between the ground truth map and the 830 classification map of the Hangzhou dataset than CDIRF. The 831 results of the features selected by the eight algorithms when 832 F n =20 are shown in Figure 16. It can be seen from Fig. 16 that 833 IMACO-CSFS tends to choose more uniform and dispersed 834 features. Fig. 17 shows the classification graphs of the eight 835 algorithms on the Hangzhou dataset when F n =8 and its orig-836 inal ground truth. Combining Fig. 6 and Fig. 17, we can find 837 that IMACO-CSFS has advantages in the land cover classes 838 Water and Land/Building, but the classification effect in the 839 class Plant is not as good as the other seven methods, the 840 reason may be that IMACO-CSFS has a weaker recognition 841 ability for this class, resulting in lower classification perfor-842 mance on this class.

844
Firstly, this paper compares IMACO-CSFS with the origi-845 nal IMACO-FS. Table 6      of IMACO-CSFS on the Pavia data pair is 7.9 seconds 926 longer than that of DRB-RESNET; the computation time 927 of IMACO-CSFS on the Shanghai data pair is 4.7 seconds 928 longer than that of DRB-RESNET. On the three data pairs, the 929 computation time of DRB-RESNET is always the shortest, 930 the second is IMACO-CSFS (the method proposed in this 931 paper), and the last one is the original IMACO-FS method: 932 the computational complexity of all eight methods is shown 933 above the column in Fig. 18. Furthermore, we can find 934 that the execution time of these algorithms on the Pavia 935 data pair is shorter than that on the Huston data pair and 936 the Shanghai-Hangzhou data pair on the whole. It may be 937 because the number of labeled samples of the former is 938 smaller than that of the latter. And the reason why the running 939 time of the algorithms on Huston data pair is greater than that 940 VOLUME 10, 2022   In this paper, a new cross-scene feature selection algorithm 954 IMACO-CSFS is proposed. In order to make the subsequent 955 search process focus on the global optimal solution (optimal 956 feature subset) found in the previous iteration, IMACO-CSFS 957 proposes a priority sorting-based ant colony strategy to obtain 958 more accurate feature subsets of the two scenes than other 959 cross-scene feature selection methods. Furthermore, in order 960 to further accelerate the convergence speed of the global opti-961 mal solution, IMACO-CSFS introduces an ant colony strat-962 egy based on elite ants to obtain the optimal feature subsets of 963 two scenes more efficiently. Finally, the original IMACO-FS 964 method based only on a single scene is successfully applied 965 to the fields of cross-scene HSIs feature selection and image 966 classification in this paper, which further improves the update 967 strategy of pheromone in the ant colony algorithm and greatly 968 improves the feature selection accuracy and image classifica-969 tion accuracy of the target scene. IMACO-CSFS's superiority 970 is demonstrated on three public HSI data pairs-Huston data 971 pair, Pavia data pair and Shanghai data pair. However, this 972 method is not perfect, as the improvement of AA in this 973 research was small, and we might incorporate improving their 974 AA into the scope of the follow-up work, such as selecting 975 some feature subsets M from the candidate feature subset F 976 to pick the best combination of AA, OA and µ. 977 supervised classification of remotely sensed hyperspectral images,'' IEEE