Hyperspectral Image Band Selection Based on CNN Embedded GA (CNNeGA)

Hyperspectral images (HSIs) are a powerful source of reliable data in various remote sensing applications. But due to the large number of bands, HSI has information redundancy, and methods are often used to reduce the number of spectral bands. Band selection (BS) is used as a preprocessing solution to reduce data volume, increase processing speed, and improve methodology accuracy. However, most conventional BS approaches are unable to fully explain the interaction between spectral bands and evaluate the representation and redundancy of the selected band subset. This study first examines a supervised BS method that allows the selection of the required number of bands. A deep network with 3D-convolutional layers embedded in a genetic algorithm (GA). The GA uses embedded 3D-CNN (CNNeGA) as a fitness function. GA also considers the parent check box. The parent check box (parent subbands) is designed to make genetic operators more effective. In addition, the effectiveness of increasing the attention layer to a 3D-CNN and converting this model to spike neural networks has been investigated in terms of accuracy and complexity over time. The evaluation of the proposed method and the obtained results are satisfactory. The accuracy improved from 6% to 21%. Accuracy between 90% and 99% has been obtained in each evaluation mode.


I. INTRODUCTION
R EMOTE sensing can be used to gather information about things or events that are happening on earth's surface [1]. Hyperspectral image (HSI) is a method that employs spectral differences to separate terrestrial objects as a result of advancements in image spectrometer technology. However, HSI creates an enormous amount of data that can be challenging to process fast. Therefore, it can be difficult to apply standard image processing methods designed for multispectral images [2], [3]. Without a question, HSI data offer scientists and researchers a wealth of knowledge. However, because it takes a lot of computer power to analyze these data, it is frequently difficult to make the most of HSI's potential. Large datasets maybe common image processing algorithms inefficient and classification accuracy may decrease. Several important factors can hinder classification in large datasets, including the following. 1) A class imbalance in a large dataset.
2) The existence of noise in educational data.
3) Variety and a high number of classes. 4) Nonintegrated category (nonintegrated pixels in each class). 5) Closeness and similarity of classes, etc. New dimensional reduction techniques and procedures are required to process HSI properly to get beyond these obstacles. Dimension reduction is the technique that minimizes the number of dimensions in the data while keeping most of the information. This will help open the full potential of HSI data while minimizing costs and time requirements [4], [5], [6]. One can reduce HSI dimensions by choosing bands or extracting features [7]. By applying specified criteria, feature extraction transforms original data into a different attribute space [8], [9]. Most feature extraction approaches combine all of the major bands linearly. Inappropriate criterion adoption can be troublesome since it makes it more challenging to evaluate the findings [10]. Many feature extraction techniques include the preprocessing phase of band selection (BS). Choosing a subset of desired bands from the main bands can be used to enhance the results' interpretability. This keeps the spectral meaning intact and boosts efficiency [11]. Feature extraction methods are essential for reducing the amount of data needed to be processed. It enables us to deal with less data while yet maintaining the most crucial information. However, it should be noted that not all feature extraction methods are created equal. It is important to note that these methods can often compromise or distort vital features in the data. This can significantly impact how these data are interpreted physically [12], [13].
When working with high-dimensional data, such as that included in HSI, feature selection is crucial. We may simplify the data and make it simpler to comprehend and deal with by lowering the number of dimensions. Feature selection techniques assist maintain the data's physical interpretation while lowering dimension. This is an essential step in HSI analysis and should be given careful consideration [14]. By doing this, we can ensure that we are selecting features that are related to the original data and preserving its meaning. Because the HSI is not useful or significant based on the ground cover classes, irrelevant and extra bands are frequently eliminated when choosing bands for HSI. However, analyzing these data can be rather difficult owing to the Hughes effect when the number of cases is significantly smaller than the number of characteristics, as is frequently the case with HSI data. As illustrated in Fig. 1, [(a) BS of the basic framework and (b) stages of the BS process], the general framework and process for BS typically consist of four major processes. The steps are as follows: 1) production of band subsets; 2) evaluation of band subsets; 3) stop criteria; and 4) validation of results [15]. HSI bands are an important part of the identification process for many people. It is crucial to choose carefully when choosing a band because doing so can be essential to success. Following these steps carefully and thoughtfully ensures that your final selection for HSI bands will be both relevant and accurate [16], [17]. Two main types of BS methods are supervised and unsupervised. Supervised methods require prior knowledge of the data, whereas unsupervised methods do not. The main difference between these two types of methods is how the criteria for BS are constructed. Supervised methods use predetermined design criteria, whereas unsupervised methods use data itself to determine which bands are most important. The best way for your application must be chosen because each sort of method has benefits and drawbacks [18]. The proposed models are more accurate when a smaller number of bands are processed. The results show that the number of spectral bands selected, the errors obtained, and the trend in the proposed models represent local or global minimums that determine the performance and accuracy of the optimal model. In addition, the problem of choosing the monitored method is considered by choosing the method that constitutes the most accurate model [19].

II. RELATED WORK
Principal component analysis (PCA) and spectral eigen decomposition are potent methods for dimensionality reduction and data analysis. PCA is a technique that finds the linear combination of variables that best explained the variance in a dataset. This can be used to reduce the number of dimensions in a dataset while preserving most of the information. In [20], PCA is used in the spectral dataset and is used to prioritize bands to maximize variance or SNR. In [21], combining supervised and unsupervised methods has various benefits to create a precise HSI BS system. Instance-based supervised neural computing was integrated with unsupervised techniques, such as information entropy, information divergence, and PCA-based band ranking. This produced superior performance results than using only supervised or unsupervised techniques. Combining several methods enables a more precise selection of bands most relevant to the task at hand, improving classification accuracy. In [22], four linear constraints were utilized to minimize band correlation: spectral angle, normalized mutual information (MI), Pearson's correlation coefficient, and Spearman's rank correlation coefficient. In [23], an unmixing-analysis-based alternative separability index approach was used for HSI BS. With this approach, correlated bands that cannot be effectively separated are forced to be removed from the analysis. Results in a more accurate and reliable final image. In [24], the interband redundancy was eliminated using methods of band decorrelation based on divergence. The first step was calculating the divergence between each pair of channels. The channels with lower divergence values were then selected for each band's representative channels. This approach resulted in improved image quality and a reduction in processing time. Finally, the remaining bands were prioritized according to their significance. The authors utilized two approaches; a clustering-based method and a ranking-based method to choose which bands to use for HSI. It was improved to make the "rapid density-peak-based clustering" approach more appropriate for HSI BS. The "isolated-point-stopping criterion" determines the number of bands that are chosen. When used on two independent datasets, this methodology surpassed earlier techniques, according to this study [25].
In [26], the pointwise-ranking-based HSI BS proposed by these authors is very efficient and can be used to select an appropriate subset of bands for further analysis or classification tasks. This approach, based on the average correlation of labels supplied by a trained nonhomogeneous hidden Markov chain model performed to wavelet processed HSI data, ranks spectral bands. The importance of each band in an HSI is evaluated in [27] and [28] in accordance with MI between the HSI bands and a common reference band. One of the most efficient ways is the clustering-based steering-based HSI band reduction method [10]. This method chooses representative bands from the clusters based on information metrics, such as MI or Kullback-Leibler divergence. This lowers the number of dimensions while enabling a more accurate depiction of the data. A brand-new unsupervised band reduction method called BandClust has been studied in [29]. This approach splits each band interval into two disjoint contiguous subbands based on minimization criteria of MI between averaged subbands. As a result, there are fewer bands while the majority of the spectral data is still preserved. One such semisupervised technique, band clustering, which uses class spectral signatures for band clustering, was presented by the authors in [30]. Each cluster center is used as a representative band once outliers from the generated clusters have been eliminated. A system for automatically removing water-absorbed, low discriminating, and high SNR bands was put forth [31]. The authors described feature mining as a useful method for identifying certain representational bands [32]. It accomplishes this by examining the connection between each band cluster. According to Zeng et al. [33], convolutional autoencoders are employed to learn the features of each data point in the cluster. The ideal band is chosen for each cluster using this information. In [34], the HSI BS issue was addressed by the authors using graph algorithms. Data analysis may be done mathematically using the rough set theory. Liu et al. [35] used to identify the most important bands in an HSI. This is done by first reducing the number of dimensions in the HSI so that it can be more easily analyzed. The significance of each band is then determined using a forward greedy search technique. The algorithm starts with the most significant band and searches for the next most significant band until all of the bands have been evaluated. In [36], the performance of the wrapper-based semisupervised HSI BS techniques was improved by the authors through the introduction of the usage of guided filter pseudolabels. This approach is based on a novel two-stage filtering procedure that first uses unsupervised learning algorithms to automatically identify relevant bands for further analysis and then uses a supervised classifier to label them. In [37], a three-step strategy is proposed. This approach begins with breaking down HSI bands into band subgroups. This can be done by dividing the spectrum into regions based on wavelength and then assigning each pixel to a band according to its location in the spectrum. The second step is to select nonredundant bands from the decomposed subsets. This can be done by ranking the bands based on their spectral information content and selecting only the top N bands, where N is a user-defined parameter. The third step is to use these selected bands for classification or other analysis tasks. Evolutionary methods are a family of search algorithms that use principles of natural selection and genetics to optimize solutions. Numerous evolutionary techniques have been used for effective band searching, including the genetic algorithm (GA) [38], particle swarm optimization [39], and firefly algorithm [40]. An innovative-supervised-filter-based method for BS using neural networks is suggested in [41]. A binary single-layer neural network classifier creates a classification between each class in the dataset and the rest of the data for each class in the dataset. The procedure of choosing the bands is then class-oriented since the largest and lowest weight bands are chosen next. Up until the predetermined number of bands is reached, this procedure iterates.
In [42], semisupervised BS using an upgraded Levy flight based variant of the GA is performed. In the suggested semisupervised strategy, both spectral similarity and spatial proximity are used to increase the number of training examples. The metaheuristic hybrid rice optimization (HRO), which has been effectively used in BS, roughly divides its population into three groups with an equal number of members based on self-equilibrium and symmetry. However, the main HRO has significant limits when it comes to the local search for better alternatives, and this could lead to the missing out of a good option. For BS, a modified HRO based on a differential evolution operator and an opposition-based learning technique is proposed in [43].
According to the authors' formulation in [44], the HSI BS problem evaluates the performance of every sparse band combination. It is a multitask sparsity pursuit problem. It is crucial to find the best answer to this issue as it can enhance how well image processing algorithms function. An attention module and an autoencoder are combined to form a neural network module that Dou et al. [45] called an attention-based autoencoder model. The informative band subset is chosen by the attention module, and the autoencoder only uses this subset to reconstruct the input data. When there is a lot of noise in the input data, this method can help an autoencoder perform better.
In [46], a multicriteria semisupervised model is developed for the selection of hyperspectral picture bands. The model is broken down into two separate tasks. The first task evaluates the amount of information and redundancy contained in the chosen bands using unlabeled samples, whereas the second task evaluates the discrimination of the chosen bands using examples that have been labeled. In order to optimize this model, a multitask optimization approach is developed to aggregate the data from the bands and expedite the search for viable bands. In [47], from the labeled data, signature patterns are extracted with minimum and maximum reflectance values for each class, which are then quantized. The quantization process is carried out repeatedly until distinct patterns are found for each class. Finally, to guarantee that the selected bands have the least possible redundancy, bands with the highest correlation and lowest variance are discarded.
Cao et al. [48] propose a supervised BS technique based on the local spatial information of the hyperspectral picture and the wrapper method in light of the special characteristics of HSIs. The suggested technique consistently outperforms the traditional wrapper method by making use of the data from both labeled and unlabeled pixels in the HSI. In [49], unsupervised HSI BS using band grouping and adaptive multigraph constraints was suggested. When using a band grouping strategy to create a global similarity matrix, the problem of disregarding substantial correlations across neighboring bands is resolved. In contrast to prior research work that was limited to fixed graph restrictions, this approach creates a global similarity matrix by dynamically altering the weight of the local similarity matrix.

III. MATERIALS AND METHODS
The methodology section includes descriptions of the suggested approach, algorithms used, and the datasets. The method of merging algorithms served as the foundation for the BS strategy employed in this investigation. The evolutionary algorithm, one of the most widely used search techniques, has been integrated with artificial intelligence, particularly deep learning, to choose the right number of bands and reduce the dimensions for issues, such as the proposed method for classifying HSI. The convolution network as a fitness function, implemented in the evolutionary algorithm, is recommended as a mechanism for getting suitable and optimal bands in the processing of HSI. This mechanism is more effectiveness of genetic operators.
The structural alteration and transformation of the suggested neural network model in the proposed method have been reviewed and assessed in the Results and Discussion section with the goal of increasing accuracy and decreasing processing speed.

A. Convolution Neural Network Embedded GA (CNNeGA-Proposed Method)
Reducing the dimensions of input data is effective in many important problems and processes. Dimension reduction as a preprocessing issue in HSI processing is also done by finding appropriate subbands, which will be efficient in the next steps and subsequent processing of related issues. To find suitable subbands, we suggest using a GA and a convolutional neural network with 3-D layers (3D-CNN), which are embedded in the GA as a fitness function, which is responsible for classifying the input data. A population of subbands of the main HSI and the images produced from these subbands is applied to the proposed model, and the results of the classification are used for ranking and choosing suitable subbands. Finding the suitable subband and finally reducing the dimensions as the final result, using this combination, have been done with better and acceptable success. The combination of the GA with the designed 3D-CNN, which we named this combination CNNeGA, using the GA a population of solutions is generated. Then, each solution is applied to the convolutional neural network to evaluate its effectiveness and the result of the classification is examined to evaluate that solution. Better solutions are used with existing successful solutions by applying common operators in the GA (selection, crossover, mutation, etc.) to produce new generations and superior solutions. This process is repeated until the best solution is found and proposed. The pseudocode and its implementation steps are also shown in Table I, and the structure and flowchart of the proposed method (CNNeGA) are shown in Fig. 2.
One of the key advantages of 3D-CNNs is their ability to quickly identify patterns in 3-D data. This is important because many real-world problems involve 3-D data, such as recognizing objects in images or videos. Traditional neural networks are not suitable for this type of data and can often struggle [50], [51], [52], [53], [54], However, 3D-CNNs are specifically designed to handle this type of information and can achieve much better results than traditional networks. 3-D convolutional neural networks are the future of machine learning and can learn from data much faster than traditional neural networks [55]. Traditional neural networks use the process of back-propagation to learn from data. This process can take a long time, especially when a lot of data is being processed. 3D-CNN uses a different process called convolutional layers, which can significantly speed up the learning process. Convolution layers divide the input data into small pieces and then process them in parallel. This allows the network to learn faster and more effectively than traditional networks. In addition, 3D-CNNs can more accurately represent complex patterns in datasets [56], [57]. The complexity layer, composite layer, and fully connected layer are some of the layers that make up the 3D-CNN, a CNN multilayer neural network. The first CNN model's convolution layer is used to execute the convolution operation on the input data. The 2D-CNN models can only extract spatial information, hence, the 3D-CNN-based model can not only extract spatial characteristics but also derives a spectrum for such models, which is why it is chosen and used. In comparable circumstances, models based on spatial extractors perform significantly better than spectral-spatial feature extractors.
In remote sensing applications, 3D-CNN can be used efficiently as these data include spectral and temporal features [58]. The ReLU activation function is used in the middle layers of the convolution network. ReLU has better behavior and is the most recommended activation function [59], [60]. At the ReLU output, a negative value is filtered to zero. The fast convergence of ReLU is what enables the activation function of CNN layers. This component allows two jumpers for reliable network operation. The problems of vanishing gradients are significantly reduced and saturation is avoided. The structure of the 3-D convolutional neural network designed and embedded in the GA algorithm, which is used here as a fitness and classifier function, and how to apply the data as input and output are shown in Fig. 3.
Each of the test data is placed in 3-D form (cube) in the input layer, and as mentioned before, while passing through each 3-D convolution layer, they gradually pass through the ReLU function. Also, the dropout technique is used to increase the efficiency and accuracy of the network before using the output layers. Regarding the input data to the network, it is necessary to explain that the spectral channels in each subset of the band applied to the convolution network are arranged in ascending order. As a result, data from the same channel are sorted into the same row. Therefore, the spectral background correlation will remain stable in any selected band subset [61]. When 3D-CNN is applied to the hyperspectral input, the results will be as follows: where K is demonstrated as the spectral size of the 3-D kernel.

B. Genetic Algorithm
GAs effectively solve many optimization problems, including constrained optimization problems, scheduling problems, routing problems, design optimization problems, and extracting suitable features. They are also more efficient than other search methods, such as hill climbing or simulated annealing, because they can be implemented in software or hardware for real-time applications, such as control systems or machine learning [62], [63]. Fig. 4(a) displays the GA's basic building blocks. The GA starts with a population of potential solutions or chromosomes. Then, it evaluates the fitness of each chromosome using fit criteria. Chromosomes that are better than others are more likely to reproduce and produce chromosome offspring. This process is repeated until a satisfactory solution is found. GAs have been used in various fields, such as image processing and HSI BS. In HSI, the goal is often to find an optimal set of bands that can be used for classification or feature extraction tasks. A GA-based approach is effective in quickly and efficiently finding good binding combinations. The main steps of using GA to select the HSI band are as follows.  [64], [65], [66]. The fundamental building blocks of a GA are chromosomal representation, fitness function, and genetic operators, including crossover, mutation, and selection. After randomly initializing a population of chromosomes, new chromosomes are produced by updating the genes on the pre-existing chromosomes in accordance with the fitness function. The best chromosomes in the population are selected for reproduction to produce new offspring, with some crossover between them to create diversity. This mutation occasionally introduces new solutions to the population. The GA is iterated until a satisfactory solution is found or the termination criteria are met [67]. The structure of this algorithm and the details of the mutation and crossover operators used in the basic GA are shown in Fig. 4 for a better understanding.
To generate a new population, the operators used in the GA used in this article are as follows, and their performance is shown in Fig. 4(b). The crossover operator is of the two-point type and the mutation operator is also selected from the flipping type. As can be seen in Fig. 5, in the crossover operator, a part of the selected subband is combined with another part of the selected subband and a new chromosome is produced. In the filling mutation operator, two selective bands are shifted with each other and a new chromosome is produced. For the selection block in the GA, the rotating wheel selection operator is also used.

C. Check Parent
Genetic operators (crossover, mutation) are used to generate and improve new chromosomes. The repetition and commonality of some elements (bands) in parent chromosomes (subbands) during population production is a well-known and unavoidable problem. This defect will make the process of selecting subbands difficult. Also, this causes the ineffectiveness of crossover and mutation operators in the GA. To overcome this problem, our proposed solution is to add a box to check parent chromosomes before entering the crossover or mutation operator. Fig. 4(a) and (b) shows this box and its location. Fig. 4(c) also depicts the proposed solution's details and implementation steps. After evaluating the initial population, P1 and P2 are selected as parental chromosomes. These parent chromosomes are selected in the crossover operator to generate new chromosomes C1 and C2, and similarly in the mutation operator, P, MC, to generate C. This process is repeated if P1, P2 for crossover and P, MC for mutation are not equal. If the constituent elements of the selected chromosomes (P1ࣔP2, PࣔMC) are not the same, therefore, none of the genetic operators will be applied to them. First, it will be explained how to implement the steps of the proposed solution, before the crossover operator, and similarly, it will be done about the mutation operator. In step 1, a list of common bands in P1 and P2 is prepared; the length of this list is considered to be L. If the length of L is greater than 1/2 b (b is the length of P1 or P2) (that is, if the number of bands in the list of L is more than half of the bands of P1 or P2), then the bands in the list of L with the probability of L/b are directly placed in C1 and C2, and the crossover operator is applied to P1 and P2. If this possibility (P(L/b)) fails and a band in P1 or P2 has not been repeated several times, the process is terminated by applying the crossover operator to P1 and P2. Because the initial population is produced at randomly, the presence of a repetitive band in some chromosomes is obvious. In the second step, a list of repeated bands in P1 along with their number, which is M, is prepared. Each repeated band, along with its additional number, is colored in Fig. 4(c) and Step 2. In this step, the extra bands are removed from P1. In the following, each of the additional repeated bands is replaced with the probability P(τ ), which is equal to 1/K from the list of all bands except the repeated bands. This step is repeated similarly for P2. After this step, the crossover operator is applied to P1 and P2. This process and its stages are repeated in the same way before the selected chromosomes enter the mutation operator. This solution was presented to prevent the repetition of a band and maintain effective common bands in each band category in the generated population. It also avoids the ineffectiveness of crossover and mutation operators in the face of similar and repeated bands. Maintaining effective and suitable bands during this process on each selected and produced chromosome is also one of the advantages of this proposed solution. All operations in this process deal with the ID of each of the bands, not the content of the bands. As a result, it does not have an influential role in computational complexity. Table II shows the pseudocode of the parent box check.

D. Dataset
The efficiency of the suggested BS strategy for categorizing HSI land cover is assessed using three sets of publicly accessible HSI remote sensing data. These are the key characteristics of this dataset.
The first HSI dataset was gathered by AVIRIS sensor over the Indian Pines (IP) test site in North-Western Indiana and consists of 145×145 pixels and 224 spectral reflectance bands in the wavelength range 0.4-2.5 × 10 −6 m. The IP setting is made up primarily of agriculture, with the remaining third being either forest or other types of perennial forest vegetation. Two      Table VII.

IV. RESULTS AND DISCUSSION
In this section, several comparative experiments have been planned, developed, and used on four HSI datasets in this part. The outcomes of these comparisons have been assessed and analyzed using six competing methodologies. Analyzes and discussions are done in several categories, which are as follows.
1) Parameters and settings are required in the proposed algorithm and method. 2) Comparison of the accuracy obtained from the results of the proposed method with six competing methods.

3) Distribution of selected and number of bands in each tested
dataset. 4) Selected ideal and recommended bands. 5) The results of the classification of images with the real earth image using recommended bands. 6) The distribution of selected bands along with the spectrum of classes in each batch of experimental data. 7) Complexity analysis. 8) Evaluation and effectiveness of increasing layers of attention in performance. 9) Converting the 3D-CNN model to a spike neural networks (SNNs) and checking the results. To show and compare the results of the proposed method fairly, three competing methods have been selected. These methods are selected from supervised categories, and named according to their references as follows.
3) SLN_BS (single-layer neural networks BS) [41]. The focus of this article is on the relative comparison of the results with the mentioned methods.

A. Implementation Details
In the experiment, the simulation and modeling process in this research has been implemented through Python. Using the Tensor-flow and Keras libraries, on the Collaboratory platform, as well as using GPUs with free access and suggested by this platform. The early stop strategy has been employed in the validation to avoid the model from becoming overfitting and to shorten the proposed method's execution duration. We set the Epoch number to the lowest value of 35 because the 3D-CNN 35 to 50 epoch range produces the maximum level of classification accuracy.
The ReLU and SoftMax activation functions were employed, respectively, in CNN's middle and output layers. The number of the selected band (BS) was considered as one of the set parameters of the first layer filter in the 3D-CNN as (3 * 3 * BS) (see Fig. 3), which means that in all BS modes, the value with the desired number is considered for the selected band category. Also, that each iteration, 20% of samples are used for training the network, and the GA's iteration is set at 100.

B. Parameters and Settings
This section includes a list of the various settings and the number of parameters for the 3D-CNN in the proposed method. The different parameters and BS modes are shown in Tables VIII  and IX. These settings include the initial value of the parameters used in GA and the numerical number of parameters derived from the convolution network, which is included in the GA. Each of the operators used in the GA, including the ranking and selection of the optimal chromosome, crossover, and mutation, has a value as a probability, and the optimal value of this probability is recorded during the initial settings and used in the process of running the algorithm has taken. Also, the number of parameters or weights used in the convolutional neural network, in selecting different bands from 5 to 30 bands, is shown in these tables.

C. Performance and Accuracy Analysis
The level of accuracy attained in classifying the image derived from the search for the ideal bands chosen by the proposed approach is shown in the first evaluation of the methods taken into consideration this article. The accuracy curve is created by changing the chosen band number, b, which ranges from 5 to 30 with a five-point interval, and the average accuracy bar is created by averaging the categorization accuracy rates used as a whole. It is necessary to explain that in all the evaluations performed on the desired and tested data, the accuracy was obtained and the result of the evaluation of the classification accuracy for the full band mode is the same according to the number of calculations and the length of time. The accuracy obtained in the 30-band mode is considered and excluded from further evaluation in the selected band number set. Table X contains information about the accuracy of image classification HSI and the numerical comparison of the obtained results with the reference methods. This table's best and highest accuracy values are related to the results obtained from the proposed CNNeGA method. In this table, the maximum and minimum values obtained by competitive methods can be seen. The bold black font corresponds to the maximum precision values in each of the selected bands. The differences in the value of accuracy of the competing method with the proposed method in all cases and tested data are about 6% to 21%. The evaluation of the obtained accuracies about testing the data related to the IP dataset shows that the maximum accuracy values obtained in the competing methods are related to the BS_UQ method and are between 0.85 and 0.9508. These values are lower than the proposed method in each selected band category from 1% to 5%. Also, the SLN_BS method has the lowest values obtained in this group of the IP dataset. The accuracy obtained by the proposed method (CNNeGA) is from 0.9085 to 0.9661. The accuracies obtained from testing the data related to the second SA dataset also show that the maximum accuracy values obtained in the competing methods are related to the BS_UQ method and are between 0.922 and 0.972. The accuracy values are lower than the proposed method in each selected band category from 1% to 3%. In this group, the LSI_BS method also has the lowest values obtained. The accuracy obtained by the proposed method (CNNeGA) is from 0.93007 to 0.982. The accuracies obtained in the third group, the test of the PU dataset also show with a slight difference that the maximum accuracy values obtained in the competing methods are related to the BS_UQ and SLN_BS methods and are between 0.906 and 0.97, of course, the SLN_BS method is in the category bands 15, and 20 has the highest amount of accuracy obtained. The accuracy values are lower than the proposed method in each selected band category from 2% to 8%. Also, the LSI_BS method has the lowest obtained values. The accuracy obtained by the proposed method (CNNeGA) is from 0.9807 to 0.9973.
The accuracies obtained from testing the data related to the last dataset H also show that the maximum accuracy values obtained in competing methods are related to the BS_UQ and SLN_BS methods and are between 0.892 and 0.9583, although the SLN_BS method is in the category bands 5, and 15 has the highest amount of accuracy obtained. The accuracy values are lower than the proposed method in each selected band category from 2% to 12%. In this group, the LSI_BS method also has the lowest values. The accuracy obtained by the proposed method (CNNeGA) is from 0.9206 to 0.9807. The proposed method as well as the SLN_BS method are class-based and therefore based on classifier training. Of course, this issue in the proposed method is due to the use of a deep network based on CNN layers, which has higher accuracy in teaching and predicting the class. The SLN_BS method has an approach based on the use of machine learning. It seems that the approach of using a deep network on CNN in this problem is good and has the highest level of accuracy in the obtained results. But the SLN_BS method has more speed and less computational complexity than the proposed method. In the LSI_BS method, accuracies have been obtained using the SVM classifier. SVM classifier is one of the most common algorithms used in the classification of many problems and is the basis for comparing the proposed methods with this method. Of course, here too, the accuracy of the proposed method was proven higher than the LSI_BS method due to the use of a deep neural network as a classifier. This method has average computational complexity and speed between the proposed method and the SLN_BS method.
The level of accuracy attained in classifying the image derived from the search for the ideal bands chosen by the proposed approach is shown in the first evaluation of the methods taken into consideration in this article. The accuracy curve is created by changing the chosen band number, b, which ranges from 5 to 30 with a 5-point interval, and the average accuracy bar is created by averaging the categorization accuracy rates used as a whole. It is necessary to explain that in all the evaluations performed on the desired and tested data, the accuracy was obtained and the result of the evaluation of the classification accuracy for the full band mode is the same according to the number of calculations and the length of time. The accuracy obtained in the 30-band mode is considered and excluded from further evaluation in the selected band number set. Table X contains information about the accuracy of image classification HSI and the numerical comparison of the obtained results with the reference methods. This table's best and highest accuracy values are related to the results obtained from the proposed CNNeGA method. The BS_UQ method has the closest results to the proposed method compared to the other two methods.
This method is based on feature extraction and the use of statistical methods to select a data-oriented band without user intervention. Although data labeling is used in this method, it does not depend much on this technique. But the BS_UQ method used 60% of the samples for training and 40% for testing the classifier to check the accuracy. Compared to the proposed method, more data (about three times) has been used for training and testing the classifier. In this method, an SVM classifier is also used. This method is almost equal to the LSI_BS method in terms of computational complexity. In general, the proposed method has better performance due to the use of less number of training samples, higher stability, and accuracy of classification in the face of various data. The proposed method shows promising results compared to the other mentioned methods in that the classification accuracy of PU and IP datasets reaches more than 99% and 98% in some cases. However, less than 97% was obtained for some datasets, this may be because the data (classes) are larger and imbalanced, have a relatively lower spatial resolution, and also have a much larger number of cover classes. The selection of bands efficiently establishes the resolution between various land cover classes without data loss or spectral distortion. Figs. 5-8 show the four datasets IP, SA, PU, and H separately so that you can visually assess the effectiveness and classification accuracy of the selected band set of 5 to 30 bands using the advised CNNeGA method as well    as the variation between the prediction of the obtained images and the actual image.
Separately, the classification maps in this method and the ground truth in the IP dataset, including 16 feature categories in Fig. 5(a)-(f), in the Salinas dataset with 16 feature categories in Fig. 6(a)-(f), the Pavia University dataset, including 9 features, are shown in Fig. 7(a)-(f), and the Houston dataset, including 9 features, are shown in Fig. 8(a)-(f). As shown in Fig. 5, the proposed method gradually achieves better classification results in four different datasets by increasing the number of selected bands.

D. Band Selection
As shown in Table X, the proposed method gradually achieves better classification results in different datasets by increasing the number of selected bands. Figs. 5-8 show the accuracy, quality, and performance of the proposed method as it progresses toward increasing the number of bands in each desired image in each of the selected bands and improving the classification. Table XI shows the proposed and ideal band using the CN-NeGA method tested and displayed on each of the datasets separately. In addition to displaying the selected bands, the accuracy values obtained in the classification of each category of bands have also been re-entered. The display and distribution of the ideal bands selected in the CNNeGA method in the corresponding Fig. 9, and for each of the IP, SA, PU, and H test data separately and after the table of recommended bands for each, have been shown. If the selected bands contain too much redundant information, test data with this method are not suitable for classification tasks. First, we investigated the distribution of the selected bands to analyze the additional information in the selected bands with the proposed BS method. The selected and recommended groups in each category, from 5 to 30, are shown in different colors in Fig. 9. The uniform distribution of the selected bands, which are chosen from the entire spectrum of bands, serves as additional evidence that the suggested strategy was chosen well. Using this method to choose a band has several advantages, including the straightforward implementation and coding of the method and access to resources, such as the Colab platform, which is available without charge. The selected bands and the class spectrum curves from four different datasets are shown in Figs. 10-13, respectively, in each group of bands from 5 to 30 bands. In the shown graphic, each vertical line denotes where a certain band is located. The suggested BS approach may choose bands with low redundant information and  a distribution that is close to uniform, according to experimental data. Using the proposed method, a set of influential bands have been identified and spectral bands that do not contain useful information have been avoided. Fig. 9, with Figs. 10-13, is a complete visual representation of how to select a band using the proposed method. The evaluation criteria of the studied and proposed methods are the accuracy level obtained in the classification in each of the selected band categories and on each of the experimental data. From the results shown in Table X, and the accuracy obtained in classifying images into different categories in Figs. 5-8, and on the experimental data, we can see that our method achieves the highest performance in each dataset and our hypothesis confirms that. The results, which are consistent even under investigation, show that there is a significant gap in performance between the proposed method and competing methods in the accuracy proposed method and competing methods in the accuracy achieved in band classification and selection. There are limitations in the implementation and the proposed method, which can mainly be pointed to the amount and time of calculations of this method, which is considered of the major limitations. The increase in the number of parameters of the CNN used, which increases in each of the selected bands, leads to an increase in the time and computation.

E. Effectiveness of Increasing Attention Layers in Performance
In this part, we pay attention to one of the ideas added to this article, that is, the level of effectiveness and adding attention layers to the 3D-CNN structure. The attention mechanism is used in many methods and increases accuracy, especially in classification methods. In the final part, after selecting and finding suitable bands (recommended band sets in Table XI), we evaluated the performance and classification accuracy of each band category by applying these bands to the new 3D-CNN network structure. This evaluation was done only to confirm the performance of the proposed method and retest the recommended subbands in each dataset. The performance evaluation showed that the accuracy of the classification is ∼ 3% more and increased than the previous value. Although the evaluations show good performance, satisfactory results, and the increase in parameters (2 to 3 times the structure of the 3D-CNN network and the parameters shown in Table IX), as a result, the computational complexity is one of the serious challenges. Fig. 14(a) shows the change in the 3D-CNN structure with the addition of the attention layer. In this structure, the attention layer is used in two forms: the spectral attention module and the spatial attention module, which are shown in Fig. 14(b) and (c). The weight of attention paid to each of the retrieved spectral and spatial features is altered by the spectral and spatial attention module, which is positioned after each 3-D CNN layer. Each channel generates additional channels with distinct information after being processed by various convolution kernels. Assume that each channel will have weights added for display. A stronger link and relationship between the channel and the important information imply a heavier weight. As a result, the appropriate channel needs to receive greater focus. Each feature channel's relevance is modeled by the spectral attention module, which subsequently boosts or suppresses them depending on the task.  A spatial attention map develops details by utilizing the link between the spatial features of the information in concentrated chunks. Spatial attention is distinct from spectral attention. In the structure of 3D-CNN with attention layers, there are second paths to extract the cube features of the input image. In this structure, one path is used to extract spectral features and the second path is used to extract spatial features. Finally, after connecting the output of these two feature extraction paths, the feature map obtained in the last layer of 3-D CNN (C4) from the global average pooling (GAP) module has been used to create the feature vector. In the networks that use the fully connected layer for classification, the output feature maps are given to SoftMax after joining each other, but in this method, a feature map is generated for each class after the last CNN layer. The GAP layer is added to the network. By adding a GAP layer on top of the feature maps, they benefit from the feature maps and the feature vector results are directly given to SoftMax. One of the characteristics of the GAP layer is that it is not more than a problem due to the lack of parameters for optimization, and on the other hand, this layer is more resistant to local changes and more compatible with convolution networks [79], [80], [81], [82], [83], [84], [85], [86], [87]. Table XII shows the accuracy result obtained using the 3D-CNN network based on the attention layers, which shows the four datasets of IP, SA, PU, and H. It seems that in datasets where the similarity between classes (spectral similarity), such as on the SA dataset, the accuracy obtained by using the attention mechanism is not much different from the previous values.

F. Spike Neural Networks
SNNs are biologically plausible counterparts of artificial neural networks (ANNs). ANNs are usually trained with stochastic gradient descent, and spiking neural networks are trained with spike-timing-dependent plasticity. Training deep convolutional neural networks is a memory and power-intensive task. SNNs could potentially help in reducing power usage. There is a large pool of tools for one to choose to train ANNs of any size. All the available tools to simulate spiking neural networks are geared toward computational neuroscience applications. SNNs promise that they are less computationally intensive and much more energy efficient because it runs asynchronously using spikes. SNNs have gained massive attention as a potential energy-efficient alternative to conventional ANNs due to their inherent high-sparsity activation. However, most previous SNNs methods use ANN-like architectures, which can provide optimal performance for the processing of binary information in SNNs. This section focus on implementing a deep spiking CNN. The 3D-CNN (proposed model) converts into an SNN network to implement our idea for providing a method. Of course, our approach in this section is only a reference to the conversion method and some components and techniques in the use of SNNs [88], [89], [90], [91].
ANNs and SNNs can model the same types of network topologies, but SNNs trade the artificial neuron model with a spiking neuron model instead. The artificial neuron model much like spiking neurons operates on a weighted sum of inputs [see Fig. 15(a)]. In this work, spiking convolutional neural networks are used for feature extraction. To explain, consider the convolution kernel W SC1 (i, j, k, 1). This kernel is used to find spikes at any location of the spiking input image. If there is a spike in the spiking image that matches up with the kernel, then this result will be a maximum (maximum correlation of the kernel with the image). The accumulated membrane potential for the neuron at location (x, y, z) of map1 of the SC1 layer is given by j, k, 1)) . ( The neuron at (x, y, z) of map 1 of the SC1 layer then spikes at time t if V (1) m (x, y, z, , t, 1) ≥ γ sc1 (3) where γ sc1 is the threshold. If the neuron at (x, y, z) in map 1 of SC1, then a vertical line of spikes has been detected in the spiking image centered at (x, y, z). Similarly, feature maps will be generated in layers SC2 to 4 [92], [93], [94]. Fig. 15(b) shows the structure of the SNN based on convolutional layers. The structure of the 3D-CNN network model proposed in this article has been transformed into an SNN network without any fundamental changes. This conversion has been done using existing frameworks designed in Python.
Also, the spiking-ReLU activation function is used in the output section of each layer of the network. The input data have also been applied to the network in spiking form. To generate input data spiking form, the resulting image of the selected bands is passed through an on-center and an off-center difference of the Gaussian [DoG in (4)] convolution filter. The output of each of the two DoG filters is computed using the same mode of convolution. To generate input data spiking form, the resulting image of the selected bands is passed through an on-center and an off-center DoG convolution filter. The output of each of the two DoG filters is computed using the same mode of convolution where σ1 and σ2 are for the on-center and off-center between a and b [95]. The classification accuracy with the new modified network has been tested only for all datasets. This advantage was investigated in the implementation of the proposed network. The results of checking the speed and execution time of the proposed network show that the speed and processing time compared to the 3D-CNN network with equal conditions show a reduction of more than 50%. Table XIII shows the classification accuracy values and the run-time (time complexity) of 3D-CNN and SNN models. Also, the best results are shown in bold.
This conversion has been done using existing frameworks designed in Python. To simulate the model and evaluate its performance, the Nengo library has been used in Python. Nengo is a Python library for building and simulating large-scale neural models. Nengo can create sophisticated spiking and nonspiking neural simulations with sensible defaults in a few lines of code.
To study and access this library, visit https://www.nengo.ai. In the last stage of this experiment, we considered the conditions slightly different. For the same run-time in the 3D-CNN model, it was necessary to increase the number of epochs. The number of epochs for the same run-time and the SNN model is approximately 120 to 160. Also, the accuracy in the SNN model has increased to 4%. The results obtained in this case from the experiment were evaluated as suitable results compared to the 3D-CNN model. The results for equal execution time, accuracy for 5BS to 30BS mode, are obtained, as shown in Table XV. Reducing the parameters (due to changing the neurons of the network) is effective in improving the accuracy. For equal execution time, the number of network cycles is increased. According to these conditions, an increase in accuracy was expected. In addition, energy consumption should be added to the advantages of this model of neural networks compared to the ANNs. Finally, Table XIV compares the accuracy value obtained from the classification of subbands in the methods evaluated in this article is displayed. The accuracy results of the proposed method have also been compared in three different modes (3D-CNN, 3D-CNN with attention layer, and 3D-CNN to SNN). As mentioned in the previous sections, the proposed method with the SNN neural network model has the highest classification accuracy compared to competing methods. According to the cases and advantages of SNN neural networks that were mentioned earlier, an increase in accuracy has been achieved. Fig. 16 also shows the accuracy comparison curve of the evaluated methods in this article.

G. Computational Complexity Analysis
After comparing the differences in accuracy, one of the most significant indices used to assess and evaluate the quality or quantity of approaches is the computational complexity index. The suggested technique includes the following mention of this index.
The complexity of the GA is affected by the population, the genetic operators and how they are implemented (which may have a considerable impact on total complexity), and obviously the fitness function. The GA has an O (P * G * O (Fitness) * ((Pc * O (crossover) + (Pm * O (mutation))) complexity. The complexity depends on how many things there are, how many generations there are, and how long it takes to process each generation. Additionally, the complexity is transformed to O (O (Fitness) * (O (mutation) + O (crossover)), and P, G, Pc, and Pm are constants. The large o is O (1) since it takes a given amount of time and the number of generations and population size are constant. This holds true for mutation functions, crossover functions, and fitness functions as long as they take a known amount of time. Considering that the convolutional neural network is used as the fitness function in GA, the complexity of the proposed method depends a lot on the computational complexity of the 3D-CNN network. The number of operations in each layer should be calculated initially since the complexity of CNN relies on various levels. Then, all the complexity of CNN relies on various levels. Then, all of these operations must be added, and time complexity must be expressed as a function of the input (and probably the number of layers). The complexity of a 1-D convolutional layer O (k * n * d) and a 1-D convolution is the sum of the rowwise dot products of a filter WࢠRk×d with a region matrix AࢠRk×d, where k is the length of the filter and d is the depth dimension (e.g., dimensionality of word embedding space), and finally, at the layer level, we apply the filter over the input n−k+1 times (where n is the length of the input), let us say n times since n>>k. This gives us a final complexity of O (n * k * d). However, the computational complexity of 3-D convolution for images with N * M * K dimensions and N * M * K filter sizes is equivalent to O (NMKnmk). Therefore, the computational complexity of 3-D convolution for pictures with N * M * K dimensions and N * M * K filter sizes is equivalent to O (NMKnmk). As a result, the complexity in the 3D-CNN network, which has four 3-D CNN layers, will be O (knmKNM^Num. Layers). Regarding the proposed method in this article, the complexity of calculations will be equal to  Table XV, the complexity of all methods (BS_UQ, LSI_BS, SLN_BS, CNNeGA) and also the complexity of the proposed method with 3D-CNN with Att. Layers and SNN are displayed. The complexity of the proposed method is also evident as a result of the changed structure of the 3D-CNN model (adding path and attention layers). Therefore, the complexity of 3D-CNN_Att will be the number of additional layers (L) multiplied by the complexity of CNNeGA. Although 3D-CNN SNN has the same structure as 3D-CNN, it is less complex due to the change in the nature of the input data and fewer parameters. The complexity of 3D-CNN_SNN based on the run-time option is almost half of CNNeGA.

V. CONCLUSION
BS is an effective way to reduce the size of hyperspectral data and to overcome the curse of dimensionality problem in ground object classification. This study proposes a supervised BS framework based on a combination of convolutional neural networks and the GA algorithm in this context. Convolutional networks and metaheuristic search techniques are used in the fundamental concept and proposed architecture of the BS for HSI. A subset of bands that accurately reflect the original bands and have little redundancy can be chosen using this framework. The CNNeGA approach was fairly resilient to different noise bands. A group of bands corresponding to a higher level of classification accuracy was selected. The bands with the highest correlation to the selected bands were automatically disregarded in the suggested method, which is an iterative procedure. The suggested approach could be thought of as a class-based approach in general. With this strategy, a BS criterion that suits the requirements of each class was possible. In other techniques, bands are chosen depending on the statistical characteristics of the dataset. The proposed method in this work has the advantage of being classification-based and uses a linear classification to rank and choose bands. The proposed method (CNNeGA) shows between 6% and 21% performance improvements compared to its competitors in the experimental data reviewed in this study. According to the experimental data, the band subset chosen by the CNNeGA approach was more successful at classification and had lower performance and correlation than the band subsets chosen by other BS methods. Testing on different types of datasets has shown that the proposed method is more stable than competing methods in terms of scale, number of samples, and multiplicity of dataset classes. Finally, to improve the performance of the proposed neural network model (3D-CNN) integrated with GA, the number of attention layers was increased, and the structure of the model was changed to SNNs. The accuracy of classification has increased by approximately 1% to 3% by changing the network structure and model. In the computational complexity index, the complexity time has been reduced by more than 50%. The increase in classification accuracy and processing speed (reduction of computational complexity), especially in changing the model to spike networks, was significant and promising for future work. According to the obtained results, we intend to investigate the application of the SNN model for the problem of BS and classification of HSIs in an unsupervised manner.