A Positive Feedback Spatial-Spectral Correlation Network Based on Spectral Slice for Hyperspectral Image Classification

The emergence of convolutional neural networks (CNNs) has greatly promoted the development of hyperspectral image classification (HSIC). However, some serious problems are the lack of label samples in hyperspectral images (HSIs), and the spectral characteristics of different objects in HSIs are sometimes similar among classes. These problems hinder the improvement of HSIC performance. To this end, in this article, a positive feedback spatial-spectral correlation network based on spectral interclass slicing (PFSSC_SICS) is proposed. First, a spectral interclass slicing (SICS) strategy is designed, which can remove similar spectral signature between classes and reduce the impact of similar spectral signature of different classes on HSIC performance. Second, in order to solve the impact of the lack of labeled samples on HSIC, a positive feedback (PF) mechanism and a spatial-spectral correlation (SSC) module are introduced to extract deeper and more features. Finally, the experimental results show that the classification performance of the PFSSC_SICS is far exceed than that of some state-of-the-art methods.

As one of the most important applications, hyperspectral image classification (HSIC) has gradually become a research hotspot. In the early days, considering that HSIs contain rich spectral information, Pal [12] proposed a pixel-by-pixel classification method based on support vector machine (SVM). Chen et al. [13] proposed a sparse representation classification (SRC) method. Although these methods are relatively simple, they mainly focus on spectral dimension features. In order to improve the classification performance, they tend to deeply mine the high-dimensional spectral information, which makes the number of model parameters increases sharply, causing the model to fail to converge and the problem of over-fitting [14], [15], [16], [17], [18]. This makes it difficult to learn an efficient classification model from high-dimensional data under small samples, which is the so-called Hughes phenomenon [19], [20]. In addition, the spectral information of HSI also has the phenomenon of "the same substance with different spectrum" and "different substance with the same spectrum," so it is hard to obtain a good classification performance only relying on spectral information. To solve this problem, some works introduced spatial texture features into HSIC and proposed the combined classification methods of spatial-spectral features [21], [22], [23]. However, both the classification methods based on spectral information and those based on the spatial and spectral information all rely on feature extraction [24], [25], [26], [27], [28], [29], [30], [31]. It is difficult for traditional methods to extract deep features from HSI datasets with limited samples, and the emergence of convolutional neural network (CNN) has brought the classification of HSI into a new era. CNN has excellent feature extraction ability [22] and can learn features autonomously for different data. Therefore, a series of CNN-based HSIC methods has been proposed [32]. First, in order to use spectral information for classification, a 1-D-CNN [33] was proposed. However, due to the large amount of redundant information in the original HSI data, it is difficult to obtain satisfactory results by using only spectral information to classify them. Therefore, a 2-D-CNN attempts to extract depth features from space [34] to make up for the shortcomings of 1-D-CNN. After that, in order to directly extract spatialspectral features, Ying et al. [35] constructed a 3-D-CNN with 3-D convolution kernel. This method of directly extracting spatial spectrum features can better correlate spatial spectrum features to improve the classification accuracy of HSI. In order to extract the deep features of the image, the traditional CNNs build networks by simply stacking convolution kernels. This method will sharply increase the number of training This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ parameters, thus resulting in the burden of hardware equipment and even the inability of the network to converge. To address these problems, He et al. [18] proposed residual networks (ResNets). ResNet solves the problem of deep network training degradation through residual connection learning, which improves network performance. By following, Zhong et al. [36] proposed the spectral-spatial residual network (SSRN). SSRN is a 3-D-CNN based on residual connection, which can better extract spatial and spectral information and achieve good classification results. After ResNet, the emergence of densely connected convolutional networks (DenseNet) [37], [38], [39], [40], [41] opened up new paths for researchers.
Furthermore, a series of attention mechanism strategies is proposed [42], [43], [44], [45], [46], [47], [48]. It improves the performance of HSIC by enhancing features of interest and suppressing unimportant features. For example, a dualbranch multiattention (DBMA) [49] network was proposed. It adopts a dual-branch structure, extracting the spatial features and spectral features of HSIs, respectively, and then using the attention mechanism to "dynamically weight" the features, thus improving the classification performance. Similarly, a double-branch dual-attention (DBDA) network proposed by Li et al. [50] is also a dual-branch network. DBDA also first extracts features through a dual-branch structure, then uses an attention mechanism to focus on features of interest, and finally performs classification. However, there are still some difficulties for the HSIC task. First, the direct extraction of high-dimensional spectral features will inevitably increase the amount of network training parameters, making it difficult for the network to converge; second, there are local interclass similarities in high-dimensional spectral features [51], and these redundant features will affect the classification, making classification more difficult; and the third is the lack of HSI label samples.
To overcome these problems, this article proposes a positive feedback spatial-spectral correlation network based on spectral interclass slicing (PFSSC_SICS). First, the spectral interclass slicing (SICS) strategy is designed to remove redundant information in spectral features and reduce the spectral dimension. Second, the spatial features are extracted with a spatial positive feedback (Spa_PF) correction mechanism of the spatial branch. Then, the spectral positive feedback (Spe_PF) compensation mechanism and the spatial-spectral correlation (SSC) module are combined on the spectral branch to generate associated spatial-spectral information. Next, the associated spatial-spectral features are fused with the features extracted by the spatial branch. Finally, fully connected capsule layers are used for classification.
The main contributions of this article include the following three parts.
1) In order to solve the problem of effectively extracting high-dimensional spectral features and the disturbance of local similarity between classes in spectral information on classification, a strategy of dimension reduction based on interclass slicing (SICS) is proposed. SICS can find some bands with similar spectral characteristics between classes in HSI and remove these bands by a slicing strategy. This method can effectively remove spectral redundancy and retain more discriminative spectral features, which is beneficial to HSIC. 2) In order to alleviate the impact of lack of training samples, a positive feedback (PF) mechanism is introduced to the PFSSC_SICS, which can extract more and deeper features to overcome the shortage of label samples. The PF mechanism is divided into Spa_PF and Spe_PF, which extract spatial-spectral features, respectively. By constantly adjusting and compensating the long-distance features with the short-distance features, more abundant and refined features can be obtained.
3) The SSC module is designed in the network. SSC can closely link the spatial and spectral features and make the spatial and spectral information corresponds one to one, which is more conducive to the fusion of the subsequent spatial spectrum branches. And a multiscale self-weighting strategy is designed in SSC, which not only enlarges the receptive field, but also obtains abundant weighted features. In order to make the network lightweight, group convolution is also introduced in SSC to reduce network parameters. The rest of this article is organized as follows. Section II introduces the overall architecture of PFSSC_SICS and the four modules SICS, Spa_PF, Spe_PF, and SSC. Section III discusses the hyperspectral dataset, experimental parameter settings, the effectiveness of the proposed strategy, and the validation of the effectiveness of PFSSC_SICS. Section IV summarizes the work of this article.

II. METHODOLOGY
To improve the performance of HSIs, this article proposes a PFSSC_SICS network. The PFSSC_SICS mainly includes four parts: SICS, Spa_PF, Spe_PF, and SSC. This section will describe PFSSC_SICS, SICS, Spa_PF, Spe_PF, and SSC in detail.

A. Overall Framework of PFSSC_SICS
The overall framework of the PFSSC_SICS is shown in Fig. 1. The PFSSC_SICS is mainly divided into two parts. In the first part, the SICS designed in this article is used to process the raw spectral information of the HSI dataset. The redundant spectral features between different classes are removed, and the indexes of the remaining important spectral features are saved. In the second part, the spectral and spatial features of HSI are extracted by using the dual-branch network structure. A Spa_PF mechanism is proposed to extract spatial features. The Spa_PF weights the near-field spatial features and delivers them to the far-field feature extraction for feature correction. On the spectral branch, the Spe_PF mechanism and the SSC module are combined to extract rich spectral information on HSI. In particular, the spectral information extracted on the spectral branch also contains spatial information. This way of associating spectral features with spatial features facilitates subsequent feature fusion. Finally, the extracted features go through global averaging pooling (GAP) and compression operations, and then go through a fully connected layer and a Softmax layer for classification. (The PFSSC_SICS can be divided into four parts. The first part uses the SICS to process redundant information. Part2 executes the Spe_PF and the SSC to extract SSC features. Part3 using the Spa_PF extracts spatial features. Part4 shows the classification. In this figure, x ∈ R h×w×b represents the HSI raw data. X ∈ R h×w×(b−n) represents the input data processed by SICS. x ′ ∈ R 9×9×(b−n),1 indicates the patch input size. x out1 ∈ R 9×9, 64 , x 0 ∈ R 9×9,64 , and x out2 ∈ R 9×9,64 are the outputs of the Spa_PF, SSC, and Spe_PF, in turn.
represents the multiplication, represents the addition, and represents the connection operation.).

B. SICS Mechanism
In HSI, the spectral information of different object categories has local similarity between the categories, which is also called "different substance with the same spectrum." This is because the spectral curves of ground objects will be affected by their own water content, density, relative angle to the sun, and even different environments. As shown in Fig. 2, they are the local spectrograms of the four datasets, Indian Pines (IN), Salinas Valley (SV), Kennedy Space Center (KSC), and Pavia University (UP), respectively. Obviously, the spectral information of different categories in these black boxes tends to coincide. This overlapping spectral information can easily cause network misclassification and interfere with the classification process. Although 3-D-CNN can associate spectral features with spatial features, which alleviates this problem to a certain extent, the performance of HSIC is still limited. Therefore, the SICS strategy is proposed in the PFSSC_SICS, which can effectively solve this problem.
In general, the SICS finds and removes those bands with local interclass similarity through operations such as slicing, normalizing, and calculating the coefficient of variation. Specifically, the 3-D HSI data with size w × h × b are first reshaped into a 2-D matrix with size c × b, where w, h, and b correspond to the width, height, and the number of bands of the HSI, respectively. Then, the 2-D matrix is sliced column by column to obtain the set θ , in which each slice corresponds to a band, and each element in each slice corresponds to the spectral information of different objects in this band. Next, each of the obtained slices is normalized to obtain θ ′ , which is to avoid the large difference between the averages of the data in different bands and then affect the subsequent comparison. Then, the coefficient of variation for each slice is calculated and stored in set ϕ, which is equivalent to a 1-D vector. The dispersion of spectral information in different bands is determined according to the coefficient of variation. The smaller the coefficient of variation is, the more aggregated the information is, and vice versa. At the same time, the coefficient of variation can eliminate the influence of the mean value again. Arrange the elements in ϕ from small to large, extract their corresponding indexes, and save them in the set ϕ ′ . The elements in ϕ are the coefficients of variation corresponding to each band, and sorting them is actually sorting the dispersion of spectral information of bands. Then, slice ϕ ′ to remove the first n indexes and retain the last b − n indexes. Finally, the remaining indexes are sorted in the ascending order, and the output of these indexes corresponds to the bands with the original spectral information. In particular, slicing the original HSI data according to the output index obtained by SICS can remove n bands with local interclass similarity. The size of n here depends on the ability of the network to acquire spectral features and the distribution of spectral information of objects in different datasets.
The calculation of SICS can be expressed as Here, reshape(·) is the shaping function, which converts the original 3-D data into 2-D data. c represents the size of the space dimension flattened by SICS, and c = h × w. slice −1 (·) is a column-by-column slicing function. In (2), θ contains b different elements ∂, and each ∂ element is a 1-D vector containing h × w samples. In (3), θ is normalized to get θ ′ . In particular, ∂ i represents the ith element in θ . Therefore, θ ′ also contains b different elements A. In (4), ϕ is obtained by calculating the coefficient of variation for the elements in θ ′ one by one, where A i represents the ith element in θ ′ , Std(·) represents the standard deviation function, and Mean(·) is the average operation. Similarly, the set ϕ also contains b elements. In (5), the arg sort(·) function sorts the coefficient of variation in ϕ in the ascending order and extracts its index to obtain ϕ ′ . In (6), ϕ ′ n is the set consisting of the first n elements of ϕ ′ , and ϕ ′ b−n is obtained by cutting ϕ ′ n by ϕ ′ . Finally, in (7), the elements in ϕ ′ b−n are sorted in the ascending order to obtain the final result. The implementation details of the SICS module are described in Algorithm 1.
SICS eliminates the influence of the high similarity between classes of spectral features on classification and is conducive to improving the classification performance of HSIs. This module can be applied to different HSI datasets, and as a plug-and-play module, it provides favorable conditions for subsequent research. In particular, SICS not only removes spectral redundant features, but also reduces the dimension of spectral dimension to some extent. This enables more efficient extraction of high-dimensional features and reduces computing costs.

C. PF Mechanism
For the classification of HSIs, feature extraction is an essential link. The classification performance of a network depends largely on its feature extraction ability. Therefore, a PF mechanism is designed in the PFSSC_SICS to enhance the feature extraction ability of the network.

Algorithm 1 FImplementation Details of the SICS Module
Input: HSI raw data x ∈ R h×w×b . 1: Dimension reduction through the reshape(·) function. Convert the input 3D data into 2D data, and the result is recorded as X ∈ R c×b (c = h × w). 2: Slice X column by column by the slice −1 (·) operation, and obtain the set θ . 3: Perform a normalization operation on θ and the result is recorded as θ ′ . 4: Calculate the coefficient of variation for each element in θ ′ and represent the result as set ϕ. 5: Arrange the elements in ϕ in ascending order, and return the index of all element positions, which is marked as ϕ ′ . 6: Perform a column-wise slice operation on ϕ ′ . Remove the first n(1 ≤ n< b) values in ϕ ′ and denote the result as ϕ ′ b−n . 7: Arrange the elements in ϕ ′ b−n in descending order. Output: The output of SICS is to keep band indices with redundant bands removed. The result is recorded as φ.
In particular, PFSSC_SICS first uses large-scale convolution kernel to extract high-resolution weighted feature and then uses the obtained high-resolution weighted feature to enhance low-resolution feature, so that the feature output of the upper level can be used to adjust the feature extraction of the lower level, that is, to implement PF adjustment. We divide the PF mechanism into Spa_PF correction mechanism and Spe_PF compensation mechanism. Spa_ PF is only used to extract HSI spatial features. In order to better integrate the spatial and spectral branches and enable the network to fully mine the spatial spectrum information of HSI. Therefore, the proposed Spe_PF is dedicated to the extraction of spectral features. Although Spe_PF mainly focuses on the extraction of spectral features, Spe_PF still retains spatial information when extracting spectral features. In this way, the subsequent SSC can better correlate the spatial spectrum features. In this way, the features of spectral branches and the features extracted by Spa_PF can be better fused in the later feature fusion stage. Thus, although Spa_PF and Spe_PF have different functions, they have some connections. The structure of PF mechanism is shown in Fig. 3.
The Spa_PF contains three types of components: Start Conv Block, Pointwise Conv Block, and Conv Block. First, Start Conv Block is utilized to compress the spectral dimension to avoid introducing too many parameters in the subsequent spatial feature extraction and SSC work. This method also preserves the original spatial features to the greatest extent. Next, a large number of Pointwise Conv Blocks are adopted in the Spa_PF to extract spatial features. The pointwise convolution of 1 × 1 and the activation function are combined in the Pointwise Conv Block to enhance the nonlinear representation ability of the network. Moreover, different number of channels in point-by-point convolution is used to strengthen the information interaction of channels and establish longer distance channel dependencies. Using point-by-point convolution can avoid too many training parameters. Then, the Pointwise Conv Block features are corrected and fused by using the weighted features extracted by the Conv Block. Here, the Conv Block uses the convolution kernel with size 3 × 3 to extract high-resolution features and uses the sigmiod function for weighting, and finally uses them to perform PF correction on the low-resolution spatial features extracted by the Pointwise Conv Block. Finally, the Spa_PF fuses and weights the feature x ′ out after Conv Block correction with the original feature x ′ in to correct feature x ′ out . This method of fusion with the original features for correction guarantees the integrity of the extracted spatial features to the maximum extent and compensates for the information loss in the feature extraction process. The feature extraction of Spa_PF can be expressed as Among them, F(·), G(·), R(·), and r (·) represent different component operations, respectively. F(·) is the Start Conv Block operation. G(·) is the Pointwise Conv Block operations. R(·) is the Conv Block operation. r (·) is the fusion operation in the final stage, including three consecutive operations of addition, Relu, and Sigmiod.
For spectral feature extraction, the Spe_PF mechanism is designed in this article. The structure of the Spe_PF mechanism is shown on the right of Fig. 3. The Spe_PF consists of four types of components: the start layer, the top layer, the middle layer, and the bottom layer. In order to simplify the model, the structure of the starting layer, the top layer, and the middle layer components are set to be the same, and the 3-D convolution kernel of 1 × 1 × 7 is used for feature extraction. First, the initial features of the image are extracted from the start layer, and then, the features are extracted from the top layer1, the middle layer1, and the bottom layer1. In particular, top layer1 uses a convolution kernel with a larger receptive field to extract different features and then connects with middle layer1 for feature compensation; then, the feature extraction is performed from the compensated features by the middle layer2, and the extracted features are connected with the finer features extracted by the bottom layer2 to further perform feature compensation; finally, the bottom layer3 performs feature extraction on the final features obtained by bottom layer2. The feature extraction process of Spe_PF can be expressed as x Among them, x ′ and x 0 represent the input and output of the Spe_PF, respectively. Both f (·) and f ′ (·) are composite functions of 3-D convolution, batch normalization, and Relu activation function. The difference is that f (·) uses a convolution kernel of size 1 × 1 × 7, while f ′ (·) uses a convolution kernel of size 1 × 1 × 9. || represents a connection operation.

D. SSC Module
The fusion of spatial and spectral features is important. Most of the commonly used methods are to directly fuse the Fig. 3. Structure of PF mechanism. (The left is Spa_PF, and the right is Spe_PF. represents the multiplication, represents the addition, and represents the connection operation.) extracted features, which cannot well correlate the spatialspectral information. Therefore, an SSC module is proposed. It associates spatial features with spectral features that all extracted by the Spe_PF and then fuses them with the spatial features extracted by the spatial branch. Inspired by selfattention [46] and squeeze-and-excitation (SE) network [52], a multiscale self-weighting mechanism is constructed in the SSC module. Different from the linear mapping method in which the SE network directly weights features and selfattention, SSC utilizes multiscale convolution to extract features of different resolutions then adopts SE to weight different features separately, and finally self-weights with unweighted features. In this way, richer and finer SSC features can be obtained. In addition, in order to reduce the amount of parameters of the network model, group convolution is also introduced in the SSC.
The structure of SSC is shown in Fig. 4. Specifically, SSC first uses the Association Block to associate the spacespectrum features extracted by Spe_PF. Next, Multiscale Group Conv Block is adopted to extract spatial-spectral features of different scales and perform SE weighting on these features. Then, multiply the extracted multiscale spatialspectral features with the SE-weighted spatial-spectral features to obtain self-weighted spatial-spectral features. By following, Softmax is adopted to recalibrate the self-weighted space spectral feature and then multiply them with the multiscale space spectral feature. Finally, the Pointwise Conv Block is utilized to fuse and extract the autocorrelated spatial-spectral features. The related process of the SSC can be expressed as In (13), x o is the feature extracted by Spe_PF, re(·) represents that the correlation module associates the spatial spectrum information in x o , and x re is the feature extracted by re(·). In (14), MGC(·) is the Multiscale Group Conv Block operation, and x MGC is the extracted spatial-spectral features after correlation. In (15), SE(·) represents the SE weighting operation, and x SE is the SE weighting feature. In (16), x RE is the autocorrelation feature. Finally, in (17), G(·) is the Pointwise Conv Block operation, and x out represents the final output spatial-spectral features after correlation. ω re , ω MGC , ω SE , and ω RE are the trainable weights corresponding to re(·), MGC(·), SE(·), and G(·), respectively.

III. EXPERIMENTAL ANALYSIS
First, some datasets involved in the experiment are introduced in this section. The experimental setup is then described. Then, in order to prove the validity of the PFSSC_SICS, some ablation experiments are performed to analyze the performance of each module. Finally, the whole network is quantitatively analyzed. All experiments are performed on a hardware platform with AMD 75800H with Radeon Graphics CPU, NVIDIA GeForce RTX 3070 GPU. The experimental software platform is the Pycharm operating system with CUDA11.2, Pytorch 1.10.0, and Python 3.7.4. To eliminate the randomness of the experimental results, all experiments were performed ten times and the experimental results were averaged.

A. HSI Datasets
All the experiments were carried out on four commonly used datasets, including the IN, SV, KSC, and UP. The performance of some networks is quantitatively evaluated by three indicators: overall accuracy (OA), average accuracy (AA), and Kappa [53]. Their calculation process can be expressed as where M is the total number of samples, and m i, j is the number of samples classified by category j as category i. N is the number of categories. The IN retained 200 bands for research. As shown in Table I, IN includes 10 249 pixels and 16 ground object categories, and it is the earliest public dataset for HSIC. The SV dataset, like the IN dataset, was acquired by AVIRIS. As shown in Table II, SV dataset includes 204 bands, 16 categories, and a total of 54 129 pixels for classification. The KSC dataset was also obtained using the AVIRIS and contains 176 bands. As shown in Table III, there are a total of 5211 pixels of KSC, including 13 categories. The UP dataset was obtained by continuously imaging on 115 bands, but in reality, only 103 spectral bands uncontaminated by noise were used for experiments. As shown in Table IV, UP includes 42 776 pixels and nine ground object categories.

B. Experimental Setup
The PFSSC_SICS model uses the Adam optimizer, and the experimental batch size and epochs are set to 64 and 200, respectively. In addition, for IN, SV, KSC, and UP datasets, the training samples are 3%, 0.5%, 5%, and 0.5% of the total samples in the dataset, respectively.
In this article, the input of the PFSSC_SICS is to randomly select a sample in the HSI as the center pixel and then use it as the center to segment it into different patches as input. However, CNNs are particularly sensitive to the spatial  size of the input data. Therefore, the size of the input patch will directly affect the classification accuracy of the network. In addition, the learning rate affects how fast the network converges. And too large or too small learning rate may cause the problem of local optimal solution in the network. Therefore, this article explores the relationship between learning rate and classification accuracy under spatial patches with different sizes. As shown in Fig. 5, there are similar results on all four datasets. Specifically, only from the perspective of patch size, the classification accuracy first increases and then decreases as the patch increases. Finally, the higher classification accuracies are all distributed around the patch of 9 × 9. From the perspective of learning rate, the classification accuracy first increases and then decreases with the decrease of learning. Finally, the higher classification accuracies are all distributed around a learning rate of 0.0005. Hence, the patch size of the PFSSC_SICS is finally set to 9×9, and the learning rate is set to 0.0005.
In this section, in order to achieve the best classification performance of the model, the number n of redundant spectral bands cut out by the SICS module is explored. Due to the different redundancy of spectral information in the four datasets, the value of n is explored on four classical datasets. For the IN dataset, set n to {20, 30, 40, 50, 60, 70, 80, 90, 100}, respectively; the values of n on the SV data are {5, 10, 15, 20, 25, 30, 35, 40, 45}, respectively; for the KSC and UP datasets, the value of n is set to {1, 2, 3, 4, 5, 6, 7, 8, 9}, respectively. For each dataset, the experimental results of the influence of n on the classification performance are shown in Fig. 6. It can be seen that for these datasets, the influence of n on the classification accuracy shows a trend of first increase and then decrease. The classification accuracy gradually improves at the beginning because the proposed SICS removes similar spectral information that is difficult for the network to distinguish in the initial stage. As the value of n increases gradually, the similar spectral information gradually decreases, and the classification accuracy shows a downward trend after reaching the peak. This is because when the classification performance of HSIs reaches the peak at a given n, the spectral information that is difficult to distinguish has been removed, and if the spectral information is further removed by using SICS, the information beneficial to classification will be removed, resulting in the reduction of classification accuracy. According to the experimental results in Fig. 6, in order to achieve the best classification performance of the model, n is set to 60 for the IN, 25 for the SV, and 5 in both the KSC and the UP.  Fig. 7. Taking the IN as an example, for the PFSSC_SICS with the SICS strategy, the OA is nearly one percentage point higher than that without the SICS strategy. For other three datasets, the classification performance of PFSSC_SICS network with SICS is also much higher that that without SICS. This proves that the SICS strategy has made a great contribution to the improvement of classification performance. In addition, it also shows that SICS strategy is applicable to different datasets and has good generalization.
Using SICS strategy to remove spectral redundant information can not only greatly improve the classification performance, but also effectively reduce the spectral dimension, thus reducing the amount of parameters.  influence of SICS on the number of parameters. It can be seen that for these datasets, compare with that without SICS strategy, the amount of network parameters obtained by the network using the SICS strategy is reduced. This provides a powerful clue for the research of lightweight networks for HSIC.  In particular, it can be seen from Table V that the influence of SICS on parameter quantities varies greatly in different datasets. This is because SICS sets different n values on different datasets. It follows the rule that the larger the n value is, the less the network parameters will be. However, when the n value is too large, it will affect the classification accuracy of the network. Therefore, SICS can only reduce the network parameters in a limited range.
2) Performance Analysis of PF: To verify the validity of PF, the PF-related modules are removed from the proposed PFSSC_SICS network, and the similar bands of the spectrum are cut off using the SICS strategy, and then, the SSC module performs feature engineering directly. Finally, the features are sent to Softmax for classification. Compared with the classification results of the complete PFSSC_SICS, the experimental results are shown in Fig. 8. Obviously, the PF is beneficial to improve the performance of PFSSC_SICS. The PF mechanism can extract more features that are conducive to classification, which greatly improves the classification accuracy of the network.
3) Performance Analysis of SSC: For the SSC module, some ablation experiments were also performed to verify its effectiveness. The classification results obtained by the PFSSC_SICS network with and without the SSC module are shown in Fig. 9. Obviously, the classification accuracy of the PFSSC_SICS network with the SSC module has been improved on the four datasets. This proves that SSC can better obtain space and spectral joint features and closely associate these information to improve classification performance than the way of directly fusing space and spectral features.

D. Verification of the Performance of the PFSSC_SICS
To verify the classification performance of the PFSSC_ SICS, this article compares PFSSC_SICS with eight methods. These comparison methods almost cover the mainstream ideas of HSIC at present, including SVM that only extracts spectral information for pixel-by-pixel classification; a deep convolution classification method going deeper with contextual CNN (GDCNN) [54] based on context; fast densely connected network fast dense spectral-spatial convolution network (FDSSC) [55] combining spatial-spectral information; the SSRN that uses the residual connection method to simultaneously extract spatial-spectral information for classification; it includes SVM that only extracts spectral information for pixelby-pixel classification; the depth convolution classification method GDCNN based on context; FDSSC, a fast and dense connection network combining spatial spectrum information; SSRN that uses residual connection to extract spatial and spectral information for classification; a feedback expansion convolution network (FECNet) [56] using dilation convolution to extract features; and two attention networks DBMA and DBDA and a ResNet attention-based adaptive spectral-spatial kernel ResNet (A2S2KResNet) [57] based on improved attention. Tables VI-IX show the classification performance of all methods on four different datasets.    the PFSSC_SICS can better distinguish each category of IN. This is because of PFSSC_SICS first uses SICS to remove a large amount of redundant information in HSI and avoid mutual interference between different categories. In addition, on the three indicators of OA, AA, and KAPPA, PFSSC_SICS also has great advantages, which benefit from PFSSC_SICS extracts rich spatial spectrum correlation information and makes full use of this information for classification, and has great advantages in OA, AA, and KAPPA. Similarly, the classification performance of the proposed PFSSC_SICS method on other three datasets is also far superior to other methods. In particular, as shown in Table IX, for the UP dataset with complex and dense ground object distribution, the classification performance of PFSSC_SICS still completely outperforms other comparison methods. This provides more possibilities for related applications such as urban planning.
To visually prove the effectiveness of PFSSC_SICS, the classification results of all methods on different datasets are visualized, as shown in Figs. 10-13. It can be seen that compared with the PFSSC_SICS, there are more misclassifications in the classification maps of other methods, the crosscontamination between different categories is serious, and a lot of noise is generated in the classification maps. This is particularly evident in the classification map of SVM. This is due to the interference of spectral similarity information and the insufficient ability of the network to extract features, resulting in poor classification results. For the classification maps obtained by the PFSSC_SICS on the four datasets, each map has clear category boundaries and there are few misclassifications. This proves that the proposed PFSSC_SICS method has excellent classification performance. In addition, PFSSC_SICS has achieved the most stable classification accuracy on different datasets. This proves that the proposed PFSSC_SICS method has excellent generalization and can adapt to datasets of different scenarios.
Figs. 14 and 15 show the relationship between training loss and epochs, and the relationship between training accuracy and epochs on the IN and SV using different methods, respectively. To demonstrate the convergence of the PFSSC_SICS,  we compared PFSSC_SICS with two other methods that performed well on IN and SV, including FDSSC and DBDA. In general, the loss curve and accuracy curve of PFSSC_SICS are relatively smoother on both datasets. Specifically, it can be seen from Fig. 14 that the training loss values of the three methods show an overall downward trend as the training progresses. However, compared to the other two methods, the training loss of PFSSC_SICS decreases faster and more stably. This shows that PFSSC_SICS has a good convergence and can converge quickly and stably during the training process. In addition, Fig. 15 shows the relationship between training accuracy and epochs. It can be seen from Fig. 15 that the training accuracy of the method proposed in this article increases faster and is more stable than other methods. In summary, the PFSSC_SICS network can converge quickly and stably during the training process and has strong convergence and stability.
The size of the training samples determines the amount of prior information used for classification. Generally speaking, the accuracy of the network will improve as the number of samples increases. However, for methods with poor performance, too many training samples will only increase the running time without a large improvement in accuracy. Therefore, in practical engineering, it is necessary to achieve high-precision classification in the context of requiring the use of small samples. In order to explore the classification ability   of PFSSC_SICS under small samples, the performance of all methods on four datasets is compared to verify the excellent stability of PFSSC_SICS. In Fig. 16, the PFSSC_SICS optimal results were achieved in any sample scale context across all datasets, especially in 1% of the training samples; the advantage of PFSSC_SICS is more prominent. This advantage just meets the need for good classification results in the case of lack of HSI label samples. In addition, the PFSSC_SICS can achieve optimal classification results under different training proportions. With the increase of samples, the classification performance of the network improves steadily, which also proves the robust stability of PFSSC_SICS.

IV. CONCLUSION
This article proposes a new network for HSIC, namely, PFSSC_SICS. PFSSC_SICS contains an SIC strategy for removing similar spectral bands of different category, a PF mechanism for extracting spatial-spectral features, and an SSC module for spatial-spectral feature association. Finally, the effectiveness of the proposed module is proven by a large number of experiments. PFSSC_SICS solves the problem that the local similarity of spectral features adversely affects the classification of HSIs and effectively alleviates the impact of the lack of sample labels on HSIC. In particular, PFSSC_SICS shows obvious advantages in the case of small samples and can obtain clear ground object boundaries on different datasets, which provides strong guarantee for the practical applications related to dense urban planning. From the perspectives of classification performance, generalization, convergence, and stability of the network, this article proves that PFSSC_SICS is more advanced than the current mainstream methods.