Hyperspectral Classification of Two-Branch Joint Networks Based on Gaussian Pyramid Multiscale and Wavelet Transform

Due to its high spectral resolution, Hyperspectral remote sensing data can provide practically continuous spectral curves for target objects and fully reflect the detailed characteristics of ground objects. However, the data redundancy generated by a large number of bands poses challenges to the feature extraction of target objects. Hence, the spectral data are processed by wavelet transform to reduce the influence of intra-class spectral variation on classification. The multi-scale image data were collected by Gaussian pyramid multi-scale transformation. Then the multi-scale spatial information was captured through the feature extraction network to improve the classification accuracy. We propose a dual-branch feature extraction network. The first branch adopts Gaussian pyramid multi-scale transformation to obtain multi-scale images and then applies the feature extraction module to gain multi-scale spatial features. The second branch employs wavelet transform to process spectral data to reduce the impact of abnormal spectral data on classification and then applies a feature extraction module to acquire spectral features. Finally, the spectral and spatial features obtained by the two branches are fused in the full connection layer to achieve an accurate classification. This method can effectively capture the fine features of hyperspectral images by combining spectral features and spatial features of different scales. Simultaneously, it can capture the interaction between spectral and spatial features by combining spatial and spectral features through joint learning. Experimental results on hyperspectral image datasets indicate that the method outperforms other traditional deep learning-based and other advanced classifiers.


I. INTRODUCTION
Hyperspectral image (HSI) is a three-dimensional image captured by an aerospace vehicle carrying a hyperspectral imager, and each pixel in the image contains hundreds of different wavelengths of reflection information, which makes it suitable for many practical applications such as military target detection, mineral exploration, environmental monitoring, and agricultural production. In recent years, hyperspectral remote sensing techniques have received great attention in various applications of Earth observation [1]- [6]. Hyperspectral imagery (HSI) provides hundreds of continuous narrow The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . spectral bands [7]- [10], which can distinguish different substances more accurately than conventional panchromatic and multispectral remote sensing images. With its high spectral resolution, HSI is uniquely advantageous for finer classification [11], [12], as it can detect subtle spectral features that cannot be resolved by conventional images. In the early stages of HSI classification, many machine learning-based methods have been applied, such as the nearest neighbor, decision tree, and linear function algorithms. Among these methods, k-nearest neighbor (K-NN) [13] can be considered as the simplest classifier, which uses the nearest neighbor algorithm and is the simplest classifier that employs the Euclidean distance to measure the similarity between the test and training samples. Support vector machine (SVM) [14] hyperspace images (HSI) can classify data in high-dimensional space by dividing hyperplanes based on different spatial structures and spectral features of different materials. Some popular classifiers, such as support vector machine (SVM) [14] and correlation vector machines, have been widely used for HSI analysis.
Recently, sparse representation (SR) has been applied to HSI classification as a powerful image processing tool [15]. Sparse representation relies on the assumption that pixels in the same class should have similar spectral features. Test samples can be linearly represented by a small number of training samples from the same class. However, traditional SR methods consider only the spectral information of the test pixels and ignore the spatial neighbors around the test pixels. Based on the assumption that pixels in a local region usually have similar spectral materials and features, Chen [16] proposed a JSR-based classification (JSRC) algorithm, which considers spatial information by jointly representing pixels in a local window to obtain better classification performance. In [17] and [18], some space-based classifiers, such as kernel-based SR and l2-norm regularized sparse subspace clustering, were proposed, and these classifiers showed better performance.
However, these local window-based methods have a single limitation. A common limitation of the methods is that the narrow windows may include different classes of pixels. In other words, pixels in the edge regions are appropriate for small-sized windows, while large-sized windows are suitable for smooth regions. To solve this problem, Fang [19] proposed the multi-scale adaptive SR (MASR) method, which obtained better performance. Other advanced tools, such as the adaptive mean shift analysis method [20], are also effective ways to solve this problem.
The main contributions of this paper are as follows: 1) Compared with the traditional feature extraction method, the multi-scale spatial feature extraction method proposed in this paper pays more attention to the detailed information and local information of HSI in the model. And the local information can effectively preserve the local structure inherited from the original data, reducing the loss of key information. The spectral feature extraction method proposed in this paper uses wavelet transform to process spectral data, which can reduce the impact of abnormal data on the overall classification results. 2) This paper uses the method of double-branch feature extraction and fusion. The spatial features and spectral features are extracted by different feature extraction networks, and then the feature fusion is performed to obtain the classification results.

II. RELATED WORK
HSI not only contains abundant ground image information but also has rich spectral information. Therefore, various spatial-spectral classification methods have been developed [21]- [23]. For example, Kang et al. proposed a spectral-spatial classification-based method of edgepreserving filtering (EPF) [24].EPF is a hot research topic in the field of image processing and computer vision in recent years. In recent years, EPF is a research hotspot in image processing and computer vision, which not only holds the function of image smoothing but also can import spatial structure information into the input image. Peng et al. proposed a region kernel to measure region-to-region distance similarity for HSI classification [25]. The region kernel was designed as a linear combination of multiscale box kernels, which can handle HSI regions with arbitrary shapes and sizes.
These methods consider information about the spatial environment of the pixel and the surrounding pixels while using the spectral information of ground objects for classification. Therefore, they can effectively reduce the influence of the phenomenon of the same object with different spectra and distinct objects with the same spectrum on the classification accuracy. At the same time, the classification accuracy can be significantly improved.
With the advancement of artificial intelligence and deep learning, various CNN-based HSIC methods have attracted a lot of attention in solving the nonlinear structure of hyperspectral data [8], [26]- [32]. Convolutional networks with multilayer CNN for HSIC were first designed [33] to extract spectral features. Yu [34] proposed a CNN structure embedded with extracted hash features to improve the accuracy of HSIC. In [35], a recurrent neural network (RNN) was proposed to process spectral data using the following information of the spectral correlation and band variability.
Nonetheless, the range of variation in spectral magnitude or pixel shape can vary widely within the same ground level. Therefore, foreign objects may have similar spectra, while objects within a class may have discrete spectra. This is the bottleneck hindering the improvement of classification accuracy. In addition, spectral variations complicate the statistical distribution of sample points and exacerbate problems associated with small sample sizes. Therefore, reducing the effects of spectral changes is one of the key issues for HSI classification. To address this problem, various HSI classification methods have been proposed.
Xue et al. [36] proposed a completely different approach from the perspective of sub-pixel target detection. They used band selection followed by nonlinear expansion (BSNE) and iterative constrained energy minimization to classify HSIs. It is not difficult to implement and has advantages over other methods. Meanwhile, with the development of remote sensing imaging technology, the spatial resolution of HSI has been increased. The joint spectral-spatial features have also attracted more attention [37]. For example, many features in [38], such as multiple features, texture features, grayscale coexistence features and statistical features are used as computational parameters to obtain better classification results. Moreover, inspired by tensor learning, spatial and spectral features can also be fused to the 3-D tensor [39]. It can narrow the loss of structural information intrinsic to HSI. Guo [40] proposed a tensor-based technique for HSI classification and VOLUME 10, 2022 used multilinear principal component analysis to preprocess the tensor. However, most of these tensor methods build the tensor physically and ignore those features that follow a predetermined logical arrangement. In [41], a novel generalized tensor regression (GTR) method extended from a simple but effective classifier, was used for HSI classification.
Currently, convolutional neural network (CNN) has shown excellent performance in HSI classification [8], [32]. This is because it can naturally deal with the problem that HSIs are often in a nonlinear and complex feature space [42]. Some researchers have proposed multi-stream CNN for HSI classification. Another aspect [43], [44]. In most of CNN, feature extraction and classifier training are separated. To overcome this drawback [45], a spectral-spatial unified network (SSUN) [45] was designed and combined with shallow and deep convolutional layers to deal with the information loss problem. In [46],a dual-channel deep CNN is proposed. Discriminative information is captured by the spectral domain and deep convolutional layer, respectively. The information is available from the spectral and spatial domains respectively and can be efficiently utilized and fused. Xu [47] proposed a novel two-branch CNN (MS-CNN) based on multi-source data for the classification and fusion of HSI and data from multiple other sensors. data from multiple other sensors, such as Light Detection and Ranging (LIDAR) [48] data. They help the two networks to place emphasis on different features separately and obtain excellent classification performance. Cao [49] proposed integrate active learning and deep learning into a unified framework and leverage Markov random fields (MRF [50]) to enhance the smoothness of class labels to further improve classification performance(CNN-AL-MRF). Han [51] proposed a method to select pixel blocks of different scales around the central pixel as the basic unit for processing. Then, a spatial augmentation strategy is intended to obtain various spatial location information under limited training samples through spatial rotation and row-column transformation to obtain better accuracy. Dong [52] propose weighting feature fusion based on convolutional neural network and graph attention network (WFCG) for HSI classification. The GAT is first built with the help of superpixel-based encoder and decoder modules, and then coupled with the attention mechanism to build the CNN. Finally, features are weighted and fused with the features of the two neural network models. Ortac [53] proposed to using 1D, 2D and 3D convolutional neural networks to classify samples from widely used hyperspectral datasets by extracting spatial, spectral and spatialspectral features.
The research of hyperspectral image classification faces some challenges, each pixel of the hyperspectral image contains hundreds of different bands of reflection information, the number of bands and the correlation between adjacent bands of the hyperspectral image is large, there is high information redundancy, as well as a large amount of hyperspectral image data, needs a lot of operations, using data dimensionality reduction algorithm can not only reduce data redundancy but also reduce the number of operations.
PCA (Principal Component Analysis) [54] is a principal component analysis method, which is a technique to analyze and simplify data sets. The principal component analysis is often reduce the dimensionality of a dataset while maintaining the features in the dataset that contribute the most to each other's variance. This is achieved by retaining lower-order principal components and ignoring higher-order principal components so that the lower-order components tend to retain the most important aspects of the data. It is particularly suitable for dimensionality reduction processing in hyperspectral images, thus achieving a reduction in data redundancy and computational effort. Moreover, there is a strong correlation between similar neighboring pixel points in hyperspectral images, and the small size or low contrast of a certain type of object in the image requires a higher resolution to be observed, while on the contrary the strong contrast or large size of a certain type of object in the image requires just a lower resolution, and both situations exist in hyperspectral images, which then require multi-resolution processing. The bottom of the Gaussian pyramid [55] is a high-resolution representation of the image to be processed, while the top is a low-resolution representation, and as the pyramid moves up the pyramid, both size, and resolution decrease. In this way, a multi-resolution and multi-size image can be calculated by Gaussian pyramid, and a better performance can be obtained for its feature extraction and classification.
With the efforts of scientists, wavelet transform has been well developed in various fields. Discrete wavelet transform (DWT) [56] is a time-frequency analysis method in signal processing improved based on Fourier transform, which can represent local features in both time and frequency domains, and is an important algorithm in image coding. Barzegar [57] improved the model's prediction accuracy on the dataset by combining boundary correction (BC) [58] and maximum overlap discrete wavelet transform (MODWT) [59] preprocessing data with a hybrid convolutional neural network (CNN). The current hyperspectral remote sensing image classification method does not apply to all types of images. The hyperspectral remote sensing image data can be regarded as three-dimensional cube data, and we adopt multi-scale and multi-resolution feature extraction for the hyperspectral image in pixel space feature to get more accurate classification features and in spectral feature, we try to process the image spectral curve of hyperspectral remote sensing image and then introduce Fourier transform and wavelet transform in signal processing to realize the image processing. The processed image spectral data are put into LSTM neural network for classification, while the multi-scale and multi-resolution images are put into the resNet [60] network for classification, and the two network output layers are connected together into a fully connected network for classification to obtain the final results.

III. METHODOLOGY
Our HSI classification method consists of two networks whose overall architecture is shown in Fig.1 The upper part is processed by PCA for dimensionality reduction, and then the dimensionality-reduced data is downsampled through the Gaussian pyramid to obtain images of three scales, which are divided into two resNet networks with different layers for feature extraction, as shown in Fig.2 and Fig.3. Each module in turn consists of two different residual blocks as shown in Fig. 4 and Fig. 5. Use a shallow network for small-sized images and a deep network for large-sized images to prevent overfitting. The following part performs discrete wavelet processing on the spectral data of each pixel point, and obtains the spectral data after denoising and de-anomaly, which is input to the LSTM network for spectral feature extraction. The first network LSTM requires discrete wavelet processing of the data to obtain finer denoised data. The second network resNet needs to segment the data, place each pixel to be classified in the center of a rectangular area, and then perform PCA data down scaling and Gaussian pyramid processing on the segmented image to obtain a multi-scale and multi-resolution image. Using two separate networks to extract spatial features and spectral features respectively, and finally combining the two parts of the network through a fully connected network to obtain the classification results. Details are described in Section IV-A-IV-B-IV-C.

A. REGION MULTI-SCALE SPATIAL FEATURE EXTRACTION
Based on the above analysis, hyperspectral data are prone to dimensional disasters because of their data redundancy as well as a large number of operations and the high-dimensional features involved. As the dimensionality of the dataset increases, the number of samples required for algorithm learning increases exponentially. In hyperspectral classification, it is very disadvantageous to encounter such large data because redundant data and certain unimportant features reduce the classification accuracy, and more memory and processing power are required to learn from large datasets. In addition, the sparsity of the data increases as the level. It is very difficult to explore the same dataset in a high-dimensional vector space than in an equally sparse dataset. Therefore, this paper first uses PCA to reduce the dimensionality of HIS data to alleviate the dimensional catastrophe and data redundancy. PCA algorithm can make the samples more spatially relevant after rounding off some information to remove some noise and data that affect the classification effect. The PCA algorithm is utilized to partition the data into regions, and the pixels to be classified are placed at the center of the partitioned region. The segmented data are then down sampled using the Gaussian pyramid algorithm to obtain multi-scale and multi-resolution image data. The Gaussian pyramid is used for down sampling. First, the original image is used as the bottom image G 0 (layer 0 of the Gaussian pyramid), and the gaussian kernel is used to convolve it, and then the convolved image is down sampled to get the upper-layer image G 1 , and this image is used as the input, and the convolution and down sampling operations are repeated to get the upper-layer image, iterating several times to form a pyramid-shaped image data structure as shown in Fig.6. L 1 indicates the upper layer to be generated, and L l−1 indicates the lower layer of the pyramid to obtain the upper layer by F-function calculation. The image is continuously computed by iterating through the following equation: The * is the convolution operation. X ∈ {1, 2, 3, · · ·, l}, l is the total number of layers of the Gaussian pyramid. F (u, v) is (2c + 1) × (2c + 1) Gaussian window, which can be defined as: where ϒ is the variance of the Gaussian filter. The Gaussian pyramid consists of a series of images {L 1 , L 2 , L 3 , · · ·, L l }, which is generated by the above equation. Put the data processed by dimensionality reduction and Gaussian pyramid into the resNet network for feature extraction. The resNet [60] network can extract deep-level features and networks that can deepen the number of network layers without affecting the effect. It is especially suitable to use it to extract hyperspectral image features, because hyperspectral images have many dimensions, even after the PCA algorithm reduces the layer. The number is not low, so use the resNet network to extract its spatial features.
The overall structure of the image is achieved by performing the same steps for down-sampling to obtain the processed image. The same processing is applied to the HIS image so that the HSI can be analyzed at multiple scales, and different spatial features can be available at different scales. It classifies with distinct layers of the resNet [60] network. Each layer of the pyramid is obtained by the following formula. L 1 represents the upper layer that needs to be generated, and L l−1 means the lower layer of the pyramid is calculated by the F function to obtain the upper layer representation. Iteratively calculate the value of the image pixel by the following formula.

B. SPECTRAL FEATURE EXTRACTION
For spectral features, we sample wavelet transform for each image element of the hyperspectral image, because of weather, atmosphere, light or satellite sensor imaging process, the uncompensated atmosphere, uncompensated error of the sensor and the angle of the sun relative to the zenith and other objective factors may cause the spectral curve of similar image elements of the hyperspectral image will be different so that the spectral curve of similar image elements is very different, resulting in The recognition rate is reduced, and the image elements cannot be correctly classified by spectral curves alone. Therefore, this paper uses wavelet transform to reduce the gap of spectral curves as well as to decrease the noise points. The processed spectral data are put into the LSTM network to learn and extract the spectral features. The wavelet transformed spectral curve data are used as   sequence data and then used to train the LSTM network, which can remember the previous input data and influence the next input, so that it can learn the long-term dependency. Moreover, the forgetting gate can determine what information we will discard from the cell state, and discarding some unimportant data can improve the accuracy of the model.

C. THE PROPOSED MODEL
The models we use are residual network and long-short memory neural network, which perform region segmentation on the data after PCA dimensionality reduction and segment the pixel points to be classified into a rectangular block, which we segment into 11×11 rectangular blocks. Then the rectangular block data are processed by a Gaussian pyramid to get multiscale multi-resolution hyperspectral region image data and use the residual network for spatial feature extraction. Data normalization is performed before in putting into the network so that each batch of data is similarly distributed and gradient vanishing can be avoided. We use the following formula to make the mean of the data close to 0 and the standard deviation close to 1. The formula is as follows:  Where Z denotes the output of samples in the batch after applying batch normalization, E(Z) and Var(Z) represent the expectation and variance of Z, respectively, and γ and β are hyper parameters to be learned. We down sampled the data three times to obtain four different scales of images and trained the network with four residual networks without layers to prevent overfitting. The wavelet-transformed spectral data are constituted into the long-short memory neural network for feature extraction.

IV. EXPERIMENTAL RESULTS
In this section, we establish the effectiveness of the proposed method on three datasets and compare it with current state-ofthe-art methods. All programs in the experiments are run on Python, and the network models are built using the PyTorch deep learning framework, an open-source Python machine learning library that allows custom deep learning models that can be trained and used flexibly. The networks used as well as the dataset processing procedures are implemented using the Python language.The computer configuration used in our

A. EXPERIMENTAL DATA
To evaluate the usability of our method, the performance of our proposed dual-branch feature network is evaluated on three commonly used datasets, namely Indian Pines dataset, Salinas dataset and University of Pavia dataset. The classification results are shown in Fig.7. For each dataset, we randomly selected 200 labeled pixels of each class for training and all other pixels in the ground truth graph for testing. The Indian Pine dataset consists of 145 × 145 pixels and was collected by an airborne visible infrared imaging spectrometer (VIRIS) located in northwest Indiana. 220 spectral channels are covering 0.4 ∼ 2.5 µm with a spatial resolution of 20 m. The Indian Pine dataset initially has 16 different land cover classes. However, from a statistical point of view, we did not select classes with a low number of samples and selected eight classes with a high number of samples. The numbers of training and test samples are listed in Table 1. are shown in Fig. 7. Three experimental data sets: (a) pseudocolor image data of Indian pine trees, (b) ground truth data of Indian pine trees, (c) pseudo-color image data of Salinas, (d) ground truth data of Salinas, (e) pseudo-color image data of Pavia University, and (f) ground truth data of Pavia University. The second dataset is the Salinas dataset collected by the sensor on AVIRIS consists of 512 × 217 pixels, the image contains 224 spectral bands with a spatial resolution  Table 2.
The third dataset is that of the University of Pavia (containing 610 × 340 pixels) was collected by the Reflective Optics System Imaging Spectrometer covering the city of Pavia, Italy. The dataset consists of 103 spectral bands covering 0.43∼0.865 µm with a spatial resolution of 1.3 m. Approximately 42776 pixels in the ground truth map have been labeled and divided into 9 categories, and the number of training and test samples is shown in Table 3.

B. LEARNING THE PROPOSED METHOD
For each training pixel, we use the surrounding 11 × 11 pixels, and diverse regions are extracted from the square-based region and then pour into a sequence of convolutional layers. Note that the proposed diverse-region strategy can be viewed as a flexible representation of the square-shaped region, hence the region size affects the ultimate performance of the proposed GWJ-Net. Here, we empirically set the global region size to be 11 × 11.
The Adam algorithm is different from the traditional stochastic gradient descent, which maintains a single learning rate to update all weights, and the learning rate does not change during the training process, while Adam designs independent adaptive learning rates for different parameters by calculating the first-order moment estimates and the second-order moment estimates of the gradients. Authors of the Adam algorithm describe it as a collection of the advantages of two stochastic gradient descent extensions by calculating the first-order moment estimates and second-order moment estimates of the gradient to design independent adaptive learning rates for different parameters. In this paper, we set the initial learning rate to 0.001 and the batch size to 200. Fig.8 shows the classification performance of the rectangular region for different window sizes, from 3 × 3 to 15 × 15. The performance tends to be satisfactory when the window size is 11 × 11. When the window size is 11 × 11 is not the optimal window size for all experimental datasets. For example, the red curve indicates that the square area 11 × 11 is the best window size for the Pavia University dataset, the blue curve indicates that the best window size for the Indian Pines dataset is 11 × 11, and the best window size for the Salinas dataset is 11 × 11, while when converting spectral data to images, the best image size for the Indian Pines dataset is, Therefore, we choose a relatively large size among the allowed hardware resources to prevent wasting hardware resources and exceeding hardware usage.
Different initial learning rates are tested on the Indian Pines dataset as showed in Table 8. It can be observed that a larger learning rate may degrade the classification performance; the VOLUME 10, 2022  best performance is achieved when the rate of learning is around 0.001. In follow-up experiments, we set the learning rate as 0.001 for GWJ-Net.

C. CLASSIFICATION PERFORMANCE
Our proposed GWJ-NET method is compared with current advanced hyperspectral image classification methods, such as SVM-RBF (SVM using radial basis function) and SVM-RF (SVM based on the random selection of features), as well as CD-CNN, SS-CNN, CNN-PPF, CNN, SVM-MRF, and R-PCA CNN. in Tables 4-6 Accuracy, OA, and AA indicate the classification performance of different methods on several datasets. We use a random selection method for dataset segmentation, for each category of data we randomly select 200 data for training and the remaining data for testing. The experimental results obtained from us show that the classification accuracy based on spatial-spectral features is higher than that using only one of the features, and our proposed GWJ-NET model is higher than other classifiers in terms of classification accuracy. In Table 5, we can obtain the accuracy of our proposed GWJ-NET as 99.74%, which is 7.35% higher than the accuracy of R-PCA CNN (92.39%) compared to the table, and about 4% higher compared to CD-CNN (95.42%). A similar performance also exists for experiments conducted on other datasets. The classification performance of our proposed method on the University of Pavia dataset, Indian Pines dataset, and Salinas dataset is approximately 2%, 1%, and 1% better than the other classification methods compared to the other methods. Fig. 9-11 show the graphs of the classification results for each classifier, and the graphs show that the classification results are consistent with the classification results in Tables 4-6. By the presentation of the images, we can visually see that compared with the CNN, CNN-PPP, and CD-CNN in the table, our proposed GWJ-NET classification result graph has significantly fewer points of classification errors in many regions, such as the Bare soil region in Fig. 11 and the Soybean-clean region in Fig. 9, so we can conclude that our GWJ-NET classification method performs better than other classification methods. Table 7 lists the classification performance with different numbers of training samples per class increasing from 50 to 200 with an interval of 50. Obviously, for all  methods the accuracy can increase with the number of training samples. From the results, the proposed GWJ-NET still outperforms other methods, namely CNN, CNN-PPF and CD-CNN. Even with a small amount of training data, such as 50 or 100, our proposed network continues to have good classification performance. Table 9 lists the computational complexity of training and testing for GWJ-NET, CNN, and CNN-PPF. During training,  CNN is faster than the other two because the network size and input size of CNN is smaller than the other two. In the testing process, GWJ-NET will be more time-consuming due to the increased computational burden of using multi-scale feature extraction methods.
In particular, we compare the classification performance by randomly selecting 10% of each class of training samples and ResNet networks with different layers on Indian Pines data. As showed in Table 10, GWJ-NET achieves the highest accuracy due to the combination of spectral feature extraction and ''multi-scale'' module feature extraction. VOLUME 10, 2022

V. CONCLUSION
In this paper, we propose a new multi-branch network for hyperspectral image classification based on a multiscale region Gaussian pyramid and a wavelet transform of the spectrum with resNet combined with LSTM. First, the image is segmented so that the image elements to be classified are placed in the center of the rectangle, and then the segmented region is subjected to Gaussian pyramid processing to extract multi-scale features, where multi-scale spatial information and contextual interaction features in specific directions can be obtained to improve the model recognition rate. For spectral data, we perform wavelet transform to reduce the influence of some anomalous spectral segments on the classification and remove some noise that affects the recognition effect, so that finer spectral information can be captured. The advantages of our method come from two aspects: regional multi-scale feature extraction and wavelet transformation of the spectral information to reduce the influence of anomalous spectral values on the classification. Experimental results show that the proposed method based on regional Gaussian pyramid multiscale and spectral information wavelet transform outperforms other recent methods on three datasets.