Regional Principal Component Analysis Network With the Rolling Guidance Filter for Classifying the Hyperspectral Images

Because conventional PCANET approach is that the conventional PCANET performs the PCA for all the segments of all the training pixel vectors, and this does not capture the difference between different segments of the same training pixel vectors, classification accuracy is not high. This paper proposes to employ a regional principal component analysis network with the rolling guidance filter (RPCANET_RGF) for performing the hyperspectral image (HSI) classification with few training samples. Regional principal component analysis network (RPCANET) proposed in this paper performs the PCA for each segment of all training pixel vectors. Besides, the rolling guidance filter (RGF) is used to remove the spatial noise and to enhance the edges of the HSIs. Different from the conventional convolutional neural networks (CNNs), the coefficients of the filters are obtained by performing the principal component analysis (PCA) on the regional segments of HSIs. This approach is also different from the conventional principal component analysis network (PCANET). Here, different segments of the same pixel image are processed by different Filters. Since the RPCANET_RGF is a general learning method that obtains the filter coefficients directly from the HSIs, the back propagation based training is not required. Hence, the RPCANET_RGF requires a less computational power for performing the training compared to the CNN. Besides, as the RPCANET_RGF can make use of both the spectral information and the spatial information for performing the classification, the computer numerical simulation results show that the classification accuracy achieved by the RPCANET_RGF is higher than that by the conventional PCANET and other state of the art methods.


I. INTRODUCTION
The classifications of the hyperspectral images play an important role for the economic and social developments [3]- [6]. However, the hyperspectral images consist of many color bands. Extracting the only useful information from these color bands for performing the classification is crucial. This is because it can reduce the required computational power.
The associate editor coordinating the review of this manuscript and approving it for publication was Lefei Zhang . However, it may result to the reduction of the classification accuracy [13], [14] especially when the total number of the training samples is small [15].
To address the above difficulties, the multinomial logistic regression was proposed for performing the classification of the HSIs [7]. The SMLR-SpATV classifier using the SpATV regularization with the spectral information was also proposed [19]. Besides, the discriminative spectral spatial margin (DSSM) based method [33] and the spatial spectral regularized local discriminant embedding (SSRLDE) based VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ method [34] made a full use of the spatial information and the priori knowledge of the HSIs [18]. Moreover, by using a spatial hypergraph model, the spatial neighbors of the HSIs were combined and the fused spatial spectral features were extracted using a hypergraph embedding method. Furthermore, based on the graph framework, a marginal Fisher analysis (MFA) based method [29], a linear discriminant analysis (LDA) based method [22], a local Fisher discriminant analysis (LFDA) based method [30], a locally linear embedding based method [23], a neighborhood preserving embedding (NPE) based method [25], a regularized local discriminant embedding (RLDE) method [34], a Laplacian Eigen map based method [24] and a locality preserving projection (LPP) based method [28] were proposed.
Recently, [26] propose a sparse-adaptive hypergraph discriminant analysis (SAHDA) method to obtain the embedding features of the HSI in this letter. [27] propose a hybrid-graph learning method (EHGDL) to reveal the complex high-order relationships of the HSI. A convolutional neural network (CNN) [1] was used for performing the classification of the hyperspectral images. It is worth noting that constructing and training an appropriate model for the CNN to perform the classification requires a huge training set. Nevertheless, obtaining a large amount of data for some specific applications is challenging. As a result, the developments of the deep learning applications are restricted. Moreover, a robust PCANET was used for performing the detection of the change of the HSIs [48]. The principal component analysis network (PCANET) is a very simple deep learning network which consists of only very basic data processing modules such as the cascaded PCA module, the binary hashing module and the segment-wise histogram module. Since it can achieve an excellent classification performance and its structure is simple as well as it is easy to perform the optimization of the hyper-parameters [2], the PCANET is useful for performing the image classification, the target recognition and other processing. As a result, many classification applications such as the character recognition [10] and the scene character recognition [11] have been developed. However, the PCANET is rarely used for performing the HSI classification.
In this paper, a new scheme based on the regional PCANET and the rolling guidance filter called the regional PCANET_RGF (RPCANET_RGF) is proposed for performing the HSI classification. In the RPCANET_RGF, the weights are computed based on the regional PCA instead of the back propagation algorithm. Thus, the required time for computing the weight is reduced. Moreover, the RPCANET_RGF exploits the spatial information of the HSIs via the rolling guidance filter. In particular, different segments of the same image are processed by different Filters, in other words, the RPCANET_RGF proposed in this paper performs the PCA for each segment of all training pixel vectors, while the conventional PCANET performs the PCA for all the segments of all the training pixel vectors.
The rest of this paper is organized as follows. In Section II, the related works on the PCANET are reviewed. Section III explains the details of the proposed HSI classification approach based on RPCANET_RGF. In Section IV, the corresponding computer numerical simulations are presented using three widely used HSI datasets. Finally, Section V concludes some remarks and discusses some possible future research directions.

II. RELATED WORKS A. PCANET
PCA is used for the dimension reduction and fault diagnosis and fault tolerant control [12]. It projects the data from the high dimensional space to the low dimensional space. Then, keep the data in the low dimensional space such that the variance of the data in the low dimensional space is as large as possible. In this case, it facilitates the classification of the data. From here, it is required to find the most appropriate projection. To achieve this objective, the PCA algorithm calculates the eigenvalues and the eigenvectors of the covariance matrix of the data and chooses the eigenvectors of the covariance matrix with the largest corresponding eigenvalues to form the projection matrix. On the other hand, the PCANET employs the PCA to learn the convolution kernel of the convolutional neural network. Then, the binarization and the hash reset the two dimensional signal. In this paper, the regional principal component analysis network with the rolling guidance filter (RPCANET_RGF) is applied to perform the classification of the hyperspectral image.

1) THE INPUT LAYER
The PCANET model is discussed in details in [17]. Suppose that the dimensions of the hyperspectral image (HSIs) are m×n×h. Here, m is the length of the HSI, n is the width of the HSI and h is the total number of the color bands of the HSI. This is also known as the dimension of each pixel vector. Therefore, the hyperspectral image is composed of m×n pixel vectors. Suppose that there are N training pixel vectors taken from a HSI. They are denoted as T i for i = 1, · · · , N . Obviously, N ≤ mn. Let the segment size or the length of the one dimensional filter be k 1 for all the layers of the PCANET. Here, only the PCA is employed to operate on the training pixel vectors T i for i = 1, · · · , N . In the following sections, the training procedures of the conventional PCANET are presented in details.

2) THE FIRST HIDDEN LAYER
To design the first layer of the PCANET, each training pixel vector is divided into the segments with the lengths equal to k 1 × 1. More precisely, two consecutive segments of each training pixel vector are related to be shifted by one point from each other. That is, the first segment in T i is taken out from its 1 st point to its k th 1 point. The second segment in T i is taken out from its 2 nd point to it's (k 1 +1) th point. The j th segment in T i is denoted as α i,j = x i,j · · · x i,k 1 +j−1 T ∈ 148358 VOLUME 8, 2020 R k 1 ×1 for i = 1, · · · , N and for j = 1, · · · , h − k 1 + 1, where x i,j is the j th value in T i . Then, α i,j = x i,j · · ·x i,k 1 +j−1 T is obtained by subtracting α i,j from its mean. Denote the matrix containing all the zero mean segments of T i asX i = α i,1 · · · α i,(h−k 1 +1) ∈ R k 1 ×(h−k 1 +1) . The above procedures are repeated for all the training pixel vectors. Finally, denote the matrix containing all these zero mean segments of all the training pixel vectors as Assume that the total number of the orthonormal filters in the first layer of the PCANET is L 1 . Let these orthonormal filters be Put V l into a matrix and let this matrix be V . The PCA minimizes the reconstruction error using these orthonormal filters. Therefore, the design of these filters can be formulated as the following optimization problem: where I L 1 is the L 1 × L 1 identity matrix. The solution of this optimization problem is the matrix with its column vectors being the first L 1 eigenvectors of XX T . Here, the corresponding eigenvalues are sorted in the descending order. It is worth noting that these leading principal eigenvectors capture the main variations of the processed segments of the training pixel vectors. Let the output of the l th filter in the first layer of the PCANET be l i for i = 1, · · · , N and for l = 1, · · · , L 1 .
l i is the result of the convolution between the l th filter W l 1 and the i th pixel vector i ∈ h×1 . That is, for i = 1, · · · , N and for l = 1, · · · , L 1 . Therefore, the matrix containing the outputs of the first layer of the PCANET is:

3) THE SECOND HIDDEN LAYER
The second layer of the PCANET is almost the same as its first layer. If the total number of the filters in the second layer is L 2 , then there are L 2 outputs for each input l i in the second layer of the PCANET. Denote W 2 as the l th filter in the second layer of the PCANET. Each W 2 performs the convolution with W 2 for = 1, 2, · · · , L 2 . Denote O i,l as the output of the second layer of the PCANET. O i,l can be obtained as follow: for i = 1, 2, · · · N , for l = 1, 2, · · · L 1 and for = 1, 2, · · · L 2 . Here, the total number of the outputs of the second layer of the PCANET for the input i is L 1 L 2 . Therefore, the matrix containing the output of second layer of the PCANET is The same operations can be repeated to obtain the further layers of the PCANET. Generally speaking, a two tier PCANET is enough for performing most of the classification applications. This is because more layers will increase the required computational power without improving much the performance.

4) OUTPUT STAGE: HASHING AND HISTOGRAM
Next, each output in the second layer of the PCANET is quantized. More precisely, the elements in O i,l are set to 0 if they are less than or equal to 0. Otherwise, they are set to 1. Then, every quantized element in O 1 i,l is multiplied by 2 L 2 −1 . The obtained vector is denoted as H 1 i,l . Likewise, every quantized elements inO i,l is multiplied by 2 L 2 − for = 2, · · · , L 2 . The obtained vector is denoted as H i,l . Define the matrix containing the vectors of these outputs for the input l Then, by summing up all these L 2 gained quantized outputs together, a vector denoted asT i,l ∈ R (h−2k 1 −2)×1 is obtained. Next, put all these vectors together to form a new vector denoted as . Then, sort the elements in T i, in the ascending order and divide the sorted T i, into the blocks. Denote the lengths of these blocks as q. Put these blocks into the columns of a matrix. Here, the dimension of the matrix is q After that, count the occurrence of each integer value in each block of the matrix and obtain a vector with its elements being the occurrences of the corresponding integer values. Put these vectors into the columns of another matrix denoted as B i, . Besides, all the columns of B i, are connected together to form a new long vector denoted as C i, . Put the C i, into the columns of a matrix. Denote this matrix as C i,1 · · · C i,L 1 . Next, combine all the columns of C i,1 · · · C i,L 1 together to form a new column vector dented as P i . Finally, P i is used to train the class of objects using the SVM.

B. NOISE REMOVAL
For various reasons, a hyperspectral image contains a lot of noise. To suppress the noise, the existing works employ the filters to suppress the noise effect. For example, an edge preserving filter is used [31]. Also, the filter based on the weighted composite features (WCFs) is employed to derive the optimized output weights and to extract the spatial features [32].
The rolling guidance filter [16] (RGF) can restore the boundaries among different classes when removing and smoothing the small areas of the noise in the HSI.

III. RPCANET_RGF A. REGIONAL SEGMENTED PCA
Because conventional PCANET approach is that the conventional PCANET performs the PCA for all the segments of all the training pixel vectors, and this does not capture the difference between different segments of the same training pixel vectors, classification accuracy is not high. This paper proposes to employ a regional principal component analysis network with the rolling guidance filter (RPCANET_RGF) for performing the hyperspectral image (HSI) classification with few training samples. The RPCANET proposed in this paper performs the PCA for each segment of all training pixel vectors. A main difference between our proposed method and the conventional PCANET approach is that the conventional PCANET performs the PCA for all the segments of all the training pixel vectors. On the other hand, the RPCANET_RGF proposed in this paper performs the PCA for each segment of all training pixel vectors. The training procedures of the RPCANET_RGF are presented in details below.
Assume that the j th segment of the i th training pixel vectors is α i,j for j = 1, · · · , h − k 1 + 1, i = 1, · · · , N . The j th segments of all the training pixel vectors are put side by side to form the k 1 × N matrix Y i . The same operation applies to all other segments of the training pixel vectors. Eventually, we have: Here, α i,j is obtained by subtracting α i,j from its mean. The filters for the i th segment are the first L 1 eigenvectors of covariance matrixȲ iȲ T i . Hence, the total number of the filters obtained at the first layer is L 1 (h − k 1 + 1). On the other hand, the total number of the filters obtained at the first layer of the conventional PCANET is L 1 . FIGURE 1 shows the flowchart of filter parameter acquisition of regional principal component analysis network. The test samples are input into the network with these parameters to obtain the classification accuracy of the test samples. Denote PCANET as conventional PCANET. DenoteRegional_PCA as the method described in the preceding paragraph, and RPCANET as conventional PCANET withRegional_PCA at the first layer. Denote PCANET_RGFas conventional PCANET with-RGF, and DenoteRPCANET_RGF as RPCANET with RGF.

B. SPATIAL AND SPECTRAL WEIGHTS IN RGF
This paper uses the method of RGF to solve the problem of noise. The RGF procedures are briefly summarized below:

1) STEP 1: GAUSSIAN FILTERING
Gaussian filter is a 2-dimensional convolution operator using Gaussian kernel for image blurring (removing details and noise). Assuming that the Gaussian Filter (window) size is (2k + 1) × (2k + 1), the formula of its elements is: where σ is the variance and k determines the dimension of the kernel matrix, and define M = The pixel value R i,j is obtained by the Gaussian filter and is expressed by (11): m i,j is value of neighboring pixels of R i,j of HSI. In order to increase the diversity of the classification data, the larger the spatial window value, the larger the σ of the Gaussian filter. The basis is the characteristic of the Gaussian distribution. The probability of the numerical distribution in (µ-3σ , µ+3σ ) is 0.9974. If the kernel matrix is larger, the corresponding σ is also larger. On the contrary, if Sigma is larger, then the nuclear matrix coverage is also larger. The relationship between k and σ with reference to OpenCv is determined by (12):

2) STEP 2, EDGE RECOVERY
The iterative edge recovery step forms the key function in RGF. In this process, an image H is iteratively updated. We denote R t+1 as the result in the t-th iteration. Initially, R 1 the output of Gaussian filtering. The value of R t+1 in the t-th iteration is obtained in a joint bilateral filtering form given the original input image H.
H remains unchanged throughout all iterations, R t (p) − R t (q) 2 is the Euclidean distance between point p and point q. σ s and σ r control the spatial and spectral weights respectively.
IN (13) and (14), if σ s is generally set larger, the slope of the Gaussian function is more gradual, so that the variance between the pixels after step 1 become smaller, and noise pixels are cleared more thoroughly; In general, the value of σ r is as small as possible, the slope of the Gaussian function is steeper, and after the t th iteration, even if the values of this can better recover the image boundary. but if σ r is too smaller, It is easier to restore noise data. This article uses adaptive σ r by (16) and (17), Through experiments, k is better between 1/20 and 1/35.
The standard deviation between the edge pixel and its surrounding pixels is large, that is advantageous for separating different types of images. In general, Step 1 of RGF eliminates the effect of noise, but it blurs the boundaries of Hyperspectral image segmentation; and step 2 of RGF enhances Hyperspectral image detail and restores edges.   The conventional PCANET performs the PCA for all the segments of all the training pixel vectors, so V 1j = V ij = V hj . While the RPCANET_RGF performs the PCA for each segment of all training pixel vectors, so V 1j = V ij = V hj .

IV. COMPUTER NUMERICAL SIMULATION RESULTS AND DISCUSSIONS
This paper reduces the noise and enhance boundary by RGF filter. Then a small number training pixels of HIS are randomly selected from the original data to obtain the Filter of two layers. Binarization processing is performed on each output matrix of the second layer, and the obtained result only contains 1 and 0. The result is hashed by squaring, and is divided into some equal length segments, and then we calculate the histogram information for each segment. The histogram information of the segment is then cascaded in the histogram features of each segment, and finally the segment extended histogram feature vector is obtained, and these vectors are passed through the libsvm classifier to obtain the classification result of the train data and the parameters of the SVM classifier. Next, we apply these Filters and trained SVM models to the test pixels of HIS to get classification accuracy. This operation was performed 10 times at different train data set, and the average value obtained was used as the final classification result.
In the subsequent experiments, we used the following settings: In the noise removal phase, select 8 times RGF for different spatial window range of pixels, the first spatial window range is 3×3, the second time is 5×5, and so on. The eighth time is 17×17. For each spatial window range, 3 cycles of RGF operation, that is, 1 times Gaussian filter, 2 times RGF iterative filters, and every time one m × n × h image is get; after 8 cycles, eight images of m × n × h were obtained, and these images were connected in series to obtain an image of size m×n×8h for subsequent RPCANET_RGF experiments. Size of segments The first and second layer are 40×1, the first and second layers have a number of filters of 8, and the histogram is segmented using a non-overlapping method with a segment size of 8 × 1 in the histogram generation phase.
In the follow part of this section, we provide an experimental evaluation for RPCANET_RGF using 3 real HSIs. In our experiments, the classification results are compared visually and quantitatively, where the quantitative comparisons are based on the class-specific accuracy, overall accuracy (OA), average accuracy (AA) and the k coefficient [1]. all experiments are performed using MATLAB R2015a on an Intel Core(TM) i7 7700 2.60-GHz machine with 8 GB RAM.
Indian Pines was the first test data for hyperspectral image classification. Indian pine was imaged in 1992 by an airborne visual infrared imaging spectrometer (AVIRIS) and then intercepted in a size of 145 × 145. These areas are labeled for use as a hyperspectral image classification test. The AVIRIS imaging spectrometer has an imaging wavelength range of 0.4-2.5µm, which continuously images the ground objects in 220 consecutive bands. The remaining 200 bands after eliminating the 20 invalid bands are the subject of research. The spatial resolution of the image formed by the spectral imager is about 20m, so it is easier to produce mixed pixels, which makes classification difficult. The number of pixel is 145 × 145 = 21025. 10776 pixels is background which is painted blue, and 10249 pixels represent classification data, labeled with different colors. Schematic of 20 training samples randomly selected for each class and the ground-truth image are shown in FIGURE 3.
The second dataset is The Pavia University data set (Pavia_U) [40], Pavia_U consists of data over the Pavia, Italy, acquired by the ROSIS instrument in 2001.The image scene intercepted in a size of 610 × 340 pixels and the remaining 103 bands after eliminating the 12 invalid bands are the subject of research. There are totally nine classes in the HSI data set. FIGURE 4 shows Schematic of 20 samples randomly selected for each category and the ground-truth map.
The third dataset is Salinas; Salinas consists of data over the Salinas Valley, California, characterized by high spatial resolution. The image scene intercepted in a size of 512×217 pixels and the remaining 204 bands after eliminating the 20 invalid bands are the subject of research. There are totally 16 classes in the HSI data set. Schematic of 20 samples randomly selected for each category and the ground-truth map are shown in FIGURE 5.

A. COMPARATIVE EXPERIMENTS USING THE INDIAN PINES DATASET
In the experiments from FIGURE 6. To FIGURE 9, we randomly select 10% labeled samples per each class for training and the rest for testing, the training data is taken randomly  10 times, and the scales of the x axis of these figures are the segment length, while the scales of the y axis of these figures are the average classification accuracy of the 10 randomly selected test data. The comparison on the average classification accuracy between the conventional PCANET and the RPCANET under the different segment lengths of the Indian pine data is shown in FIGURE 6. It can be seen from FIGURE 6 that the overall rate of the classification accuracy of the RPCANET is 6% higher than that of the conventional PCANET without Regional_PCA. The filters in the first layer of the network in this experiment are the first eight eigenvectors corresponding to the largest eight eigenvalues. The results verify that the RPCANET can significantly improve the classification accuracy comparing with PCANET. This because RPCANET can captures the details of the difference between different segments of the same training pixel vectors, and then the classification accuracy can been improved.
When the segment length is small, the variance of the data projected to these eigenvectors is close to 0. These data are not conducive to the classification. On the other hand, when the segment length is large, it will lose a lot of the detailed features. If the paper does not specify, to tradeoff between these two effects the rest of this paper chooses the segment length as 40. The methods for obtaining the results shown in FIGURE 6 have not considered the spatial information. Hence, the obtained accuracies are relatively low.
The experimental results of FIGURE 6 verify RPCANET with RGF can also significantly improve the classification accuracy.
The classification accuracy of Indian pine of between RPCANET and RPCANET_RGF is shown in the FIGURE 7, and RPCANET is PCANET without RGF.  It can be seen from the FIGURE 7 that the RGF can significantly improve the classification accuracy.
In the paper, RGF is used in the data initialization phase. The number of RGF reverse iterations is set to 2, and the space window is set to 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, 15 × 15 and 17 × 17 respectively. Superimposing the data obtained from these different spatial windows, we length of each vector of original HIS pixel is changed from 200 to 1600.
When the segment lengths of the first and second layers of RPCANET_RGF are 15,20,25,30,35,40, the average classification accuracy is shown in FIGURE 8.: In FIGURE 8, the explanation for the decrease of classification accuracy when the segment length is 35: the training sample has certain randomness. Because the units of the ordinate are very small, the difference is very small, and the average classification accuracy is over 99%, when segment lengths are between 30 and 40, so the algorithm is also very stable.
Adding RGF, FIGURE 9 shows the comparison of overall classification accuracy between PCANET_RGF and RPCANET_RGF. In FIGURE 9, sections in the first layer (blue line) denote RPCANET_RGF, and no sections in the first layer (green line) denote PCANET_RGF. Comparing  with FIGURE 6, the classification accuracy is significantly improved after adding RGF. As the segment length increases, although the classification accuracy rate of RPCANET_RGF increases first and then decreases, the variety of classification accuracy rate was not obvious, and it was around 99%. As the segment length PCANET_RGF is increased, the classification accuracy rate increases, and then rapidly reduces. Indicating that the conventional PCANET is greatly influenced by the segment length, while RPCANET_RGF is very little influenced by the segment length, the algorithm is more robust. The reason for this phenomenon is that when the conventional PCANET performs singular value decomposition algorithm, it uses the set of all segments of all training pixel vectors. As the length of the data segment increases, for example, 100 spectral bands are taken as a segment, however we only take the first 8 singular values, the difference between the 8 singular values is small, and so the difference between the data to be classified is small. When the segment length is small, the difference between the singular values is large, and it leads to a large difference between the data to be classified. However for RPCANET_RGF, since PCA is performed separately for each segment, the effect of segment length is less influence, and the classification accuracy is higher.  We randomly select 10% labeled samples per each class for training and the rest for testing. The proposed approaches are benchmarked with seven state-of-the-art methods including kernel-based LORSAL (KLORSAL) [21], BELM [42], NLELM [20], KELM [20], the ASMLKELM, SVM [20], SVM with CK (SVM-CK) [20], LORSAL [21], SMLR-SpATV (KLORSAL with the weighted Markov random field) [35], [36], PCRC [39], SMLR [40], SR [41]. The quantitative results is average of 10 times for each methods, 0 shows the OAs of different methods. 0 shows that method of RPCANET_RGF can achieve 99.18 OA, and perform better than other methods. Because RPCANET_RGF makes different segments of the same pixel vector be processed with different filters, and the same filters are used for segments of the same sequence number of vectors of different pixels.
Then we randomly select 20 labeled samples per each class for training and the rest are test samples. The paper compares RPCANET_RGF with seven state-of-the-art methods including SVM [1], SVM with CK (SVM-CK) [20], LORSAL [21], SMLR-SpATV (KLORSAL with the weighted Markov random field) [35], [36], PCRC [39], SMLR [40], SR [41]. The LIBSVM [37] software is used for the implementation of the SVM and the SVM-CK. The result is average of 10 times for each methods. The class-specific accuracies, overall accuracies (OAs), average accuracies (AAs) and k coefficients (k) of different methods are showed in TABLE 2. FIGURE 10 illustrates full classification maps obtained by different methods. We randomly pick a result from the results of 10 runs to form Figure 10. Obviously, RPCANET_RGF works best, because different segments of the same pixel vector are processed with different Filters, and RPCANET_RGF can better distinguish between different segments. We can also see that the maps obtained by K-SVM, PCRC, SMLR, SR, LOR-SAL and SVM-CK have heavy noisy appearance. Although SMLR_SpTV gets better results, their classification maps still divide samples from one category into another.   When we randomly select 5,10,15,20 labeled samples from each class to form the training set, and the rest forms the test set, FIGURE 11 shows the average accuracy of different methods. The AA is average of 10 times for each method. As the number of training samples increases, AAs increase basically. When training samples is limited, for example, 15 samples from each class, RPCANET_RGF can obtain an AA over 90%, which is around 7% higher than the SMLR_SpTV which is the second best method. In addition, RPCANET_RGF has performed well when the number of training samples in each category was more than or equal to 10. This Experiments show that RPCANET_RGF can achieve higher average accuracy with less training samples.

B. EXPERIMENTS WITH THE Pavia UNIVERSITY DATA SET
For the Pavia University data set, we randomly select 20 labeled samples per each class for training and the rest are test samples. The paper compares RPCANET_RGF with the same seven state-of-the-art methods including used by the first database. The results is average of 10 times for each methods, and the class-specific accuracies, overall accuracies(OAs), average accuracies(AAs) and k coefficients(k) of different methods is shown in TABLE 3. We randomly pick a result from the results of 10 runs to form FIGURE 12, which shows RPCANET_RGF also gives best results. We can also see that the maps obtained by K-SVM, PCRC, SMLR, SR, LORSAL and SVM-CK have heavy noisy appearance. Although SMLR_SpTV gets better results, their classification maps still divide samples from one category into another. When we randomly select 5,10,15,20 labeled samples from each class to form the training set, and the rest forms the test set, FIGURE 13 shows the average accuracy of different methods. The AA is average of 10 times for each method. As the number of training samples increases, AAs increase basically. When training samples is limited, for example, 15 training samples from each class, RPCANET_RGF can obtain an AA over 94%, which is around 7% higher than the SMLR_SpTV which is the second best method. In addition, RPCANET_RGF has performed well when the number of training samples in each category was more than or equal to 10. This experiments show that RPCANET_RGF  can achieve higher average accuracy with less training samples.

C. EXPERIMENTS WITH THE SALINAS DATA SET
For the Salinas data set, we randomly select 20 labeled samples per each class for training and the rest are test samples. The paper compares RPCANET_RGF with the same seven state-of-the-art methods including used by the first database. The results is average of 10 times for each methods, and the class-specific accuracies, overall accuracies(OAs), average accuracies(AAs) and k coefficients(k) of different methods is shown in TABLE 3. We randomly pick a result from the results of 10 runs to form FIGURE 14, which shows RPCANET_RGF also gives best results. We can also see that the maps obtained by K-SVM, PCRC, SMLR, SR, LORSAL and SVM-CK have heavy noisy appearance. Although SMLR_SpTV gets better results, their classification maps still divide samples from one category into another.
Next, when we randomly select 5,10,15,20 labeled samples from each class to form the training set, and the rest forms the test set, FIGURE 15 shows the average accuracy of different methods. The AA is average of 10 times for each method. As the number of training samples increases, AAs increase basically. When training samples is limited, for example, 15 training samples from each class, RPCANET_RGF can obtain an AA over 96%, which is around 4% higher than the SMLR_SpTV which is the second best method. In addition,   RPCANET_RGF has performed well when the number of training samples in each category was more than or equal to 10. This experiment also proves that RPCANET_RGF can achieve higher average accuracy with less training samples than the conventional PCANET and other state of the art methods.
In order to increase comparability, we choose 2% for each category from The Pavia University data when selecting training data. Full classification maps obtained by different methods are illustrated in FIGURE 16. Each map is constructed from a randomly selected result from ten results.
As can be seen from FIGURE 16, RPCANET_RGF gives satisfactory results on smooth homogeneous regions by making use of the Regional segment spectral and stacking the joint spatial information. We can also see that the maps obtained by K-SVM, PCRC, SMLR, SR and SVM-CK have heavy noisy appearance. SVM-CK, LORSAL and K_SVM got similar classification accuracy, but the map of SVM-CK and K_SVM has more sesame and pepper noise information than LORSAL. One of possible reasons for this may be that the spectral-based methods cannot make use of Regional segment spectral and stacking the joint spatial information of the HSI. Although SMLR_SpTV show improvements on the classification results from TABLE 5 is slightly higher than RPCANET _RGF in some ground categories, their classification maps still divide more samples from one category into another. The results in TABLE 5 and FIGURE 16 further prove the proposed RPCANET _RGF not only reduces noise, but also provides higher OA, AA, kappa and CA than other methods.
Last, when we randomly select 1%, 2%, 3%, 4% labeled samples from each class to form the training set, and the rest forms the test set, FIGURE 17 shows the average accuracy of different methods. The AA is average of 10 times for each method. When the percentage of training samples increases, AAs increase basically. When training samples is limited, for example, 1% samples from each class, RPCANET_RGF can obtain an AA over 98%, which is around 2% higher than the SMLR_SpTV which is the second best method. In addition, RPCANET_RGF has performed well when the  number of training samples in each category was more than or equal to 1%. These experiments once again show that RPCANET_RGF can achieve higher average accuracy with less training samples.

V. CONCLUSION AND FUTURE WORKS
Inspired by PCANET and spatial relationship between neighboring pixels, we develop the RPCANET_RGF. RPCANET_RGF applies regional PCA to every segments of the same pixel picture, so that different filters processes different segments of the same pixel vector and the same Filters are used for segments of the same number of vectors of all pixels. RGF can effectively eliminate the spatial noise information of HSIs. The experiments have shown that RPCANET_RGF outperforms some state-of-the-art HSI classification methods even with a small number of training samples.
In our future work, first, tensor methods can extract different features, and we can design some new learning methods to learn spectral-spatial features of each segment and mix these methods using Optimization Methods. Second, for our structure, more features such as band selection will be explored in the future. Last, we will try to use parallel computing to further improve its computational efficiency. For the past five years, he has hosted or participated in many projects, such as the National Natural Science Foundation Projects, the National 863 Project, and so on. His research interests include the Internet of Things, SAAS service, image processing, and machine learning.
XIULING LI was born in Harbin, China, in 1980. She received the LL.B. degree from the Mudanjiang Teachers College and the LL.S. degree from South China Normal University, in 2007. She is currently a Teacher with the Guangdong University of Technology. VOLUME 8, 2020