Hyperspectral Image Classification Based on Active Learning and Spectral-Spatial Feature Fusion Using Spatial Coordinates

In Hyperspectral image (HSI) classification, combining spectral information with spatial information has become an efficient measure to obtain good classification results, where spatial information is generally introduced in an unsupervised way or some complicated way. We introduce spatial coordinates as the spatial information in a simple supervised way and propose two HSI classification algorithms, where spatial coordinates of samples are regarded as the spatial features of samples. A spectral-spatial classification algorithm is proposed, named as HSI Classification Based on Spectral-Spatial Feature Fusion using Spatial Coordinates (SSFFSC). The HSI is divided into multiple small images in spatial dimension, and samples in each small image are randomly selected as training samples. Support vector machine (SVM) is used to classify the samples to obtain the probability of samples belonging to each class according to the spatial coordinate features and spectral features respectively. The probability features are further classified by SVM to achieve the final classification result. Considering that the performance of SSFFSC relies on the partition of HSI, SSFFSC is further combined with active learning (AL) as a new method named as HSI Classification Based on Active Learning and SSFFSC (SSFFSC-AL). Partition of HSI is omitted and the training samples are selected adaptively by AL’s sampling scheme. We find spatial coordinates are useful spatial information. SSFFSC and SSFFSC-AL run fast and improve the classification accuracy effectively by using the spatial coordinates as the spatial features. Experiments demonstrate that comparing with other algorithms, SSFFSC and SSFFSC-AL can obtain higher classification accuracy in less time.


I. INTRODUCTION
Hyperspectral image (HSI) classification is important in hyperspectral remote sensing processing [1], [2]. Unlike traditional color images or multi-spectral images, hyperspectral images have high resolution and rich spectral information [3], which brings convenience to classification. But, it also causes ''Curse of Dimensionality'' and ''Hughes Phenomenon'' [4]- [6]. That is to say, when the number of training samples is limited, the characteristic of high-dimensional The associate editor coordinating the review of this manuscript and approving it for publication was Hossein Rahmani . data of hyperspectral images will reduce the efficiency of the classifier; as the data dimension increases, the classification accuracy will increase first and then decrease. Therefore, to obtain good classification results, one has to increase the number of training samples or improve the performance of the algorithm. However, it is costly to acquire enough training samples with true labels in reality. So designing algorithms with high efficiency and high classification accuracy becomes the focus of hyperspectral classification.
In the research of hyperspectral image classification, it is found that the phenomenon of ''salt and pepper'' [7] occurs in the classification only based on spectral information.
The introduction of spatial information into the classification process, which is used to be combined with spectral information, can overcome this phenomenon and get a better classification result [8]. Therefore, how to introduce spatial information becomes an important issue that should be considered. The methods of using spatial information generally include morphological profiles [9]- [11], Markov random fields [12], [13], image segmentation [14], [15] and so on [16]. In [9], Marpu et al. proposed a method using extended attribute profiles (EAP) which was developed from morphology profiles. This method firstly reduced the dimension of data. Then the EAP with standard deviation attribute was obtained by parameters which were computed by training samples. Finally, EAP were classified to get the classification result. In [14], an approach for hyperspectral image classification using image segmentation was presented. The segmentation map of hyperspectral image was first obtained by Fractional-Order Darwinian Particle Swarm Optimization (FODPSO), and then the initial result of classification based on spectral information was got by support vector machine (SVM). Finally, the majority voting method was executed in each segmented area to assign the class with the largest number of votes to all samples in the area. In [16], Hadoux et al. proposed another method to use spatial information. Partial Least Square (PLS) was first used to reduce the dimension of hyperspectral image. Then anisotropic regularization (AR) was performed on each dimensional data after dimensionality reduction. The information in the neighborhood of the sample was used to correct the sample's feature and finally the data obtained after processing were classified to get the final result. These methods can effectively reduce the effect of ''salt and pepper'' and obtain good classification results with high regional consistency. However, the way of introducing spatial information in these methods may be complicated, which thus increases the cost of computation to some extent.
For the supervised classification method, the establishment of the classifier model relies on the labeled samples. In order to obtain a classifier model with relatively high classification accuracy, it is necessary to construct a training sample set containing lots of labeled samples. However, manually labeling a large amount of samples in hyperspectral image is very time-consuming and costly in reality. In view of this problem, the active learning (AL) [17] was proposed. Among the unlabeled samples, the ability of the samples to improve the performance of the classifier is different, and some samples tend to have more abundant information. For the classifier model, such samples can improve the classification performance of the classifier more effectively. AL trains the classifier model by selecting samples with rich information through the sampling scheme, labeling them manually and adding them into the training set for the following training procedure, so that the performance of the classifier is rapidly improved, and a good classification ability can be obtained when the training sample set is small in scale.
Many hyperspectral image classification methods have been proposed based on active learning, which have achieved good classification accuracy [18]- [24]. In [24], hierarchical segmentation was combined with AL, which obtained a good result by combining the advantages of spatial information and active learning. Li et al. [25] proposed a method using loopy belief propagation (LBP) and AL for hyperspectral classification. This method established a discriminant field model, which was used to comprehensively consider the spectral information and spatial information, and then LBP was used to solve the problem of maximizing a posteriori marginal (MPM) to find the label of the sample. Finally, good classification result could be obtained by AL with fewer training samples.
In recent years, some methods [26]- [28] based on Deep neural networks have been designed for hyperspectral image classification. Cheng et al. [26] proposed a simple and effective deep spatial feature extraction method and designed a spectral-spatial feature extraction method by imposing metric learning on SVM training. Zhou et al. [27] proposed an effective framework named compact and discriminative stacked autoencoder (CDSAE), which could learn discriminative low-dimensional feature mappings and train an effective classifier progressively. Cheng et al. [28] proposed a novel and effective approach to learn a rotation-invariant CNN (RICNN) model for improving the performance of object detection, which was achieved by introducing and learning a new rotation-invariant layer on the basis of the existing CNN architectures.
Spatial coordinates are also a kind of spatial information. In [29], Goovaerts presented a method which used spatial coordinates in classification of hyperspectral image. Maximum likelihood method (ML) was used at first to classify samples by their spectral features, and then indicator kriging (IK) was used to estimate the probability of samples by using the probabilistic information in the neighborhood. These probability values were combined with the probability values based on spectral information and the final result was obtained. In this method, spatial coordinates were used to determine neighbor samples of the central sample. However, in our work we find that spatial coordinates, as features of the spatial distribution of the samples, can be utilized directly for classification.
In this paper, a novel spectral-spatial classification using spatial coordinate information is proposed, which is named as Hyperspectral Image Classification Based on Spectral-Spatial Feature Fusion using Spatial Coordinates (SSFFSC). In SSFFSC, spatial coordinates of samples are regarded as the spatial features of samples and utilized for classification. The samples show a certain local consistency in spatial distribution, making it possible to classify samples by spatial coordinates. Considering spatial coordinate features are related to the spatial distribution of samples, a strategy of constructing training set is designed that samples are selected from all spatial areas as the training samples to making full use of spatial coordinates as follows. The hyperspectral image is divided into multiple small images according to the spatial dimension, and samples in each small image are randomly selected, which ensure that samples are selected as the training samples from all spatial areas. Then, the spatial coordinate features and spectral features will be combined to achieve better classification result. However, the range of values of spatial coordinate features is quite different from that of spectral features. So these features should not be directly connected end-to-end, which is not good for classification. In fact, SVM is used to classify the samples to get the probability of samples belonging to each class according to the spatial coordinate features and spectral features respectively, and thus the spectral features and spatial coordinate features could be converted into corresponding probability features. Finally, the probability features are further classified by SVM to achieve the final classification result.
In SSFFSC, it can be found that the classification performance of the algorithm relies on the partition of HSI. Considering this, SSFFSC is further combined with AL, which is named as Hyperspectral Image Classification Based on Active Learning and Spectral-Spatial Feature Fusion using Spatial Coordinates (SSFFSC-AL). In SSFFSC-AL, the training samples are selected by AL's sampling scheme which selects samples contributing most to classifier. SSFFSC-AL could achieve good classification accuracy when the number of training samples is small.
The main contributions of this paper can be summarized as follows.
1) Spatial coordinates of samples are directly used to classify samples, the results of which are then combined with those based on spectral features. It is a simple way to implement spectral-spatial information fusion. Compared with other methods using spectral-spatial information, this algorithm can use training sample set with the same size to obtain higher classification accuracy with less running time.
2) Considering the phenomenon that the performance of the algorithm based on spatial coordinates is related to the division of images, we introduce AL into the framework, and the proposed SSFFSC-AL can get better classification result when the size of training sample set is small. Compared with other methods using AL, SSFFSC-AL is high in accuracy and fast in running speed.
The remainder of this paper is organized as follows. Section II describes the proposed SSFFSC and SSFFSC-AL. Section III provides the comparison experiments conducted and the analysis of the parameters. Finally, Section IV summarizes the work and provides the further research.

II. PROPOSED APPROACH A. HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON SPECTRAL-SPATIAL FEATURE FUSION USING SPATIAL COORDINATES
There are n samples in the Hyperspectral remote sensing data set X = { x 1 , x 2 , · · · , x i , · · · , x n } . The dimension of spectral information of each sample is d and the data set contains L categories. Each sample has a pair of spatial coordinates corresponding to it. In the spatial dimension of hyperspectral image, the distribution of samples in local region shows the consistency of the spatial region, which is separable. Therefore, it is feasible to conduct supervised classification using the spatial coordinates of samples.
To implement spectral-spatial information fusion, directly combining the spectral information and the spatial coordinates to form the new features is not a feasible way. Because the range of the samples' spatial coordinates is related to the size of hyperspectral image and its value is usually from tens to hundreds. The spectral information value of the samples should be much less than the value of the spatial coordinates. Therefore another feasible approach of feature fusion should be considered. After the support vector machine (SVM) obtains the classification result, the Sigmoid fitting can get the probability distribution of the samples belonging to different categories [30], which makes it possible to design an approach of feature fusion of spatial coordinates and spectral information as follows.
To achieve correct classification of samples in hyperspectral image, the supervised classification method needs to select a part of the samples from sample set of each category. In SSFFSC, the method of constructing the training sample set is proposed as shown in Fig. 1 to make full use of spatial coordinate information. The hyperspectral image is uniformly divided into 100 small hyperspectral images

Algorithm 1 SSFFSC Algorithm
Input: Hyperspectral image data set X = { x 1 , x 2 , · · · , x i , · · · , x n } , which contains n samples of L categories. The spectral dimension of each sample is d. Output: Classification result of Hyperspectral imageŶ = ŷ 1 ,ŷ 2 , · · · ,ŷ i , · · · ,ŷ n . 1: Construct the training sample set and the testing sample set as shown in Fig. 1. 2: Get the spatial probability feature F a (x i ) = p ai1 , p ai2 , · · · , p aik , · · · , p aiL of the sample x i . 3: Get the spectral probability feature F e (x i ) = p ei1 , p ei2 , · · · , p eik , · · · , p eiL of the sample x i . 4: Combine the spatial probability feature with the spectral probability feature to get the spectral-spatial probability feature F ea (x i ) = p ei1 , p ei2 , · · · , p eiL , p ai1 , p ai2 , · · · , p aiL . 5: Classify the sample x i based on the spectral-spatial probability feature by SVM and obtain the classification result according to the spatial relationship between pixels in the image, and the 100 small hyperspectral images can be used to obtain 100 small data sets. In each small data set, a certain percentage of samples are randomly selected as training samples and all the training samples selected from the 100 small data sets make up the final training sample set. All the other samples in hyperspectral image data set are assembled into a testing sample set.
After the construction of training sample set and testing sample set, spatial coordinates and spectral information are utilized respectively in the following classification. First, spatial coordinates are used as features to be classified by SVM. Then the probability values of each sample belonging to each category are obtained, which can be regarded as a new feature of the sample. For the sample x i , i = 1, 2, · · · , n, its probability values can be written as a vector as follows: where p aik is obtained by the SVM classifier based on the spatial coordinates and indicates the probability that the sample x i belongs to the class k (k = 1, 2, · · · , L). Thus, F a (x i ) can be treated as the spatial probability feature of x i . Meanwhile, spectral information is also used as features to be classified by SVM. Principal component analysis (PCA) is first performed on the spectral information and partial principal components are extracted as the spectral features. Then the spectral features are classified by SVM and the probability values of each sample belonging to each category are obtained and recorded as the spectral probability feature vector as follows: where p eik is obtained by the SVM classifier based on the spectral features and indicates the probability that the sample x i belongs to the class k (k = 1, 2, · · · , L).
In this way, both spectral information and spatial coordinates are converted into probability feature vectors by SVM classifiers, making the fusion possible. The spectral probability feature vector is combined with the spatial probability feature vector to get the spectral-spatial probability feature as shown in formula (3).
The final classification result is then obtained through the SVM classification based on the spectral-spatial probability features.
Algorithm 1 shows the steps of hyperspectral image classification based on spectral-spatial feature fusion using spatial coordinates.

B. COMBINED WITH ACTIVE LEARNING
The classification result of SVM based on spatial coordinates is dependent on the division of images. In this section, another version of classification using the active learning method is proposed to select the samples with high uncertainties corresponding to the classifiers, where the division of the image is no longer needed. All these selected samples will be labeled manually to retrain the classifiers, which can improve the classifiers' performance rapidly. Based on this idea, we propose a sampling scheme which selects samples with high uncertainties corresponding to spectral classifier and spatial classifier, where spectral classifier means SVM classifier using spectral information as features and spatial classifier means SVM classifier using spatial coordinates as features. Finally, the image is classified based on the spectral-spatial probability.
After the SVM classification is performed, in addition to prediction labels of all the samples, the probability value of each sample belonging to the class corresponding to the prediction label can also be obtained. We define the probability value of sample x i corresponding to spectral classifier's prediction label as the P e (x i ) and the probability value corresponding to spatial classifier's prediction label as the P a (x i ), where x i ∈ X , i = 1, 2, · · · , n.
Calculate sampling measurement value W of all samples according to (4), where α is sampling weight parameter, P e (x i ) and P a (x i ) are the probability values that the sample x i is judged as the corresponding class by spectral classifier and spatial classifier, respectively. They also indicate Algorithm 2 SSFFSC-AL Algorithm Input: Hyperspectral image data set X = { x 1 , x 2 , x 3 , · · · , x n } , which contains n samples for L categories. The spectral dimension of each sample is d. Output: Classification result of Hyperspectral imageŶ = ŷ 1 ,ŷ 2 ,ŷ 3 , · · · ,ŷ n . 1: Select a part of samples randomly in each class to constitute the initial training sample set. All the rest of samples constitute the testing sample set. 2: Perform PCA on spectral information and extract the first m principal components as spectral features. 3: Classify the samples based on spatial coordinates by SVM to obtain the classification result and the probability value P a corresponding to the result. 4: Classify the samples based on spectral information by SVM to obtain the classification result and the probability value P e corresponding to the result. 5: Calculate the sampling measurement value W according to (4) using P a and P e . 6: Select some samples with the smallest W in the testing sample set and manually label these samples. These samples are removed from the testing sample set and added to the training sample set. 7: Go to step 3 if the termination condition (the preset maximal number of training samples is reached) of AL is not met. 8: Obtain the spatial probability feature F a (x i ) of x i . 9: Obtain the spectral probability feature F e (x i ) of x i . 10: Combine the spatial probability feature with the spectral probability feature to get the spectral-spatial probability feature F ea (x i ) = p ei1 , p ei2 , · · · , p eiL , p ai1 , p ai2 , · · · , p aiL . 11: Classify the sample x i based on the spectral-spatial probability feature by SVM and obtain the classification result y i = SVM (F ea (x i )). the uncertainty of the sample. The smaller the two values are, the higher the uncertainty of the sample is. In order to simultaneously consider the uncertainty of the sample using these two features, these two probability values are combined with the sampling weight parameter α to calculate the sampling measurement value W . So the smaller the W of sample is, the higher the uncertainty of sample is. Each time we select samples in active learning, we should select samples with the smallest values of W for manually labeling in the testing sample set. Algorithm 2 and Fig. 2 show the implementation steps and the flow chart of SSFFSC-AL. Unlike SSFFSC, SSFFSC-AL no longer needs to obtain the training sample set by dividing the HSI but selects the training samples adaptively through the sampling scheme of AL.

III. EXPERIMENTAL RESULTS AND DISCUSSION
To verify the effectiveness of the proposed method which uses spatial information, we compare the proposed SSFFSC with PSVM (Spectral Classification using PCA and SVM), SCSVM (Spatial Classification using Spatial Coordinates and SVM), FODPSO-SVM (SVM classification combined with segmentation maps obtained by fractional-order Darwin particle swarm optimization algorithms) [14], and PLS-AR-SVM (Spatial information classification using information near the sample to correct the characteristics of the sample, and using partial least squares to reduce the size of the hyperspectral image.) [16]. PSVM only uses spectral information for classification and SCSVM only uses spatial coordinates for classification. FODPSO-SVM and PLS-AR-SVM use both spectral and spatial information to classify the samples.
Furthermore, to verify the effectiveness of the proposed SSFFSC-AL, we compare it with some methods which also use active learning. These methods are Random-SVM-AL (Hyperspectral image classification using active learning based on random sampling), MS-SVM-AL (Hyperspectral image classification using active learning based on margin sampling), MCLU-SVM-AL (Hyperspectral image classification using active learning based on multiclass-leveluncertainty sampling) and MPM-LBP-AL (A method for hyperspectral classification using cyclic belief propagation and AL, and using LBP to solve the problem of maximizing posterior edges to find sample labels) [25]. Among them, Random-SVM-AL, MS-SVM-AL and MCLU-SVM-AL only use spectral information during the classification, while MPM-LBP-AL and SSFFSC-AL use both spectral information and spatial information. Random-SVM-AL, MS-SVM-AL and MCLU-SVM-AL were performed by the AL Toolbox [31].
All experiments are performed on a personal computer with i5-3210M central processing unit, 8-G memory, and the 64-bit Windows 7 using Matlab R2008a and Python 3.6.0. Libsvm [32] is adopted to implement SVM using OAOSVMs [33]. The overall accuracy (OA), average accuracy (AA), kappa coefficient (Kappa) [34] and classification accuracy of each class are employed to evaluate the classification performance. OA is the overall accuracy for all classes and is defined as follows: test,correct is the i-th correctly classified test sample, N is the total number of the test samples. AA represents the averaged accuracy of each class and is defined as: where M is the class number of a data set, N i is the total test sample number of the i-th class, and x (j) i,correct is the j-th correctly classified test sample of class i. Kappa coefficient [34] is a statistical measure of agreement degree. The higher these measurement metrics are, the better the classification performance is.
The remainder of this section is organized as follows. Section III-A introduces the data sets, which comprise the AVIRIS Indian Pines and ROSIS Pavia University data set. They are two widely used benchmarks for hyperspectral image classification. Section III-B provides the comparison experiments of SSFFSC and four other algorithms. Section III-C shows the comparison experiments of SSFFSC-AL and four other algorithms using AL. Finally, Section III-D conducts experiments to analyze the parameters.

A. HYPERSPECTRAL DATA SETS
In this paper, two frequently-used hyperspectral data sets, Indian Pines data set and Pavia University data set, are tested to validate the effectiveness of the proposed algorithms.
1) Indian Pines data set. This hyperspectral image data set was collected by the AVIRIS sensor over the Indian Pines region in Northwestern Indiana in 1992. This scene has the size of 145 × 145 and the spatial resolution of 20 meters per pixel. The scene comprises 220 spectral channels in the wavelength ranging from 0.4 to 2.5µm. After several spectral bands are removed due to noise and water absorption phenomena, the remaining 200 radiance channels are used in the experiments. The Indian Pines data set contains 10366 samples with 16 mutually exclusive ground-truth classes. Fig. 3 (a) shows the grayscale map of the 170th channel in the 200 channels, while Fig. 3 (b) shows the ground-truth map available for the scene. Table 1 shows the meaning of each class and the number of samples in each class in Indian Pines data set.
2) Pavia University data set. This hyperspectral data set was collected by the ROSIS optical sensor over the urban area of the University of Pavia, Italy. The image size is 610 × 340, with very high spatial resolution of 1.3 meters per pixel. The number of data channels in this image is 115 with VOLUME 8, 2020  spectral ranging from 0.43 to 0.86µm. After 12 channels affected by noise are removed, the remaining 103 channels are used in the experiments. Pavia University data set contains 42776 samples with 9 classes. Fig. 3 (c) shows the grayscale map of the 60th channel in the remaining 103 channels, while Fig. 3 (d) shows the ground-truth map available for the scene. Table 2 shows the meaning of each class and the number of samples in each class in Pavia University data set.

B. COMPARISON OF SSFFSC WITH FOUR OTHER ALGORITHMS
In the experiment of comparing SSFFSC with other algorithms, the training samples and the testing samples are selected according to the method described in Fig. 1. Since the Indian Pines data set has only 26 and 20 samples in categories 7 and 9, while the Pavia University data set does not have a category with a particularly small sample size. Taking into account the different characteristics of the two data sets, we use the following method to select training samples in accordance with conventions. For Indian Pines data set, 10% of samples of each class are selected from each segmented small data set as training samples. The total The kernel function of SVM in all algorithms is chosen as radial basis kernel function (RBF) and involves two parameters: one is the penalty coefficient C, and the other is the parameter γ . The two coefficients in this paper are obtained by 5-fold cross validation method. To make the experimental results objective and fair, each algorithm has been simulated for 10 times with different training samples and testing samples, and then the average results are obtained and compared. All methods, containing SSFFSC and other four algorithms, are simulated under the same training sample set and testing sample set every time . TABLE 3 and TABLE 4 show the results of the experiments of Indian Pines and Pavia University. Fig. 4 and Fig. 5 provide the classification maps of two data set respectively.
From TABLE 3 and TABLE 4, it can be found that PSVM only using spectral information is inferior to other four algorithms in terms of OA, AA and Kappa. Unlike PSVM, FODPSO-SVM, PLS-AR-SVM and SSFFSC all introduce  spatial information in classification. These three algorithms achieve better classification result than PSVM, proving that the introduction of spatial information could effectively improve the classification accuracy. It could also be observed from Fig. 4 and Fig. 5 that the results of these three algorithms have better consistency in local regions, less isolated points and better classification results. Comparing SCSVM with others, it can be found that SCSVM, which only makes use of spatial coordinates, can obtain good classification result. SCSVM performs much better than PSVM on both data sets, and it even performs better than FODPSO-SVM and PLS-AR-SVM on data set of Indian Pines. However, as for data set of Pavia University, the performance of SCSVM degrades, and its accuracy is lower than that of FODPSO-SVM and PLS-AR-SVM. Why does SCSVM perform differently on two data sets? The reason can be found by comparing the ground-truth maps of Indian Pines and Pavia University. In the map of Indian Pines, the samples of the same class are generally concentrated in several blocks, which are best suited for classification based on spatial coordinates, while in the map of Pavia University, the samples of the same class may be distributed dispersedly in different regions, which are difficult to be classified by spatial coordinates. From TABLE 4, it can be seen that samples of Trees (#4) and shadows (#9) are classified by SCSVM with low accuracy, which are distributed dispersedly. In contrast, PSVM obtains VOLUME 8, 2020 good results on these two classes. But PSVM performs poorly on most other classes (#1, 3,6,7,8), while SCSVM can obtain reasonable results on these classes. As for SSFFSC that combines spectral information with spatial coordinates, it achieves relatively good classification results for each class on both data sets, which validates the effectiveness of the fusion of the spectral information and spatial coordinates.
As for three algorithms using both spectral and spatial information, the OA, AA and Kappa of SSFFSC are larger than those of FODPSO-SVM and PLS-AR-SVM on two data sets. It proves that the proposed method using spatial information in classification can effectively improve the classification accuracy. From Fig. 4 and Fig. 5, it can be seen that compared with FODPSO-SVM and PLS-AR-SVM, SSFFSC obtains a better classification result in the area of blocks in the image and there are fewer misclassified samples in the local regions. This phenomenon indicates that compared with FODPSO-SVM using the segmentation and PLS-AR-SVM using the anisotropic diffusion regularization method, SSFFSC making use of spatial coordinates performs better and can effectively reduce misclassified samples and obtain higher classification accuracy with better regional consistency. From TABLE 3 and TABLE 4, it can be seen that SSFFSC obtains relatively good classification results for each class. For OA, AA and Kappa, the proposed method obtains the best result and it has obvious advantages over others especially on AA and Kappa.
Comparing the running time of FOSPSO-SVM and PLS-AR-SVM with that of SSFFSC, it can be found from TABLE  3 and TABLE 4 that SSFFSC takes the least time, which is reasonable considering that SSFFSC combines PSVM and SCSVM and both of them are simple and take less time.
On the whole, it can be found that SSFFSC is superior to four other algorithms on two data sets, which demonstrates that spatial coordinates are very useful features for classification. If combined with spectral information, spatial coordinates can improve the classification result further.
Considering the performance of SSFFSC is dependent on the division of images, active learning is a reasonable measure to make SSFFSC perform better, the effect of which will be discussed in the following section C.

C. COMPARISON OF SSFFSC-AL WITH FOUR OTHER ALGORITHMS USING ACTIVE LEARNING
In the experiment of comparing SSFFSC-AL with four other algorithms using active learning, the training samples and testing samples are selected as follows. In the experiments on Indian Pines data set, 160 samples are selected at the initial time, and each class has 10 samples. In the process of active learning, 50 samples unlabeled are chosen adaptively by AL's sampling scheme to be manually labeled in each iteration. The 50 samples selected each time may come from various categories, and a total of 900 samples are selected as the training samples during the iteration. Therefore, in total, 1060 samples are treated as the training samples. In the experiments on Pavia University data set, 90 samples are selected at first, and each class contains 10 samples. Then during the process of active learning, 10 samples unlabeled are chosen to be manually labeled in each iteration. The number of samples labeled manually is 250 and the total number of training samples is 340. The SVM used in algorithms uses RBF as the kernel function and the parameters are set according to experience. The sampling weight parameter α is set as 0.3. The first 15 principal components are extracted by PCA as spectral features. TABLE 5 shows the classification result and time of these five methods on Indian Pines data set. Fig. 6 shows the classification overall accuracy curves and Fig. 7 shows the classification maps on Indian Pines data set. TABLE 6 shows the classification result and time of these five methods on Pavia University data set. Fig. 8 shows the classification overall accuracy curves and Fig. 9 shows the classification maps on Pavia University data set.
From Fig. 6 and Fig. 8, it can be seen that SSFFSC-AL is superior to Random-SVM-AL, MS-SVM-AL and MCLU-SVM-AL. The spatial information is not introduced in these three methods and only spectral information is used for classification. So the results of these three methods are not so good. The proposed method SSFFSC-AL uses spatial coordinates as spatial features, and combines spatial features with spectral features for classification, which makes the classification result has obvious advantages.   TABLE 5 and TABLE 6, the time of these three methods are longer than that of SSFFSC-AL. However, the complexity of Random-SVM-AL should be lower than that of SSFFSC-AL. Longer running time is due to AL Toolbox. So we only consider the comparison of time between SSFFSC-AL and MPM-LBP-AL here.
Comparing SSFFSC-AL with MPM-LBP-AL on Indian Pines data set, it can be observed from Fig. 6 that the performance of SSFFSC-AL is better than that of MPM-LBP-AL. When the number of training samples is small, SSFFSC-AL has particularly obvious advantages. When the number of training samples is 610, the accuracy of SSFFSC-AL reaches 99.58%. At the same time, the accuracy of MPM-LBP-AL is only 94.10%, which is much lower than that of SSFFSC-AL. After a large number of training samples are  selected, both methods achieve good results at last. According to TABLE 5 and Fig. 7, the overall accuracy of MPM-LBP-AL is 98.56 and the overall accuracy of SSFFSC-AL is 100%. At the same time, SSFFSC-AL requires less time. SSFFSC-AL needs 93.6 seconds, while MPM-LBP-AL needs 2538.9 seconds, which is much longer.
Comparing SSFFSC-AL with MPM-LBP-AL on Pavia University, the SSFFSC-AL's performance is worse than the MPM-LBP-AL's performance. When the number of training samples is small, the accuracy of SSFSC-AL is obviously worse than that of MPM-LBP-AL. When the number of training samples is large, the final results of these two algorithms reach a high level. According to TABLE 6 and Fig. 9, the final accuracy of MPM-LBP-AL is 99.40% and the overall accuracy of SSFFSC-AL is 98.43%. The accuracy of the proposed method is slightly lower than that of MPM-LBP-AL. But compared with other three methods, SSFFSC-AL has an obvious advantage in overall accuracy.
Comparing the overall accuracy curves of SSFFSC-AL and MPM-LBP-AL on the two data sets, it can be seen that SSFFSC-AL has an obvious advantage on Indian Pines data set. But the performance of this algorithm is worse than that of MPM-LBP-AL on Pavia University data set. SSFFSC-AL mainly classifies samples by using spectral-spatial feature fusion. The spectral features are used to obtain the final classification with the spatial features (spatial coordinates). The improvement of result based on spatial coordinates can effectively improve the overall accuracy. However, the spatial coordinates of samples are related to the spatial distribution of samples. In Indian Pines image, the samples of the same class are generally concentrated in several blocks, which is advantageous for the classification based on the coordinate features. But in Pavia University, there are a lot of small and disorderly areas and spots. In this case, classification based on coordinate features is difficult to achieve its greatest advantage. Nevertheless, SSFFSC-AL still can get a result which is close to that of MPM-LBP-AL. In addition, SSFFSC-AL has a great advantage in time. According to TABLE 6, SSFFSC-AL requires 96.1 seconds for classification on Pavia University data set, while MPM-LBP-AL needs 30020 seconds.

D. PARAMETER SENSITIVITY ANALYSIS 1) NUMBER OF DIVISIONS USED IN SSFFSC
As mentioned above, good performance of SSFFSC mainly benefits from the using of coordinates, which are related to the spatial distribution of samples. Therefore, the number of divisions of HSI is closely related to the performance of SSFFSC. To verify the impact of the number of divisions of HSI on the SSFFSC, SSFFSC was performed on both data sets by using the number of divisions 4×4, 5×5, · · · , 16×16, respectively. Fig. 10 shows the results of overall accuracy versus the number of divisions of HSI. From the experimental curve in Fig. 10, it can be seen that when the number of divisions is smaller than 100, OA gradually rises, and when the number of divisions is greater than 100, the curve of OA tends to be flat for Indian Pines or even descending for Pavia University. Therefore, 100 is a reasonable choice and is adopted in this paper.

2) WEIGHT PARAMETER OF SSFFSC-AL
To verify the influence of sampling weight parameter α, SSFFSC-AL is performed in the case of 310, 460, 610, 760,  910 and 1060 training samples with different sampling weight parameter on Indian Pines data set. Fig. 11 shows the results of this experiment. Similarly, on Pavia University data set, SSFFSC-AL is performed in the case of 140, 190, 240 and 340 training samples with different sampling weight parameter and Fig. 12 shows the results.
On the Indian Pines data set, the influence of the sampling weight parameter α on the classification accuracy is approximately the same in these six cases. The sampling weight parameter has a very important influence on the final overall accuracy. When α is 0, the algorithm only considers the uncertainty of samples belonging to the classifier using spatial coordinates. Although 100% overall accuracy is got when α is 0 and the number of training samples is 1060, it can be seen that for the case that α is 0, the overall accuracy decreases more greatly than other cases as the number of training samples deceases. When α is 1, the final overall accuracy is lower than most other cases. When parameter α is 1, it means that the method only considers the classifier based on spectral features when selecting samples. The samples selected could not enhance the performance of the classifier based on spatial coordinates. So the final overall accuracy is not high in this case. According to Fig. 11, it shows that when the number of training samples is 1060, SSFFSC-AL can obtain a good result when the value of the parameter α is between 0 and 0.5.
On the Pavia University data set, the sampling weight parameter α has a great influence on the final overall accuracy. When the parameter α is 0 or 1, the accuracy of the proposed method is low. Because it means that when selecting samples, the method only considers the classifier based on spatial features or the classifier based on spectral features. The samples selected by only considering one classifier could not enhance the performance of the other classifier. Therefore, the final overall accuracy of SSFFSC in these cases is low. According to Fig. 12, SSFFSC-AL can obtain a good result when the number of training samples is 340 in the case that the parameter α is 0.3, 0.5 and 0.6.

IV. CONCLUSION AND FUTURE RESEARCH LINES
In this paper, a hyperspectral image classification method based on spatial coordinate information named as SSFFSC is proposed. Considering that the commonly methods introducing spatial information is complicated and time consuming, SSFFSC algorithm implements a simple way to classify the samples based on spatial coordinates using supervised classification method. In addition, active learning is introduced to SSFFSC to improve the performance further, which is called SSFFSC-AL. Compared with other methods using spectral-spatial information, SSFFSC can more effectively solve the phenomenon of ''salt and pepper'' by combining spatial information with spectral information. It obtains better classification results with less time. Comparing SSFFSC-AL with other methods using active learning, it can obtain higher classification accuracy in images where the samples of the same class are generally concentrated in several blocks. For the images where the samples of the same class are distributed dispersedly in different regions, its performance may degrade slightly, but it also can obtain the satisfactory classification results. Moreover, the running time of SSFFSC-AL is significantly less than that of other methods.
In the sampling scheme used in SSFFSC-AL, the sampling weight parameter α is static and invariable. It could not change when the number of training samples increases. In the future, we will design the sampling weight parameter as an adaptive parameter which can change adaptively with the change of training samples set, thus to improve the performance of classifiers more effectively. YI LIU received the bachelor's degree in information engineering, the master's degree in circuit and system, and the Ph.D. degree in communication and information system from Xidian University, Xi'an, China, in 1999China, in , 2002, and 2013, respectively. Since 2002, he has been a Teacher with Xidian University. His research interests include computational intelligence and image processing.
YIJIN LIU received the B.S. degree in electronic information science and technology from Xidian University, Xi'an, China, in 2018, where she is currently pursuing the master's degree with the School of Artificial Intelligence. Her research interests include machine learning and hyperspectral image processing. VOLUME 8, 2020