Introduction
Hyperspectral images (HSI) contain rich spectral and spatial information, which makes them widely used in crop monitoring, environmental monitoring, mineral exploration, and other fields [1]–[4]. HSI classification is one of the basic and key technologies of remote sensing for earth surface observation. It aims to infer the class of each pixel based on the spectral and spatial information of the HSI [5]–[7].The early staged methods for HSI classification are mostly based on conventional pattern recognition methods, such as K-nearest neighbor [8] and support vector machine (SVM) [9], random forest [10], and decision tree [11]. In addition, extreme learning machines [12], sparse representation [13], and graph embedding methods [14] are also used for HSI classification. However, most early staged HSI classification methods only focused on exploring the role of the spectral information for classification, and therefore high classification accuracy could not be obtained [15]. Since the neighboring pixels in HSI usually carry rich spatial information, many spectral-spatial classification methods have been proposed and the spatial information of HSI was used to obtain higher classification accuracy therein. For instance, some researchers applied the spatial information to HSI classification via the extended morphological profiles, and thus the satisfactory classification accuracy could be achieved [16], [17]. Spectral and spatial information contained in the neighborhood region of the pixels were merged and added into the sparse representation model in [18] and [19]. Tu et al. [20] proposed a spectral-spatial HSI classification method, which exploited the comprehensive contextual information of HSI by considering a weak assumption that the pixels in a superpixel belong to the same class, and achieved an excellent classification performance. Sellami et al. [21] proposed an HSI classification approach, which made full use of the spectral-spatial information by automatically selecting relevant spectral bands.
Compared with traditional machine learning algorithms, deep learning techniques can automatically extract high-level and compact features from input data. In recent years, deep learning techniques have been successfully applied to HSI classification tasks. Chen et al. [22] used stacked autoencoders to extract the features of HSI, and entered them into the logistic regression model for classification. Liu et al. [23] first used deep belief network to extract deep spectral features, and then repeatedly selected good-quality labeled samples as training samples with active learning algorithms. Zhang et al. [24] proposed an HSI classification algorithm based on the convolutional neural network (CNN), which utilized diverse region-based inputs to learn discriminative spectral-spatial features. Chen et al. [25] used 1-D, 2-D, and 3-D CNN to extract features of HSI, respectively. Kong et al. [26] extracted the spectral features of HSI by constructing intra-class and inter-class hypergraphs, and extracted spatial features by CNN. Zhu et al. [27] adopted generative adversarial networks to construct a semisupervised feature learning framework for HSI classification. Mou et al. [28] applied recurrent neural network (RNN) to HSI classification for the first time, and proposed a parameter modified tanh activation function to replace the traditional activation function.
The impressive feature representation capability of deep learning is based on abundant labeled samples. However, collecting the labeled HSI data is difficult and expensive [29]. Therefore, how to learn a strong generalization classifier at a low labeling cost has become a research hotspot in the field of HSI analysis. To address this concern, many methods have been proposed, which contain four categories. The first one is data augmentation, which synthesizes new examples following the original data distribution [30]. Li et al. [31] constructed a new training set for CNN by using pairwise labeled samples and exploited it to improve the model classification accuracy. Wang et al. [32] established a data mixture model to augment the labeled training set quadratically, and exploited this set to train the CNN. The second category is named domain adaptation, which uses sufficient samples from different but similar domains to solve the problems for another domain [33]. Zhou and Prasad [34] first used deep convolutional RNNs to extract the discriminative features for two domains, then aligned the features with each other layer-by-layer in the common subspaces, and thus realized the HSI classification of different distributions by exploiting only part of labeled samples in the source domain. The third one is active learning, which can exploit a small number of labeled samples to train a classier, making the classifier actively select representative unlabeled samples [35]. The semisupervised method utilizes abundant unlabeled data and limited labeled samples for classification. Wu and Prasad [36] proposed a semisupervised deep learning network, which effectively alleviated the shortage of labeled samples by combining limited labeled samples with abundant unlabeled samples for HSI classification.
Broad learning system (BLS) is a random vector functional link neural network (RVFLNN) consisting of only three parts [mapped feature (MF), enhancement node (EN), and output layer] [37]. Compared with the deep learning, BLS has the following advantages [37]: 1) BLS can nonlinearly expand the feature. 2) BLS has a simple and flexible structure with only three layers. 3) Gradient descending is used in deep learning methods, which requires more times of iterations. For BLS, the ridge regression is exploited to directly calculate the network weights of BLS, so the network training speed is fast. 4) It is easy to integrate BLS with other models. Feng and Chen [38] proposed a fuzzy BLS by combining the Takagi–Sugeno fuzzy system with BLS, which achieved an ideal accuracy in regression and classification. Chu et al. [39] proposed a weighted BLS, in which the contribution of each input sample to the BLS was constrained by exploiting penalty factors. Kong et al. [40] proposed a semisupervised model by merging the class-probability structure into BLS and achieved good classification performance in HSI classification. Kong et al. [41] proposed a HSI clustering algorithm based on BLS, and exploited the graph-regularized sparse autoencoder to fine-tune the weights of MF and EN.
As the latest research achievement of deep learning, the graph convolutional network (GCN) can aggregate and transform the neighbor feature information from each node. Besides, GCN is able to encode features of graph nodes and local graph structure by convolutional layers, so as to exhaustively exploit the graph features and flexibly preserve the class boundaries [42]. However, the original GCN only utilizes the spectral information when classifying HSI, that is, only constructs the spectral adjacency matrix. Qin et al. [43] consider the data structure characteristic of HSI and the advantages of GCN, and completed the HSI classification by constructing a spectral-spatial adjacency matrix while using spectral and spatial information. Therefore, according to [43], we first use GCN to extract the spectral-spatial features of HSI. Then, CAM is used to expand the data with spectral-spatial features extracted by GCN, while considering the flexible network structure and the ability of feature broad expansion of BLS, a semisupervised graph convolutional broad network (GCBN) is proposed. The main contributions of our work are summarized as follows.
We replace the linear mapping features used in the traditional BLS with the spectral-spatial features extracted from the original HSI by GCN, which can achieve accurate HSI classification at low labeling cost by means of exploiting limited labeled samples and abundant unlabeled samples.
In the proposed combinatorial average method (CAM), some valuable paired samples are selected in a targeted manner, and averaged in pairs to generate a sample expansion set much larger than the original training set. Thus, the problem of the lack of labeled samples to support high-precision classification model training can be solved.
We exploit the BLS to perform broad expansion on spectral-spatial features extracted by GCN and extended by CAM, which is helpful to further enhance the representation ability of features and thus improve the classification accuracy of HSI.
The rest of this article is organized as follows. We elaborate the semisupervised classification method of HSI based on GCBN in Section II. We present experimental results on three real HSI datasets and analyze them in Section III-A followed by a conclusion in Section IV.
Semisupervised Classification of Hsi Based on Gcbn
A. Flowchart of GCBN for HSI Classification
The flowchart of the proposed GCBN for HSI classification is shown in Fig. 1, which mainly contains the following five steps:
The principal component analysis (PCA) is applied to the original HSI to reduce dimensionality;
The spectral-spatial graph of GCBN constructed based on the spectral and spatial information of limited labeled samples and abundant unlabeled samples is used for graph convolution operation. Then the discriminative spectral-spatial features of HSI are extracted by the trained GCN;
In our proposed CAM, some valuable paired samples are selected in a targeted manner, and averaged in pairs to generate a sample expansion set for GCBN training;
BLS is used to expand the width of spectral-spatial features extracted by GCN and extended by CAM;
The output layer weights can be calculated with the ridge regression theory.
B. Feature Extraction Based on GCN
Since there is redundant information in the original HSI band, directly entering the original HSI into the GCN will cause a dramatic increase in the network parameters and affect the classification performance of GCN. Therefore, PCA is used to reduce the dimensionality of the original HSI data
\begin{equation*}
g_{\theta } \star x=\boldsymbol{U} g_{\theta } \boldsymbol{U}^{\mathrm{T}} x \tag{1}
\end{equation*}
\begin{equation*}
g_{\theta ^{\prime }}(\boldsymbol{\Lambda }) \approx \sum _{k=0}^{K} \theta _{k}^{\prime } T_{k}(\tilde{\boldsymbol{\Lambda }}) \tag{2}
\end{equation*}
\begin{equation*}
g_{\theta ^{\prime }} \star x \approx \sum _{k=0}^{K} \theta _{k}^{\prime } T_{k}(\tilde{\boldsymbol{L}}) x \tag{3}
\end{equation*}
\begin{equation*}
\begin{aligned}[b] g_{\theta ^{\prime }} \star x & \approx \theta _{0}^{\prime } x+\theta _{1}^{\prime }\left(\boldsymbol{L}-\boldsymbol{I}_{N}\right) x \\
&=\theta _{0}^{\prime } x-\theta _{1}^{\prime } \boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}} x \end{aligned} \tag{4}
\end{equation*}
\begin{align*}
a_{i j}=\left\lbrace \begin{array}{ll}\left\Vert x_{i}-x_{j}\right\Vert _{2}, & i \ne j \\
0, & i=j \end{array}\right. \tag{5}
\\
p_{i j}=\left\lbrace \begin{array}{ll}\left\Vert d_{i}-d_{j}\right\Vert _{2}, & i \ne j \\
0, & i=j \end{array}\right. \tag{6}
\end{align*}
Since reducing the number of parameters is helpful to solve overfitting problem, we set
\begin{equation*}
g_{\theta ^{\prime }} \star x \approx \theta \left(\boldsymbol{I}_{N}+\boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}}\right) x. \tag{7}
\end{equation*}
Since the eigenvalues of
\begin{equation*}
\boldsymbol{S}^{(l)}=\operatorname{Relu}\left(\tilde{\boldsymbol{A}} \boldsymbol{S}^{(l-1)} \boldsymbol{W}^{(l)}\right) \tag{8}
\end{equation*}
\begin{equation*}
\tilde{a}_{i j}=\left\lbrace \begin{array}{ll}e^{\frac{-\left(\left\Vert {x}_{i}-{x}_{j}\right\Vert ^{2}+\mu \Vert d_{i}-d_{j}||^{2}\right)}{\sigma }}, & \text{ if } {x}_{i} \in \operatorname{Nei} \left(x_{j}\right) \\
& \text{ or } {x}_{j} \in \operatorname{Nei}\left(x_{i}\right) \\
0, & \text{ otherwise } \end{array}\right. \tag{9}
\end{equation*}
Only the three-layer graph CNN is selected. The propagation rule of the first two layers is shown in (8), and propagation rules of the last layer is as follows:
\begin{equation*}
\boldsymbol{S}^{(3)}=\operatorname{softmax}\left(\tilde{\boldsymbol{A}} \boldsymbol{S}^{(2)} \boldsymbol{W}^{(2)}\right) \tag{10}
\end{equation*}
\begin{equation*}
{} L=-\sum _{k \in \boldsymbol{{Y}}_{L}} \sum _{c=1}^{C} {\boldsymbol{{Y}}}_{k c} \ln {\boldsymbol{{S}}}_{k c}^{(3)} \tag{11}
\end{equation*}
C. Sample Expansion Based on CAM
When the number of input labeled samples is insufficient, the BLS is prone to the problems of insufficient network training and overfitting. Therefore, we propose the CAM to expand the samples after graph convolution operation. First, limited labeled samples
\begin{equation*}
\boldsymbol{Z}=\tilde{\boldsymbol{A}} \operatorname{Relu}\left(\tilde{\boldsymbol{A}} \boldsymbol{X} \boldsymbol{W}^{(1)}\right) \boldsymbol{W}^{(2)}=\left[\begin{array}{l}\boldsymbol{Z}_{1} \\
\;\vdots \\
\boldsymbol{Z}_{{l}} \\
\;\vdots \\
\boldsymbol{Z}_{C} \end{array}\right] \in \mathrm{R}^{\left({C} \times {n}_{l}\right) \times d_{1}} \tag{12}
\end{equation*}
Second, the center value of the selected samples belonging to the lth class is defined as
\begin{equation*}
\boldsymbol{z}_{l}^{0}=\frac{\boldsymbol{z}_{l}^{1}+\boldsymbol{z}_{l}^{2}+\cdots +\boldsymbol{z}_{l}^{n_{l}}}{n_{l}}. \tag{13}
\end{equation*}
Third, the
\begin{equation*}
\boldsymbol{Z}_{l}^{a}=\left[\begin{array}{c}\boldsymbol{z}_{l}^{{a_{1}}} \\
\vdots \\
\boldsymbol{z}_{l}^{a_{{C}_{n_x}^2}} \end{array}\right]. \tag{14}
\end{equation*}
Finally, the
\begin{equation*}
\boldsymbol{Z}^{\mathrm{K}}=\left[\begin{array}{l}\boldsymbol{Z}_{1}^{\mathrm{K}} \\
\;\vdots \\
\boldsymbol{Z}_{l}^{\mathrm{K}} \end{array}\right] \in \mathrm{R}^{C\left(n_{l}+C_{n_{x}}^{2}\right) \times d_{1}}. \tag{15}
\end{equation*}
The CAM can be used to extend the sample size of the data with discriminative spectral-spatial features extracted by GCN, which will provide more valuable samples for GCBN training. CAM is only used for model training, and
D. Spectral-Spatial Feature Broad Expansion Based on BLS
BLS is a new type of flat network designed based on the idea of RVFLNN [37]. Although the lack of linear sparse representation ability of BLS could lead to an underfitting problem, it still has such advantages as simple structure, fast calculation speed, and feature broad expansion. Therefore, BLS can be used to expand the width of the nonlinear features extracted by the GCN to further enhance the feature representation ability.
The original input is mapped to feature nodes via random weights,
\begin{equation*}
\boldsymbol{M}_{i}=\boldsymbol{Z}^{\mathrm{K}} \boldsymbol{W}_{e i}+\boldsymbol{\beta }_{e i}, i=1, \ldots, d^{\mathrm{M}} \tag{16}
\end{equation*}
\begin{equation*}
\boldsymbol{H}_{j}=\varphi \left(\boldsymbol{M} \boldsymbol{W}_{h j}+\boldsymbol{\beta }_{h j}\right), j=1, \ldots, G^{\mathrm{E}} \tag{17}
\end{equation*}
Finally, MF and EN are simultaneously mapped to the output layer, and the output of the GCBN is
\begin{equation*}
\boldsymbol{O}=[\boldsymbol{M} \mid \boldsymbol{H}] \boldsymbol{W}^{\mathrm{O}}. \tag{18}
\end{equation*}
The objective function of the GCBN is as
\begin{equation*}
\underset{W^{\text {O}}}{\operatorname{argmin}}\left\Vert \boldsymbol{O}-\boldsymbol{Y}^{\mathrm{K}}\right\Vert _{2}^{2}+\delta \left\Vert \boldsymbol{W}^{\mathrm{O}}\right\Vert _{2}^{2} \tag{19}
\end{equation*}
\begin{equation*}
\boldsymbol{W}^{\circ }=\frac{[\boldsymbol{M} \mid \boldsymbol{H}]^{\mathrm{T}} \boldsymbol{Y}^{\mathrm{K}}}{\delta \boldsymbol{I}+[\boldsymbol{M} \mid \boldsymbol{H}]^{\mathrm{T}}[\boldsymbol{M} \mid \boldsymbol{H}]} \tag{20}
\end{equation*}
\begin{equation*}
\boldsymbol{Y}=[\boldsymbol{M} \mid \boldsymbol{H}] \boldsymbol{W}^{\mathrm{O}}. \tag{21}
\end{equation*}
In summary, the steps of semisupervised HSI classification based on GCBN are summarized as follows.
Algorithm 1: GCBN
Inputs: PCA-based HSI representation
Initialize GCBN network parameter.
Calculate the spectral-spatial adjacency matrix
Pretrain GCN with labeled samples
Extract features
Calculate the network weights
Calculate the predictive labels
Outputs: Predictive labels
Experiments
A. HSI Datasets
Three real HSI datasets were selected in our experiments.
Indian Pines dataset was acquired by AVIRIS sensor over the Indian Pines test site in North-western Indiana, containing 145×145 pixels and 224 bands. This image is mainly used for agricultural related research with two-third of agricultural land, one-third of forests, and other natural perennial vegetation, including 16 classes.
Botswana dataset was acquired by Hyperion sensor over the Okavango Delta, Botswana, containing 1476×256 pixels and 242 bands and including 14 classes. After removing noise, atmospheric and water absorption, and overlapping bands, the remaining 145 bands are used for the experiment.
Kennedy space center (KSC) dataset was acquired by AVIRIS sensor over Florida, containing 614×512 pixels and 224 bands and including 13 classes. After removing the water absorption and noise bands, 176 bands of the image are reserved for the experiment.
B. Experimental Result
To verify the validity and superiority of the proposed GCBN, the following 11 classifiers are selected for comparison:
traditional classification method: SVM [9];
deep learning methods: 2D-CNN [24], GCN [45], SSGCN [43], MDGCN [42];
GCBN without CAM: GB;
replacing CAM of GCBN with the data augmentation methods in [26] and [32] respectively: GZB, GMB; and
replacing GCN of GCBN with Graphsage [46]: GSCB.
The experimental settings are as follows.
Since Wan et al. [42] also selected the Indian Pines and KSC datasets to test the performance of MDGCN, here we directly refer to [42] to select the hyperparameters of MDGCN. The hyperparameters of the remaining 6 comparison classifiers were set via grid search method;
A three-layer GCN with 40 hidden nodes is used in GCBN. The epoch is 200 and the learning rate is 0.01,
,\mu =30 ,\sigma =6 ,\delta = 0.01 , wheren_x = n_c - 2 is the number of labeled samples. The feature dimensions of each group{n_c} , number of nodes in MF per group{{\mathop { {G}}\nolimits } ^{\rm {M}}}{\rm { = }}30 , and number of nodes in EN{{\mathop { {d}}\nolimits } ^{\rm {M}}}{\rm { = }}15 are set via grid search method;{{\mathop { {d}}\nolimits } ^{\rm {E}}}{\rm { = }}600 All eight classifying methods are implemented in PyTorch and MATLAB R2017a using a computer with a 3.60 GHz Intel Core i5-6500 CPU and 8 GB of RAM;
We select 4 evaluating indexes to evaluate the experimental results, including per-class accuracy (%), overall accuracy (OA, %), Kappa coefficient, and consumed time (Time, s), where the consumed time here means the training and testing time of the classifier. To eliminate the influence of random factors, each experiment is conducted ten times to get the average value of all indexes;
We randomly select five samples from each class of the ground objects in the HSI dataset as labeled samples for experiments.
In the Indian Pines dataset, the surface objects represented by I1-I16 are: Alfalfa, Corn-notill, Corn-mintil, Corn, Grass-pasture, Grass-trees, Grass-pasture-mowed, Hay-windrowed, Oats, Soybean-notill, Soybean-mintill, Soybean-clean, Wheat, Woods, Buildings-Grass-Tree-Drives, and Sybtone-Steel-Towers. In the Botswana dataset, B1-B14 represent: Water, Hippo grass, Floodplain grasses1, Floodplain grasses2, Reeds1, Riparian, Firescar2, Island interior, Acacia woodlands, Acacia shrublands, Acacia grasslands, Short mopane, Mixed mopane, and Exposed soils. In the KSC dataset, the surface objects represented by K1–K14 are: Srub, Willow swamp, CP hammock, Slash pine, Oak, Hardwood, Swamp, Graminoid, Spartina marsh, Cattail marsh, Salt marsh, Mud flats, and Water.
Tables I–III and Figs. 2–4 shows the performance comparison results of different classifiers.
Classification maps on Indian Pines dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
Classification maps on Botswana dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
Classification maps on the KSC dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
It can be observed from Tables I–III and Figs. 2–4 that
Among the 12 methods, 2-D CNN obtains the lowest OAs and Kappa coefficients on all three HSI datasets, and consumes the longest time. The reason is that the impressive performance of the deep learning network requires abundant labeled samples to ensure. When the number of labeled samples is insufficient, 2-D CNN cannot be adequately trained, resulting in low classification accuracy of HSI, even lower than that of conventional SVM. In addition, 2-D CNN has a large number of network layers and the gradient descent which needs repeated iteration training is used to learn the model. So it consumes a long time.
BLS has the shortest time-consuming and high classification accuracy among the 12 methods. The reason is that the structure is simple and the nonlinear mapping from MF to EN in BLS achieves the broad expansion of MF and enhances the classification ability of BLS. Compared with BLS, SBLS achieves higher OAs and Kappa coefficients on all three datasets because SBLS additionally utilizes a large amount of unlabeled sample information.
GCN, GCBN, and MDGCN are all GCN methods, in which GCBN has the highest classification accuracy, followed by MDGCN. The reason is that both GCBN and MDGCN use spectral and spatial information of HSI, while GCN only considers spectral information. In addition, compared with GCBN and MDGCN takes the spectral and spatial information of different scales into account, and dynamically updates the constructed graph during training.
GCBN achieves the highest OAs and Kappa coefficients and the lowest time-consuming on all three datasets. The reason is as follows. First, GCBN is a semisupervised classification method that uses limited labeled samples and abundant unlabeled samples. Second, GCN helps extract more discriminant spectral-spatial features from the original HSI. Third, combinatorial average expansion of spectral-spatial features provides a great quantity of valuable samples for GCBN training. Fourth, the spectral-spatial feature broad expansion further enhances the feature representation ability of GCBN. GCBN only uses one layer of graph convolutional and the structure of BLS is simple, so the learning speed of GCBN is fast.
Among the three HSI datasets, the OAs and Kappa coefficients of the 12 methods are the lowest on Indian Pines. This is because the similarity of the features in the Indian Pines dataset is relatively large. For instance, the corn-notill, corn-mintill, and corn belong to the same class in essence, so it is difficult to classify them. All the classification models have the lowest time-consuming on the Botswana dataset. This is because the Botswana dataset has the smallest sample size with only 3268 samples, while the Indian Pines dataset contains 10 249 samples.
Among the three data augmentation methods (GZB, GMB, and GCBN), GCBN achieves the highest OAs and Kappa coefficients. This is because CAM can increase the number of training samples without losing key information.
Compared with GSCB, GCBN obtains higher OAs and Kappa coefficients. The reason is that GCN integrates the global contextual information of the graph by constructing a spectral-spatial matrix of the entire graph.
Then the influence of different labeled sample sizes on the classification accuracy of HSI is studied. It can be seen from Fig. 5 that: 1) with the increase of the number of labeled samples, the OAs of all classification models show an increasing trend; 2) when the number of labeled samples is small (5 or 10 per class), the classification accuracy of 2-D CNN on the three datasets is the lowest among all models. As the number of labeled samples gradually increases, the 2-D CNN classification accuracy increases the most.
OAs of various methods under different numbers of labeled samples per class. (a) Indian Pines. (b) Botswana. (c) KSC.
Conclusion
An HSI classification method, named GCBN, is proposed in this article. First, the deep and nonlinear spectral-spatial features extracted by GCN are used to replace the linear mapping features in traditional BLS, which is helpful to avoid the underfitting problem caused by the insufficient linear sparse feature representation ability of BLS. Then we propose CAM to select valuable paired samples so as to generate a sample expansion set for GCBN, which can alleviate the problem of poor classification ability caused by the limited labeled samples. Furthermore, we use BLS to expand the width of spectral-spatial features extracted by GCN and CAM, which is able to enhance the representation ability of features and improve the classification ability of GCBN. Finally, the objective function can be easily obtained with the ridge regression theory. Experimental results on three real HSI datasets demonstrate the proposed GCBN can obtain higher classification accuracy than several other methods.