Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/Math/BoldItalic/Main.js
Semisupervised Classification of Hyperspectral Image Based on Graph Convolutional Broad Network | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

Semisupervised Classification of Hyperspectral Image Based on Graph Convolutional Broad Network


Abstract:

Hyperspectral image (HSI) classification has attracted much attention in the field of remote sensing. However, the lack of sufficient labeled training samples is a huge c...Show More

Abstract:

Hyperspectral image (HSI) classification has attracted much attention in the field of remote sensing. However, the lack of sufficient labeled training samples is a huge challenge for HSI classification. To face this challenge, we propose a semisupervised HSI classification method based on graph convolutional broad network (GCBN). First, to avoid the underfitting problem caused by the insufficient linear sparse feature representation ability of broad learning system (BLS), graph convolution operation is applied to extract nonlinear and discriminative spectral-spatial features from the original HSI to replace the linear mapping features in the traditional BLS. Second, to solve the problem of insufficient model classification ability caused by limited labeled samples, the combinatorial average method (CAM) is proposed to use valuable paired samples to generate sample expansion set for GCBN model training. Third, BLS is used to perform broad expansion on spectral-spatial features extracted by GCN and extended by CAM, which further enhances the feature representation ability. Finally, the output weights can be easily calculated by the ridge regression theory. Experimental results on three real HSI datasets demonstrate the effectiveness of our proposed GCBN.
Page(s): 2995 - 3005
Date of Publication: 02 March 2021

ISSN Information:

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Hyperspectral images (HSI) contain rich spectral and spatial information, which makes them widely used in crop monitoring, environmental monitoring, mineral exploration, and other fields [1]–​[4]. HSI classification is one of the basic and key technologies of remote sensing for earth surface observation. It aims to infer the class of each pixel based on the spectral and spatial information of the HSI [5]–​[7].The early staged methods for HSI classification are mostly based on conventional pattern recognition methods, such as K-nearest neighbor [8] and support vector machine (SVM) [9], random forest [10], and decision tree [11]. In addition, extreme learning machines [12], sparse representation [13], and graph embedding methods [14] are also used for HSI classification. However, most early staged HSI classification methods only focused on exploring the role of the spectral information for classification, and therefore high classification accuracy could not be obtained [15]. Since the neighboring pixels in HSI usually carry rich spatial information, many spectral-spatial classification methods have been proposed and the spatial information of HSI was used to obtain higher classification accuracy therein. For instance, some researchers applied the spatial information to HSI classification via the extended morphological profiles, and thus the satisfactory classification accuracy could be achieved [16][17]. Spectral and spatial information contained in the neighborhood region of the pixels were merged and added into the sparse representation model in [18] and [19]. Tu et al. [20] proposed a spectral-spatial HSI classification method, which exploited the comprehensive contextual information of HSI by considering a weak assumption that the pixels in a superpixel belong to the same class, and achieved an excellent classification performance. Sellami et al. [21] proposed an HSI classification approach, which made full use of the spectral-spatial information by automatically selecting relevant spectral bands.

Compared with traditional machine learning algorithms, deep learning techniques can automatically extract high-level and compact features from input data. In recent years, deep learning techniques have been successfully applied to HSI classification tasks. Chen et al. [22] used stacked autoencoders to extract the features of HSI, and entered them into the logistic regression model for classification. Liu et al. [23] first used deep belief network to extract deep spectral features, and then repeatedly selected good-quality labeled samples as training samples with active learning algorithms. Zhang et al. [24] proposed an HSI classification algorithm based on the convolutional neural network (CNN), which utilized diverse region-based inputs to learn discriminative spectral-spatial features. Chen et al. [25] used 1-D, 2-D, and 3-D CNN to extract features of HSI, respectively. Kong et al. [26] extracted the spectral features of HSI by constructing intra-class and inter-class hypergraphs, and extracted spatial features by CNN. Zhu et al. [27] adopted generative adversarial networks to construct a semisupervised feature learning framework for HSI classification. Mou et al. [28] applied recurrent neural network (RNN) to HSI classification for the first time, and proposed a parameter modified tanh activation function to replace the traditional activation function.

The impressive feature representation capability of deep learning is based on abundant labeled samples. However, collecting the labeled HSI data is difficult and expensive [29]. Therefore, how to learn a strong generalization classifier at a low labeling cost has become a research hotspot in the field of HSI analysis. To address this concern, many methods have been proposed, which contain four categories. The first one is data augmentation, which synthesizes new examples following the original data distribution [30]. Li et al. [31] constructed a new training set for CNN by using pairwise labeled samples and exploited it to improve the model classification accuracy. Wang et al. [32] established a data mixture model to augment the labeled training set quadratically, and exploited this set to train the CNN. The second category is named domain adaptation, which uses sufficient samples from different but similar domains to solve the problems for another domain [33]. Zhou and Prasad [34] first used deep convolutional RNNs to extract the discriminative features for two domains, then aligned the features with each other layer-by-layer in the common subspaces, and thus realized the HSI classification of different distributions by exploiting only part of labeled samples in the source domain. The third one is active learning, which can exploit a small number of labeled samples to train a classier, making the classifier actively select representative unlabeled samples [35]. The semisupervised method utilizes abundant unlabeled data and limited labeled samples for classification. Wu and Prasad [36] proposed a semisupervised deep learning network, which effectively alleviated the shortage of labeled samples by combining limited labeled samples with abundant unlabeled samples for HSI classification.

Broad learning system (BLS) is a random vector functional link neural network (RVFLNN) consisting of only three parts [mapped feature (MF), enhancement node (EN), and output layer] [37]. Compared with the deep learning, BLS has the following advantages [37]: 1) BLS can nonlinearly expand the feature. 2) BLS has a simple and flexible structure with only three layers. 3) Gradient descending is used in deep learning methods, which requires more times of iterations. For BLS, the ridge regression is exploited to directly calculate the network weights of BLS, so the network training speed is fast. 4) It is easy to integrate BLS with other models. Feng and Chen [38] proposed a fuzzy BLS by combining the Takagi–Sugeno fuzzy system with BLS, which achieved an ideal accuracy in regression and classification. Chu et al. [39] proposed a weighted BLS, in which the contribution of each input sample to the BLS was constrained by exploiting penalty factors. Kong et al. [40] proposed a semisupervised model by merging the class-probability structure into BLS and achieved good classification performance in HSI classification. Kong et al. [41] proposed a HSI clustering algorithm based on BLS, and exploited the graph-regularized sparse autoencoder to fine-tune the weights of MF and EN.

As the latest research achievement of deep learning, the graph convolutional network (GCN) can aggregate and transform the neighbor feature information from each node. Besides, GCN is able to encode features of graph nodes and local graph structure by convolutional layers, so as to exhaustively exploit the graph features and flexibly preserve the class boundaries [42]. However, the original GCN only utilizes the spectral information when classifying HSI, that is, only constructs the spectral adjacency matrix. Qin et al. [43] consider the data structure characteristic of HSI and the advantages of GCN, and completed the HSI classification by constructing a spectral-spatial adjacency matrix while using spectral and spatial information. Therefore, according to [43], we first use GCN to extract the spectral-spatial features of HSI. Then, CAM is used to expand the data with spectral-spatial features extracted by GCN, while considering the flexible network structure and the ability of feature broad expansion of BLS, a semisupervised graph convolutional broad network (GCBN) is proposed. The main contributions of our work are summarized as follows.

  1. We replace the linear mapping features used in the traditional BLS with the spectral-spatial features extracted from the original HSI by GCN, which can achieve accurate HSI classification at low labeling cost by means of exploiting limited labeled samples and abundant unlabeled samples.

  2. In the proposed combinatorial average method (CAM), some valuable paired samples are selected in a targeted manner, and averaged in pairs to generate a sample expansion set much larger than the original training set. Thus, the problem of the lack of labeled samples to support high-precision classification model training can be solved.

  3. We exploit the BLS to perform broad expansion on spectral-spatial features extracted by GCN and extended by CAM, which is helpful to further enhance the representation ability of features and thus improve the classification accuracy of HSI.

The rest of this article is organized as follows. We elaborate the semisupervised classification method of HSI based on GCBN in Section II. We present experimental results on three real HSI datasets and analyze them in Section III-A followed by a conclusion in Section IV.

SECTION II.

Semisupervised Classification of Hsi Based on Gcbn

A. Flowchart of GCBN for HSI Classification

The flowchart of the proposed GCBN for HSI classification is shown in Fig. 1, which mainly contains the following five steps:

  1. The principal component analysis (PCA) is applied to the original HSI to reduce dimensionality;

  2. The spectral-spatial graph of GCBN constructed based on the spectral and spatial information of limited labeled samples and abundant unlabeled samples is used for graph convolution operation. Then the discriminative spectral-spatial features of HSI are extracted by the trained GCN;

  3. In our proposed CAM, some valuable paired samples are selected in a targeted manner, and averaged in pairs to generate a sample expansion set for GCBN training;

  4. BLS is used to expand the width of spectral-spatial features extracted by GCN and extended by CAM;

  5. The output layer weights can be calculated with the ridge regression theory.

Fig. 1. - Flowchart of GCBN for HSI classification.
Fig. 1.

Flowchart of GCBN for HSI classification.

B. Feature Extraction Based on GCN

Since there is redundant information in the original HSI band, directly entering the original HSI into the GCN will cause a dramatic increase in the network parameters and affect the classification performance of GCN. Therefore, PCA is used to reduce the dimensionality of the original HSI data \boldsymbol{X}_{0}. Define \boldsymbol{X} \in \mathrm{R}^{n \times d}, then \boldsymbol{x} \in \mathrm{R}^{d} is the signal after dimension reduction by PCA, \theta \in \mathrm{R}^{d} is the Fourier coefficient. The graph convolution operation in the spectral domain can be expressed as each frequency of the signal x multiplied by the filter g_{\theta } parameterized by \theta in the Fourier domain \begin{equation*} g_{\theta } \star x=\boldsymbol{U} g_{\theta } \boldsymbol{U}^{\mathrm{T}} x \tag{1} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where U is the matrix composed of the eigenvectors of the normalized graph Laplacian, \boldsymbol{L}=\boldsymbol{I}-\boldsymbol{D}^{-\frac{1}{2}} {{\bf A}} \boldsymbol{D}^{-\frac{1}{2}}=\boldsymbol{U} \boldsymbol{\Lambda } \boldsymbol{U}^{\mathrm{T}}. \boldsymbol{\Lambda } is the diagonal matrix which contains the eigenvalues of \boldsymbol{L}, \boldsymbol{D} is the degree matrix, \boldsymbol{D}_{i i}=\sum _{j} \boldsymbol{A}_{i j}, \boldsymbol{I} is the identity matrix. Then, we can regard g_{\theta } as a function of the eigenvalues of \boldsymbol{L}, g_{\theta }(\boldsymbol{\Lambda }). In order to reduce computational consumption, Hammond et al. [44] tried to approximate g_{\theta }(\boldsymbol{\Lambda }) by simplifying the Chebyshev polynomials {T}_{k}({x}) to the Kth-order \begin{equation*} g_{\theta ^{\prime }}(\boldsymbol{\Lambda }) \approx \sum _{k=0}^{K} \theta _{k}^{\prime } T_{k}(\tilde{\boldsymbol{\Lambda }}) \tag{2} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \theta ^{\prime } is a vector of Chebyshev coefficients, \tilde{\boldsymbol{\Lambda }}=\frac{2}{\lambda _{\max }} \boldsymbol{A}-\boldsymbol{I}_{N}, \lambda _{\max } is the largest eigenvalue of \boldsymbol{L}. According to [44], the Chebyshev polynomial is defined as T_{k}(x)=2 x T_{k-1}(x)-T_{k-2}(x), T_{0}({x})=1 and T_{1}({x})=x. Then define the expression of the convolutional filter g_{\theta ^{\prime }} on signal x as \begin{equation*} g_{\theta ^{\prime }} \star x \approx \sum _{k=0}^{K} \theta _{k}^{\prime } T_{k}(\tilde{\boldsymbol{L}}) x \tag{3} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \tilde{\boldsymbol{L}}=\frac{2}{\lambda _{\max }} \boldsymbol{L}-\boldsymbol{I}_{N} represents the scaled Laplacian matrix. Equation (3) can be easily verified by exploiting the fact (\boldsymbol{U} \boldsymbol{\Lambda } \boldsymbol{U}^{\mathrm{T}})^{k}=\boldsymbol{U} \boldsymbol{\Lambda }^{k} \boldsymbol{U}^{\mathrm{T}}. It can be seen that this expression is a Kth-order polynomial regarding to the Laplacian, that is, the final filtering result only depends on the nodes at most K steps away from the center point. In this article, we consider the first-order neighborhood, i.e., {K}=1. This means that filtering of the graph signal x only relies on the nearest node of the current node. We further approximate \lambda _{\max }\approx 2, as the parameters of the neural network, can adapt to changes during large-scale training [45]. Thus, (3) can be simplified to \begin{equation*} \begin{aligned}[b] g_{\theta ^{\prime }} \star x & \approx \theta _{0}^{\prime } x+\theta _{1}^{\prime }\left(\boldsymbol{L}-\boldsymbol{I}_{N}\right) x \\ &=\theta _{0}^{\prime } x-\theta _{1}^{\prime } \boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}} x \end{aligned} \tag{4} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \boldsymbol{A} \in \mathrm{R}^{n \times n} is the adjacency matrix of spectral signatures, \boldsymbol{P} \in \mathrm{R}^{n \times n} is the adjacency matrix of spatial signatures, and \mu is the spatial coefficient \begin{align*} a_{i j}=\left\lbrace \begin{array}{ll}\left\Vert x_{i}-x_{j}\right\Vert _{2}, & i \ne j \\ 0, & i=j \end{array}\right. \tag{5} \\ p_{i j}=\left\lbrace \begin{array}{ll}\left\Vert d_{i}-d_{j}\right\Vert _{2}, & i \ne j \\ 0, & i=j \end{array}\right. \tag{6} \end{align*}
View SourceRight-click on figure for MathML and additional features.
where x represents the spectral feature vector of the sample, and d represents the spatial position coordinates of x.

Since reducing the number of parameters is helpful to solve overfitting problem, we set \theta =\theta _{0}=-\theta ^{\prime }_{1}, then (4) can be converted to \begin{equation*} g_{\theta ^{\prime }} \star x \approx \theta \left(\boldsymbol{I}_{N}+\boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}}\right) x. \tag{7} \end{equation*}

View SourceRight-click on figure for MathML and additional features.

Since the eigenvalues of \boldsymbol{I}_{N}+\boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}} are within the range [0, 2], repeatedly using this operator in a deep neural network will lead to numerical instabilities and vanishing/exploding gradients [45]. To solve this problem, according to [43], we performed the renormalization trick \boldsymbol{I}_{N}+\boldsymbol{D}^{-\frac{1}{2}}(\boldsymbol{A}+\mu \boldsymbol{P}) \boldsymbol{D}^{-\frac{1}{2}} \rightarrow \tilde{\boldsymbol{D}}^{-\frac{1}{2}}(\boldsymbol{I}_{N}+\boldsymbol{A}+\mu \boldsymbol{P}) \tilde{\boldsymbol{D}}^{-\frac{1}{2}}, where \boldsymbol{D}_{i i}=\sum _{j}(\boldsymbol{I}_{N}+\boldsymbol{A}+\mu \boldsymbol{P})_{i j}. The GCN can be expressed as \begin{equation*} \boldsymbol{S}^{(l)}=\operatorname{Relu}\left(\tilde{\boldsymbol{A}} \boldsymbol{S}^{(l-1)} \boldsymbol{W}^{(l)}\right) \tag{8} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where \boldsymbol{S}^{(l)} is the output of the lth layer, \operatorname{Relu}(\cdot)=\max (0, \cdot) was selected as the activation function, \boldsymbol{W}^{(l)} denotes the trainable weight matrix contained in lth layer, and \tilde{\boldsymbol{A}} can be calculated by \begin{equation*} \tilde{a}_{i j}=\left\lbrace \begin{array}{ll}e^{\frac{-\left(\left\Vert {x}_{i}-{x}_{j}\right\Vert ^{2}+\mu \Vert d_{i}-d_{j}||^{2}\right)}{\sigma }}, & \text{ if } {x}_{i} \in \operatorname{Nei} \left(x_{j}\right) \\ & \text{ or } {x}_{j} \in \operatorname{Nei}\left(x_{i}\right) \\ 0, & \text{ otherwise } \end{array}\right. \tag{9} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \sigma is the spectral-spatial coefficient.

Only the three-layer graph CNN is selected. The propagation rule of the first two layers is shown in (8), and propagation rules of the last layer is as follows: \begin{equation*} \boldsymbol{S}^{(3)}=\operatorname{softmax}\left(\tilde{\boldsymbol{A}} \boldsymbol{S}^{(2)} \boldsymbol{W}^{(2)}\right) \tag{10} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where \operatorname{softmax}(z_{i})=\exp (z_{i}) / \sum _{i} \exp (z_{i}) is selected as the activation function of the output. The loss function is \begin{equation*} {} L=-\sum _{k \in \boldsymbol{{Y}}_{L}} \sum _{c=1}^{C} {\boldsymbol{{Y}}}_{k c} \ln {\boldsymbol{{S}}}_{k c}^{(3)} \tag{11} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where {{{\boldsymbol{Y}}}}_{L} denotes the set of vertex indices corresponding to the labeled samples, {C} is the total number of categories, and {\boldsymbol{{Y}}} is the category matrix. Similar to [45], we employed the gradient descent method to learn the weight parameters.

C. Sample Expansion Based on CAM

When the number of input labeled samples is insufficient, the BLS is prone to the problems of insufficient network training and overfitting. Therefore, we propose the CAM to expand the samples after graph convolution operation. First, limited labeled samples \boldsymbol{X} \in \mathrm{R}^{n_{t} \times d_{0}} are fed into the trained GCN to obtain \boldsymbol{Z} with discriminative spectral-spatial features \begin{equation*} \boldsymbol{Z}=\tilde{\boldsymbol{A}} \operatorname{Relu}\left(\tilde{\boldsymbol{A}} \boldsymbol{X} \boldsymbol{W}^{(1)}\right) \boldsymbol{W}^{(2)}=\left[\begin{array}{l}\boldsymbol{Z}_{1} \\ \;\vdots \\ \boldsymbol{Z}_{{l}} \\ \;\vdots \\ \boldsymbol{Z}_{C} \end{array}\right] \in \mathrm{R}^{\left({C} \times {n}_{l}\right) \times d_{1}} \tag{12} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where n_{l} is the number of labeled samples selected for each class, l \in [ {1,C} ].

Second, the center value of the selected samples belonging to the lth class is defined as \boldsymbol{z}_{l}^0, which can be calculated by \begin{equation*} \boldsymbol{z}_{l}^{0}=\frac{\boldsymbol{z}_{l}^{1}+\boldsymbol{z}_{l}^{2}+\cdots +\boldsymbol{z}_{l}^{n_{l}}}{n_{l}}. \tag{13} \end{equation*}

View SourceRight-click on figure for MathML and additional features.

Third, the n_{x} samples nearest to the center value are averaged in pairs to obtain C_{n_{x}}^2 samples. The expanded samples belonging to the lth class is defined as {\boldsymbol{Z}_{l}^a} \in \mathrm{R}^{C_{n_{x}}^2 \times d_{1}} \begin{equation*} \boldsymbol{Z}_{l}^{a}=\left[\begin{array}{c}\boldsymbol{z}_{l}^{{a_{1}}} \\ \vdots \\ \boldsymbol{z}_{l}^{a_{{C}_{n_x}^2}} \end{array}\right]. \tag{14} \end{equation*}

View SourceRight-click on figure for MathML and additional features.

Finally, the \boldsymbol{Z} is expanded and used as the training set \boldsymbol{Z}^{\mathrm{K}} for GBCN, \boldsymbol{Z}_{l}^{\mathrm{K}}=[\boldsymbol{Z}_{l}; \boldsymbol{Z}_{l}^{a}] \in \mathrm{R}^{(n_{l}+C_{n_{x}}^{2}) \times d_{1}} is defined as all samples of lth class of \boldsymbol{Z}^{\mathrm{K}}. \begin{equation*} \boldsymbol{Z}^{\mathrm{K}}=\left[\begin{array}{l}\boldsymbol{Z}_{1}^{\mathrm{K}} \\ \;\vdots \\ \boldsymbol{Z}_{l}^{\mathrm{K}} \end{array}\right] \in \mathrm{R}^{C\left(n_{l}+C_{n_{x}}^{2}\right) \times d_{1}}. \tag{15} \end{equation*}

View SourceRight-click on figure for MathML and additional features.

The CAM can be used to extend the sample size of the data with discriminative spectral-spatial features extracted by GCN, which will provide more valuable samples for GCBN training. CAM is only used for model training, and n_{x} can be set according to the specific situation.

D. Spectral-Spatial Feature Broad Expansion Based on BLS

BLS is a new type of flat network designed based on the idea of RVFLNN [37]. Although the lack of linear sparse representation ability of BLS could lead to an underfitting problem, it still has such advantages as simple structure, fast calculation speed, and feature broad expansion. Therefore, BLS can be used to expand the width of the nonlinear features extracted by the GCN to further enhance the feature representation ability.

The original input is mapped to feature nodes via random weights, d^{\mathrm{M}} is denoted as the number of feature node groups, and G^{\mathrm{M}} is denoted as the feature dimension of each group. The ith group MFs is \begin{equation*} \boldsymbol{M}_{i}=\boldsymbol{Z}^{\mathrm{K}} \boldsymbol{W}_{e i}+\boldsymbol{\beta }_{e i}, i=1, \ldots, d^{\mathrm{M}} \tag{16} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where \boldsymbol{W}_{e i} and \boldsymbol{\beta }_{e i} are connection weights and bias from \boldsymbol{Z}^{\mathrm{K}} to MF. Sparse autoencoder is used to fine-tune the initial \boldsymbol{W}_{e i}, \boldsymbol{M}=[\boldsymbol{M}_{1}, \boldsymbol{M}_{1}, \ldots, \boldsymbol{M}_{d^{\mathrm{M}}}]. To further enhance the feature representation capability, \boldsymbol{M} is randomly mapped to EN to achieve feature broad expansion by \begin{equation*} \boldsymbol{H}_{j}=\varphi \left(\boldsymbol{M} \boldsymbol{W}_{h j}+\boldsymbol{\beta }_{h j}\right), j=1, \ldots, G^{\mathrm{E}} \tag{17} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \boldsymbol{W}_{h j} and \boldsymbol{\beta }_{h j} are connection weights and bias from MF to EN, respectively, and \varphi (\cdot) is tansig function here.

Finally, MF and EN are simultaneously mapped to the output layer, and the output of the GCBN is \begin{equation*} \boldsymbol{O}=[\boldsymbol{M} \mid \boldsymbol{H}] \boldsymbol{W}^{\mathrm{O}}. \tag{18} \end{equation*}

View SourceRight-click on figure for MathML and additional features.

The objective function of the GCBN is as \begin{equation*} \underset{W^{\text {O}}}{\operatorname{argmin}}\left\Vert \boldsymbol{O}-\boldsymbol{Y}^{\mathrm{K}}\right\Vert _{2}^{2}+\delta \left\Vert \boldsymbol{W}^{\mathrm{O}}\right\Vert _{2}^{2} \tag{19} \end{equation*}

View SourceRight-click on figure for MathML and additional features.where \delta is the regularization parameter. Then the network weights of GCBN can be calculated with the ridge regression as \begin{equation*} \boldsymbol{W}^{\circ }=\frac{[\boldsymbol{M} \mid \boldsymbol{H}]^{\mathrm{T}} \boldsymbol{Y}^{\mathrm{K}}}{\delta \boldsymbol{I}+[\boldsymbol{M} \mid \boldsymbol{H}]^{\mathrm{T}}[\boldsymbol{M} \mid \boldsymbol{H}]} \tag{20} \end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \boldsymbol{Y}^{\mathrm{K}} is the label corresponding to \boldsymbol{Z}^{\mathrm{K}}, The predicted result can be calculated by the following formula \begin{equation*} \boldsymbol{Y}=[\boldsymbol{M} \mid \boldsymbol{H}] \boldsymbol{W}^{\mathrm{O}}. \tag{21} \end{equation*}
View SourceRight-click on figure for MathML and additional features.

In summary, the steps of semisupervised HSI classification based on GCBN are summarized as follows.

Algorithm 1: GCBN

Inputs: PCA-based HSI representation \boldsymbol{X}, labels of samples \boldsymbol{Y}_{\mathrm{L}}, unlabeled samples \boldsymbol{X}_{\mathrm{U}}, spatial coefficient \mu, spectral-spatial coefficient \sigma, number of expanded samples belong to the per class n_{x}, regularization coefficient \delta, feature dimensions of each group G^{\mathrm{M}}, number of nodes in MF per group d^{\mathrm{M}}, and number of nodes in EN d^{\mathrm{E}}.

Step 1.

Initialize GCBN network parameter.

Step 2.

Calculate the spectral-spatial adjacency matrix \tilde{\boldsymbol{A}} according to (9).

Step 3.

Pretrain GCN with labeled samples \boldsymbol{X}_{\mathrm{L}} and unlabeled samples \boldsymbol{X}_{\mathrm{U}}.

Step 4.

Extract features \boldsymbol{Z} according to (12). Calculate expanded samples \boldsymbol{Z}^{\mathrm{K}} according to (12)\sim(15) and take \boldsymbol{Z}^{\mathrm{K}} as the training set for GCBN.

Step 5.

Calculate the network weights \boldsymbol{W}^{\mathrm{O}} of GCBN according to (16)\sim(20).

Step 6.

Calculate the predictive labels \boldsymbol{Y} according to (21).

Outputs: Predictive labels \boldsymbol{Y}.

SECTION III.

Experiments

A. HSI Datasets

Three real HSI datasets were selected in our experiments.

Indian Pines dataset was acquired by AVIRIS sensor over the Indian Pines test site in North-western Indiana, containing 145×145 pixels and 224 bands. This image is mainly used for agricultural related research with two-third of agricultural land, one-third of forests, and other natural perennial vegetation, including 16 classes.

Botswana dataset was acquired by Hyperion sensor over the Okavango Delta, Botswana, containing 1476×256 pixels and 242 bands and including 14 classes. After removing noise, atmospheric and water absorption, and overlapping bands, the remaining 145 bands are used for the experiment.

Kennedy space center (KSC) dataset was acquired by AVIRIS sensor over Florida, containing 614×512 pixels and 224 bands and including 13 classes. After removing the water absorption and noise bands, 176 bands of the image are reserved for the experiment.

B. Experimental Result

To verify the validity and superiority of the proposed GCBN, the following 11 classifiers are selected for comparison:

  1. traditional classification method: SVM [9];

  2. deep learning methods: 2D-CNN [24], GCN [45], SSGCN [43], MDGCN [42];

  3. broad learning methods: BLS [37], SBLS [40];

  4. GCBN without CAM: GB;

  5. replacing CAM of GCBN with the data augmentation methods in [26] and [32] respectively: GZB, GMB; and

  6. replacing GCN of GCBN with Graphsage [46]: GSCB.

The experimental settings are as follows.

  1. Since Wan et al. [42] also selected the Indian Pines and KSC datasets to test the performance of MDGCN, here we directly refer to [42] to select the hyperparameters of MDGCN. The hyperparameters of the remaining 6 comparison classifiers were set via grid search method;

  2. A three-layer GCN with 40 hidden nodes is used in GCBN. The epoch is 200 and the learning rate is 0.01, \mu =30, \sigma =6, \delta = 0.01, n_x = n_c - 2, where {n_c} is the number of labeled samples. The feature dimensions of each group {{\mathop { {G}}\nolimits } ^{\rm {M}}}{\rm { = }}30, number of nodes in MF per group {{\mathop { {d}}\nolimits } ^{\rm {M}}}{\rm { = }}15, and number of nodes in EN {{\mathop { {d}}\nolimits } ^{\rm {E}}}{\rm { = }}600 are set via grid search method;

  3. All eight classifying methods are implemented in PyTorch and MATLAB R2017a using a computer with a 3.60 GHz Intel Core i5-6500 CPU and 8 GB of RAM;

  4. We select 4 evaluating indexes to evaluate the experimental results, including per-class accuracy (%), overall accuracy (OA, %), Kappa coefficient, and consumed time (Time, s), where the consumed time here means the training and testing time of the classifier. To eliminate the influence of random factors, each experiment is conducted ten times to get the average value of all indexes;

  5. We randomly select five samples from each class of the ground objects in the HSI dataset as labeled samples for experiments.

  6. In the Indian Pines dataset, the surface objects represented by I1-I16 are: Alfalfa, Corn-notill, Corn-mintil, Corn, Grass-pasture, Grass-trees, Grass-pasture-mowed, Hay-windrowed, Oats, Soybean-notill, Soybean-mintill, Soybean-clean, Wheat, Woods, Buildings-Grass-Tree-Drives, and Sybtone-Steel-Towers. In the Botswana dataset, B1-B14 represent: Water, Hippo grass, Floodplain grasses1, Floodplain grasses2, Reeds1, Riparian, Firescar2, Island interior, Acacia woodlands, Acacia shrublands, Acacia grasslands, Short mopane, Mixed mopane, and Exposed soils. In the KSC dataset, the surface objects represented by K1–K14 are: Srub, Willow swamp, CP hammock, Slash pine, Oak, Hardwood, Swamp, Graminoid, Spartina marsh, Cattail marsh, Salt marsh, Mud flats, and Water.

Tables I–​III and Figs. 2–​4 shows the performance comparison results of different classifiers.

TABLE I comparison of Classification Performance on Indian Pines Dataset
Table I- comparison of Classification Performance on Indian Pines Dataset
TABLE II comparison of Classification Performance on Botswana Dataset
Table II- comparison of Classification Performance on Botswana Dataset
TABLE III comparison of Classification Performance on KSC Dataset
Table III- comparison of Classification Performance on KSC Dataset
Fig. 2. - Classification maps on Indian Pines dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
Fig. 2.

Classification maps on Indian Pines dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.

Fig. 3. - Classification maps on Botswana dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
Fig. 3.

Classification maps on Botswana dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.

Fig. 4. - Classification maps on the KSC dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.
Fig. 4.

Classification maps on the KSC dataset. (a) False-color image. (b) Ground-truth map. (c) SVM. (d) 2D-CNN. (e) GCN. (f) BLS. (g) SBLS. (h) SSGCN. (i) MDGCN. (j) GB. (k) GZB. (l) GMB. (m) GSCB. (n) GCBN.

It can be observed from Tables I–​III and Figs. 2–​4 that

  1. Among the 12 methods, 2-D CNN obtains the lowest OAs and Kappa coefficients on all three HSI datasets, and consumes the longest time. The reason is that the impressive performance of the deep learning network requires abundant labeled samples to ensure. When the number of labeled samples is insufficient, 2-D CNN cannot be adequately trained, resulting in low classification accuracy of HSI, even lower than that of conventional SVM. In addition, 2-D CNN has a large number of network layers and the gradient descent which needs repeated iteration training is used to learn the model. So it consumes a long time.

  2. BLS has the shortest time-consuming and high classification accuracy among the 12 methods. The reason is that the structure is simple and the nonlinear mapping from MF to EN in BLS achieves the broad expansion of MF and enhances the classification ability of BLS. Compared with BLS, SBLS achieves higher OAs and Kappa coefficients on all three datasets because SBLS additionally utilizes a large amount of unlabeled sample information.

  3. GCN, GCBN, and MDGCN are all GCN methods, in which GCBN has the highest classification accuracy, followed by MDGCN. The reason is that both GCBN and MDGCN use spectral and spatial information of HSI, while GCN only considers spectral information. In addition, compared with GCBN and MDGCN takes the spectral and spatial information of different scales into account, and dynamically updates the constructed graph during training.

  4. GCBN achieves the highest OAs and Kappa coefficients and the lowest time-consuming on all three datasets. The reason is as follows. First, GCBN is a semisupervised classification method that uses limited labeled samples and abundant unlabeled samples. Second, GCN helps extract more discriminant spectral-spatial features from the original HSI. Third, combinatorial average expansion of spectral-spatial features provides a great quantity of valuable samples for GCBN training. Fourth, the spectral-spatial feature broad expansion further enhances the feature representation ability of GCBN. GCBN only uses one layer of graph convolutional and the structure of BLS is simple, so the learning speed of GCBN is fast.

  5. Among the three HSI datasets, the OAs and Kappa coefficients of the 12 methods are the lowest on Indian Pines. This is because the similarity of the features in the Indian Pines dataset is relatively large. For instance, the corn-notill, corn-mintill, and corn belong to the same class in essence, so it is difficult to classify them. All the classification models have the lowest time-consuming on the Botswana dataset. This is because the Botswana dataset has the smallest sample size with only 3268 samples, while the Indian Pines dataset contains 10 249 samples.

  6. Among the three data augmentation methods (GZB, GMB, and GCBN), GCBN achieves the highest OAs and Kappa coefficients. This is because CAM can increase the number of training samples without losing key information.

  7. Compared with GSCB, GCBN obtains higher OAs and Kappa coefficients. The reason is that GCN integrates the global contextual information of the graph by constructing a spectral-spatial matrix of the entire graph.

Then the influence of different labeled sample sizes on the classification accuracy of HSI is studied. It can be seen from Fig. 5 that: 1) with the increase of the number of labeled samples, the OAs of all classification models show an increasing trend; 2) when the number of labeled samples is small (5 or 10 per class), the classification accuracy of 2-D CNN on the three datasets is the lowest among all models. As the number of labeled samples gradually increases, the 2-D CNN classification accuracy increases the most.

Fig. 5. - OAs of various methods under different numbers of labeled samples per class. (a) Indian Pines. (b) Botswana. (c) KSC.
Fig. 5.

OAs of various methods under different numbers of labeled samples per class. (a) Indian Pines. (b) Botswana. (c) KSC.

SECTION IV.

Conclusion

An HSI classification method, named GCBN, is proposed in this article. First, the deep and nonlinear spectral-spatial features extracted by GCN are used to replace the linear mapping features in traditional BLS, which is helpful to avoid the underfitting problem caused by the insufficient linear sparse feature representation ability of BLS. Then we propose CAM to select valuable paired samples so as to generate a sample expansion set for GCBN, which can alleviate the problem of poor classification ability caused by the limited labeled samples. Furthermore, we use BLS to expand the width of spectral-spatial features extracted by GCN and CAM, which is able to enhance the representation ability of features and improve the classification ability of GCBN. Finally, the objective function can be easily obtained with the ridge regression theory. Experimental results on three real HSI datasets demonstrate the proposed GCBN can obtain higher classification accuracy than several other methods.

References

References is not available for this document.