Graph Convolutional Enhanced Discriminative Broad Learning System for Hyperspectral Image Classification

Recently, broad learning system (BLS) have demonstrated excellent performance in hyperspectral images (HSI) classification. However, due to the complex geometric structure and spatial layout of HSI, the linear sparse features in broad learning system are difficult to fully represent hyperspectral data. In addition, the features learned by broad learning system lack more effective discriminative ability, which leads to the limited expressive ability of features. To address the issues, we propose a graph convolutional enhanced discriminative broad learning system (GCDBLS) for HSI Classification. GCDBLS aggregates the node information in the adjacency graph through graph convolution, and then learns the context relationship, so as to obtain rich nonlinear spatial spectral features in hyperspectral images; In order to extract more discriminative hyperspectral image features, GCDBLS introduces the concept of local intra-class scatter and local inter-class scatter. By minimizing the local intra-class feature distance and maximizing the local inter-class feature distance, GCDBLS can improve the discrimination ability of BLS extracted features. On three HSI datasets, the experiments compared with the latest classification methods show that the proposed method achieves good results, and improves the classification performance of hyperspectral images.


I. INTRODUCTION
Hyperspectral image classification is the research foundation of hyperspectral remote sensing image processing, its main purpose is to divide each pixel in the hyperspectral remote sensing image into different object categories according to the spectral information and spatial information [1].Hyperspectral image classification technology is widely used in environmental monitoring, mineral exploration, military target recognition and other fields.However, HSI classification is facing great challenges due to its high dimensional characteristics, high correlation between bands, spectral mixing and so on.Therefore, hyperspectral image classification has attracted more and more attention of researchers [2], [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Chao Tong .
In the past decades, traditional machine learning methods, such as support vector machine [4], random forest and k-nearest neighbor [5] have achieved great success in HSI classification.However, traditional machine learning methods largely depends on human expertise, feature extraction is not sufficient, and the classification effect needs to be improved.
Inspired by the successful application of deep learning (DL) in image processing, DL has also been applied to HSI classification.The main advantage of deep learning is that it can automatically learn the effective feature representation of the problem domains, so as to avoid complex manual feature engineering.Hu et al. [6] used a onedimensional convolutional neural network (1D-CNN) with five layers to classify hyperspectral images by extracting spectral features.Zhang et al. [7] utilized generative adversarial network to extract the spectral features of HSI.In the training of small sample data, this method achieves better classification results.Zhu et al. [8] introduced a deep convolutional generative adversarial network [9] (DCGAN) into hyperspectral image classification, and proposed a onedimensional generative adversarial network (1D-GAN) based on spectral features.Recently, graph convolutional network (GCN) [10] can effectively extract, aggregate and transform the neighborhood information of each graph node by modeling between graph vertices.GCN has been successfully used for HSI classification [11], [12], [13], [14], [15].Qin et al. [16] used GCN to extract HSI spectral-spatial information (S2GCN) for HSI classification, which greatly improved the classification accuracy.However, the spectral spatial GCN (S2GCN) method treats each pixel as a graph node, which brings a lot of computation.Hong et al. [17] proposed a new method combining GCN and CNN, which provided a new idea for HSI classification.Wan et al. [18] used superpixel segmentation on HSI and adopted multiscale GCN to extract HSI multi-scale graph features.Ding et al. [19] combines CNN with GCN, and proposed a feature fusion hypergraph neural network (F 2 HNN) for HSI classification.
Although deep learning has achieved good results in HSI classification.However, deep neural networks often have many super parameters, which require repeated training to obtain the optimal values of these super parameters, and the amount of calculation is large.In addition, deep learning requires high hardware, and in the face of newly added training samples, the entire network must be retrained, which is bound to waste unnecessary time and resources.To alleviate the above problems, Chen et al. [20] proposed a broad learning system (BLS).Compared with deep learning networks, which have multiple hidden layers and many hyperparameters, broad learning system form the network in a simplified way with only one hidden layer.First, BLS maps the input data to form feature nodes [21].Then the feature nodes are mapped into enhancement nodes by nonlinear function.Finally, all feature nodes and enhancement nodes form a hidden layer, which is connected to the output layer for classification.When the BLS needs to expand in broad, there will be corresponding incremental learning methods.The incremental learning method does not need to retrain all the data whenever new data is added, that is, the previously trained does not need to be retrained, just need to train the newly added data.BLS can update the network quickly through incremental learning method, which is also one of the advantages of BLS.In HSI classification, broad learning system also has a wide range of applications.Kong et al. [22] proposed a semisupervised broad learning system (SBLS) for HSI classification, SBLS method combines hierarchical guided filtering and class probability structure.Zhao et al. [23] adopted local binary pattern (LBP) to extract the spectral-spatial joint features of hyperspectral images, and then used BLS to classify the spectral-spatial joint features.Chu et al. [24] utilized multiple filters to obtain the spatial and spectral features of hyperspectral images, and then introduced the discriminant information and the manifold structure of samples into BLS to improve the classification performance of BLS.
Despite BLS has greatly improved the accuracy of HSI classification.However, the existing BLS still has some shortcomings, mainly in the following two aspects: First, the simple network structure of the broad learning system leads to the limited ability of complex nonlinear representation, which can not fully extract the deep-level features of hyperspectral images.Second, BLS does not take into account the relationship between local inter-class spacing and local intra-class spacing of data samples, so it is difficult to ensure effective separation between sample classes and sufficient aggregation within classes.Thus, the learned features lack more effective discrimination ability.
Aiming at the above problems, this paper proposes a hyperspectral image classification framework based on a graph convolution enhanced discriminative broad learning system (GCDBLS).GCDBLS aggregates the node information in the adjacency graph through graph convolution, learns the context relationship, and then obtains the rich nonlinear spatialspectral features in the hyperspectral image.It makes up for the problem that the linear sparse feature in the width learning network is difficult to fully represent hyperspectral data.GCDBLS introduces the concept of local intra-class scatter and local inter-class scatter, which respectively reflect the local manifold structure and discriminative information in the input space.GCDBLS optimizes the projection direction of BLS output weights by minimizing the local intra-class scatter and maximizing the local inter-class scatter, so as to enhance the discrimination ability of BLS extraction features.The main contributions of this paper are as follows: (1) We fuse broad learning and graph convolutions to propose a graph convolution enhanced discriminative broad learning system.
(2) GCDBLS uses graph convolution network to effectively extract, aggregate and transform the neighborhood information of each graph node by modeling between graph vertices.Through graph convolution operation, GCDBLS realizes the information fusion of graph structure information and node features, and effectively extracts the nonlinear spatial features of hyperspectral images.
(3) GCDBLS introduces the concept of local intra-class scatter and local inter-class scatter in the input space into BLS, which can not only maintain the inherent local geometric structure of the data, but also maintain the local difference information of the data, thus enhancing the ability of BLS to distinguish features.

II. RELATED WORKS
The main work of this paper is based on GCN and BLS.This section will briefly introduce graph convolution network and broad learning system.

A. GRAPH CONVOLUTIONAL NETWORK
Kipf et al. proposed a scalable graph convolution (GCN) neural network [10].GCN combines the characteristics of graph neural network (GNN) and convolution neural network (CNN), and extends convolution neural network from European space to non-Euclidean space suitable for expressing relational networks.The input of GCN is a graph.After a layer of convolution, the neighbors of each node are convoluted once, and the node is updated with the convolution results.Then, after the activation function, the process of convolution activation function is repeated until the desired depth is reached.Finally, through the output function, the node state is transformed into the relevant label output.The structure of GCN is shown in Figure 1.
The essential purpose of GCN is to use graph convolution to extract the spatial features of graph data with non-European structure.According to the theory proposed in literature [10].For graph G = (V , E, A),input signal X and output signal Y .The processing method f adopted by the graph convolution neural network is defined as: where V represents the number of nodes in the graph, V = {v i } N i=1 , E represent the set of edges, A is the adjacency matrix of the graph, A ∈ R N ×N , the element A ij in the matrix A represents the connection relationship between the nodes v i and v j in the graph G.The forward propagation formula of graph convolution is: where Ã = A + I , I is an identity matrix of size N × N ; D is a diagonal matrix, Dii = j Ãij ; H l ∈ R N ×D represents the output value of the l-th layer, H 0 = X ; τ (•) is the activation function; W l represents the parameter value of the l-th layer.

B. BROAD LEARNING SYSTEM
The input data X of BLS is first mapped into mapped feature (MF), then MF is mapped into enhancement node (EN), and finally MF and EN are connected to the output layer.The structure of BLS is shown in Figure 2.
For input data where N is the number of data samples, M is the dimension of data samples, Y is the label matrix of data samples, C is the number of categories of data samples.
The MF node Z i can be obtained from the input data through the nonlinear mapping of formula (3): where φ is the nonlinear activation function; W ei and β ei are random weights and biases, respectively; n is the number of groups of feature nodes.
The feature nodes of group n are spliced into and then Z n is nonlinearly mapped to the enhancement node H j through formula ( 4): where σ is the nonlinear activation function; W hj and β hj are random weights and biases, respectively; m is the number of groups of enhancement nodes.
The feature nodes of group m are spliced into |H m ] is directly connected to the output layer, and the weight matrix between the A and the output layers can be calculated by ridge regression.The output layer can be expressed as: where W is the weight from hidden layer to output layer.Since W ei , β ei , W hj , β hj are generated randomly and remain unchanged in the training process, the network only needs to learn the weight W .Therefore, the objective function of BLS is: where Y is the label matrix of the data sample; Y − AW 2 2 is used to control training error minimization; λ 2 W 2 2 is used to prevent over fitting of the model; λ is the regularization coefficient.Solving formula (6) by ridge regression can get: where I is the identity matrix, A T is the transpose matrix of A.

III. PROPOSED METHOD
In this section, we propose GCDBLS.The motivation of this paper is presented in sub-section A. Next, the GCDBLS framework is given in sub-section B. Finally, we optimize GCDBLS in sub-section C.

A. MOTIVATION
Due to the complex geometric structure and spatial layout of hyperspectral images, in the original BLS, the input to MF is a linear mapping process.If the original BLS is directly used to process HSI, not only the spatial information of HSI is not fully considered, but also since the insufficient ability of the linear features to represent the HSI, it will lead to underfitting.In addition, BLS does not take into account the relationship between the local inter-class spacing and local intra-class spacing of data samples, so it is difficult to ensure effective separation between sample classes and

B. FRAMEWORK OF GCDBLS
The proposed HSI classification framework is shown in Figure 4. Firstly, in order to obtain more structural features of HSI, we extract the attribute contour features of HSI based on principal component analysis and extended multi-attribute profile; Secondly, we construct adjacency graph as the input of GCN, and then extract the deep spatial spectral features of HSI; Thirdly, we take the GCN fully connected layer features as the input of BLS, use the GCN fully connected layer features to construct local intra-class scatter and inter-class scatter as regular terms, and then optimize the regular terms together with the loss function of BLS.Finally, the optimized GCDBLS predicts the hyperspectral image data to complete the classification task.

1) MULTI-ATTRIBUTE PROFILES FEATURE EXTRACTION
Attribute profiles of hyperspectral images can be obtained by applying a series of attribute filters to grayscale images.Extended multi-attribute profile (EMAP) [26] is an improvement and extension of the traditional morphological profile feature extraction method.Due to the high dimension of hyperspectral image data, in order to reduce the computational complexity, principal component analysis is generally performed on the original image data, then attribute filtering is performed on several principal components, finally the extended attribute profile (EAP) can be obtained.The formula is: where AP i (i = 1, . . ., n) represents the attribute filtering of component i, n represents the number of retained principal components.
Since the feature vector extraction process of a single attribute profile can only be carried out in one aspect.Therefore, the feature information of hyperspectral images can not be effectively described.Mura et al. [26] extracted multiple features using different attribute filters, and then stitched these features together, and named this method as EMAP.EMAP can make good use of the spatial and spectral information of hyperspectral images.For EMAP feature extraction, four different attribute profiles are generally defined: area a, pixel standard deviation s, moment of inertia i and shape d.The formula is: 2) GRAPH CONSTRUCTION GCN constructs adjacency graph G (V , E, A) based on EMAP features.For the hyperspectral image feature extracted by EMAP, it is denoted as X EMAP , suppose sample x i in X EMAP is a node in adjacency graph G (V , E, A), the set V ∈ R N of nodes in the adjacency graph G (V , E, A) composed of all VOLUME 10, 2022 samples.In adjacency graph G (V , E, A), any two points x i and x j in data set X EMAP are connected by edges, and the interconnection between all nodes is defined by adjacency matrix A ∈ R N ×N to calculate the Euclidean distance between different data points.For the edge set E, when x i and x j have edges, the formula for calculating the weight a ij in the adjacency matrix A ∈ R N ×N is: where N k (x i ) represents the k-nearest neighbors of x i .

3) GRAPH CONVOLUTIONAL LAYER
After the adjacency graph is constructed, it is input into the GCN network to learn the context of hyperspectral images, and the aggregation of neighborhood high-order features is realized through multi-layer graph convolution.GCN includes two graph convolution layers, a global average pooling layer, and a fully connected layer.In the graph convolution layer, in order to improve the stability of model training, the self-connected adjacency matrix Ã is adopted, and the expression is as follows: where I is an identity matrix of size N × N .The graph convolution layer realizes the transfer of neighborhood relations by continuously aggregating adjacent nodes, and the conduction equation of the l-th layer is defined as: For the meaning of the parameters in formula (10), see Section 2.1.In this paper, we choose the Re LU(•) function with faster convergence speed as the activation function.
In GCN, a global pooling layer is introduced to aggregate all node features, and the context features are further compressed by a fully connected layer.Finally, the context features of hyperspectral images are extracted by GCN.

4) DISCRIMINATIVE BROAD LEARNING SYSTEM CLASSIFICATION LAYER
We note that the broad learning system only focuses on the separability of various samples, ignores the relative relationship between samples and the contained discrimination information, and lacks more effective discrimination ability.The recently proposed manifold learning [27], [28] methods can effectively reveal the local geometric structures contained within the data points.Manifold learning is widely used in the field of pattern recognition.Its intuitive idea [29] is that similar points in high-dimensional space should have similar function prediction values, and the possibility of similar classification labels will be greater.That is, the smoothness assumption: The classification or prediction function should be the smoothing function of the original high-dimensional data.Manifold learning assumes that all the sample data can be embedded into a low dimensional manifold, along the manifold, the similar points should have the same class label.Therefore, we introduce the concepts of local intraclass scatter and local inter-class scatter into BLS, which are called discriminative broad learning system.local intra-class scatter and local inter-class scatter reflect the local manifold structure and local discrimination information in the input space respectively.We optimize the projection direction of BLS output weights by minimizing local intra-class scatter and maximizing local inter-class scatter, so as to improve the ability of BLS to discriminate sample features. The x i and x j belong to the same class 0, otherwise (13) where t is a constant, N k (x i ) is the k nearest neighbor of x i .

Definition 1 Local intra-class scatter matrix:
The local intra-class scatter matrix is defined as: where D w is the diagonal matrix, D ii = j W w,ij , L w = D w − W w is the Laplace matrix of G w .

Local inter-class adjacency graph G b :
In order to construct local inter-class adjacency graph G b , we first need to calculate the adjacency weight matrix of G b .The elements of adjacency matrix defined based on Gaussian function are shown in equation ( 13): x i and x j are not belong to the same class 0, otherwise (15) where m is a constant, N k (x i ) is the k nearest neighbor of x i .

Definition 2 Local inter-class scatter matrix:
The local inter-class scatter matrix is defined as: We introduce the local maximum information difference matrix as a regular term into the objective function of BLS, which is called discriminative broad learning system.By minimizing the objective function of the BLS, we can maximize the local inter-class scatter and minimize the local intra-class scatter, so thatthe samples of the same class are clustered as much as possible, and the samples of different classes are as far away as possible.Through the above optimization process, it can not only minimize the classification error, but also have Strong distinguishability of data features, so as to improve the ability of BLS to distinguish features.The GCDBLS objective function based on the local maximum information difference matrix can be expressed as: where ) is the manifold discriminative regular term, Tr (•) represents the trace of the matrix; λ 2 W 2 2 prevent model overfitting; λ 1 and λ 2 are the regularization parameter.We take the derivative of formula (17) ∂F GCDBLS ∂W as in (18), shown at the bottom of the page.We make ∂F DBGCN ∂W = 0, according to formula (18) We can calculate the output weight where I is the identity matrix.
In the GCDBLS training process, firstly, we use EMAP to extract hyperspectral multi-attribute profiles feature; Then, we take it as the input of GCN, train the GCN model to convergence, and then use the convergent GCN to extract the deep spatial spectral features of hyperspectral images; Finally, the spatial spectral features of hyperspectral images extracted by GCN are used as the input of BLS, and the local intra-class scatter matrix and local inter-class scatter matrix are constructed as regular terms to optimize with the objective function of BLS, so as to realize the training of BLS model,so as to realize the hyperspectral image classification architecture as shown in Figure 3.
The overall process of the proposed method is as follows: Algorithm 1 GCDBLS Input: hyperspectral image, GCN hyperparameters, BLS hyperparameters; (1): Utilize PCA to reduce the dimension of hyperspectral images, and use EMAP to extract multi-attribute profiles feature; (2): Initialize the hyperparameters contained in GCN, and calculate the adjacency matrix A by formula ( 10); (3): Use the multi-attribute profiles feature extracted by EMAP to train the GCN to convergence by formulas ( 11) and ( 10); ( 4): The spatial spectral features of hyperspectral images are extracted by using the convergent GCN, and then the hyperspectral image features are vectorized through the fully connected layer; (5): The spatial spectral feature X GCN extracted by GCN is used as the input of BLS model, and the local intra-class scatter matrix and local inter-class scatter matrix are constructed according to formulas ( 13), ( 14), ( 15) and ( 16); ( 6): According to formula (17), we introduce the local intra-class scatter matrix and the local inter-class scatter matrix into the BLS objective function as regular terms; (7): Optimize GCDBLS through formulas ( 18), ( 19) and ( 20), and calculate the output weight W of GCDBLS; (8): The test data can be predicted by formula Y = AW ; Output: Predicted labels.

A. DATASETS
In order to compare the effects of GCDBLS with other models, we used three datasets, 1 Indian pines, Botswana and KSC for experiments.Datasets information are shown in Table 1.

B. EVALUATION METRICS
In order to quantitatively compare the effects of various methods, we selected overall accuracy (OA), average accuracy (AA) and kappa coefficient (kappa) as evaluation metrics.
(1) OA is calculated as the ratio of the correctly classified sample to the overall sample.
(2) AA is the average percentage of each category that is correctly classified.
(3) Kappa coefficient comprehensively considers the number of correctly divided sample points and the number of incorrectly divided sample points.
(1) SVMCK: Camps Valls et al. [30] proposed a composite kernel (CK) support vector machine method integrating  spectral kernel and spatial kernel.This method can use and spatial features in hyperspectral thus improving the accuracy.
(2) ELMCK: Zhou et al. [31] proposed a composite kernel (CK) limit learning machine method integrating spectral kernel and spatial kernel, which can effectively use spectral and spatial features in hyperspectral data.
(3) BLS: Compared with deep learning networks, which have multiple hidden layers and many hyperparameters, broad learning system forms the network in a simplified way with only one hidden layer.
(5) LBP-BLS: Zhao et al. [23] adopted local binary pattern (LBP) to extract the spectral-spatial joint features of hyperspectral images, and then used BLS to classify the spectralspatial joint features.
(7) GCN: Graph convolution neural network combines the characteristics of graph neural network (GNN) and convolution neural network (CNN), extends convolution neural network from European space to non-European space suitable for expressing relational networks, and has the scalability of edges and nodes.
The parameter settings of the comparison method and the proposed method are as follows: The suggestion parameter settings from the corresponding references were utilized.Most comparison algorithms were implemented given the available source codes with the same settings.It should be noted that the results of 2DCNN, SSGCN, MDGCN, SBLS and GCBN are borrowed from the reference [33].
In the proposed method, A three-layer GCN with 40 hidden nodes is used in GCDBLS, The GCN model is optimized by Adam optimizer.In the experiment, the batch size is set to 32 and the learning rate is set to 0.001.Batch normalization (BN) is adopted with the 0.9 momentum, the number of iterations is set to 200.The feature mapping node of GCDBLS is set to 100, the number of feature mapping groups is set to 10.The enhancement node is set to 100, and the number of enhancement mapping groups is set to 1.The regularization parameters λ 1 and λ 2 are selected from 10 −10 , 10 −9 , . . ., 10 −1 , 10 0 .For the three datasets, we randomly select five samples in each category as the training set and the rest as the test set.All experiments were repeated 10 times and the average was used as the final result.Table 2-

GPU. All experiments use TensorFlow platform MATLAB 2018b.
Indian Pines dataset classification results: Table 2 shows the quantitative classification results obtained by different methods on the Indian pines dataset, where the highest value of each row is highlighted in bold.From table 2, GCDBLS is better than the comparison model in terms of OA, AA and Kappa, which verifies that GCDBLS can obtain the contextual features of hyperspectral images and extract more discriminative hyperspectral image features; At the same time, it can be found from the results that BLS is better than SVMCK, ELMCK, and GCN in OA, AA and Kappa.SBLS and LBP-BLS as the improved algorithms of BLS, their experimental results are better than BLS.SBLS combined with hierarchical guided filtering and class probability structure effectively enhances the classification performance of hyperspectral images.LBP-BLS performs local binary pattern operations on the spatial domain of each band to extract gray-scale and rotation-invariant local texture feature.
Since the codes for SBLS, SSGCN, MDGCN and GCBN are unavailable, the corresponding results are borrowed from the references.Therefore, Figure 5 shows the classification results of SVMCK, ELMCK, BLS, LBP-BLS, GCN, GCD-BLS on the Indian Pines dataset.As shown in Figure 4, since the GCDBLS method can well acquire the contextual features of hyperspectral images and extract more discriminative hyperspectral image features.Compared with other methods, the classification results of GCDBLS method have less classification errors and obtain a smoother visual effect.
As shown in Table 3, since the Botswana dataset contains less noise and higher spatial resolution than the Indian Pines dataset, it is more suitable for landscape classification.Compared with the classification results on the Indian Pines dataset, the experimental results of the ten methods on the Botswana dataset have greatly improved.Similarly, the proposed method achieves better results than the comparison method, and verifies the performance of GCDBLS again.In addition, from the results, the classification results of GCDBLS in C1, C3, C4 and other categories reached 100%.This is because GCDBLS can effectively extract local features from hyperspectral images, which is important for local small object classification.Figure 6 visualizes the classification results generated by all methods.
Table 6 shows the quantitative classification results of different methods on the KSC dataset.It can be observed from table 6 that SSGCN, MDGCN and GCBN are better than GCN.As an improvement of GCN, SSGCN fuses the spatial information and spectral information of hyperspectral images, and effectively improves the classification results of hyperspectral images; MDGCN uses multi-scale regions as input, which effectively improves the classification accuracy of HSI.GCBN utilizes graph convolution operations to obtain rich nonlinear spatial spectral features in hyperspectral images to enhance the classification performance of BLS.Both the proposed method and GCBN combine graph convolution network with BLS, our approach differs from GCBN in: GCBN only pays attention to the problem that the linear sparse features in the broad learning network are difficult to fully represent hyperspectral data, but ignores the ability of BLS to discriminate hyperspectral image features.Since GCBN does not take into account the relationship between the inter-class spacing and intra-class spacing of sample features, it is difficult to ensure effective separation between sample classes and sufficient aggregation within classes, resulting in the misclassification of hyperspectral images by BLS.Our method introduces the concepts of local intra-class scatter and local inter-class scatter.By minimizing the local intra-class feature distance and maximizing the local inter-class feature distance, so as to improve the ability of BLS to extract features and reduce the phenomenon of misclassification of hyperspectral images.Finally, Compared with the GCN method, the proposed GCDBLS has better classification performance.As shown in Figure 7, GCDBLS has smaller classification error than the other five comparison methods, which further proves the advantages of GCDBLS.

D. EFFECT OF DIFFERENT TRAINING SAMPLES ON METHOD PERFORMANCE
To analyze the classification robustness of GCDBLS.In the experiment, training samples of different sizes are selected from three data sets for testing.For the three sets of benchmark data sets, each group randomly selects 2, 3, 4, 5, and 6 labeled samples of each category as the training set, the remaining samples are used as the test set, and OA is used as evaluation metric.The comparative experimental results of different methods are shown in Figure 8.
From the results in Figure 8, it can be found that with the increase of training samples, the performance of different methods on the three datasets has been significantly improved.In addition, the performance of GCDBLS model is better than the comparison methods.Since GCDBLS obtains  the context features of hyperspectral images and extracts more discriminating hyperspectral image features.Therefore, GCDBLS can get better performance in the case of changing training samples, which makes GCDBLS more robust and adaptive.

E. ABLATION EXPERIMENTS
Since GCDBLS is composed of three module items, GCN, BLS and discriminative regular term.To verify the effectiveness of different modules, we conduct ablation experiments on three datasets.In the experiment, BLS is used as the benchmark model, and two comparative models GCBLS and GCDBLS are obtained by gradually adding the GCN module and the discriminative regular term (intra-class scatter and inter-class scatter) to the BLS module.For the three datasets, we randomly select four samples in each category as the training set and the rest as the test set.We take spectral features as input to BLS, GCBLS, and GCDBLS,The experimental results of different models are shown in Table 5-7.Next, we use EMAP to extract multi-attribute profiles features, which are used as inputs to BLS, GCBLS, and GCDBLS.The experimental results of different models are shown in Table 5-7.
By analyzing the data in Table 5, the following conclusions can be drawn: 1) The model after introducing graph convolution network into the benchmark model BLS is called GCBLS.The experimental results on three datasets are better than BLS, which verifies the effectiveness of introducing graph convolution network into the benchmark model BLS.
2) In the GCBLS model, we construct local intra-class scatter and local inter-class scatter, and then introduce them into the GCBLS as discriminative regular term.It can be observed from table 5 that the experimental results on the three datasets are better than GCBLS, which verifies the effectiveness of introducing discriminative regular term into GCBLS.

V. CONCLUSION
In this paper,we proposed a graph convolution enhanced discriminative broad learning system.As a new hyperspectral classification method.This method extracts the context information of hyperspectral images through GCN, which makes the extracted spatial-spectral features more expressive in detail, and effectively overcomes the problem that linear sparse features in broad learning are difficult to fully characterize hyperspectral data.Moreover, GCDBLS introduces the concepts of local intra-class scatter and local inter-class scatter into BLS.GCDBLS can not only maintain the internal local geometric structure of the data, but also maintain the local difference information of the data, which enhances the ability of BLS to discriminate features.The experimental results on 3 real HSI datasets show that the proposed method has excellent performance on both the visual quality and quantitative metrics of the classification map.

FIGURE 1 .
FIGURE 1.The structure of GCN.

FIGURE 2 .
FIGURE 2. The structure of BLS.
sufficient aggregation within classes.The learned features lack more effective identification ability, resulting in limited expression ability of features.Therecore, we combine broad learning and graph convolution to propose a graph convolution enhanced discriminative broad learning system (GCDBLS).GCDBLS uses graph convolution operations to achieve information fusion of graph structure information and node features, and efficiently extract hyperspectral image context features.Then, We introduce the concepts of intraclass scatter and inter-class scatter into BLS.By minimizing intra-class scatter and maximizing inter-class scatter, while maintaining the local nearest neighbor relationship of data samples, we can also make similar samples more compact and samples of different classes more far away, so as to ensure that the features of BLS data samples have stronger discrimination ability and realize the maximum separability of samples.After minimizing the intra-class scatter and maximizing inter-class scatter, the samples visualization are shown in Figure3.On the basis of considering the minimization of sample classification error, GCDBLS introduces local intra-class scatter and local inter-class scatter to participate in the training of GCDBLS to minimize intra-class feature distance and maximize inter-class feature distance to improve feature discrimination ability.
hyperspectral image features extracted by GCN are recorded as X GCN .Next, we construct local intra-class scatter matrix and local inter-class scatter matrix.Before constructing local intra-class scatter matrix and local inter-class scatter matrix, we need to construct local intra-class adjacency graph and local inter-class adjacency graph.Local intra-class adjacency graph G w : In order to construct local intra-class adjacency graph G w , we first need to calculate the adjacency weight matrix of G w .The elements of adjacency matrix defined based on Gaussian function are shown in equation (13): where D b is the diagonal matrix, D ii = j W b,ij , L b = D b − W b is the Laplace matrix of G b .Definition 3 local maximum information difference matrix: We name the matrix S = S w − S b as the local maximum information difference matrix.
4 show the comparison results of different methods, and Figure 4-6 show the classification results of different methods.The hardware environment for this experiment is windows10 operating system, Intel Core i7-8750h CPU, 16 GB ram and NVIDIA Ge force GTX 1660 Ti 6 GB

FIGURE 8 .
FIGURE 8. Classification results of with different number of training samples.(a) Indian Pines, (b) Botswana, (c) KSC.

TABLE 2 .
Classification results of different methods on Indian pines (%).

TABLE 3 .
Classification results of different methods on Botswana (%).

TABLE 4 .
Classification results of different methods on KSC(%).

TABLE 5 .
Ablation experiments on indian pines.