Diversity-Driven Multikernel Collaborative Representation Ensemble for Hyperspectral Image Classiﬁcation

—Recently, kernel collaborative representation classiﬁ-cation (KCRC) has shown its outstanding performance in dealing with the problem of linear inseparability in hyperspectral remote sensing image classiﬁcation. Meanwhile, ensemble learning has attracted great attention in improving the performance of a single classiﬁer. Aiming at the limitation of a single classiﬁer, bagging algorithm based on KCRC(KCRC-bagging) is presented in this article. The KCRC-bagging method uses bootstrap to increase the diversity of base classiﬁers, thus improving the classiﬁcation accuracy and generalization performance. In order to reduce the scale of ensemble, a diversity-driven multikernel collaborative represen-tationclassiﬁerensembleapproach(DIV-KCRC)isproposed.DIV-KCRCveriﬁestheeffectivenessoftherepresentationclassiﬁerwiththepairofdiversitymeasures,andclassiﬁerswithhighaccuracy anddiversityareselectedtoimprovetheclassiﬁcationperformanceandefﬁciencyoftheensemblesystem.Threerealhyperspectral datasetswereappliedtoprovethevalidityoftheproposedmethod.TheexperimentalresultsdemonstratethatbothKCRC-bagging andDIV-KCRCcanyieldbetterclassiﬁcationperformancethantheircorrespondingbaseclassiﬁers.Inparticular,DIV-KCRCpro-videsmorereliableclassiﬁcationresultsthanKCRC-bagging.


I. INTRODUCTION
H YPERSPECTRAL remote sensing images can obtain information with hundreds of continuous spectral bands of surface objects, and provide rich spectral information to enhance the ability to distinguish ground objects [1], [2]. Hyperspectral Manuscript  remote sensing plays an important role in the diversity applications, such as national defense, environmental monitoring, agriculture, etc. Among them, supervised classification is a critical process for hyperspectral remote sensing image applications. However, the limited training samples, uneven quality, and high dimensionality results in great challenges for hyperspectral images classification [3].
In view of the above-mentioned problems, support vector machine (SVM) and extreme learning machine (ELM) are applied to hyperspectral image classification and achieved meaningful performance. However, a single classifier is often limited and cannot accomplish the best performance with complex and diverse scenarios of hyperspectral images. Therefore, how to integrate the advantages of different classifiers and achieve the effect of "1+1>2" is an important research direction. Ensemble learning has attracted much attention for its ability to use the advantages of multiple classifiers to complete the final decision and has shown promising performance than individual classifiers. Such as the classic Bagging and Boosting [4], and the improved ensemble method: a scalable end-to-end tree boosting system (XGBoost) [5], a highly efficient gradient boosting decision tree (LightGBM) [6], etc.
At present, ensemble learning is investigated from two aspects: improving the base classifier, and taking advantage of the diversity between classifiers. To improve the performance of the base classifiers, that is, to design a strong classifier suitable for hyperspectral image classification. Some traditional machine learning models have been used for hyperspectral image classification, and the most representative ones of which are SVM and ELM. SVM is one of the representative algorithms of the kernel transformation technique. Its main idea is to transform low-dimensional linear non-separable problems into a high-dimensional space for accurate classification. Melgani and Bruzzone were the first to test the application of SVM in the field of hyperspectral remote sensing image classification [7]. In order to further improve the performance of SVM, SVM-based extensions such as hybrid kernel SVM have been introduced, which have achieved certain improvements, but the selection of kernel function and optimal parameter combination is still a difficulty [8]. ELM has proven to be effective for high-dimensional data [9]. The output weights of the learning network can be obtained only through one-shot calculation, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ which effectively improves the parameter setting and improves computing efficiency in the training process, and has stronger generalization ability than SVM. Convolutional neural network methods is also widely used [10], [11]. All of the above methods need complex parameters. In recent years, the representation model has been widely concerned because of its advantages of simplicity and few parameters [12]. The core idea of this classifier is that the class labels of testing samples can be represented by a dictionary of labeled samples linearly. The representation coefficients are solved by different regularization functions, namely, sparse representation (SR) with l 0 -norm or l 1 -norm minimization constraints [12], [13], and collaborative representation (CR) using l 2 -norm minimization constraints. CR has attracted more attention due to its lower computational complexity and better classification performance [13], [14]. For example, the proposed novel CR-based nearest neighbor algorithms in [15] uses the weight coefficient of CR to find the truly closest training sample for each testing sample, which outperforms the traditional Euclidean distance. Tangent distance-based CR for classifier (TCRC) in [16], using manifold learning to project hyperspectral data into a simplified tangent space to achieve better performance. In addition to linear methods, some nonlinear CRC methods that use nuclear techniques are also proposed. For example, kernel CR with Tikhonov regularization is proposed in [17] incorporates neighborhood information into a kernel space to increase separability between classes, a novel kernel nonlocal joint CR classification method uses the similarity measure between spectral pixels as the mapping feature in [18], [19], and a multiple kernel CRC incorporates multiple kernel functions, enhance CRC performance [20], [21]. Liu et al. [22] proposed a probabilistic kernel cooperative representation classifier to solve the problem that the original KCRC could not do. Karaca et al. [23] designed a spatial aware probabilistic multikernel CR to overcome the problem of small samples in hyperspectral classification. Ma et al. [24] designed a discriminative kernel CR and Tikhonov regularization method to improve the utilization rate of correlation between different categories, which effectively improved the separability between samples. Su et al. [25] designed a shape-adaptive neighborhood KCRC model to explore the nonlinear characteristics of spatial features, which effectively improved the performance of the kernel cooperative representation model. Tu et al. [26] used the density peak to improve the super pixel model and extract local spatial information of images and designed a structural-kernel CR algorithm, which effectively improved the reliability of extracted spatial information. An optimized single classifier provides more choices of base classifiers for ensemble learning, but there is still room for further improvement.
Another idea is to increase and utilize the diversity among classifiers for ensemble learning [27], [28]. Bagging and boosting, the most typical ones, create diversity by changing the training sample subset. SVM and ELM are often integrated as base classifiers. Among them, rotation-based SVM integration uses data transformation and random features to generate training results of multiple SVMs to participate in the final decision [29]. Bagging-based ELM and Adaboost-based ELM use ELM as the base classifier for bagging and boosting respectively, which overcomes the problem in the original ELM caused by randomness of input weights [9], [21]. Artificial neural networks are also used in ensemble learning as weak classifiers. SR has been used as a base classifier in random subspace (RS) and bagging ensemble methods in the area of signal processing [32], [33]. CRC-bagging subsequently proved to be effective in hyperspectral classification [34]. Tangent space corepresentation projects hyperspectral data to a simplified tangent space to achieve better performance. Besides, TCRC is sensitive to training data and parameters, so TCRC combined with bagging and boosting can show better performance than CRC-bagging. It also effectively solves the problems of limited accuracy and weak generalization performance in a single TCRC [35]. In order to effectively utilize the diversity information of features, Chen et al. combined multifeature and Adaboost algorithm to extract Gabor features, gray level co-occurrence matrix, and extended morphological profile (EMP) features for stack, and each can increase the diversity between base classifiers by using different spatial features [36], [37]. Chen et al. [38] also proposed an SVM algorithm based on RS, which effectively improved the classification accuracy and obtained feature sequences by feature selection. Then, feature sequences were divided into sections and randomly sampled according to a sampling ratio, and the performance and diversity of base classifiers were also improved. In order to improve the computational efficiency, an adaptive shape neighborhood RS-based k-nearest class CR with Tikhonov algorithm was proposed by combining CR and RS [39], and the spatial structure information of training and testing samples was further explored.
It is worth noting that in addition to creating diversity, it is also important to quantify the diversity between classifiers [40]- [42]. Bi et al. studied the impact of diversity on the accuracy of classifier ensemble, and proved that effective diversity indicators can improve ensemble accuracy while reducing the integration scale [41], [42]; Kuncheva et al. [43] used nine measures to test the performance of two classic algorithms, bagging and boosting, and experiments show that boosting can produce more diversity even for stable classifiers. Nan Li et al. used voting and paired difference evaluation indicators to measure the difference between classifiers and ensemble pruning, reducing the integration scale, and improving the calculation efficiency [26], [28], [44]. Zhao et al. [45] proposed a classifier ensemble difference measure based on complementary information entropy under fuzzy relation. It measures the uncertainty contained in the classification data space, and the difference between base classifiers is gauged according to the information, which solves the problem that fuzzy data cannot be directly processed in the diversity assessment of multiclassifier systems. A heuristic classifier ensemble algorithm considering sparsity and diversity is proposed in [46]. In addition, Tan et al. [47] combined the three diversity metrics of different (D), double fault (DF), and correlation coefficient (ρ) to evaluate the difference between SVM, k-nearest neighbor, multinomial logistic regression, and ELM, and select the three most diverse classifiers for integration, which effectively improves the classification accuracy.
From the above-mentioned analysis, the first problem is the existing differentiation-guided classifier ensemble methods are mostly verified on UCI data sets, and the effect of hyperspectral feature classification is unknown. Moreover, traditional machine learning models are mainly used. In other words, the base classifier involved in the ensemble is relatively simple. Another problem is that although the ensemble learning based on the representation model classifier can increase the diversity of the ensemble system, but without the evaluation on diversity. There is no relevant research on the diversity of representing model classifier. Therefore, how to solve the problem of diversity and precision balance in ensemble learning is very important, and how to choose a strong classifier with diversity as the basic classifier is also the key.
In this article, a bagging ensemble learning algorithm based on KCRC (KCRC-bagging) and a diversity-driven multikernel CR classifier ensemble (DIV-KCRC) method for hyperspectral image classification are proposed. KCRC-bagging selects the KCRC as the base classifier and uses the Bootstrap sampling method to generate training subsets with differences, and then creates a series of classifier combinations with diversity to participate in the ensemble to improve the generalization performance of KCRC. The DIV-KCRC explores the diversity among CR classifiers guided by different kernel functions and effectively utilizes the advantages of KCRC. In this method, the CR classifiers of different kernel functions are combined as base classifiers, and the differences among base classifiers are evaluated by four diversity measures. With the verification accuracy and diversity evaluation results, the optimal combination is selected to participate in the integration for the final classification result. Through the final classification results, the effect of several diversity indicators is analyzed, and then the diversity indicators with good effects are selected, and the validity of the selected indicators is validated on other data sets. The main contributions of this article are as follows.
1) The novel nonlinear ensemble learning method combining KCRC and bagging ensemble strategy is presented for hyperspectral image classification. The KCRC-bagging improves the CRC-bagging by combining the kernel cooperative representation classifier with the bagging ensemble method and can solve the problem of the linear inseparability of low-dimensional space more effectively. 2) The differences between different KCRC is evaluated by the diversity measures for the first time, and multikernel CRs with more diversity were used in ensemble learning, making full use of the advantages of different KCRC. Meanwhile, the difference in the ensemble system is ensured through the guidance of diversity.
3) The prior knowledge of accuracy and diversity is added before the ensemble to avoid base classifiers with poor performance in the ensemble, which reduces the ensemble scale and improves the generalization performance. The rest of this article is organized as follows. In Section II, a detailed explanation of CRC, KCRC, pairwise diversity measures, and diversity in ensemble learning are presented. Section III is dedicated to introducing the proposed KCRCbagging and introduced the DIV-KCRC algorithm to reduce ensemble complexity. Section IV introduces the experiment and analysis of three real hyperspectral data. Section V discussed the parameters selection of our method. Finally, Section VI concludes this article.

A. CRC and KCRC
Let training data be denoted as X ∈ R F ×N (which contains F bands and N samples for M classes). Let the dictionary D be constructed using M different subdictionaries In CRC, an approximation of a testing sample y ∈ R F ×1 can be represented via the linear combination of atoms. The best approximation in the subspace is regarded as the class of sample y, which can be expressed as where the coefficient vector α can be estimated by solving the following optimization problem with the l2-norm regularization constraint: here λ is the regularization parameter as the tradeoff between the residual term and regularization. The analytical solution to (2) is where I is an identity matrix. After obtaining α, the class label of y can be determined according to (1). However, samples of HSI may be linearly indivisible in the original data space due to the complex scene environment and mixed pixels. A better approach is to use the kernel trick to map data from a lower dimensional space to a higher one to increase class separability. Define the nonlinear mapping Φ : In the KCRC, the optimization problem in the kernel-induced space is rewritten as where Φ(y) is the mapped testing sample y, and training atoms X mapping into Φ(D). The closed-form solution to α according to (3) is reformulated as using the kernel function K(d i , d j ), the coefficient α can be denoted as where K(D, D) is an N × N Gram matrix and K(D, y) is an N × 1 vector for the inner products between Φ(y) and Φ(D). Similar to (1), the label of sample y is determined by Algorithm 1: KCRC-Bagging. Input: KCRC: base classifier, K: Ensemble times XࢠR F×N Input training sets, λ: regularized parameter y: testing samples.
Calculating the class label class (y) for the testing sample y according to Eq.

B. Pairwise Diversity Measures
Generally speaking, diversity measures can be divided into two categories: pairwise and nonpairwise [48].
Suppose the classifier set C = {C 1 , C 2 , . . . , C P }, where P is the number of classifiers in the classifier set. Let a, b, c, and d respectively represent the number of correct and incorrect sample classification by classifiers, and the details are shown in Table I. The total number is T = a + b + c + d.
1) Q statistics: For any two classifiers Ci, Cj ∈ C, Q is calculated as and Q ∈ [−1, 1]. For statistically independent classifiers Q = 0. The overall diversity calculation method Q av of classifier set C is The diversity of classifier sets decreases with the increasing of Q [49].
2) Correlation Coefficient: For classifiers Ci, Cj ∈ C, Cor is defined as Correlation Coefficient (Cor) and Q are the same as positive and negative, and it can be proved that |Cor| ≤ |Q|. The diversity calculation method Cor av of classifier set C is The diversity of classifier sets decreases with the increasing of Cor [40].
3) Disagreement Measurement: The disagreement measurement (Dis) between C i and C j can be computed as and the Dis [40] of the entire classifier combination can be obtained as

4) Double Fault Measurement:
The misclassification probability of both classifiers can be measured by double fault measurement (DF) as The overall diversity DF of the classifier system is expressed as The diversity of classifier sets increases with the decrease of DF [49].

C. Diversity in Ensemble Learning
The diversity between classifiers is a necessary condition to improve the generalization performance of ensemble learning.
Obviously, if the error rates of the individual classifiers participating in the ensemble are highly correlated, then the ensemble is meaningless. Taking binary classification as an example, assuming that the error rates of the individual classifiers participating in the ensemble are independent of each other, it can be known from the Hoeffding [49] inequality that with the increase of the ensemble times K, the error rate of ensemble eventually tends to zero, namely where ε is the error rate of the base classifier, EL(x) represents the decision result of the ensemble system, and Y (x) represents the true value. However, (16) holds on the premise that the error rates of the base classifiers are independent of each other, which does not exist in real-world tasks. In the actual learning process, the base classifiers participating in the ensemble often have a unified learning objective, which determines the inevitable relationship between them. In fact, there is a certain conflict between the accuracy and diversity of base classifiers in ensemble learning. Generally, after the variance is high, increasing the accuracy will sacrifice the variance; on the contrary, in order to ensure the accuracy, a part of the diversity is often sacrificed. This phenomenon can be explained more intuitively from the perspective of error-ambiguity decomposition, which was proposed by Krogh and Vedelsby [28], [48], [49]. Assuming the classifier set C = {c 1 , c 2 , . . . , c K }, in the regression learning task, for x, the error-ambiguity decomposition of the ensemble system that completes the decision through the weighted average method can be defined as among them, Err = K i=1 ω i Err i and AM = K i=1 ω i AM i represent the generalization error and weighted divergence value of c i .
Since the definition of Err(EL) is based on the entire sample space and AM cannot be obtained a priori, Err(EL) is difficult to optimize directly. It should also be noted that the above derivation is only applicable to regression learning, and it is difficult to directly generalize to classification learning tasks.

A. KCRC Bagging
In this work, KCRC-bagging is proposed using the KCRC as the based classifier within the bagging framework. Benefited from the Bootstrap sampling method, a series of training sample sets {X 1 , X 2 , . . . , X K } is generated randomly from the raw training set X, where K represents the number of ensemble size. Then, a subtraining set constructs a discrepant dictionary D K = {D 1k , D 2k , . . . , D T k } in a new subdata X k . Therefore, each training subset trains a KCRC model, and the testing samples are classified by K representation coefficients α gained from  Fig. 1.

B. Kernel CR Classifiers Ensemble Based on Diversity
Although bagging increases the overall diversity of the integrated system by generating different training samples, it lacks the diversity evaluation between the base computational efficiency and poor integration effects.
In this section, a heterogeneous integration method for different kernel cooperative representation classifiers with diversity evaluation is proposed. The classifier poolC = {C 1 , C 2 , . . . , C P } consists of CRC models guided by different kernel functions. Meanwhile, the difference index is used to evaluate the classification similarity among different classifier models.
The higher the similarity, the smaller the diversity of the ensemble. The combination of accuracy and diversity is combined to determine the combination of base classifiers. First, the EMP feature of spectral data is extracted, and the spectral feature and EMP feature are divided into training and testing sets. Then, a fixed number of training sets were randomly selected from each class. Assume the k-nearest pixels similar to the testing sample belong to the same category and are taken as the verification set. A priori knowledge of the accuracy and diversity of basic classifiers is obtained through validation sets. Finally, using the diversity measures and the overall classification accuracy based on the verification set, a best group can be found. Finally, obtaining the classification result by the majority voting rule. It is worth noting that the most varied combinations were determined by a combination of four indicators. If the results of the four diversity indexes are inconsistent, the combination with high precision is taken as the final output result. Algorithm 2 shows the key steps of DIV-KCRC and its flow diagram is shown in Fig. 2.

IV. EXPERIMENTS
A. Hyperspectral Data Sets 1) Pavia University: The data set of University of Pavia provides 103 bands in the 0.43-0.86 μm wavelength range, excluding noise and water absorption bands. This data set is located in Pavia, Italy, and was acquired in 2003 by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor, and the size of the data is 610 × 340 pixels, which includes classes. The detailed information of each class is described in Table II, and images of this data set are shown in Fig. 3.
2) Purdue Campus: The second data set, Purdue Campus, was obtained by an airborne hyperspectral mapper (HYMAP) sensor. This data provides 126 bands in the spectrum of 0.45-2.48 μm, after bad band removal. The data size is 377 × 512 pixels under the 3.6 m spatial resolution. Six classes are contained in the HYMAP data set shown in Table III, and the false color image and ground-truth image are shown in Fig. 4(a) and (b).

3) Yellow River Delta:
The third data set is Yellow River Delta, which contains 330 spectral bands from visible light to shortwave infrared. The data of the remaining 285 bands after removing the 45 water absorption bands participated in the experiment. The Yellow River Delta hyperspectral data was collected on January 7, 2019, by the Gaofen-5 with a spatial resolution of 30 m, and the image size of the experimental area is 1185 × 1324. It is a typical coastal wetland area in Dongying City, Shandong Province, China (36°55'-38°16'N, 117°31'-119°18'E) [50], [51]. The data features complex types, including 21 types of features mainly composed of artificial vegetation, wet vegetation, halophyte vegetation, and complex water bodies, and the spatial distribution of various features presents a block structure, which is of great representative significance [52], [53]. The detailed information of each class is described in Table IV, and images of this data set are shown in Fig. 5.

B. Experimental Setup
In order to evaluate the performance of the two proposed methods, several related classifiers are chosen for comparison, including the classical single classifier SVM, typical integration methods RF [4], XGBOOST and LingtGBM, single classifier CRC and KCRC based on presentation model framework, the CRC-bagging algorithm combined with CR and bagging, and the ensemble method KCRC-All which combines ALL classifiers without difference evaluation. The CRC-bagging and KCRC-bagging are running for 20 iterations.

C. Classification Performance
In this paper, the commonly used indices such as overall accuracy (OA), average accuracy (AA), and Kappa coefficient (KAPPA) are selected to measure the performance of classifier.
1) Overall Accuracy: where P (C i ) OA represents the overall classification accuracy of classifier C i , M represents the total number of ground object classes, and N represents the total number of test samples. P represents a confusion matrix of size M × M . P mm represents the number of correctly classified ground objects of class m, and this element is located on the main diagonal of matrix P .
2) Average Accuracy: AA is obtained by dividing the sum of the precision of all classes by the percentage of the total number of classes 3) Kappa Coefficient: KAPPA is different from OA and AA, which integrates all the information of confusion matrix P . As a quantitative evaluation index for the consistency between the classification results of the analysis algorithm and the distribution of real ground object classes, KAPPA can reflect the overall classification accuracy of the classifier more comprehensively and accurately.
The optimum parameters for the proposed methods are listed in Table V. Table VI shows the corresponding numbers of base classifiers. The value of n denotes the number of neighboring pixels. The classification accuracies, such as OA, AA, and KAPPA, are included in Tables VII-XII. Boldface represents the best result.
For the Pavia University data set, Fig. 6(a)-(j) visually displays the ten classification maps. By comparing the classification maps of several algorithms, it can be found that KCRC-bagging and DIV-KCRC algorithms have better classification effects on the 4th, 6th, 7th, 8th, and 9th class, especially the DIV-KCRC algorithm has a better processing effect on the building boundary. From Table VII, classification accuracy performance of classifier combinations selected by four diversity measures (Cor, Q statistics, DF and Dis) are shown. The classifier combinations selected by the four diversity measures were all 134, which was 3.3% higher than the OA of all classifiers combined together. It is 0.28%, 3.36%, 3.81%, 3.36%, 3.54%, 3.81%, 3.81% higher than that of 123, 124, 125, 145, 234, 235, 245, 345, respectively, and only 0.10% lower than that of 135, and the DF value difference between combination 134 and combination 135 is only 0.0005. The experimental results validate the four difference evaluation indexes.  be seen from  Table VII, compared with single classifier KCRC and KCRC-all, the OA of KCRC-bagging is improved by 1.53% and 1.02% respectively. And the OA of DIV-KCRC algorithm is 13.81%, 14.48%, 14.38%, 5.81%, 3.81%, 2.07%, 3.81%, 3.3% higher than that of base classifier RF, SVM, XGBOOST, LingtGBM, CRC, CRC-bagging, KCRC, KCRA-All algorithm. The OA of the KCRC-ALL that integrates all base classifiers is increased by 3.3%, and the OA of the classic ensemble algorithm KCRC-bagging is increased by 2.28%. In addition to the overall classification accuracy, the AA and KAPPA of DIV-KCRC ensemble method are the best among the 10 methods, followed by KCRC-bagging.
The experimental results demonstrate the accuracy and effectiveness of the ensemble learning method based on difference evaluation for multikernel classifier, while reducing the ensemble scale.
In order to verify the reliability of the proposed method, the Purdue data sets with grassland, roads, and buildings as main ground object types were used for the experiment. For this data, 7 or 8 training samples were randomly selected for each class by the 10 algorithms. Experimental results of Purdue data show that both KCRC-bagging algorithm and the DIV-KCRC algorithm achieve good results. By comparing the classification results of 10 algorithms for this data as shown in Fig. 7(a)-(j), it is found that the two proposed algorithms have significantly improved their performance in distinguishing buildings and roads compared with the other eight comparison algorithms, especially DIV-KCRC algorithm, which has better fitting effect than KCRC-bagging algorithm.
As illustrated in Table IX, the classifier combinations selected by Cor, Q, DF, Dis were all 134. The accuracy of the selected combination is highest. From Table X Similar to the ROSIS data set, DIV-KCRC can also provide the best performance by producing 93.81% of OA, which increases 3.58% and 2.83% over KCRA-All and the KCRC-bagging respectively.
However, the CRC-bagging and KCRC-bagging have interesting performance, not even as good as a single classifier. It can be seen that ensemble learning without considering accuracy and difference may not achieve ideal results.
The above results show that the four difference indexes can be used to select the suitable classifier combinations for different data types, but the ground object types of the first two data sets     are relatively simple. In order to further verify the validity of the results, the Yellow River Delta with complex ground object types were selected to verify the reliability of the diversity index again.
For the Yellow River Delta, Fig. 8(a)-(j) shows the classification renderings. The detailed classification results of the diversity values are recorded in Table XI, and the best choice is 135. In addition, the selected results of the four difference indicators are consistent, and all have higher performance accuracy than other combinations with small diversity. It is worth noting that KCRC-bagging has a less than 1% improvement over KCRC and KCRC-All, while DIV-KCRC improved by about 2%. There are nearly 1.2% improvements from our methods compared with KCRC-ALL, and the accuracy of the original KCRC is almost up to 1.62%. As with the first two data sets, it is observed that DIV-KCRC achieved better results than KCRC-bagging. Especially for classes 3, 5, 7, and 18, DIV-KCRC's discrimination effect is significantly improved compared with the KCRC-bagging algorithm and other comparison algorithms. According to the experimental results of the Yellow River Delta, the DIV-KCRC ensemble method with diversity assessment can still achieve a good classification effect on complex data types, that is, prior difference assessment can improve the generalization performance of the ensemble system.

A. Parameters Analysis for KCRC-Bagging
For the KCRC-bagging algorithm, the parameter λ and the ensemble size K significantly impact the algorithm performance.
In the experiments, λ is set in the range of 1e-9−1e-1. The relationship between parameter λ and the OA is shown in Fig. 9.
In the three data, CRC algorithm performs the best at 1e-2, rapidly declines between [1e-2, 1e-4], and then tends to be stable.
Although CRC-bagging also shows a decreasing trend in Pavia University data between [1e-1, 1e-4], its OA is significantly higher than that of CRC after it becomes stable. In the Yellow River Delta, the regularization parameters increase between [1e-1, 1e-3] and show a slight decreasing trend between [1e-6, 1e-9]. KCRC was less affected by parameters, and the results of the three sets of data showed that KCRC achieved the best effect at 1e-2, and then tended to be stable. On the other hand, KCRC-bagging performs better than CRC and CRC-bagging in Pavia University as a whole, but its accuracy is not as good as that of KCRC except for λ= 1e-1, and its accuracy decreases with the decrease of regularization parameters. In terms of Purdue data, KCRC-bagging is greatly affected by parameters, and its accuracy is slightly higher than KCRC at 1e-3,1e-4,1e-8. In terms of Yellow River Delta, the OA of KCRC-bagging is higher than that of CRC and CRC-bagging as the first two groups, but the overall performance is slightly better than that of KCRC.
According to Fig. 10(a), the OA of KCRC-bagging first increases with the number of integrations in the Pavia University data set. When the number of integrations reaches 30, it changes and fluctuates between 30 and 50. When the number of integrations k = 40, the accuracy is close to 88%. The second data set, Purdue Campus, with the increase of integration times, the OA of KCRC-bagging shows a trend of fluctuating growth, while the OA of CRC-bagging shows an obvious increase between 5 and 15, and then shows a trend of first decreasing, then increasing and then decreasing. It can be shown in Fig. 10(b).
The results of Yellow River Delta are shown in Fig. 10(c). The overall performance of KCRC-bagging is relatively stable. After the number of integrations reaches 30, KCRC-bagging almost shows a stable performance, remaining between 95.35% and 96%, while CRC-bagging shows a significant increase in the number of integrations ranging from 15 to 25. After the number of K reaches 25, it tends to be stable. Fig. 10(b) and (c) shows that KCRC-bagging algorithm has more advantages in classification effect and stability. In addition, the three groups of experimental data also show that increasing the number of ensemble times does not necessarily improve the accuracy, and may even present a downward trend. After the number of ensembles reaches 15-25, it gradually tends to be stable.

B. Parameters Analysis for DIV-KCRC
For the DIV-KCRC algorithm, the performance is mainly affected by diversity values. From Fig. 11, there is no obvious linear relationship between the four diversity indices and the OA. From the distribution of the two, the evaluation results of the four indexes are consistent.
As can be seen from Fig. 11, the variation trend of Cor and Q is relatively consistent, and the distribution of DF and DIS is     similar. Among them, when the classification accuracy is high, the values of Q, Cor, and DF are at a low level, and the value of Dis is at a high level, that is, the classifier combination is diverse large. In addition, the distribution of the two diversity measures Cor and Q is more dispersed, and the diversity of the ensemble system can be judged more intuitively. For example, in Fig. 11(c), it can be seen that the classifier combination with precision at (0.96, 0.97) is more diverse than the classifier combination with precision at (0.925, 0.95). In other words, diversity results in higher accuracy. However, when the number of classifier combination is too high, the OA decreases. This proves that the diversity between classifiers is limited in ensemble learning. In addition, Fig. 11(c) clearly shows that the more the overall distribution of DF is inclined to 1, the worse the performance of the classifier is, and the distribution of the OA is at a lower level.
By comparing the results of the three groups of experimental data, it is found that more complex data include more diversity. Specifically, under the condition of ensuring the accuracy of the base classifier, increasing the difference between individual classifiers can effectively improve the ensemble accuracy, while reducing the number of individual classifiers in ensemble learning. Analyzing the relationship between the difference and precision of different combinations, it can be found that the classifier combination selected by the four diversity indicators is consistent and has the highest accuracy among all combinations. The experimental results of the three sets of data all prove the effectiveness of the diversity measures in ensemble learning.
Moreover, the final trend of the four diversity indexes is consistent, which can ensure that the selected combination is optimal. Among them, the trend of Cor and DF is more obvious than the other two indicators, and the accuracy is higher.

D. Time Complexity Analysis
According to Tables VIII, X, and XII for the three experimental data, DIV-KCRC significantly reduces the operation time compared with the traditional bagging algorithm and improves the computational efficiency. In other words, DIV-KCRC with diversity evaluation can effectively reduce the scale of ensemble and achieve better classification results with as few ensemble times as possible.

VI. CONCLUSION
In this article, ensemble learning method with kernel CR and multiple kernel CR classifier based on diversity measures are proposed for hyperspectral image classification, i.e., KCRCbagging and DIV-KCRC. For the proposed method, the diversity of the ensemble system is increased with different training sets. In addition, the similarity evaluation index is used to screen classifiers for ensemble and reduce the scale. Experimental results show that KCRC-bagging can obtain better performance than a single classifier in most cases, and the multiclassifier ensemble based on diversity measures (DIV-KCRC) can not only improve the performance but also greatly improve the computing efficiency, which demonstrates that the generating diversity and evaluation diversity are equally important in ensemble learning.