A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data

With the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the conventional ML techniques, such as classification, achieved excellent performance in classification accuracy when applied in medical diagnoses, they have a fatal shortcoming of poor performance since the imbalanced dataset, especially for the detection of the minority category. To tackle the shortcomings of conventional classification approaches, this study proposes a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifier and final ensemble. In the first data pre-processing phase, we introduce the extension of Synthetic Minority Oversampling Technique (SMOTE) by integrating it with cross-validated committees filter (CVCF) technique, which can not only synthesize the minority sample and thereby balance the input instances, but also filter the noisy examples so as to perform well in the process of classification. In the classification phase, we introduce ensemble support vector machine (ESVM) classification technique, which were constructed by multiple diversity structures of SVM classifiers and thus has the advantages of strong generalization performance and classification precision. Additionally, in the last phase of the final ensemble strategy, we introduce the weighted majority voting strategy and introduce simulated annealing genetic algorithm (SAGA) to optimize the weight vector and thereby enhance the overall classification performance. The efficiency of our proposed ensemble learning method was tested on nine imbalanced medical datasets and the experimental results clearly indicate that the proposed ensemble learning paradigm outperforms other state-of-the-art classification models. Promisingly, our proposed ensemble learning paradigm can effectively facilitate medical decision making for physicians.

cancer cases are diagnosed too late, however, if they accepted accurate and early detection, more than 30% of these patients will be assured of long-term survival [4]. Consequently, it is of great significance for us to design an effective approach for early detection of diseases so as to improve the healthcare of our society.
Generally, due to machine learning (ML) techniques can effectively extract useful knowledge from large, complex, heterogeneous and hierarchical time series clinical data, these techniques have been widely utilized for medical diagnosis [5]- [8]. Moreover, with the help of ML techniques, the possible medical errors of the pathologists and physicians that caused by inexperience, fatigue, stress and so on can be avoided and the medical data can be examined in a short time and in a more detailed manner [9]- [11]. To the best of our knowledge, the problem of medical diagnosis has been attributed to classification issues, as previous studies demonstrated that various classification methods have been utilized for medical diagnosis, such as neural networks, Naïve Bayes, KNN, SVM, and most of these classification models achieved excellent performance. However, these state-of-the-art classification models only focused on classification accuracy, while neglecting the imbalanced characteristic of the input original data. More specifically, when the input data is imbalanced, the classifier will biased toward the majority class and the decision boundary (line) will biased toward minority class samples [12], [21], [56], [57], as can be observed from Figure 1. Therefore, the classification performance will dramatically deteriorate, and the poor performance of the classifier will become greatly troublesome, especially when applied in medical diagnosis. Motivated by the above deficiency, in this work, we only concentrate on research the binary classification problem and proposed a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases. In the first phase, we introduce Synthetic Minority Oversampling Technique (SMOTE) integrated with cross-validated committees filter (CVCF) technique for resampling the examples. To the best of our knowledge, SMOTE has been proven to be superiority to under-sampling and can increase the number of instances in the minority class by creating new synthetic instances rather than relying on replication [13], [14]. However, SMOTE only focused on synthetic the minority instances, while neglecting the presence of class noise. Motivated by this drawback, we introduce CVCF noise filter technique to remove the noisy examples and thereby construct the integrated SMOTE-CVCF technique for data pre-processing. CVCF is a committee-based filter technique, which can obtain the excellent performance in noise filter [15], [16]. Then in the second phase, we introduced the ensemble learning technique for classification. It is worth noting that ensemble learning, as one of the state-of-the-art technologies in machine learning, can generate more accurate classification results than a single classifier because it reflects the benefits from both the performance of the different classifiers and the diversity of the errors [17]- [20]. Nevertheless, the most important concern is that when we apply ensemble learning techniques for classification, we should consider that there are typically two main challenges: the one is how to select diversity classifier members to form an ensemble, and the other is how to fuse the individual decisions of the base classifiers into a single decision result [21]. It is worth pointing out that SVM as the widely utilized classifier and has been proven to be one of the most effective approaches for addressing binary classification problems, and thus show superiority regarding low algorithmic complexity and high robustness [9]. Due to such advantages of SVMs, they have been widely utilized for classification. However, previous studies only focused on either tuning SVM classifier's parameters or performing feature selection [22], which may lead to overfitting and cannot produce the optimum results. Motivated by this deficiency, in this work, we apply multiple diversity structures of SVM classification models to construct the ensemble members. In the final phase, we introduce the weighted fusion strategy, which can not only overcome the shortcomings of majority voting but also can measure the importance of each ensemble member in the final classification. Herein, to find the optimal weight vector, we apply a hybrid algorithm of simulated annealing genetic algorithm (SAGA) for optimization and thereby find the optimum weight vector for the fusion strategy. To the best of our knowledge, no study has been performed to diagnose clinical diseases utilizing multiple diversity structures of SVM ensemble classifiers based on imbalanced datasets. To fill in this gap, we proposed a novel ensemble learning paradigm for medical diagnosis based on imbalanced data and envision that our proposed ensemble learning paradigm can be regarded as a useful clinical intelligent diagnosis tool for medical decision maker. The main contributions of this work can be summarized as follows: • A novel multi-stage ensemble learning paradigm is proposed for medical diagnosis based on imbalanced data. To the best of our knowledge, this is the first comprehensive ensemble learning technique employing multiple diversity structures of SVMs perform for classification. In addition, this is the first study, where SAGA algorithm has been employed to explore the optimal weight vector for weighted fusion strategy.
• We propose a novel data preprocessing strategy, which introduce SMOTE-CVCF integrated technique for data resampling. It can not only overcome the deficiency caused by SMOTE and thereby remove the noise examples effectively, but also can synthetic the minority instances and rebalance the input data set and thus perform well in the process of classification.
The rest of our paper is organized as follows. Section 2 presents related works on medical diagnosis. Section 3 introduces preliminaries of our study. Section 4 proposes the framework of our proposed method. Section 5 presents experimental analysis to validate the effectiveness of our proposed method. Finally, the conclusions of this research are summarized in Section 6.

II. RELATED WORK
In this section, we briefly review the related work about the imbalance learning and the classification approaches in medical diagnosis.

A. RELATED WORK ON THE IMBALANCED LEARNING
Imbalance learning techniques have been drawn a lot of attention from both the pattern recognition and the machine learning communities [68], [69]. Reference [70] proposed a novel ensemble method, which converts the imbalanced dataset into multiple balanced datasets firstly and then construct a number of classifiers on these multiple data with a specific classification algorithm. Reference [71] proposed a two-stage algorithm to deal with imbalanced data classification problems, in the first stage, the algorithm generates a set of IGs utilizing meta-heuristics approaches, which is a dynamic clustering using particle swarm optimization, genetic algorithm K-means and artificial bee colony K-means together, then in the second stage the classifier has been applied to classify the data. Reference [59] introduced a novel over-sampling technique. This approach which utilize the real-value negative selection method to generate artificial minority data, and then the generated minority data with actual minority data are combined with the original majority data as the input data which can be performed for classification. Reference [58] introduced a new selfadaptive cost weights-based SVM cost-sensitive ensemble for imbalanced data classification. The proposed method not only apply cost-sensitive SVMs as basic weak learner but also modify the standard boosting scheme to cost-sensitive ones, finally the extensive experimental results verify the efficacy of our proposed approach. In 2019, Reference [72] introduced a novel SMOTE based class-specific extreme learning machine approach, in their proposed approach they exploits the benefit of both the minority oversampling and the class-specific regularization. Reference [73] presented an effective oversampling method which combine k-means clustering and SMOTE together and the combination of the proposed method can avoid the generation of noise and effectively overcomes imbalanced between and within classes. In another study, Reference [74] use of heterogeneous ensembles for imbalance learning and the experimental results have shown that the heterogeneous ensembles provide significantly higher AUC and F 1 scores when compared to the ensembles utilizing a single classification method. In their proposed approach, they deal with the imbalanced problem from two aspects. The one is from the data level, they applied the related methods such as under-sampling or over-sampling, and the other is from the algorithm level, they adopted some improved algorithm such as adjusted the weight. However, they may not consider the noise problem then they perform for classification. Consequently, in order to deal with this issue, we applied the CVCF technique after rebalancing the data, which can remove the noise and thereby improve the performance of the final results.

B. RELATED WORK ON THE CLASSIFICATION APPROACHES IN MEDICAL DIAGNOSIS 1) MEDICAL DIAGNOSIS APPROACHES BASED ON A SINGLE CLASSIFIER
Due to the advantages of ML techniques, many basic data mining models, such as artificial neural networks (ANNs), decision tree analysis, support vector machines (SVMs), Naïve Bayes, KNN, have been utilized for medical diagnosis. To the best of our knowledge, neural networks have the advantage of capturing the correlations between attributes, so they have been widely utilized for medical diagnosis. Reference [23] developed a novel decision support algorithm to determine the most appreciate treatment method for rectal cancer survivals based on Analytic Hierarchy Process (AHP) and sequential decision trees. In the proposed method, the priorities of each sub-criteria were calculated by AHP method, and a sequential decision tree was constructed for the best treatment decision. In 2016, Reference [24] proposed a novel SVM parameter tuning scheme that uses the fruit fly optimization algorithm (FOA). In their proposed model, FOA algorithm can adjust the parameters of SVM effectively and thus output the optimization results. It can effectively reduce the computational time and improve the computational efficiency. In 2018, Reference [25] introduced a probabilistic neural network as a classification approach for the lung carcinomas. This method utilized a simple segmentation method and a probabilistic neural network to obtain the classification accuracy of lung carcinoma, which is also capable of detecting low-contrast nodules and lung cancers of minor or equal to 20 mm of diameter. In another study, Reference [26] introduced a novel intelligent classification model for breast cancer diagnosis. In their proposed model, they employed information gain directed simulated annealing genetic algorithm wrapper (IGSAGAW) for feature selection and introduced the cost sensitive support VOLUME 8, 2020 vector machine (CSSVM) learning algorithm perform for classification. Reference [27] presented a convolutional Neural Network Improvement for Breast Cancer Classification (CNNI-BCC). In their proposed algorithm it can classify the mammographic medical images into benign patient, malignant patient and healthy patient without prior information of the presence of a cancerous lesion. Reference [28] constructed a hybrid model which is based on artificial neural network and fuzzy logical for cardiac arrhythmia classification. The proposed hybrid model consists of two basic module units, each basic module unit includes three different classifiers based on the fuzzy KNN algorithm, multilayer perceptron with gradient descent and momentum (MLP-GDM), and multilayer perceptron with scaled conjugate gradient back propagation (MLP-SCG) model, and then the output of the classifiers can be combined utilizing a fuzzy system for integrated of the results. In another study, Reference [29] presented a novel heartbeat classification technique based on deep convolutional neural network and batch-weighted loss function for heartbeat classification. In their proposed model, it can not only well performed the classification task without noise removal or feature extraction, but also can effective quantify the loss bring by imbalance data with the help of introducing batch-weighted loss function.

2) MEDICAL DIAGNOSIS APPROACHES BASED ON CLASSIFIER ENSEMBLE
To overcome the weakness of a single classifier, in recent years, studies have increasingly focused on constructing ensemble models for medical diagnosis and the empirical results demonstrate that ensemble models have performed better than single model. In 2009, Reference [30] introduced neural network ensemble classifier for heart disease diagnosis. The proposed method creates new models by combining the posterior probabilities or the predicted values form multiple predecessor models. Reference [12] proposed an integrated of sampling technique, which incorporated both under-sampling and over-sampling technique together for resampling the input dataset and then employed the ensemble of SVMs perform for classification. Reference [31] presented boost SVM approach to predict the post-operative expectancy in the lung cancer patients. The weight of SVM learning criterion were determined by the ensemble learning approach, which can minimize the error of external sequential in boosting loop. Reference [32] proposed a novel dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) method for breast cancer diagnosis. The proposed DCE-MRI method based on a mixture ensemble of convolutional neural network which is a modular and image based ensemble, and can stochastically parathion the high-dimensional image space through simultaneous and competitive learning of its modules. Reference [33] proposed a novel nested ensemble learning technique for automated diagnosis of breast cancer, which utilized the combination of Stacking and Voting to detect the benign breast tumors from malignant cancers, as for each nested ensemble classifier which contain ''Classifiers'' and ''Meta-Classifiers'', and for each ''Meta-Classifiers'', which have two or more different classification algorithm. In another study, Wang et al. [11] designed a WAUCE (Weighted Area Under the Receiver Operating Characteristic Curve Ensemble) ensemble learning model for breast cancer diagnosis based on twelve different structures of SVM classifiers, and finally obtained the final results adopt the weighted area under the receiver operating characteristic. Reference [10] presented a novel technique for predicting types of kidney stone, and the proposed method was an ensemble learning method that included different individual classifiers, During the process of ensemble learning, each classifier was assigned a weight calculated by genetic algorithm (GA) and finally introduced the weight majority voting to fusion the final results of each classifier. In 2019, Reference [34] proposed a two-staged model based on tree ensemble to predict the survival of colorectal cancer. In their proposed method, they adopted ensemble classification model for the first stage to predict if the patient is survival or not and then introduced another ensemble learning regression model for the second stage to predict the remaining lifespan of the patients whose predicted output is death in the first stage. This two-stage prediction can effectively predict the survive time of the patient precisely, which overcome the deficiency of the traditional prediction methods. In another study, Reference [35] also proposed a novel stacking-based ensemble learning model for prostate cancer detection. In their proposed model, they simultaneously construct the diagnostic model and extract the interpretable diagnostic rule. Then in order to maximize the classification accuracy and minimize the ensemble complexity of the proposed model, they constructed a multi-objective optimization function, and then adopted the non-dominated sorting genetic algorithm-II (NSGA-II) algorithm to find the Pareto optimal solution. In summary, numerous previous studies have demonstrated that the results of ensemble learning have achieved superior performance than single classifiers, which can leverage the strength of individual classifiers and thereby output the optimal results [36]- [38]. Due to its advantages, in this work, we introduce ensemble learning technique for medical diagnosis.

III. PRELIMINARIES OF OUR STUDY
This section presents some preliminaries of our study, which include extend SMOTE with CVCF integrated filter technique, different types of kernel-based SVMs and the SAGA algorithm.

A. SMOTE-CVCF INTEGRATED FILTER METHOD
As a simple and effective oversampling method, SMOTE demonstrates performance that is superior to random oversampling [3], [6], [39]. By means of SMOTE, the number of instances for the minority category is increased by creating new synthetic instances [40]. The basic assumption of SMOTE is that the synthetic data points are generated on the line connecting the minority samples to their k nearest Step 1: Calculate the distance between a feature vector in the minority category and one of its k nearest neighbors.
Step 2: Multiply the distance obtained in Step 1 by a random number between 0 and 1.
Step 3: Add the value obtained from Step 2 to the feature value of the original feature vector. Then, a novel feature vector is created by formula (1).
where x n denote the novel synthetic minority sample, and x o denote the feature vector of each sample in the minority category. x oi denote the i th selected nearest neighbor of x o , and δ denotes a random number between 0 and 1. As shown in Figure 2, we notice that SMOTE can synthetic the minority instances and thereby rebalance the training dataset. However, when facing with noise examples, SMOTE can't perform well or even reinforce it. In order to overcome this deficiency, in this work, we introduce CVCF technique to filter the noisy examples first. As an effective noisy filter technique, CVCF was proposed by Verbaeten and Assche [15], which employed multiple classifiers using a single classification technique in a cross-validation strategy, if the misclassified examples by all cross-validation or most of the rounds, then it will be regarded as the noise and thereby be removed from the dataset [41]. The pseudo-code for the CVCF noisy filter technique is shown in the following.

B. KERNEL-BASED SVMs
In this work, our ensemble learning technique was constructed by two different structures of SVM classifiers (i.e., C − SVM and v − SVM ) and five types of kernel functions (i.e., Linear kernel, Polynomial kernel, RBF kernel, Laplacian kernel, Sigmoid kernel). Additional details can be described as follows: Given a training dataset D = {(x i , y i ) : x i ∈ χ n , y i ∈ γ , i = 1, 2, 3, . . . , τ }, herein χ n denote the n-dimensional feature space, and γ denote the category label, in this work we set γ ∈ {−1, +1}, then the dual form of the C − SVM model is presented as below: where in formula (2), α i denotes the Lagrange multiplier, κ(x i · x j ) denotes the kernel function, and C denotes the regularization term, which is used to balance the structural risk and empirical risk [11], [42], [43]. Additionally, in this work, we also introduce another SVM where in formula (3), α i denotes the Lagrange multiplier, κ(x i ·x j ) denotes the kernel function, v represent the parameter to control the upper bound on the fraction of margin error.
Herein v ∈ [0, 1] is the parameter to control the upper bound on the fraction of the margin error and it also determines the lower bound of the fraction of the support vectors [44], [45]. From formula (2) and (3), we can clearly observe that the structures of these two SVM-based models vary greatly.
Additionally, in order to construct diversity classifiers to form an ensemble, we also introduce five types of different kernel functions, as demonstrated in Table 1.

C. SAGA HYBRID ALGORITHM
In this work, in order to find the optimal weight vector for the final ensemble, we introduce the SAGA hybrid algorithm. As an improved meta-heuristic approach, the hybrid SAGA algorithm can overcome the shortcomings of Simulated Annealing (SA) algorithms or Genetic Algorithm (GA), as well as the convergence to the global optimum solution. GA has the advantages of searching for the optimal solution quickly, but it has the fatal shortcoming of susceptibility to being trapped in the local optimum. Motivated by this deficiency, we introduce SA algorithm to improve GA, which can effectively change the annealing temperature during the iteration process and avoid becoming trapped in the local optimum [46]. Numerous empirical results have demonstrated that the hybrid algorithm of SAGA exhibits superior performance compared with those of the single Particle Swarm Optimization (PSO) or SA algorithms [13], [46]. Due to the advantages of the SAGA algorithm, in this work, we introduce this hybrid algorithm to optimize the weight vector, which can dynamically adjust the annealing temperature during the iteration process and avoid becoming trapped in the local optimum, thereby converging to global optimum solutions. The detailed steps of the SAGA algorithm can be described as follows: Step 1: Set the initial parameters of SAGA: maxgen = 200; sizepop = 50; cross probability =0.7; mutation probability =0.05. Set the initial annealing temperature T 0 = 100; T end = 1; ξ = 0.8.
Step 2: Set T = T 0 ; create the initial population, and calculate the fitness of each individual.
Step 3: Set the initial generation is 0.
Step 4: Select a chromosome with the largest fitness for replication, then perform cross and mutation.
Step 5: Then, generate a new population, and evaluate the fitness for each individual.
Step 6: Replace the least fitness population with the new best individual and then judge gen < maxgen? if ''yes'', then implement Step 4; otherwise, execute the annealing operation.
Step 7: While T i < T end , if ''yes'', then output the optimal weight vector; otherwise, carry out the temperature operation of T j+1 = ξ × T j , then return to step 3. The overall flow chart of the SAGA algorithm is shown in Figure. 3.

IV. THE DESIGN OF THE PROPOSED METHOD
As noted before, an effective ensemble should consist not only of a set of models that are highly accurate but also the models that generate their errors on different parts of the input space as well. Thus, in our study, varying structures of SVMs have been utilized by each member of the ensemble to promote this necessary diversity. In general, our proposed novel ensemble learning paradigm can be structured in three consecutive stages, as it can be observed from Figure 5.  also can rebalance the input dataset and thus perform well in the process of classification.

B. ENSEMBLE PRUNING FOR CLASSIFICATION
In this work, we design multiple diversity structures of SVM classifiers to form an ensemble. Due to the strong generalization performance and classification precision, SVMs have demonstrated their superior performances to other conventional classification methods [12], [47]. Based on the different structures of the C-SVM and v-SVM models, we fully consider five different types of kernel functions, and thereby form an ensemble classifier. As it can be obviously observed from Figure 4, our proposed SVM ensemble learning paradigm consists of two different structures of SVM classifiers and five different of kernel functions. In particular, the diversity of the ensemble member mostly relying on different options of kernel functions and the structure of the SVM classifiers. According to each diversity SVM classifier, we employed the grid search approach to obtain the penalty parameter C and kernel function parameter g of the SVM classifier. Furthermore, the 10-fold fold cross-validation approach with five replications are been utilized for the training and thereby obtain the final results of each classifier.

C. ENSEMBLE FUSION STRATEGY
It is well-known that conventional ensemble fusion strategies, such as majority voting, consider the decision from each classifier equally, and thereby neglect the influence of those classifiers with low accuracy [11]. Consequently, in order to overcome the deficiency of majority voting, in our work we introduced a weighted fusion strategy, which can not only overcome the deficiency by majority voting but also has the advantages of considering the contribution of each classifier. The formula of the weighted fusion strategy as follows [10], [11]: In formula (4), w i represents the weight of each basic classifier, and h j i (x) represents the decision results of i-th classifier corresponding to j-th pattern. F [·] denotes the ensemble fusion strategy. Formula (5) represents the weighted ensemble fusion strategy. H (x) denotes the final result of the ensemble classifiers. Given the set of (h 1 i (x); h 2 i (x); . . . ; h N i (x)), which denote the output results of each classifier, where N represents the numbers of classifiers, w i is the corresponding weight of each classifier. Then according to formula (4) and formula (5), we can obtain the final results of ensemble learning. VOLUME 8, 2020 As noted before, one of the most critical issues in the weighted fusion strategy is determining the weight vector of each classifier [10]. However, how to determine the optimal weight vector of each classifier and thereby output the optimal results has become an imperative issue we should to solve. To the best of our knowledge, improve meta-heuristic algorithm of SAGA has the advantages of fast convergence to the global optimal solution [46]. In this regard, we introduce the SAGA algorithm to optimize the weight vector. In our work, the SAGA will find an optimal weight vector of w, measuring the importance of each SVM classifier in the final classification. The final classification results can be obtained by a simple linear combination of the decision values of the SVMs with the weight vector. In this way, the representation of each individual of our population in the GA is defined as a vector containing the weights of each classifier members.
where n represents the number of SVMs, and in our study, we take full account of the different structures of the C-SVM and the v-SVM models and five different types of kernel functions. Then, we set n = 10. When we applied the SAGA algorithm to optimize the weight vector, the most important consideration is to determine the fitness function. For this task, the fitness function of our SAGA is the accuracy of combined classifiers using the given weights. We define the fitness function as following: In formula (7), w i is the weight vector, and h j i (x) represent the output result of the i-th classifier corresponding to the j-th  pattern. In the GA, a population of 300 individuals is initially created using random weights for each individual. In each generation, the fitness function is performed on each instance of the population, and the population is sorted. From Figure. 6, we can clearly find that the crossover and mutation steps for the GA are shown, and each column represents a weight corresponding to each classifier.

V. EXPERIMENTAL ANALYSIS
In order to examine the efficacy and rationality of our proposed ensemble learning paradigm, we conduct an empirical analysis together with the experiment on the MATLAB 2016a platform. The performance parameters of the executing host are Windows 10 with an Intel (R) Core(TM) i5-8250U CPU at 1.80 GHz, X64, and 16 GB (RAM).

A. DATASETS
In order to evaluate the performance of the proposed ensemble learning approach in dealing with imbalanced medical datasets, we introduced nine imbalanced datasets with different imbalanced ratio (IR) in this work. The value of IR is computed as the ratio between the number of instances belonging to the minority category and the number of samples belonging to the majority category. Of these imbalanced online medical datasets, which come from UCI machine learning repository 1  and KEEL-dataset repository. 2 The detailed description of these datasets is presented in Table 2.

B. EVALUATION MEASURES
To evaluate the performance of our proposed hybrid algorithm, the precision, recall, G-mean, F-measure and AUC have been utilized as the evaluation approaches for the imbalanced data. The AUC represents the area under the ROC curve, and it provides a single-number summary for the performance of a learning algorithm and is one of the best methods for comparing classifiers in two-class problems, especially in imbalanced learning [48]- [50]. The value of G-mean [51] represents the geometric mean of the true positive rate (TPR) and true negative rate (TNR), which has been widely utilized for evaluation measure for imbalanced data. The calculation formulas are presented as follows: The evaluation measures listed above are based on the confusion matrix, which is shown in Table 3.
In Table 3, class 0 denotes the absence of disease and class 1 denotes the presence of disease. TP is the number of true positives, which represents cases that are correctly categorized in the 'positive' class; FN is the number of false negatives, which represents 'positive' class cases that are classified as 'negative'; TN is the number of true 'negatives', which represents cases that are correctly categorized in the 'negative' class; and FP is the number of false 'positives', which represents 'negative' class cases that are classified as 'positive'. In this work, we set positive cases as the majority category and set negative cases as the minority category. Additionally, in order to evaluate the performance of the ensemble classification models, we employed 10-fold crossvalidation and the overall performances of the classifiers are calculated by averaging the performance of the 10 subsets.

C. EXPERIMENTAL DESIGN
To demonstrate the superior performance of our proposed ensemble learning paradigm, we design the experiment from two aspects. In the first aspect, we explore the effect of our proposed ensemble learning approach with other state-ofthe-art classification models. To the best of our knowledge, these classification models have been considered state-of-theart classifiers and proved excellent in previous studies. The details of these classification models can be found in Table 4. Then in the second aspect, in order to verify the robustness of our proposed approach, we comprehensively compared its previous obtained results on nine imbalanced medical datasets with other state-of-the-art classification models. In our comparative experiments, for all the SVM classifiers we employ RBF kernel function, and the penalty parameters C and kernel function parameters g are determined utilizing grid search method, The variation range of parameter C is 2^(-5) to 2^(15); the variation range of parameter g is 2^(-9) to 2^(3); the step of average classification accuracy is 0.1; the step of parameter C is 0.2. For each comparative experiment, we repeat our experiment ten times and computed the average values.

D. EXPERIMNETAL RESULTS AND ANALYSIS
The results of the confusion matrix of our proposed ensemble learning paradigm for different imbalanced medical datasets are demonstrated in Table 5. Additionally, in this work, we also add a working example and explain the detail steps of our proposed method, as can be obviously observed in APPENDIX.

1) COMPARISON WITH OTHER STATE-OF-THE-ART CLASSIFICATION METHODS
In this subsection, we compare our proposed novel ensemble learning paradigm with other state-of-the-art classification models. As for each imbalanced medical dataset, we set a comparison experiment and the results are demonstrated in Table 6.
The ensemble model runs on each test instance individually, and each instance in test set can be classified into absence of the disease and presence of the disease. We employed 10-fold cross validation and the average performances of these results are calculated and analyzed to verify the superiority performances of our proposed approach. As can be seen in Table 6, in order to verify the effectiveness of our proposed approach, we set precision, recall, F-measure, G-mean and AUC as the evaluation measures. Table 6 indicates the results comparison of the proposed ensemble approach with some other state-of-the-art methods for different imbalanced medical data sets. As it can be obviously observed that the proposed ensemble learning paradigm can achieve higher precision, recall, F-measure, G-mean and AUC in all the imbalanced medical datasets in comparison with the other state-of-the-art classification models.

2) ROBUSTNESS ANALYSIS OF DIFFERENT CLASSIFICATION APROACHES
In order to further verify the superiority of our proposed ensemble learning paradigm, we also compared the robustness of our proposed ensemble learning method with the other state-of-the-art models.
According to ref [75] the relative performance of the algorithm n (n represent one of the eleven algorithms) on a certain benchmark dataset can be measured by the following ratio: a n = MPrecision n max n MPrecision n (12) b n = MGMean n max n MGMean n (13) c n = MFmeasure n max n MFmeasure n (14) d n = MRecall n max n MRecall n (15) e n = MAUC n max n MAUC n (16) where MPrecision n , MGMean n , MFmeasure n , MRecall n , MAUC n denote the mean of Precision, recall, F-measure, G-mean and AUC obtained by the algorithm n with different imbalanced ratios and classifiers on a certain dataset, respectively. According to formula (12)∼(16), we can infer that the performance of the best performing algorithm n * on a certain medical dataset would be equal to 1, while the relative performance of other algorithms would be less than 1. The larger a n ,b n , c n , d n , e n values indicate that the algorithm n has the better relative performance. Based on the above analysis, we can conclude that the sum of a n ,b n , c n , d n , e n is, then the better the robustness of the algorithm is [59]. We calculated the a n ,b n , c n , d n , e n values of the eleven algorithms in all medical datasets, and the results are shown in Figure 7∼Figure 11. In Figure 7∼ Figure 11, the numerical value labeled at the top of each histogram represent the sum of the a n (b n , c n , d n , e n ) values of its corresponding algorithm in imbalanced medical datasets. From these figures we can clearly found that the our proposed ensemble learning paradigm always has the maximum sum values which are 7.847,7.396,7.787,7.968,7.833 respectively. VOLUME 8, 2020  Based on the above analysis, we can draw the conclusion that our proposed novel ensemble learning paradigm obtains the superior performances compared with other state-of-theart classification models in terms of classification accuracy, G-mean and AUC (area under the receiver operating characteristic (ROC) curve). Moreover, it can also obtain the best robustness among the eleven algorithms as compared.

VI. DISCUSSION
The aim of this study is to propose a novel ensemble learning paradigm for medical diagnosis that performs at the same level or better than the other state-of-the-art comparison methods. In this section, we will provide in-depth discussion of the performance of our proposed ensemble learning paradigm.
In this regard, an extensive empirical analysis has been carried on nine imbalanced medical datasets. In order to evaluate the performances of our proposed ensemble learning paradigm, we compare five evaluation measures (i.e., precision, recall, G-mean, F-measure and AUC). In Table 6, we report the results comparison of the proposed ensemble approach with some other state-of-the-art methods for different imbalanced medical data sets. In general, we draw the conclusion that selection of diversity structures of SVM classifiers and thereby construct the ensemble learning paradigm outperforms other state-of-the-art classification models. Form the results we can infer that two types of SVM structures with five different kernel functions are adopted in our ensemble, which can not only increase the ensemble model structure diversities, but it can also increase the VOLUME 8, 2020  parameter diversities. Moreover, we applied improved weighted ensemble mechanism which also considering the contribution of good basic classifier and thereby improve the final results. Finally the results of the experimental analysis indicate that our proposed method can achieve higher precision, recall, F-measure, G-mean and AUC in all the imbalanced medical datasets in comparison with the other state-of-the-art classification models. As it obviously indicated that the proposed ensemble classifier can improve the classification accuracy for medical diseases diagnosis, which also confirms our initial objective of this research.
Moreover, regarding the robustness of different algorithm, we calculated the a n ,b n , c n , d n , e n values of the eleven algorithms in all medical datasets and the results are shown in Figure 7∼ Figure 11. As it can be observed from the results presented in Figure 7∼ Figure 11, our proposed ensemble learning paradigm can achieve the best robustness among the eleven algorithms as compared. From the results, we can infer that our proposed ensemble learning paradigm can effectively avoid the generation of noisy instances which often occurred   in other oversampling methods and can effective rebalancing the input data. From the empirical results, we can conclude that on one hand our proposed approach can effectively solve the problems encountered by other over-sampling methods and can obtain the best robustness, on the other hand from the results presented in Table 6, our proposed approach can also achieve the best performances in terms of precision, recall, G-mean, F-measure and AUC. Furthermore, the analysis as described above has verified the effectiveness of our proposed ensemble learning paradigm.

VII. CONCLUSION
Medical diagnosis plays an important role in the healthcare system of our society, and the most important aspect is that the diagnostic results directly affect the patient's treatment and safety. To extract valuable knowledge for medical decision making can make our healthcare community better [64]- [67]. In this regard, we proposed a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifiers and final ensemble. First, we introduce SMOTE-CVCF integrate filter technique for data preprocessing, which can not only filter the noisy examples, but also rebalance the input dataset and thus perform well in the process of classification. Then, in the next phase, the C − SVM and the v − SVM with five kernel functions are been utilized to increase the diversity of the ensemble model. In the last phase, we adopt the weighted fusion strategy. Then, in order to obtain the optimum weight vector, we introduce the SAGA algorithm to optimize the weight vector to improve the reasonableness of our ensemble fusion strategy. To evaluate the performance of our proposed method, nine benchmark medical imbalanced datasets are introduced, and the empirical results of these medical datasets demonstrate that our proposed ensemble learning paradigm can achieve the superior performances than other state-of-the-art classification models. To the best of our knowledge, this is the first study that employs multiple diversity structures of SVM classifiers to form an ensemble for medical diagnosis with imbalanced data. The main objective of this work was to apply our proposed ensemble learning paradigm in a clinical disease diagnostic system and thereby facilitate clinicians in making high-quality and effective decisions in the future.

APPENDIX A WORKING EXAMPLE OF OUR PROPOSED ENSEMBLE CLASSIFIER
The working of the proposed ensemble learning paradigm can be clearly demonstrated by the following example: ( She has published more than 30 articles in international journals, including the Journal of Cleaner Production and the Journal of General Management. She has presented her research in several national and international conferences, including the IEEE, POMS, and INFORMS. Her current research interests include innovation management, knowledge management, and intelligent decision. She is a member of AOM and IACMR. ERSHI QI received the Ph.D. degree from the College of Management Science, Tianjin University, Tianjin, China, in 1992. He is currently a Professor with the Department of Management Science and Engineering, College of Management and Economics, Tianjin University. He is also the Dean of the College of Management Science. He has published several books/textbooks in the broader are of production management. He is often invited to national and international conferences for keynote addresses on topics related to intelligent decision, production system optimization, healthcare analytics, and decision support systems. He has published more than 100 articles in international journals, including Information Processing & Management, Mathematical Problems in Engineering, EJOR, Omega, Industrial Management and Data Systems, Kybernetes, the Journal of Applied Sciences, Information Technology Journal, and Sustainability (Switzerland). He has presented his research in several national and international conferences, including CIE, POMS, and INFORMS annual meeting. His current research interests include intelligent decision, machine learning, imbalanced data learning, production system optimization, ensemble models, and application of intelligent decision methods in healthcare domain. He regularly serves and chairs tracks for various production management and analytics conferences, and serves on several academic journals as the editor-in-chief, a senior editor, an associate editor, and an editorial board member. University. She has presided over a number of national, provincial, and ministerial projects and has published more than 20 articles. Her current research interests include knowledge management and intelligent decision.

MAN XU
BO GAO is currently pursuing the master's degree in intelligent decision and machine learning with the School of Computer Science and Technology, Anhui University, Hefei, China. As a Research Assistant, he has participated in research projects in a variety of fields, including database modeling and analysis, data visualization, intelligent decision, and application of intelligent decision methods in healthcare domain. His current research interests include machine learning, imbalanced data learning, and ensemble models.