Selective Ensemble Learning Method for Belief-Rule-Base Classiﬁcation System Based on PAES

: Traditional Belief-Rule-Based (BRB) ensemble learning methods integrate all of the trained sub-BRB systems to obtain better results than a single belief-rule-based system. However, as the number of BRB systems participating in ensemble learning increases, a large amount of redundant sub-BRB systems are generated because of the diminishing difference between subsystems. This drastically decreases the prediction speed and increases the storage requirements for BRB systems. In order to solve these problems, this paper proposes BRBCS-PAES: a selective ensemble learning approach for BRB Classiﬁcation Systems (BRBCS) based on Pareto-Archived Evolutionary Strategy (PAES) multi-objective optimization. This system employs the improved Bagging algorithm to train the base classiﬁer. For the purpose of increasing the degree of difference in the integration of the base classiﬁer, the training set is constructed by the repeated sampling of data. In the base classiﬁer selection stage, the trained base classiﬁer is binary coded, and the number of base classiﬁers participating in integration and generalization error of the base classiﬁer is used as the objective function for multi-objective optimization. Finally, the elite retention strategy and the adaptive mesh algorithm are adopted to produce the PAES optimal solution set. Three experimental studies on classiﬁcation problems are performed to verify the effectiveness of the proposed method. The comparison results demonstrate that the proposed method can effectively reduce the number of base classiﬁers participating in the integration and improve the accuracy of BRBCS.


Introduction
In 2006, for the modelling of data characterized by incompleteness, fuzzy uncertainty, probability uncertainty, and non-linearity, Yang et al. [1] extended Wanling Liu, Weikun Wu, Yanggeng Fu, and Yanqing Lin are with the School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China.Email: 380509981@qq.com;wwk91@qq.com;ygfu@qq.com;765305442@qq.com.Yingming Wang is with the Institute of Decision Sciences, Fuzhou University, Fuzhou 350116, China.E-mail: ymwang@ fzu.edu.cn.To whom correspondence should be addressed.Manuscript received: 2019-03-24; revised: 2019-04-09; accepted: 2019-04-11 the evidence-based reasoning algorithm to propose their belief Rule-base Inference Methodology using the Evidential Reasoning approach (RIMER).RIMER is composed of a knowledge base and a reasoning machine, and was developed on the basis of fuzzy logic theory [2] , the Dempster-Shafer theory [3,4] , and traditional If-then rules [1] .A Belief-Rule-Base (BRB) system is an expert system that adds a confidence distribution to the If-then rule.After the construction of the BRB, quantitative information or qualitative knowledge is input for reasoning and analysis, ultimately to provide an informative basis for decision making.
Present research on BRB systems is mainly focused on the use of a single BRB system.However, the use of a single BRB system has some limitations.Its reasoning performance is affected by the parameter values, and where the training set is unevenly distributed or the amount of data is small, parameter training can be insufficient.Therefore, the decision information provided by the reasoning results suffers from locality.In 2016, Wu et al. [5] introduced the Bagging and AdaBoost algorithms.Their approach uses the accelerated gradient method [6] to train the parameters of a single BRB system, and then integrates multiple sub-BRB systems with ensemble learning methods to improve the reasoning ability.In ensemble learning, a common approach is to integrate all of the trained learning machines in order to obtain better results; ensemble learning produces better results than using a single BRB system in isolation.However, as the number of individuals participating in ensemble learning increases, the sub-BRB system begins to produce a large number of redundant base learning machines because of the decrease in individual differences.This results in a noticeable decrease in prediction speed and a dramatic increase in storage overhead, ultimately reducing the effective generalization ability of the system.
In response to these deficiencies, this paper proposes BRBCS-PAES, using selective ensemble learning methods for Belief-Rule-Base Classification Systems (BRBCS) based on the Pareto-Archiving Evolution Strategy (PAES).The improved Bagger algorithm is used to train the base classifier, and the training set is constructed by repeated sampling of the data, thereby increasing the degree of difference when the base classifier is integrated.In the base classifier selection stage, the trained base classifier is binary coded (with 1 meaning participation integration, 0 meaning no participation), and the number of base classifiers participating in integration and generalization error of the base classifier is used as the objective function for multi-objective optimization.Employing an elitist retention strategy and an adaptive grid archiving strategy to iteratively arrive at the PAES optimal solution set, three sets of classification data from UCI (University of California, Irvine) are used to verify the effectiveness of the proposed method.
The rest of the paper is organized as follows: Section 2 briefly reviews the basics of BRB, multi-objective optimization, and selective ensemble learning, and reviews some related works; Section 3 introduces the belief rule-base classification system of selective ensemble learning; Section 4 discusses three case studies to demonstrate the efficiency of the proposed method; and Section 5 concludes the paper.

Belief rule-base and RIMER method
The belief rules in the BRB are extensions of If-then rules [7] , adding the distributed confidence frame, the antecedent attribute weight, and the rule weight.The k-th belief rule [8] can be written as follows: T k ; then f.D 1 ; ˇ1;k /; .D 2 ; ˇ2;k /; : : : ; .D N ; ˇN;k /g (1)   where R k .kD 1; 2; : : : ; L/ represents the k-th rules, L represents the total number of rules; A k i .iD 1; 2; : : : ; T k / represents the antecedent attribute reference of the i -th attribute of the k-th rule, T k represents the number of attributes in the k-th rule; D j .jD 1; 2; : : : ; N / represents a set of rule result evaluation levels, N is the set size; and ˇj;k .jD 1; 2; : : : ; N; k D 1; 2; : : : ; L/ represents the belief degree of the result of the k-th rule on the j -th evaluation level D j .If P N j D1 ˇj;k D 1, then the k-th rules contain complete information, otherwise, the information in the k-th rule is incomplete.Â k .kD 1; 2; : : : ; L/ is the rule weight of the k-th rules, the antecedent attribute weight is ı k;i .kD 1; 2; : : : ; L; i D 1; 2; : : : ; T k /, and "^" expresses the logical conjunction (And operator).
The RIMER method is at the core of a BRB system [9] , and consists of three main steps [10] : (1) calculate the activation weight; (2) amend with the belief degree; and (3) use an Evidence Reasoning (ER) algorithm to synthesize activation rules [11] .
The calculation of the activation weight depends on the input data, the antecedent attribute weight, and the rule weight [12] .Before calculating the activation weight, we need to calculate the individual match of the antecedent attribute for each reference.Assuming that the BRB input x i .iD 1; 2; : : : ; M / is in numeric form, then according to utility information conversion [7] , the matching degree ˛j i of the i -th input relative to the reference value in the k-th rule is calculated from x i and A k i .iD 1; 2; : : : ; T k / as follows: Then the activation weight of the k-th rule is 308 Big Data Mining and Analytics, December 2019, 2(4): 306-318 calculated as while !k 2 OE0; 1; k D 1; 2; : : : ; L: When the input data contains fuzzy, uncertain data, we need to amend the belief degree of each evaluation level of the result portion.The correction formula for the belief degree ˇi;k of the i-th evaluation grade D i of the k-th rule is If the input data is complete, then ˇi;k D ˇi;k : In the ER algorithm, Wang et al. [13] proposed the ER analysis algorithm to combine all the rules in a BRB; the output f .x/ of the BRB can be expressed as f .x/D .D j ; ˇj /; j D 1; 2; : : : ; N where ˇj represents the belief degree relative to the evaluation result, calculated as Assuming .D n / is the utility value of the n-th evaluation level D n , the final numerical output of the BRB system is expressed as

Multi-objective optimization
Multi-Objective Optimization problems (MOPs) have two or more objective functions [14,15] ; they can be stated as follows: x/;f 2 .x/;:: : ;f n .x//;s:t: W g i .X / 0; i D 1; 2; : : : ; K g ; h j .X / D 0; j D 1; 2; : : : ; k h ; where x D .x 1 ; x 2 ; : : : ; x m / 2 X Â R; y D .y 1 ; y 2 ; : : : ; In the above formula, x D .x 1 ; x 2 ; : : : ; x m / is the mdimensional decision parameters, X is the decision space, y D .y 1 ; y 2 ; : : : ; y n / is n-dimensional target variable, and F .x/ is the objective function of mapping m decision spaces to n target spaces.g i .X/ 0 contains k g inequality constraints and h j .X/ 0 contains k h equality constraints; let X f denote the set of decision parameters x that satisfy all the constraints.Definition 1: Pareto dominant Suppose x A ; x B 2 X f are the two solutions of F .x/ D .f 1 .x/;f 2 .x/;: : : ; f n .x//.For x B relative to x A to be Pareto dominant .xB x A /, two equations need to be satisfied: (1) 8i D 1; 2; : : : ; n; f i .xB / f i .xA /, that is, in all objective functions, x B is not worse than x A I (2) 9i D 1; 2; : : : ; n; f i .xB / > f i .xA /, that is, x B is better than x A in at least one objective function.
Definition 2: Pareto optimal solution If a solution x 2 X f satisfies :9x 2 X f W x x ; it is called a Pareto optimal solution.In an MOP, a solution is actually an approximation set of candidate solutions which offer trade-offs between the multiple objectives, where an improvement in one objective value will result in a decline in one or more of the others [15] .
MOPs were previously solved by being treated as single-objective problems using the weighted sum approach, but recent years have witnessed significant progress in the development of Evolutionary Algorithms (EAs) for MOPs [15,16] .The majority of existing Multi-Objective Evolutionary Algorithms (MOEAs) are based on Pareto dominance.In MOEAs, the utility of each individual solution is mainly determined by its Pareto dominance relations with other solutions visited in the previous search.Since using Pareto dominance alone can reduce the diversity of a search, certain techniques such as fitness sharing and crowding have often been used to compensate [15,17] .Arguably, PAES is one of the most popular Pareto dominance based MOEAs, proposed by Corne et al. [18] , in 2000.

Pareto Archiving Evolution Strategy (PAES)
PAES [14,19] is a classical method for the evolutionary multi-objective optimization algorithm.PAES uses .1C1/ evolution strategy, in which a population of solutions is used to create offspring solutions using a mutation operator.The dominance relationship of offspring and parental solutions is compared, with the elite retention strategy used to retain the better solution and establish a Non-Dominated Solutions (NDS) file to retain the solution of the previous generation.
The PAES algorithm consists of three parts: (1) Generation of candidate solutions; (2) Selection of candidate solutions; and (3) Construction of the NDS.The algorithm (shown in Fig. 1) randomly generates an initial solution, calculates the target value corresponding to the initial solution, and adds it to the NDS.A candidate solution is obtained by the mutation of a parent solution or the recombination of multiple parent solutions, and the target value of the candidate solution is calculated.If the candidate solution is dominated by the parent solution, then, according to a certain probability of performing a mutation or reorganization operation, a new candidate solution is generated; otherwise, the dominance of the offspring solution is compared with other solutions in the NDS.The NDS is updated by the adaptive grid archiving strategy, and a mutated or recombined solution in the NDS is selected as the new parent solution.The process iterates until it reaches the end condition.

Adaptive grid archiving strategy
The PAES algorithm uses the adaptive grid archiving strategy to maintain diversity in the Pareto-optimal set.The main purpose of the adaptive grid archiving strategy is to make a choice between the parent solution and offspring when updating the NDS.If the candidate solution is dominated by any solution in the NDS, the candidate solution is deleted.The basic idea is to divide the target space into many grids and assign a grid to each individual.The crowded comparison operator is used in various stages of PAES to guide the selection [15] .
Reference [20] points out that when appending candidate solutions to the NDS, three things need to be considered: (1) The size of the NDS is limited; (2) The algorithm produces a new non-dominated sub-solution after each iteration; and (3) The distribution of solutions in the NDS must be more uniform than the distribution of solutions in the target space.Based on the above three aspects, for a candidate solution to join the NDS,  NDS and the grid is updated.Otherwise, if the NDS is full and adding the candidate solution increases the crowding coefficient of the grid in the NDS, a solution in the grid with the largest crowding coefficient will be deleted.When making the judgement in this step, the crowding coefficient of the grid where the candidate solution is located is compared with the parent solution.If the candidate solution is less crowded, the candidate solution is added to the NDS and the grid is updated; otherwise, the candidate solution is discarded and the process moves to the next iteration.
The size of the NDS will increase or decrease in the iterative process, with the size of the grid automatically adjusted through the adaptive algorithm.Figures 3 and  4 show two common scenarios for adding a candidate solution to the NDS and re-dividing the grid.In Fig. 3, the NDS file is full, so the points of NDS which have a large grid congestion coefficient are randomly removed, and the new solution is then added to the NDS.In Fig. 4, when the NDA file set is not full, we need to repartition Fig. 4 When the archived set is not full, the mesh is newly meshed and the new solution is added to the NDS.
the grid and add the new solution to the NDS.

Selective ensemble learning
Selective ensemble learning is a learning algorithm that trains a number of base classifiers and selects some of them to form an ensemble [17] .As shown in Fig. 5, a certain measure is used to select a number of pretrained base learning machines to form an ensemble base learning machine, with the base learning machine to be processed being equivalent to a different solution to a problem.

Related works and challenges
A common approach in ensemble learning is to combine all of a set of trained learning machines to obtain better results.Wu et al. [5] put forward the ensemble rule-based learning method, which is combined with AdaBoost and Bagging.It produces better results than a single rule-based system, but as the number of rule-based systems in the ensemble increases, individual differences become increasingly difficult to obtain.Furthermore, as a large number of redundant basic learning machines are generated, the prediction rate will significantly reduce, and the storage space requirement will significantly increase, thereby reducing the effective generalization ability.Therefore, Zhou et al. [21] proposed the concept of the selective ensemble in 2002, by which some selection criterion is applied so that only selected basic learning machines are involved in the ensemble.
In order to solve the existing problems, this paper introduces the multi-objective optimization algorithm PAES to select the base classifier for ensemble learning.The improved Bagging algorithm is used as the training strategy to construct the training set, thereby increasing diversity of the base classifier.In the base classifier selection phase, the trained base classifier is binarycoded, an elitist retention strategy is adopted to obtain the PAES optimal solution set, and the solution set is updated using the adaptive grid archiving strategy.Because belief-rules are traditionally constructed by traversal combination, the RIMER method suffers a "combinatorial explosion" problem during rule building.Taking the categorical Breast Cancer dataset on UCI as an example, containing 30 antecedent attributes, if we assume that each candidate value of the antecedent attributes is set to 3, then the number of constructed BRB rules constructed by traversing the combination is 3 30 D 205 891 132 094 649.The method of traversal combination exponentially increases the combinations with the increase of antecedent attributes, and most real classification problems are multi-attribute.For this reason, Chang et al. [22] proposed the linear combination method of BRBCS, as shown in Fig. 6.
The linear combination method proposed by Chang et al. [22] overcomes the problem of "combination explosion" in the rule construction process, but does not provide a specific solution as to how many rules need to be generated.When the number of rules is too small, the classification performance of the BRB classifier will be reduced; when the number of rules is too high, storage space requirements will soar.Therefore, Ye et al. [23] proposed correlating the number of categories of results in the classification problem with the number of BRB rules, setting the number of rules equal to the number of categories, and performing a rationality analysis.At the same time, the BRB result evaluation level is mapped to the classification result.
In order to overcome the problem of "zero activation" in the process of rule activation, Ref. [23] improved the calculation method for the individual matching degree when seeking the activation weight, by returning the normalized value of the inverse of the distance from the input parameter to the candidate value in the rule as the individual match degree.Assuming that x i represents the i -th attribute value of the input data, the formula for calculating the individual matching degree of the k-th rule is as follows: The new rule weight is calculated as The constructed belief rule-base is trained by the Differential Evolution (DE) algorithm.This algorithm first initializes the size NP and the number of iterations T of the population P t D fp t 1 ; p t 2 ; : : : ; p t NP g.With t D 0; NP individuals in the initial population P 0 are initialized randomly.Each individual in the current population x t i .iD 1; 2; : : : ; NP/ is then mutated to produce a variant individual V t C1 i ; the formula is where x t r1 ; x t r2 ; x t r3 are random and satisfy r 1 ; r 2 ; r 3 2 f1; 2; : : : ; NPg ^r1 ¤ r 2 ¤ r 3 ¤ i; and F is the scaling factor.The above formula cross-reorganizes V t D1 i and X t i to produce cross-members U tC1 i ; the cross formula is where U t C1 i;j represents the j -th dimensional element of the individual U tC1 i after crossover, and j D 1; 2; : : : ; Dim, with Dim representing the dimensions of the optimization problem, rand./represents a random number between OE0; 1 and CR is a crossover factor in the range OE0; 1. rb is a random integer between f1; 2; : : : ; Dimg.The fitness value function is used to calculate the fitness value of an individual U t C1 i and x t i .Based on the greedy strategy, the individual with a better fitness value is selected as the individual of the new population.The formula for selecting the individual is where f ./ is the fitness function, the Mean-Square Error (MSE) or Cross Entropy (CE) [24] can be selected.

Multi-objective optimization based on PAES for BRBCS selective ensemble learning
The selective ensemble learning process of the BRBCS consists of two steps: (1) Base classifier generation; and (2) Base classifier selection.
In order to generate different base classifiers, several training datasets are obtained by using the Bagging data re-sampling technique.Based on the training datasets, a BRB classifier is constructed by linear combination.The DE algorithm is used to train the parameters of the base classifier, then the trained BRB base classifier is binary-coded (with 1 meaning that the base classifier is involved in the ensemble, and 0 meaning that it is not), and PAES multi-objective optimization [19] is used to find the optimal solution.
For the multi-class classification problem, when there are multiple base classifiers involved in the ensemble, the calculation of the integrated generalization error can be deduced as follows.
Assume that C is the number of categories, in which case the actual class label of the j -th sample d j satisfies d j 2 f1; 2; : : : ; C g and the actual class label f ij of the i-th base classifier on the j -th sample satisfies f ij 2 f1; 2; : : : ; C g, then the generalization error of the i -th base classifier on m samples of the training set is where The general meaning of sum j can be expressed as for the j -th sample, the classifier with the highest number of votes is obtained from the voting results of all base classifiers, that is, the mode of the class flag.The output of all base classifiers on the j -th sample can be represented as Therefore, the generalization error of the integrated base classifier for multi-class tag classification problems is where In Eq. ( 20), mode denotes the number of modes in the class tag.For example, if there were 10 base classifiers participating in the integration, and the output class tag set is f1, 1, 1, 2, 7, 7, 7, 8, 8g, then O f 0 j D 1; 7; mode D 2.
Assuming that the k-th base classifier does not participate in the integration and is therefore removed, the output of the new integration base classifier on the j -th sample is And the generalization error of the new ensemble base classifier is expressed as Algorithm 1 and Fig. 7 show the steps of the selective ensemble learning methods for the BRBCS based on the PAES algorithm.The initial dataset is S, the test dataset is SI, NUM is the number of constructed BRBCSs, and the parameter training method is the DE algorithm, the classifier selection algorithm is PAES multi-objective optimization, and M is the number of BRBs participating in the ensemble.

Experimental Evaluation
In order to evaluate the performance of the proposed BRBCS selective ensemble learning method based on PAES, three sets of classification problems were studied.This section analyzes the classification   return BRBCS (t 0 ).6: end for 7: Load test set SI, calculate the classification Result(t) of BRBCS(t 0 ) .tD 1; 2; : : : ; NUM/ on SI. 8: Binary encoding of BRBCS-based classifiers.9: Set the NDS size, number of iterations, and initial adaptive mesh in PAES to obtain the Pareto optimal solution set.10: Selecting an appropriate Pareto optimal solution according to the generalization error and the number of selected base classifiers.accuracy and spatial complexity of the proposed method and compares it with a single expert system, contrasting ensemble learning with the simple voting method with the data-driven ensemble learning of BRBCS, and then reports on the results of tests run on classification datasets.The experimental environment is an Intel Core i5-4570 CPU@3.20 GHz with 8 GB memory running the Windows 10 operating system, and the algorithm is written in Visual Studio 2013.

Experimental design
The three test datasets used for the experiment was selected from the UCI public test dataset.The three datasets are made up of Breast Cancer data, Iris trait data, and Glass type data.Table 1 lists the number of antecedent attributes and classification categories and the data size of the three datasets.
In the experiment, assuming that the number of categories in the classification problem is C and the number of antecedent attributes is T k , it can be known from the linear combination of BRBs that each antecedent attribute in the BRB contains C candidate values, the result evaluation level is C levels, the number of rules is L, and the number of training datasets is ND.
The initial settings for each parameter in the BRB were set as follows.
(1) Â k is the weight of k-th rule, the initial value of Â k is Â k D rand k ./;k D 1; 2; : : : ; L (2) ı k;i is the weight of the i -th antecedent attribute in BRB, the initial value of ı k;i is ı k;i D rand i ./;i D 1; 2; : : : ; T K (3) A i k is the referential set of values for the i-th antecedent attributes in k-th rule, the value of A i k is (5) ˇc;k is the belief degree of the c-th result in the k-th rule, the initial value of ˇc;k is In the DE algorithm, population size NP D 100, scale factor F D 0:5, crossover factor CR D 0:9, and the NDS size in the PAES algorithm was set to 20.

Selective ensemble for classification problem
To verify the effectiveness of the proposed method, experiments were run three times for each of the three datasets, with a different number of base classifiers generated for each run: 25, 100, and 200.The PAES-BRB method was then used for selective ensemble experiments.Tables 2 -4 respectively show the average generalization error and the average classification accuracy on the Breast Cancer, Iris, and Glass datasets after applying the PAES selective ensemble with the three different numbers of base classifiers.
Table 2 shows that the average classification accuracy rates on the Breast Cancer dataset after PAES selective ensemble for the three different numbers of base classifiers were higher than 97%, and that the number of base classifiers had little effect on the classification accuracy.Table 3 shows that the average classification accuracy rate on the Iris dataset after PAES selective ensemble for the three different numbers of base classifiers were higher than 98%.In this case, the classification accuracy increased as the number of base classifiers increased, reaching 99:38% with 200 base classifiers.Table 4 shows that the average classification accuracy on the Glass dataset after PAES selective ensemble for the three different numbers of base classifiers were about 70%, with the classification accuracy again increasing as the number of base classifiers increased.From these experimental results, we can see that the proposed method obtains a lower generalization error and a higher classification accuracy when solving classification problems on the Breast Cancer, Iris, and Glass datasets.
To verify that the number of base classifiers participating in integration using this method in fact is reduced, Tables 5 -7 show the number of classifiers actually participating in the integration when generating different numbers of total classifiers on the three datasets.Indeed, these results show that with an increase of the number of classifiers involved in the ensemble, this method selects a smaller percentage of classifiers for the ensemble.
In order to more intuitively display the classification accuracy and the number of base classifiers that actually participate in integration, some selected experimental results were plotted as shown in Figs. 8 -10. Figure 8 shows a scatter plot of the classification accuracy of the experiments on the Breast Cancer dataset along with   show that BRBCS-PAES can effectively reduce the number of classifiers participating in the integration while ensuring the classification accuracy of the integrated system and reducing its space complexity.Some notable results are as follows.When the base classifier number was set to 25, the highest classification accuracy on the Breast Cancer dataset was 98:07%, with 5 classifiers participating; the highest classification accuracy on Iris was 99:33%, with a single classifier; and the highest classification accuracy on Glass was 70:16%, with 7 classifiers participating in the integration.When the number of base classifiers was set to 100, the highest classification accuracy on the Breast Cancer dataset was 98:24%, with 5 classifiers participating; the highest classification accuracy on Iris was 99:33%, with 4 classifiers; and the highest classification accuracy on Glass was 70:79%, with 7 classifiers participating in the integration.When the number of base classifiers was set to 200, the highest classification accuracy on the Breast Cancer dataset was 97:89%, with 3 classifiers participating; the highest classification accuracy on Iris reached 100%, with 3 classifiers; and the highest classification accuracy on Glass was 73:13%, with 17 classifiers participating in the integration.These experimental results show that the proposed method can reduce the number of classifiers involved in the ensemble and ensure the generalization ability of the ensemble system.

Performance comparison
In order to further verify the effectiveness of this method, Table 8 provides a comparison between this method and alternative approaches [23] , with various numbers of base classifiers.Of these alternatives, Naive Bayes, C4.5, SMO, Fuzzy gain measure, Fallahnezhad, and YE-BRBCS use the average effect achieved by a single classifier, and EBRB-Vote is based on the data-driven EBRB using the AdaBoost algorithm and integrated using the simple voting method.BRBCS-Vote is based on the linear approach and uses a simple voting method to ensemble.BRBCS-PAES adopts the selective ensemble learning methods based on PAES.The classification accuracies of BRBCS-Vote and BRBCS-PAES were obtained by averaging multiple experiments.
From Table 8, we can see that BRBCS-PAES can achieve a higher classification accuracy on the three test datasets with the number of base classifiers set to 25, 100, and 200.Its classification accuracy on the Iris dataset is the highest among the comparison methods, and on the Breast Cancer dataset it is bettered only by the Fuzzy gain measure method.BRBCS-PAES has a lower classification accuracy on the Glass dataset than the EBRB-Vote method, but the ERBB-Vote depends on the data for reasoning.As a result, when dealing with large-scale data the number of rules in the EBRB will become very large, leading to higher storage costs and time consumption while searching for rules.It can be seen from Table 8 that the accuracy of BRBCS-PAES is significantly higher than that of the BRBCS-Vote method.When the number of participating classifiers increases, the BRBCS-Vote classification accuracy decreases.In particular, in the Iris dataset experiment, the BRBCS-Vote classification accuracy reduced by 11:33% when the number of classifiers was increased from 25 to 200.In contrast, BRBCS-PAES has a higher classification accuracy on the three datasets and is less affected by the number of base classifiers.

Conclusion
In the present study, we proposed BRBCS-PAES: using selective ensemble learning methods for BRBCS based on PAES.The proposed method is effective in solving the problem of a system's prediction speed decreasing and storage space usage increasing rapidly when the The proposed selective ensemble learning method for BRBCS using the multi-objective optimization model as the selective strategy is effective.In this method, sub-BRBs are binary coded in the base classifier selection stage.The number of base classifiers involved in the ensemble and the generalization error of the base classifier is taken as a multi-objective optimization function.An elite retention strategy and adaptive grid archiving strategy are used to produce the Pareto optimal solution set.Comparing the proposed method with existing methods on three classification datasets shows that it improves the accuracy of BBRCS and reduces the number of classifiers participating in the ensemble.In future work, we will investigate how to obtain the optimal number of classifiers to participate in the ensemble.

Fig. 2
Fig. 2 Process of adaptive grid archiving strategy.

Fig. 3
Fig.3When the archived set is full, the candidate solutions are added and the grid is updated.(Blue indicates the point to be deleted, red indicates the newly added solution.)

Fig. 5
Fig. 5 Basic idea of selective ensemble learning.

Fig. 6
Fig. 6 Different ways of building the rules.(a) Traverse combination and (b) linear combination.

Fig. 7
Fig. 7 Process of selective ensemble learning methods of belief rule base classification system based PAES.

Fig. 8
Fig. 8 Breast Cancer's classification accuracy rate of multiple tests and the number of base classifiers participating in the integration.

Fig. 9
Fig. 9 Iris's classification accuracy rate of multiple tests and the number of base classifiers participating in the integration.
As shown in Fig.2, if the candidate solution dominates any solutions in the NDS, all solutions in the NDS that are subject to the candidate solution are deleted, the candidate solution is added into the Big Data Mining and Analytics, December 2019, 2(4): 306-318 Algorithm 1 BRBCS-PAES Input: initial dataset S , test datasets SI, NUM, M .Output: Prediction results f .
1: for t D 1 to NUM do

Table 2
Average generalization errors and accuracy of classification in Breast Cancer experiments.

Table 3
Average generalization errors and accuracy of classification in Iris experiments.

Table 5
Involved classifiers of two ensemble learning methods in Breast Cancer experiment.

Table 8
Classification accuracy of the proposed method compared to other methods in different classifiers. of base classifiers participating in an ensemble is increased.Experimental studies on classification problems demonstrated that the proposed method can effectively promote the performance of BRBCS.Two main conclusions can be drawn from the study, as follows.(1)BRBCSlacks methods to improve its effective generalization ability as the number of sub-BRBs participating in ensemble learning is increased.The use of selective ensemble learning methods is a good approach to dealing with this problem.(2) number