A Balance Adjusting Approach of Extended Belief-Rule-Based System for Imbalanced Classification Problem

The extended belief-rule-based (EBRB) system has become a widely recognized and effective rule-based system in decision-making. The system uses a data-driven method to generate the rule base by transforming each training sample into a rule. Hence, when an EBRB system is applied in an imbalanced classification dataset, the imbalance of training dataset will retain in the generated rule base. More specifically, the number of rules transformed from majority classes will be far greater than the rules transformed from minority classes. This issue usually leads to a sharp decrease in the accuracies of minority classes. This study analyses how the imbalance of training dataset exists in the generated EBRB and then proposes a Balance Adjusting (BA) approach to eliminate the influence of imbalance in the rule base. The BA approach adjusts rule activation weights of all activated rules, and further enhances the competitiveness of rules with higher activation weight during the rule aggregation process of the EBRB system. Several case studies in imbalanced benchmark classification datasets from UCI demonstrate how the use of the BA approach improves the performance of the EBRB system. This study also conducts a series of experiments to validate the improvement of the proposed approach compared with some conventional and recent existing works. The comparison results illustrate that the BA approach is feasible, effective and robust, and it performs well especially in large scale datasets. Moreover, the BA approach can also combine with various rule activation weight calculation methods, which means it might worth to be applied as a generic process before the rule aggregation process of the EBRB system.


I. INTRODUCTION
Data imbalance is a common problem occurs in classification. An imbalanced dataset consists of majority classes and minority classes. The former has more samples while the latter has less. Almost each of the existing decision models has a bias towards majority classes when applied in imbalanced classification datasets, unless the model has a special mechanism to address them. However, the accuracies of majority classes are usually less important than the accuracies of minority classes in data-imbalanced fields. For example, The associate editor coordinating the review of this manuscript and approving it for publication was Valentina E. Balas . in disease surveillance, assume that 95% potential patients are healthy and the other 5% are ill. If a decision model infers that all the potential patients are healthy, it will obtain a high average accuracy of 95% without finding any true patient. Such a result is obviously meaningless and shall be improved.
The rule-based system is a kind of rapid developing and powerful decision support system, and it has become an important branch of artificial intelligence [1]. Generally, a rule-based system uses rules as its knowledge representation scheme, and uses a reasoning approach to infer the result of queries by activating and aggregating rules. The belief rule-based (BRB) system is a kind of rule-based system that embeds belief degrees in the consequent term of each rule, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ and it has been widely applied in many fields [2]- [4]. It can handle both quantitative and qualitative information, and is considered to be more interpretable than deep-learning-based tools. The inference methodology of BRB system is the belief rule-base inference methodology using the evidential reasoning (RIMER) approach [5]. Recent years, a novel BRB system was proposed and studied [6]- [8], called extended belief-rule-based (EBRB) system. This system extends the conventional BRB system with belief degrees embedded in all the antecedent terms of each rule. Generally, the EBRB system uses the data-driven method to automatically generate rules from numerical data, without involving complex learning procedure to set parameters of the system. So the construction of the EBRB system is more steady and less time-consuming compared with other forms of BRB systems.
Although the data-driven method has many advantages, it brings an issue when applied in imbalanced datasets. Since each training sample will be transformed into a rule by the data-driven method, the number of rules belong to each class will also be the same as the number of training samples belong to each class, which means that the number of rules which belong to majority classes will be far greater than the number of rules which belong to majority classes. Such kind of rule base is called imbalanced rule base in this study. Besides, the performance of the RIMER methodology is similar to the weighted averages methodology when handling a large number of rules. Hence, even having a higher activation weight, the rules belong to minority classes in an imbalanced rule base will always be at a disadvantage during the rule aggregation process. So the reasoning result of an imbalanced EBRB system always has a bias towards majority classes. In order to fix this issue, this study proposes a Balance Adjusting (BA) approach. The approach increases the ratio of higher activation weights to lower activation weights, balances the sums of rule activation weights belong to the majority and the minority classes during the rule aggregation process, and thus further enhances the competitiveness of rules with higher activation weight. The BA approach improves the performance of EBRB system applied in imbalanced classification problems, and also proves that the system itself is a powerful tool for classification but limited by the imbalance in its rule base.
The remainder of this paper is organized as follows: Section II contains an overview about the BRB system and the EBRB system. Section III analyses the issue of the EBRB system applied in imbalanced classification datasets and then proposes the BA approach. Section IV demonstrates how the effect of the BA approach by case studies, and how to apply it to an EBRB system. Section V validates the effectiveness of the BA approach on various benchmark datasets compared with some recent existing works. Section VI is the conclusion of this paper.

II. OVERVIEW OF BRB SYSTEM AND EBRB SYSTEM
This section is a brief overview of the BRB system and the EBRB system. The overview of the EBRB system will be explained in three parts, i.e., how the systems represent knowledge, generate rules and obtain reasoning results. Some examples are provided in this section to help understand the overview.

A. OVERVIEW OF BRB SYSTEM
The BRB system is a kind of rule-based system, developed from the conventional IF-THEN rule-based system [9]. It can handle quantitative and qualitative knowledge, and deal with fuzzy uncertainty and probability uncertainty. The reasoning methodology of BRB system is the RIMER approach, which is based on D-S evidence theory, decision theory [10] and fuzzy theory [11].
Assume a BRB has L rules, T antecedent attributes and N referential values of its consequent attribute. The kth (k = 1, . . . , L) rule in this BRB can be written as: In (1), U i is the ith antecedent attribute of this BRB, and A ij is the jth (j = 1, . . . , J i ) referential value of U i . D is the consequent attribute of this BRB, D n is the nth (n = 1, . . . , N ) referential value of D, and β k n is the belief degree to which D is evaluated to be D n in the kth rule. For example, a belief rule for the Iris [12] classification may be written as a linguistic form (2a) or a numerical form (2b): In all existing kinds of BRB system, if N n=1 β k n = 1, then the kth rule is called complete, and if N n=1 β k n < 1, then the kth rule is called incomplete. For any rule in a BRB, N n=1 β n ≤ 1 and ∀n∈N , β n ≥ 0. To distinguish the importance of different antecedent attributes and different rules, the BRB system uses δ i to represent the weight of the ith antecedent attribute and θ k to represent the weight of the kth rule. These parameters are called the system parameter of the BRB system.
The researches of the BRB system can be divided into three part: (1) parameter training optimization: mainly about seeking the optimal value of the system parameters [13]- [15]; (2) structure optimization: mainly about improving the rule retrieval efficiency and reasoning efficiency of the BRB system [16], [17]; (3) reasoning approach optimization: mainly about improving the rule activation method and reasoning approach of the BRB system [18], [19]. This study is a reasoning approach optimization that aims to improve the effectiveness of the EBRB system applied in imbalanced classification datasets.

B. EXTENDED BELIEF RULE
The extended belief rule is an extension of the conventional belief rule [6], and each antecedent attribute in the rule is a belief distribution but no referential value. Assume an EBRB has L rules, T antecedent attributes and N referential values of its consequent attribute. The kth (k = 1, . . . , L) extended belief rule in this EBRB can be written as: In (3), U i is the ith antecedent attribute of this EBRB, A ij is the jth referential value of U i , and α k ij is the belief degree to which U i is evaluated to be A ij in the kth rule. D is the consequent attribute of this EBRB, D n is the nth (n = 1, . . . , N ) referential value of D, and β k n is the belief degree to which D is evaluated to be D n in the kth rule. For example, an extended belief rule for the Iris [12] classification may be written as (4): Due to the data-driven construction method, when an EBRB system is applied in classification problems, the belief degrees of consequent attribute's referential values in its extended belief rule will be either 1 or 0.

C. THE DATA-DRIVEN CONSTRUCTION METHOD OF EBRB SYSTEM
The most frequently used data-driven construction method of EBRB system is the utility-based transformation method. By this method, each numerical training sample will be transformed into a rule. The following example illustrates how the method works.
The first step of the method is to determine the referential values of each antecedent and consequent attribute. Generally, the referential values of each antecedent attribute are arranged incrementally, i.e., for the ith antecedent attribute, all the J i referential values are arranged as A i1 < A i2 < . . . < A iJ i . Assume the value of the ith antecedent attribute U i in a training sample is x i , then let α ij be the belief degree to which x i is evaluated to U i 's jth referential value A ij , and then x i can be represented using the following equivalent expectation [6]: , then using utility-based equivalence transformation techniques [20] the belief degrees are generated as follow: All these belief degrees form the belief distribution of U i . The belief distribution of the consequent attribute can be generated in the same way according to the output of the training sample. In classification problems, the consequent attribute's referential values of EBRB system are arranged as the possible results of classification. As the output of each sample is one of those results, the belief degree to the output will be 1 and the others will be 0.
For example, suppose an EBRB for the Iris classification is arranged as: Then a training sample x = (6.1, 2.6, Virgica) will be transformed into the follow rule:

D. THE EVIDENTIAL REASONING APPROACH OF EBRB SYSTEM
The reasoning approach of EBRB system is similar to the one of the conventional BRB system. First, the antecedent attributes of the testing sample should also be transformed into belief distribution using the data-driven method, and then calculate the distance between the testing sample and each rule in EBRB. The Euclidean distance between input Base on (9), the individual matching degree of x i and U i in the kth rule is: Base on the individual matching degrees, the conventional method to calculate activation weight ω k for the kth rule is: It is apparent that 0 ≤ ω k ≤ 1, and L i=1 ω i = 1. After calculating the activation weight of each rule, all activated rules are aggregated using the RIMER analytical algorithm [21]. The reasoning conclusion of EBRB system is also a belief distribution represented as: The belief degree β n (x) in D(x) is generated as follows: where For classification problems, the referential value with the highest belief degree is generally regarded as the expected output of EBRB system, i.e.:

III. THE BALANCE ADJUSTING APPROACH
This section analyses the issue of a concrete imbalanced rule base and then proposes the BA approach to address the issue.

A. SIMILARITY BETWEEN RIMER METHODOLOGY AND WEIGHTED AVERAGES METHODOLOGY
Evidential reasoning (ER) approach was proposed by Yang and Singh [22] based on D-S evidence theory [23], [24].
In the D-S evidence theory, a proposition is represented by a frame of discernment , which is a finite set of mutually exclusive elements. Each subset of the frame corresponds to an event that may be the solution to the proposition. Basic belief assignment function M is a mapping from each subset to [0, 1], and M (A) is the belief degree to which event A is the solution. For function M , there is: In BRB, the referential set of consequent attribute can be regarded as a special frame of discernment where For a certain frame of discernment, several sources of information may provide different value of function M , e.g., M 1 = M 2 . M 1 and M 2 are called different evidences. In BRB, the rules can be regarded as the evidences, and the consequent belief distribution can be regarded as the value of function M . And the aggregation of rules is based on the D-S evidence combination approach as follow: The RIMER recursive algorithm [25] can be regarded as aggregating rules one by one using (18), and this recursive algorithm has been equivalently transformed into the analytical algorithm as (14) in [21].
The RIMER methodology has something in common with the weighted averages methodology. As the weighted averages methodology is quite well-known, its introduction is omitted.
Assume two belief distributions are: D 1 = (0.7, 0.1, 0.2) and D 2 = (0.2, 0.5, 0.3). The aggregation result of D 1 and D 2 will be D er = (0.462, 0.289, 0.249) using the RIMER methodology and D wa = (0.45, 0.3, 0.25) using the weighted averages methodology. Moreover, the aggregation result of 100 D 1 and 100 D 2 using the RIMER methodology will be D er = (0.473, 0.291, 0.236). These examples indicate that the effect of the RIMER methodology is similar with the one of the weighted averages methodology. However, the former has a bias towards some of the belief degrees. Such a bias only depends on the aggregated belief distributions. Besides, the aggregation result of one D 1 and 100 D 2 using the RIMER methodology will be D er = (0.187, 0.527, 0.286), which indicates that the result of RIMER methodology will also be changed by the quantity difference of rules. Fig. 1 shows the changing of the aggregation results when the number of D 2 is increasing.
The increased number of rules is the equivalent of the increased weight of a rule in the weighted averages methodology, i.e., one rule whose weights are 100 is the equivalent  Aggregation result of one D 1 whose weight is 1 and n D 2 whose weights are 100/n. of 100 rules whose weights are 1. This property also applies to the RIMER methodology. Fig. 2 shows the aggregation results of one D 1 whose weight is 1 and n D 2 whose weights are 100/n, and it can be noticed that those aggregation results merely have a little difference.

B. ISSUE OF EBRB SYSTEM WITH AN IMBALANCED RULE BASE
Since the data-driven method transforms each training sample into an extended belief rule, the number of generated rules belong to each class will equal to the number of training samples belong to each class. A rule base generated from an imbalanced dataset is an imbalanced rule base.
Take the classification dataset Thyroid [12] as an example. Thyroid is a series of datasets about thyroid disease research provided by the Garvan Institute in Sydney, Australia. Thyroid has 10 sub-datasets. This section selects one of its classification datasets. The quantity of samples in this dataset is 215, and each sample has five numerical antecedent attributes and a consequent attribute. The three classes in this dataset are Normal, Hyper, and Hypo, and the number of samples belong to each class are 150, 35 and 30, respectively. Thyroid is an imbalanced dataset, where Normal is the majority class and Hyper, Hypo are minority classes.
Uniformly and incrementally arrange 5 referential values for each antecedent attribute of the data, and then use stratified sampling to divide the dataset into 10 folds, where 9 folds are used for training and the other one fold are used for testing. For a testing sample of Hypo, the activation weights and consequent belief distribution of rules in this rule base are listed in Table 1, Table 2 and Table 3, respectively.
These rules have been sorted by their activation weights. Cause Thyroid is an imbalanced dataset, the training datasets obtained by stratified sampling are also imbalanced, and the rule base generated by the data-driven method will certainly be imbalanced. In such an imbalanced rule base, although the rules of Hypo exactly have higher activation weights, the quantity of them is much less than those of Normal.  For this sample, the sums of rule activation weights belong to the three classes are 0.570, 0.031 and 0.399, respectively, and the final belief distribution obtained is (0.590, 0.024, 0.386), which means the testing sample is misclassified into Normal. In the entire case study, the average accuracies of all the three classes are 100.00, 31.91 and 43.67, respectively. The imbalanced rule base brings high accuracies in majority classes but low accuracies in minority classes.
As discussed above, the RIMER methodology can be influenced by both quantity and weights of rules like the weighted averages methodology, so the reasoning result of an EBRB system will also be misled when the rule base is imbalanced.

C. ADJUSTING THE RULE BASE INTO BALANCED
Considering the rule base of a binary imbalanced classification problem, suppose the activation weight of the rule belonging to the majority class is ω majority and the activation weight of the rule belonging to the minority class is VOLUME 8, 2020 ω minority . Ideally, when ω majority ≥ ω minority there shall be ω majority ≥ ω minority , and when ω majority ≤ ω minority there shall be ω majority ≤ ω minority . But if the rule base is imbalanced, when ω majority ≤ ω minority there may still be ω majority ≥ ω minority . The condition ω majority ≤ ω minority is likely to occur when the testing sample belongs to the minority class. For easy understanding, simplify the RIMER methodology to the weighted averages methodology, and then the ω minority is equal to β minority (because the values in consequent belief distribution are either 0 or 1 in classification problem) and vice versa, thus there may be β majority ≥ β minority which indicates a misclassification. In order to change the rule base to be balanced, a reasonable way is to simultaneously adjust the activation weights of all rules to make ω majority ≤ ω minority when the testing sample belongs to the minority class.
It can be noticed from (11) that there is 0 ≤ ω k ≤ 1 for any rule. Therefore, for any two different rules with unequal activation weights ω a , ω b and a real number p, if ω a > ω b and p > 1, there must be Letω be the rule activation weight before normalization. To address the issue above, operation of the BA approach based on (20) can be represented as follow: Real number p is called the balance parameter of the BA approach. Increasing the value of p can reduce the remainder of ω majority − ω minority . Fig. 3 shows the effect of p applied in the example of Section III-B. For that testing sample, set p = 1.25 is enough.
If the testing sample belongs to the majority class, the condition ω majority ≥ ω minority is likely to occur, and thus  β majority β minority is nearly for sure. Since the BA approach always increases the value of ω majority , it will certainly bring a possibility of misclassification for majority classes. But, as demonstrated in the following section, such a possibility is quite little and acceptable.

IV. CASE STUDIES
The case studies in this section demonstrate the performance of EBRB system optimized by the BA approach (BA-EBRB) with 10-fold cross-validation. After that, an algorithm is proposed to set the value of the balance parameter p.
For each antecedent attribute in the following case studies, 5 referential values of antecedent attribute are arranged uniformly and incrementally. Number of samples, accuracy of each class and the average accuracy are shown in tables of reasoning result. Note that the average accuracy of the whole dataset does not equal to the weighted average of the accuracy of classes in 10-fold cross-validation because the distribution proportion of samples in different folds may not be precisely equal.

A. PERFORMANCE OF EBRB SYSTEM OPTIMIZED BY THE BA APPROACH
Take Thyroid dataset as the first case study. Table 4 lists the reasoning results of EBRB system when the value of p is gradually increased and Fig. 4 show the changing trend more clearly. For each class, its quantity is attached following the label. Note that p = 1 means the EBRB system is the conventional one, namely Liu-EBRB.
It can be concluded from the results that with the value of p increasing, the accuracies of the minority classes (Class 2 and  Class 3) are significantly increasing while the accuracy of the majority class (Class 1) is slightly decreasing. Both of them finally tend to be stable.
The accuracy of majority class Normal seems to be excellent in Liu-EBRB but at a cost. Many samples are misclassified into the majority class, although it does not ruin the average accuracy because those minority classes are at a disadvantage in quantity. Is the result of Liu-EBRB acceptable? As the average accuracy is not the most reasonable evaluation metrics of imbalanced classification problems, this study uses Macro F1-score to evaluate the reasoning results of multiclassification. Fig. 5 shows the Macro F1-scores of this case study.
The Macro F1-score of Liu-EBRB is only 61.70, which is too low to be acceptable, but that of BA-EBRB is 96.76 at p = 9. The BA approach successfully increases the average accuracy by 20% and the Macro F1-score by 56% in Thyroid dataset. To further illustrate the effectiveness of the BA approach and seek the optimal value of p, the other two case studies are conducted in the Glass dataset and Bupa dataset. The detailed results are listed in Table 5 and Table 6, respectively. And the line arts are shown in Fig. 6 and Fig. 7, respectively. For each class, its quantity is attached following the label.  In both case studies, the average accuracies are increased by about 7% while the Macro F1-scores are increased by more than 14%. These results prove that the BA approach is an effective and reliable tool to improve the performance of EBRB in imbalanced datasets.
The relationship between the value of p and either average accuracy or Macro F1-score is hard to be concluded. It may seems to be a convergence function in Fig. 6 but a unimodal function in Fig. 7, so neither a ternary search algorithm nor set a positive infinity value for p can fit the problem.

B. HOW TO DETERMINE THE VALUE OF P
In fact, the only confirmed conclusion is that increasing the value of p will lead a decrease in the accuracies of majority classes but an increase in the accuracies of minority classes. As a result, the relationship will certainly become a unimodal function in binary classification problems but may become any kind of function in multi-classification problems. Besides, the sum of rule activation weight does not only depend on the number of rules but also each single rule activation weight. Since the latter depends on testing samples and generated rules (i.e., training samples), it is impossible to calculate the value of p based on the class distribution of a dataset.
As the optimal value of p depends on the information of whole dataset, it can hardly be calculated by a mere mathematical formula. This section proposes an approximate iterative algorithm to solve the problem.
Define E(p) as the evaluation function of p. E(p) can be measured by average accuracy, Macro F1-score, the accuracy of a minority class or other evaluation metrics of a model.
And define [1, s] as the value range of p, p opt as the optimal value of p in [1, s], and ε as the threshold of step length. Then the objective function of the algorithm is represented as: E(p) in (22) also depends on the generated EBRB and testing samples, i.e., information of a whole dataset, so it has to be calculated by the 10-fold cross-validation but not a mathematical formula. The detailed algorithm is introduced as follows: Once the value of p is determined, it is unnecessary to be changed during the reasoning process. Before applying the algorithm there are some problems worth considering: 1) What is the time complexity of this algorithm? Let n represent the number of antecedent attributes in S and T (nS) represent the time complexity of an iteration process, then the complete time complexity of Eq. (23) shows that the time complexity of Algorithm 1 mainly depends on T (nS), i.e., the size of S (|S|). 2) How to apply this algorithm to large scale datasets?
As discussed in 1), the time complexity of Algorithm 1 will increase along with T (nS) and be unacceptable when the latter becomes very large. Nevertheless, the information of a dataset can be derived also from its stratified samples. |S| is not recommended to be too large. Before applying this algorithm to a large scale dataset, a stratified sampling for the complete dataset will help.

3) How to determine the value of s?
Suppose a worst-case: |S| is n, and there are n − 1 rules belonging to the majority class whose weights are all equal to ω majority and one rule belonging to the minority whose weight is t * ω majority . To adjust the rule base to be balanced, the value of s should satisfy: As discussed in 2), |S| is not recommended to be too large. Assuming |S| is 2000 and t = 1.5, then just set s = 20 is enough.

V. EFFECTIVENESS VALIDATION OF THE PROPOSED APPROACH
To further validate the effectiveness of the proposed BA approach, a series of benchmark classification datasets from UCI are used in this section to test the performance of BA-EBRB, and the derived results are compared with both conventional classification approaches and recent works of EBRB system.

A. COMPARE WITH CONVENTIONAL CLASSIFICATION APPROACHES
Several public classification datasets from UCI, including balanced and imbalanced, are used in this section. The class distribution of datasets are shown in Fig. 8.
The reasoning results of BA-EBRB in this comparison are all derived from 10 independent runs with 10-fold cross-validation. Table 7 lists the comparison results. Those approaches for comparison are Decision Tree [27], Naive Bayes [28], Fuzzy Set [29], KNN [23], SVM and LDA [31], conventional Liu-EBRB, DRA-EBRB [32], SRA-EBRB [19] and VP-EBRB [33]. Since F1-score was not used as metrics in those papers, the results for comparison have to be measured by average accuracy. The comparison rank of approach is attached following each reasoning result. BA-EBRB is not always the best for all datasets, but its ranks never fell out of the top 4, and its average rank is highest. The comparison  illustrates that the BA approach is an effective and robust tool appling to various datasets.
It can be noticed that BA-EBRB not only performs well in imbalanced datasets like Thyroid and Wine, but also gets the highest rank in the balanced dataset Seeds. Such a result is reasonable. To a certain extent, the BA approach may also resolve data inconsistency like the DRA approach because of their similarity in mathematical tricks when using the conventional method to calculate rule activation weights. But the former focuses on the issue of the imbalanced rule base and will not change the visitation rule rate (VRR, usually as an evaluation metrics of efficiency) of EBRB system. As a result, it may be sometimes better than the conventional DRA approach. Furthermore, different from the DRA approach, the BA approach is also able to combine with some novel rule activation weight calculation methods which do not base on the multiplication of individual matching degrees (See [34]).

B. COMPARE WITH RECENT WORKS OF EBRB SYSTEM
To demonstrate the advancement of the BA approach, this study compares it with CABRA-EBRB [35] and NP-EBRB [36], both of them are novel and significant works proposed in the recent two years. The statistics on the extra datasets used in this section is summarized in Table 8. Table 9 lists their comparison results in several small scale classification datasets. The reasoning results of BA-EBRB are derived from 10 independent runs with 10-fold crossvalidation and still measured by average accuracy because   F1-score was not mentioned in those papers. BA-EBRB gets the highest rank in three datasets as well as NP-EBRB, and the latter performs better in Glass while the former performs better in a larger imbalanced dataset, Yeast. For other datasets, there is no much difference.
Additionally, the performance of the BA approach in large scale datasets whose sizes are greater than 5000 is compared with those listed in [36], where CABRA-EBRB did not involve in that comparison because it is too time-consuming. Their comparison results are listed in Table 10. These reasoning results are all obtained from 2-fold cross-validation.
The value of balance parameter p for these datasets are determined using Algorithm 1 with |S| = 1000. It can be seen from Table 10 that BA-EBRB has the best reasoning results in all the 3 datasets. Compared with NP-EBRB, the misclassification rates of BA-EBRB in these datasets are decreased by 21.9%, 22.9% and 30.0%, respectively. This result illustrates the BA approach is more effective and potential in large scale datasets. It also demonstrates that seeking the approximate optimal value of p from the stratified samples is an effectual and reliable method to apply Algorithm 1 to large scale datasets.

VI. CONCLUSION
In this study, the BA approach is proposed to improve the performance of EBRB systems applied in imbalanced classification datasets. The presented analysis and case studies show why the reasoning result of the conventional RIMER methodology can be affected by an imbalanced rule base and how the BA approach adjusts the rule activation weights to balance a rule base. A series of benchmark classification datasets validate the effectiveness of BA-EBRB compared with several conventional classification methods and novel studies about EBRB system. The further conclusions of this study are summarized as follow: 1) The rule aggregation process of EBRB system using the conventional RIMER methodology attaches more importance to the weighted combination of activated rules but neglects the importance of those rules with a higher activation weight. This issue will be magnified when the data-driven method generates an imbalanced rule base for EBRB system applied in imbalanced classification datasets. Since the number of rules belongs to minority classes will be always at a disadvantage, the reasoning ability of EBRB system, especially the accuracies of minority classes will be heavily decreased in an imbalanced rule base. 2) To illustrate the issue above, this study provides an example of the counterintuitive reasoning process of EBRB system applied in an imbalanced classification dataset about thyroid disease research, and then proposes a BA approach. The BA approach addresses the issue by simultaneously adjusting the activation weights of all rules in the rule base using a balance parameter. To determine the optimal value of the balance parameter, an approximate iterative algorithm is proposed. The BA approach can combine with various rule activation weight calculation methods and might worth to be applied as a generic process of EBRB system.
3) The case studies in imbalanced datasets demonstrate the BA approach can effectively increase the accuracies of the minority classes, and thus leads an increase in the average accuracy and the Macro F1-score over the whole dataset. And this approach never compromises the effectiveness of the system. The effectiveness and robust of BA-EBRB are further demonstrated by the comparisons with several conventional classification methods and recent works about EBRB system. Moreover, the comparison result illustrates the BA approach is more effective and potential in large scale datasets, and also proves using the proposed approximate algorithm to seek the optimal value for the balance parameter is effectual.
Future research will concentrate on a more refined balance adjusting method, e.g., adjust the rule activation weights of each class respectively by different balance parameters. An exact algorithm to seek the optimal value of these balance parameters will also be considered.
WEIJIE FANG received the B.S. degree from Fuzhou University, in 2017, where he is currently pursuing the master's degree. His research interests include multiobjective optimization, intelligent decision technology, rule-based inference, big data analysis, and data mining. YANGGENG FU received the Ph.D. degree from Fuzhou University, Fuzhou, China, in 2013. He is currently an Associate Professor of computer science with Fuzhou University. His research interests include decision theory and methods, data mining and machine learning, and intelligent systems.