Information Entropy-Based Attribute Reduction for Incomplete Set-Valued Data

This paper investigates attribute reduction for incomplete set-valued data based on information entropy. The similarity degree between information values on a conditional attribute of an incomplete set-valued decision information system (ISVDIS) is first proposed. Then, the tolerance relation on the object set with respect to a conditional attribute subset in an ISVDIS is obtained. Next, <inline-formula> <tex-math notation="LaTeX">$\lambda $ </tex-math></inline-formula>-reduction in an ISVDIS is presented. What’s more, connections between the proposed attribute reduction and uncertainty measurement are exhibited. Furthermore, an attribute reduction algorithm based on <inline-formula> <tex-math notation="LaTeX">$\lambda $ </tex-math></inline-formula>-information entropy in an ISVDIS is provided. Finally, experiments to evaluate the performance of the proposed algorithm are carried out, and Friedman test and Nemenyi test in statistics are conducted. The experimental results indicate that the proposed algorithm is more effective for an ISVDIS than some existing algorithms.


I. INTRODUCTION
A. RESEARCH BACKGROUND Rough set theory (RST) [26], [27] is a significant approach for managing uncertainty. An information system (IS) based on RST may reveal large databases and knowledge discovery process mathematically. Plenty of applications of RST are related to an IS [1], [2], [7], [10], [16], [19], [23], [33], [35]. An incomplete information system (IIS) expresses an IS including missing values. A set-valued information system (ISVIS) indicates an IS whose information values are sets. An incomplete set-valued information system (ISVIS) means an IIS whose information values are sets. An ISVIS with decision attributes is said to be an incomplete set-valued decision information system (ISVDIS). These ISs have been investigated by a great deal of scholars. For example, Yao et al. [44] proposed a RST model for a set-valued information system (SVIS) with upper and lower approximations and introduced generalized decision logic; Based on the process of knowledge induction, Leung et al. [20] came up with a rough set decision rule selection method based on minimum attribute set in a SVIS; Qian and Liang [28], [29] presented a dominance relation for a SVIS and a set-valued ordered IS.
The associate editor coordinating the review of this manuscript and approving it for publication was Francisco J. Garcia-Penalvo . Couso and Dubois [4] examined statistical reasoning with a SVIS from ontic and epistemic views, respectively. Chen et al. [5] measured the uncertainty of an ISVIS and considered the optimal selection of subsystems by using Gaussian kernel.
Uncertainty measurement may provide new viewpoint for analyzing data. We refer to the articles about uncertainty measurement of other scholars. For instance, Dai and Tian [8] thought about entropy measure and granularity measure of an SVIS; Wang and Yue [41] discussed entropy measure and granularity measure of interval and set-valued IS; Duntsch and Gediga [6] applied Shannon's entropy to the measurement of decision rules in RST; Liu and Zhong [22] used four different kinds of entropy to gauge uncertainty in a fuzzy relation IS.
Attribute reduction in an ISVDIS is to delete some irrelevant attributes while maintaining the classification ability of the ISVDIS. As one of the core contents of RST, attribute reduction has been widely concerned. For instance, Guan et al. [15] studied attribute reduction in an ISVDIS and proposed the decision rules; Song an Zhang [31] proposed attribute reduction in a set-valued decision IS; Liu and Zhong [22] presented attribute reduction in a SVDIS based on a dominance relation; Chen and Qin [3] discussed attribute reduction in a SVDIS based on a tolerance relation; Li et al. [34] put forward attribute reduction in a ISVDIS.

B. COMPARISON AND CONTRIBUTION
In order to see the innovation and contribution of this paper more clearly, we do comparison and discussion between this paper and some related literatures. They are shown in TABLE 1.

C. ORGANIZATION AND STRUCTURE
The work process of the article is displayed in FIGURE 1 and the rest is shown as below. Section 2 retrospects some essential notions of binary relations and ISVDISs. Section 3 constructs the similarity degree and tolerance relation in an ISVDIS, and considers rough approximations based on this tolerance relation. Section 4 proposes entropy measurement for an ISVDIS. Section 5 presents an attribute reduction algorithm based on λ-information entropy in an ISVDIS. Section 6 gives an illustrative example. Section 7 evaluates the performance of the presented algorithm and does some statistical hypothesis experiments. Section 8 concludes this article.

II. PRELIMINARIES
In this section, we recall some basic notions about binary relations and ISVDISs.
Throughout this paper, O denotes a finite set, 2 O means the family of all subsets of O and |X | expresses the cardinality of

III. THE TOLERANCE RELATIONS INDUCED BY AN ISVDIS AND ROUGH APPROXIMATIONS BASED ON THEM
In this part, we give the tolerance relations induced by an ISVDIS and define rough approximations based on them.
Obviously, R λ F and T d are tolerance and equivalence relations on O, respectively.
Then R λ F (o) and T d (o) are said to be the λ-tolerance and equivalence classes of o under R λ F and T d , respectively. Moreover, T d (o) is said to be the decision class of o in C, and T d is said to be the decision in C.
In this paper, denote An algorithm for computing the λ-tolerance class is designed as follows.
is said to be λ-generalized decision of o in C, and ∂ λ F is said to be λ-generalized decision in C.          Then R λ F (X ) and R λ F (X ) are said to be λ-lower and λ-upper approximations of X , respectively. ( .
. Proof: It can be obtained by Theorem 2.

IV. ENTROPY MEASUREMENT FOR AN ISVDIS
In this section, we propose entropy measurement for an ISVDIS. Stipulate 0 log 2 0 = 0.  By Definition 14, Then ij ).
. Then f (x, y) increases with respect to x and increases with respect to y, respectively.
Since p ij ≤ q (2) ij , we have ij ). Thus Proof: Note that {D 1 , D 2 · · · , D r } is a partition of O. Then By Definition13, we have Then
We have It follows that This is a contradiction. Hence R λ F ⊆ T d . Proof: It can be proved by Theorems 1 and 6.

V. ATTRIBUTE REDUCTION IN AN ISVDIS
In this section, we study attribute reduction in an ISVDIS.

Definition 14: Let (O, C ∪ {d}) be an ISVDIS. Given F ⊆ C and λ ∈ [0, 1]. Then F is said to be a λ-coordination subset of C relative to d, if
In this paper, the set of all λ-coordination subsets of C relative to d is denoted by co λ (C).
In this paper, the set of all λ-reducts of C relative to d is denoted red λ (C). (1) F ∈ co λ (C); . This implies that R F ⊆ T d . By Corollary 1, Hence Then, By Theorem 6, Given F ⊆ C and λ ∈ [0, 1]. Then the following conditions are equivalent: (1) F ∈ red λ (C); Proof: It can be obtained by Proposition 4 and Theorem 8.
Based on the discussion above, we propose the following algorithm for attribute reduction based on similarity measurement and entropy measurement for an ISVDIS.
, e 4 is first reduct member. We will add other attributes to e 4 one by one and calculate corresponding λ-conditional entropies. We can easily find the corresponding tolerance classes first, and these results are shown in TABLEs 14. Hence, we have Then, we will add other attributes to {e 4 , e 7 } one by one and compute the corresponding tolerance classes (see TABLE 15). And we calculate corresponding λ-conditional entropies as follows: Similarly, we will add other attributes to {e 4 , e 6 , e 7 } one by one and compute the corresponding tolerance classes (see TABLE 16). Then we calculate corresponding λ-conditional entropies as follows: Since λ-conditional entropy cannot less than 0, {e 4 , e 6 , e 7 , e 8 } will be the reduct set of an ISVDIS as given in TABLE 2.

VII. EXPERIMENTAL RESULTS AND ANALYSIS
This section evaluates the performance of the proposed algorithm and existing methods. The frame work chart of the experiment is displayed in FIGURE 2.
We consider comparing the proposed algorithm with four other algorithms. These are representative attribute reduction algorithms based on fuzzy similarity-based rough set approach (FSRS) [32], fuzzy rough set (FRSM) [9] and dominance relation (DRM) [22]. For the sake of verifying the performance of the proposed attribute reduction algorithm for an ISVDIS, we carry out some experiments on a personal PC with an Intel 3.0-GHz CPU and 64-GB RAM. We execute our experiments on six real datasets selected from the University of California, Irvine (UCI) Machine Learning Repository. These six real datasets are incomplete decision systems (special case of ISVDIS) given in TABLE 17. We take away   5% information values randomly from the above all data sets in TABLE 17 to obtain ISVDISs. For the experimental work, we use the WEKA tool with ten fold cross-validation technique [17]. To evaluate these attribute reduction approaches, we employ two learning mechanisms to create classifiers. They are frequently-used classifiers which the one is PART, the other is J48.

A. REDUCT SIZE AND CLASSIFICATION ACCURACY
By comparing the selected dataset with the other three methods, the reduced average attribute subset size is given in On the whole, classification accuracies based on the CEIS method were higher than the other three approaches in most case. Therefore, our algorithm is superior to other algorithms. We may conclude that the CEIS method is more effective for attribute reduction in an ISVDIS.
More detailed trend lines of each method on six data sets are shown in FIGUREs 3-5. FIGURE 3 shows the reduction comparison of the average attribute subset size of all four algorithms on these six datasets. It can be seen that the CEIS algorithm proposed in this article selects the least attributes as the elements of the reduction set. FIGUREs 4-5 show the more detailed trend of the algorithm classification accuracy with the number of selected attributes on all selected data sets.    It can be seen from the figure that the proposed method provides higher or almost equal classification accuracy for all six data sets. By comparing the above tables and charts, we can conclude that the proposed CEIS algorithm is an acceptable method to select the optimal attribute subset in an ISVDIS.

B. FRIEDMAN TEST AND NEMENYI TEST
Based on the obtained classification accuracies in the previous section, these two tests are used to further verify the stability of the proposed approach in this part.
Friedman test, as a nonparametric test method in statistics, is used to compare the overall performance of k algorithms on N data sets, and to draw the conclusion whether there are differences in the performance of these algorithms. If there are differences, we will carry out the next more detailed test, namely Nemenyi test. The Nemenyi test is used for determining which algorithms differ statistically in performance.
Suppose that N and k express the number of data sets and algorithms, respectively. ∀ i, r i can be viewed as the mean ordering of the i-th algorithm. Friedman statistic, denoted by F F , is defined as This statistic F F based on the F-distribution and distributed by k − 1 and (k − 1)(N − 1) degrees of freedom. In Friedman test, if the value of F F calculated is greater than F α (k − 1, N − 1), then the original hypothesis does not hold. Nemenyi test can be used to inquire into which algorithm   is better. Critical difference, denoted as CD α , is defined as follows:  where q α and α are the critical value and the significance level of this test, respectively. The difference between the average ranking of each pair of algorithms is compared with CD α , and if it is greater than CD α , In other words, the algorithm   with higher average ranking is statistically superior to the algorithm with lower average ranking; otherwise, there is no statistically difference between the two. Below, we select all datasets in TABLE 17. Friedman test is used to test the influence of parameter λ on the classification accuracy of different algorithms, and the test results show parameter λ has no significant effect on the classification accuracy of different algorithms. Thus we choose the classification accuracies of λ = 0.1 to test whether the classification accuracies of the four algorithms are   significantly different. From TABLEs 19-20, we can obtain the ranking of classification accuracies of reduced data with PART and J48, respectively (see . Note that k − 1 = 3, (k − 1)(N − 1) = 27 and F 0.05 (3, 27) = 2.96. Then F F = 4.344 with PART and F F = 20.978 with J48. Therefore, the two value of F F are outweigh the value of F 0.05 (3,27). That is to say, they rejects the original hypothesis at α = 0.05 under the Friedman tests. Therefore, there are      FIGUREs 18-19, we can come to the same conclusions as follows: the classification accuracy of CEIS is significantly higher than that of FSRS and DRM. There is no significant difference between CEIS and FRSM. There is no significant difference among FSRS, FRSM and DRM.

VIII. CONCLUSION
This article has studied attribute reduction for an ISVDIS by using entropy measurement. Connections between the proposed attribute reduction and λ-information entropy have been researched. An attribute reduction algorithm in an ISVDIS based on information entropy has been proposed. Moreover, we have showed that λ-reduction obtained from λ-information entropy are equivalent to those obtained from λ-rough approximations. Soon after, we will investigate some applications of λ-reduction in an ISVDIS.

THE APPENDIX OF SYMBOLS
For convenience, we give the appendix of these symbols as follows (see TABLE 23).