Skip to Main Content
This paper presents a robust and efficient method to discover knowledge for classification problems through data summarization. It discretizes continuous features and then summarizes the data using a contingency table. Inconsistency rate for different subsets of features can then be easily calculated from the contingency table. Sequential search is then used to find the best feature subset. After the number of features is reduced to a certain extent, easy-to-understand knowledge can be intuitively derived from data summary. Another desirable feature of the proposed method is its capability to learn incrementally; namely, knowledge can be updated quickly whenever new data are obtained. Moreover, the proposed method is capable of handling missing values when used for prediction. The method is applied on two benchmark data sets showing its effectiveness on selecting discriminative features. The practical usefulness of this method in manufacturing is demonstrated through an application on welding fault diagnosis.