Skip to Main Content
Feature selection is a key problem to pattern recognition and machine learning, and it is difficult to get the optimal feature subset for its NP-hard. Currently, the dimensionality of feature set or instance set is very high in many applications, such as information retrieval, so the feature selection from high-dimensional data is also an urgent task for researchers. This paper presents a new approach, which is a two-level filter model system integrating the relief and a newly developed algorithm of feature cluster, to reduce the dimensionality of large-scale feature set via the feature correlation (relevance) including the feature-feature correlation and feature-class correlation. Our major contributions are: (1) to present a system to perform feature selection from high-dimensional data; (2) to analyze the change of system architecture according to the time cost of the parts in the system; (3) to summarize and comment on the calculations of feature correlation; (4) to perform experiments to show the effective of the proposed approach, which has shown that the system can efficiently get a better compromise between dimensionality reduction and accuracy rate of classification than just part of the system. In many cases, it can improve the accuracy rate and dimensionality reduction.