Skip to Main Content
Data mining is a part in the process of Knowledge discovery from data (KDD). The performance of data mining algorithms mainly depends on the effectiveness of preprocessing algorithms. Dimensionality reduction plays an important role in preprocessing. By research, many methods have been proposed for dimensionality reduction, beside the feature subset selection and feature-ranking methods show significant achievement in dimensionality reduction by removing irrelevant and redundant features in high-dimensional data. This improves the prediction accuracy of the classifier, reduces the false prediction ratio and reduces the time and space complexity for building the prediction model. This paper presents an empirical study analysis on feature subset evaluators Cfs, Consistency and Filtered, Feature Rankers Chi-squared and Information-gain. The performance of these methods is analyzed with the focus on dimensionality reduction and improvement of classification accuracy using wide range of test datasets and classification algorithms namely probability-based Naive Bayes, tree-based C4.5(J48) and instance-based IBl.