Skip to Main Content
The relevance of a set of measured features describing labeled patterns within a problem domain affects classifier performance. Feature subset selection algorithms employing a wrapper approach typically assess the fitness of a feature subset simply as the accuracy of a given classifier over a set of available patterns using the candidate feature set. For datasets with many patterns for some classes and few for others, relatively high accuracy may be achieved simply by labeling unknown patterns according to the largest class. Feature selection wrappers that only emphasize high accuracy typically follow this bias. Class bias may be mitigated by emphasizing well-balanced accuracy during the optimization algorithm. This paper proposes adding selective pressure for balanced accuracy to mitigate class bias during feature set evolution. Experiments compare the selection performance of genetic algorithms using various fitness functions varying in terms of accuracy, balance, and feature parsimony. Several feature selection algorithms including greedy, genetic, filter, and hybrid filter/GA approaches are then compared using the best fitness function. The experiments employ a naive Bayes classifier and public domain datasets. The results suggest that improvements to class balance and feature subset size can be made without compromising overall accuracy or run-time efficiency.