Abstract:
Real-world datasets tend to be complex, large in size, and may contain many irrelevant features. Eliminating such irrelevant features can significantly improve the perfor...Show MoreMetadata
Abstract:
Real-world datasets tend to be complex, large in size, and may contain many irrelevant features. Eliminating such irrelevant features can significantly improve the performance of a data mining algorithm. In this paper, we propose a multi-objective genetic algorithm that finds a set of Pareto-optimal feature subsets that works as a wrapper around a standard back-propagation algorithm. We also introduce a novel mechanism called the least-crowded selection algorithm that maximizes the diversity of the solutions returned by the algorithm. We justify the proposed method by theoretically and empirically comparing it to the backpropagation neural network and the simple genetic algorithm for feature selection.
Date of Conference: 31 July 2005 - 04 August 2005
Date Added to IEEE Xplore: 27 December 2005
Print ISBN:0-7803-9048-2