Skip to Main Content
Class imbalance tends to cause inferior performance in data mining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complete dataset to train a classification model. Evolutionary sampling works to remove noisy and duplicate instances so that the sampled training data will produce a superior classifier. We propose this novel technique as a method to handle severe class imbalance in data mining. This paper presents our research into the the use of evolutionary sampling with C4.5 decision trees and compares the technique's performance with random undersamp ling.