Abstract:
Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges,...Show MoreMetadata
Abstract:
Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges, as common classifiers assume a balanced distribution of examples' classes in the data. Sampling techniques overcome the imbalance in the data by modifying the examples' classes distribution. Unfortunately, selecting a sampling technique together with its parameters is still an open problem. Current solutions include the brute-force approach (try as many techniques as possible), and the random search approach (choose the most appropriate from a random subset of techniques). In this work, we propose a new method to select sampling techniques for imbalanced data sets. It uses Meta-Learning and works by recommending a technique for an imbalanced data set based on solutions to previous problems. Our experimentation compared the proposed method against the brute-force approach, all techniques with their default parameters, and the random search approach. The results of our experimentation show that the proposed method is comparable to the brute-force approach, outperforms the techniques with their default parameters most of the time, and always surpasses the random search approach.
Date of Conference: 09-12 October 2016
Date Added to IEEE Xplore: 02 February 2017
ISBN Information: