Skip to Main Content
Learning from imbalanced data has taken great interest on machine learning community because it is often present on many practical applications and reliability of learning algorithms is affected. A dataset is imbalanced if there is a great difference between observations from each class. Classification methods that do not consider this phenomenon are prone to produce decision boundaries totally biased towards the majority class. Today, assembly methods like DataBoost-IM combine sampling strategies with Boosting, and oversampling methods. However, when the input data has much noise these algorithms tend to reduce their performances. This work present a new method to deal with imbalanced data called SwarmBoost that combines Bossting, oversampling, and sub sampling based in optimization criteria to select samples. The results show that SwarmBoost has a better performance than DataBoost-IM and Smote for several databases.