Skip to Main Content
Performance of SVM is greatly limited when it is used to imbalanced datasets in which the classification categories are not approximately equally represented. In real world datasets are often composed of "normal" examples with only a small percentage of "abnormal" examples. Under-sampling of majority class and over-sampling minority class are two obvious ways to balance the datasets before training. SMOTE algorithm is a simple and effective over-sampling technique. But SMOTE algorithm ignores data distribution and density information which is important to synthesize minority examples. SMOTE algorithm cannot effectively eliminate the influence of noise either. A novel over-sampling algorithm-SMOBD is proposed and shows better performance in experiments. We also combine this algorithm with different error costs SVM. We compare the performance of our algorithm against regular SVM, SMOTE, SMOTE-ENN, SDC (SMOTE with different costs of SVM) and the experiment results show our algorithm outperforms all of them.