Skip to Main Content
This paper presents an experimental study on the impact of low class prevalence on the neural network based classifier performance as measured using receiver operator characteristic (ROC) analysis. Two methods of dealing with the problem are investigated: oversampling and undersampling in the context of varying the class prevalence and the size of training datasets with uncorrelated and correlated features. The results show that the class imbalance can significantly decrease the classifier performance especially in the case of small training datasets. Furthermore, the oversampling method is shown to be more effective than the undersampling method in compensating the class imbalance. Statistically significant differences, however, are observed only in the cases with large total number of samples and very low prevalence.