Skip to Main Content
The existing fault-proneness prediction methods are based on unsampling and the training dataset does not contain the information on the number of faults of each module and the fault distributions among these modules. In this paper, we propose an oversampling method using the number of faults to improve fault-proneness prediction. Our method uses the information on the number of faults in the training dataset to support better prediction of fault-proneness. Our test illustrates that the difference between the predictions of oversampling and unsampling is statistically significant and our method can improve the prediction of two probability models, i.e. logistic regression and naive Bayes with kernel estimators.