Journals & Magazines >IEEE Transactions on Software... >Volume: 45 Issue: 12

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Context: Software defect prediction (SDP) is an important challenge in the field of software engineering, hence much research work has been conducted, most notably throug...Show More

Metadata

Abstract:

Context: Software defect prediction (SDP) is an important challenge in the field of software engineering, hence much research work has been conducted, most notably through the use of machine learning algorithms. However, class-imbalance typified by few defective components and many non-defective ones is a common occurrence causing difficulties for these methods. Imbalanced learning aims to deal with this problem and has recently been deployed by some researchers, unfortunately with inconsistent results. Objective: We conduct a comprehensive experiment to explore (a) the basic characteristics of this problem; (b) the effect of imbalanced learning and its interactions with (i) data imbalance, (ii) type of classifier, (iii) input metrics and (iv) imbalanced learning method. Method: We systematically evaluate 27 data sets, 7 classifiers, 7 types of input metrics and 17 imbalanced learning methods (including doing nothing) using an experimental design that enables exploration of interactions between these factors and individual imbalanced learning algorithms. This yields 27 × 7 × 7 × 17 = 22491 results. The Matthews correlation coefficient (MCC) is used as an unbiased performance measure (unlike the more widely used F1 and AUC measures). Results: (a) we found a large majority (87 percent) of 106 public domain data sets exhibit moderate or low level of imbalance (imbalance ratio <; 10; median = 3.94); (b) anything other than low levels of imbalance clearly harm the performance of traditional learning for SDP; (c) imbalanced learning is more effective on the data sets with moderate or higher imbalance, however negative results are always possible; (d) type of classifier has most impact on the improvement in classification performance followed by the imbalanced learning method itself. Type of input metrics is not influential. (e) only 52% of the combinations of Imbalanced Learner and Classifier have a significant positive effect. Conclusion: This paper offers two practical ...

Published in: IEEE Transactions on Software Engineering ( Volume: 45, Issue: 12, 01 December 2019)

Page(s): 1253 - 1269

Date of Publication: 15 May 2018

ISSN Information:

DOI: 10.1109/TSE.2018.2836442

Funding Agency:

Contents

References is not available for this document.

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?