Skip to Main Content
This paper describes a study performed in an industrial setting that attempts to build predictive models to identify parts of a Java system with a high fault probability. The system under consideration is constantly evolving as several releases a year are shipped to customers. Developers usually have limited resources for their testing and inspections and would like to be able to devote extra resources to faulty system parts. The main research focus of this paper is two-fold: (1) use and compare many data mining and machine learning techniques to build fault-proneness models based mostly on source code measures and change/fault history data, and (2) demonstrate that the usual classification evaluation criteria based on confusion matrices may not be fully appropriate to compare and evaluate models.