Abstract:
Background: The SZZ algorithm was proposed to identify bug-introducing changes, i.e., changes that are likely to induce bugs. Previous studies improved its implementation...Show MoreMetadata
Abstract:
Background: The SZZ algorithm was proposed to identify bug-introducing changes, i.e., changes that are likely to induce bugs. Previous studies improved its implementation and evaluated its results.Aims: To address existing limitations of SZZ to improve the maturity of the algorithm. We also aim to verify if the improvements that have been proposed to the SZZ algorithm also hold in different datasets.Method: We re-evaluate two recent SZZ implementations using an adaptation of the Defects4J dataset, which works as a preprocessed dataset that can be used by SZZ. Furthermore, we revisit the limitations of RA-SZZ (refactoring aware SZZ) to improve the precision and recall of the algorithm.Results: We observe that a median of 44% of the lines that are flagged by the improved SZZ are very likely to introduce a bug. We manually analyze the SZZ-generated data and observe that there exist refactoring operations (31.17%) and equivalent changes (13.64%) that are still misidentified by the improved SZZ.Conclusion: By preprocessing the dataset that is used as input by SZZ, the accuracy of SZZ may be considerably improved. For example, we observe that SZZ implementations are approximately 40% more accurate if only valid bug-fix lines are used as the input for SZZ.
Published in: 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
Date of Conference: 19-20 September 2019
Date Added to IEEE Xplore: 17 October 2019
ISBN Information: