Skip to Main Content
Software projects often use bug-tracking tools to keep track of reported bugs and to provide a communication platform to discuss possible solutions or ways to reproduce failures. The goal is to reduce testing efforts for the development team. However, often, multiple bug reports are committed for the same bug, which, if not recognized as duplicates, can result in work done multiple times by the development team. Duplicate recognition is, in turn, tedious, requiring to examine large amounts of bug reports. Previous work addresses this problem by employing natural-language processing and text similarity measures to automate bug-report duplicate detection. The downside of these techniques is that, to be applicable, they require a reporting user to go through the time-consuming process of describing the problem, just to get informed that the bug is already known. To address this problem, we propose an approach that only uses stack traces and their structure as input to machine-learning algorithms for detecting bug-report duplicates. The key advantage is that stack traces are available without a written bug report. Experiments on bug reports from the Eclipse project show that our approach performs as good as state-of-the-art techniques, but without requiring the whole text corpus of a bug report to be available.