Skip to Main Content
Software end-users are the best testers, who keep revealing bugs in software that has undergone rigorous in-house testing. In order to leverage their testing efforts, failure reporting components have been widely deployed in released software. Many utilities of the collected failure data depend on an effective failure indexing technique, which, at the optimal case, would index all failures due to the same bug together. Unfortunately, the problem of failure proximity, which underpins the effectiveness of an indexing technique, has not been systematically studied. This article presents the first systematic study of failure proximity. A failure proximity consists of two components: a fingerprinting function that extracts signatures from failures, and a distance function that calculates the likelihood of two failures being due to the same bug. By considering different instantiations of the two functions, we study an array of six failure proximities (two of them are new) in this article. These proximities range from the simplest approach that checks failure points to the most sophisticated approach that utilizes fault localization algorithms to extract failure signatures. Besides presenting technical details of each proximity, we also study the properties of each proximity and tradeoffs between proximities. These altogether deliver a systematic view of failure proximity.