Skip to Main Content
Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. In a sample of 140 manually classified mutations of seven Java programs with 5,000 to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent; (b) manual classification takes time-about 15 minutes per mutation; (c) coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d) coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source JAVALANCHE framework; the data set is publicly available for replication and extension of experiments.