By Topic

Predicting the location and number of faults in large software systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ostrand, T.J. ; AT&T Labs., Florham Park, NJ, USA ; Weyuker, E.J. ; Bell, R.M.

Advance knowledge of which files in the next release of a large software system are most likely to contain the largest numbers of faults can be a very valuable asset. To accomplish this, a negative binomial regression model has been developed and used to predict the expected number of faults in each file of the next release of a system. The predictions are based on the code of the file in the current release, and fault and modification history of the file from previous releases. The model has been applied to two large industrial systems, one with a history of 17 consecutive quarterly releases over 4 years, and the other with nine releases over 2 years. The predictions were quite accurate: for each release of the two systems, the 20 percent of the files with the highest predicted number of faults contained between 71 percent and 92 percent of the faults that were actually detected, with the overall average being 83 percent. The same model was also used to predict which files of the first system were likely to have the highest fault densities (faults per KLOC). In this case, the 20 percent of the files with the highest predicted fault densities contained an average of 62 percent of the system's detected faults. However, the identified files contained a much smaller percentage of the code mass than the files selected to maximize the numbers of faults. The model was also used to make predictions from a much smaller input set that only contained fault data from integration testing and later. The prediction was again very accurate, identifying files that contained from 71 percent to 93 percent of the faults, with the average being 84 percent. Finally, a highly simplified version of the predictor selected files containing, on average, 73 percent and 74 percent of the faults for the two systems.

Published in:

Software Engineering, IEEE Transactions on  (Volume:31 ,  Issue: 4 )