By Topic

Using simulated data sets to compare data analysis techniques used for software cost modelling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $31
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Pickard, L. ; Dept. of Comput. Sci., Keele Univ., UK ; Kitchenham, B. ; Linkman, S.J.

The goals of the study presented were to compare different data analysis methods and to demonstrate the viability of simulation as a mechanism to allow such comparisons. Simulation was used to create data sets with a known underlying model and with non-Normal characteristics that are frequently found in software data sets: skewness, unstable variance, and outliers and combinations of these characteristics. Three data analysis approaches were investigated: residual analysis; multiple regression; classification and regression trees (CART). In addition to the standard statistical 'least squares' version of each method, robust and non-parametric versions of the techniques were also investigated. It was found that standard multiple regression techniques were best if the data only exhibited moderate non-Normality. As might be expected, under more extreme conditions such as severe heteroscedasticity, the non-parametric techniques performed best. It was more surprising to find that under strongly non-Normal conditions the robust and nonparametric residual analysis techniques performed as well as the conventional robust and nonparametric versions of multiple regression. However, the most important result of the study is to demonstrate the value of simulation as a technique for evaluating different data analysis techniques under controlled conditions

Published in:

Software, IEE Proceedings -  (Volume:148 ,  Issue: 6 )