Skip to Main Content
This paper investigates two essential questions related to data-driven, software cost modeling: (1) What modeling techniques are likely to yield more accurate results when using typical software development cost data? and (2) What are the benefits and drawbacks of using organization-specific data as compared to multi-organization databases? The former question is important in guiding software cost analysts in their choice of the right type of modeling technique, if at all possible. In order to address this issue, we assess and compare a selection of common cost modeling techniques fulfilling a number of important criteria using a large multi-organizational database in the business application domain. Namely, these are: ordinary least squares regression, stepwise ANOVA, CART, and analogy. The latter question is important in order to assess the feasibility of using multi-organization cost databases to build cost models and the benefits gained from local, company-specific data collection and modeling. As a large subset of the data in the multi-company database came from one organization, we were able to investigate this issue by comparing organization-specific models with models based on multi-organization data. Results show that the performances of the modeling techniques considered were not significantly different, with the exception of the analogy-based models which appear to be less accurate. Surprisingly, when using standard cost factors (e.g., COCOMO-like factors, Function Points), organization specific models did not yield better results than generic, multi-organization models.