To have general validity, empirical results must converge. To be credible, an experimental science must understand the limitations and be able to explain the disagreements of empirical results. We describe an experiment to replicate previous studies which claim that estimation by analogy outperforms regression models. In the experiment, 68 experienced practitioners each estimated a project from a dataset of 48 industrial COTS projects. We applied two treatments, an analogy tool and a regression model, and we used the estimating performance when aided by the historical data as the control. We found that our results do not converge with previous results. The reason is that previous studies have used other datasets and partially different data analysis methods, and last but not least, the tools have been validated in isolation from the tool users. This implies that the results are sensitive to the experimental design: the characteristics of the dataset, the norms for removing outliers and other data points from the original dataset, the test metrics, significance levels, and the use of human subjects and their level of expertise. Thus, neither our results nor previous results are robust enough to claim any general validity.