Skip to Main Content
Current software quality estimation models often involve using supervised learning methods to train a software quality classifier or a software fault prediction model. In such models, the dependent variable is a software quality measurement indicating the quality of a software module by either a risk-based class membership (e.g., whether it is fault-prone or not fault-prone) or the number of faults. In reality, such a measurement may be inaccurate, or even unavailable. In such situations, this paper advocates the use of unsupervised learning (i.e., clustering) techniques to build a software quality estimation system, with the help of a software engineering human expert. The system first clusters hundreds of software modules into a small number of coherent groups and presents the representative of each group to a software quality expert, who labels each cluster as either fault-prone or not fault-prone based on his domain knowledge as well as some data statistics (without any knowledge of the dependent variable, i.e., the software quality measurement). Our preliminary empirical results show promising potentials of this methodology in both predicting software quality and detecting potential noise in a software measurement and quality dataset.