Skip to Main Content
In recent years, there has been increasing interest in exploring clustering as a technique to recover the architecture of software systems. The efficacy of clustering depends not only on the clustering algorithm, but also on the choice of entities, features and similarity measures used during clustering. It is also important to understand characteristics of the domain in which clustering is being applied, since the performance of different measures and algorithms may vary depending on these characteristics. In the software domain, the Jaccard similarity measure gives better results as compared to other similarity measures for binary features. In this paper, we highlight cases where the Jaccard measure may fail to capture similarity between entities appropriately. We propose a new similarity measure which overcomes these deficiencies. Our experimental results indicate the better performance of the new similarity measure for software systems exhibiting the defined characteristics.