Skip to Main Content
Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results; however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype-based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype-based clusterings, previously proposed indexes-mostly devised for the evaluation of point-based clusterings-usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this paper, we propose a new index, Conn_Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities, or overlaps. We construct Conn_Index based on inter- and intra-cluster connectivities of prototypes. Connectivities are defined through a “connectivity matrix”, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn_Index outperforms existing validity indices, used in this paper, for the evaluation of prototype-based clustering results.