Skip to Main Content
When clustering produces more than one candidate to partition a finite set of objects O, there are two approaches to validation (i.e., selection of a “best” partition, and implicitly, a best value for c , which is the number of clusters in O). First, we may use an internal index, which evaluates each partition separately. Second, we may compare pairs of candidates with each other, or with a reference partition that purports to represent the “true” cluster structure in the objects. This paper generalizes many of the classical indices that have been used with outputs of crisp clustering algorithms so that they are applicable for candidate partitions of any type (i.e., crisp or soft, with soft comprising the fuzzy, probabilistic, and possibilistic cases). Space prevents inclusion of all of the possible generalizations that can be realized this way. Here, we concentrate on the Rand index and its modifications. We compare our fuzzy-Rand index with those of Campello, Hullermeier and Rifqi, and Brouwer, and show that our extension of the Rand index is O(n), while the other three are all O(n2). Numerical examples are given to illustrate various facets of the new indices. In particular, we show that our indices can be used, even when the partitions are probabilistic or possibilistic, and that our method of generalization is valid for any index that depends only on the entries of the classical (i.e., four-pair types) contingency table for this problem.