Skip to Main Content
Cluster analysis is concerned with the problem of partitioning a given set of entities into homogeneous and well-separated subsets called clusters. The concepts of homogeneity and of separation can be made precise when a measure of dissimilarity between the entities is given. Let us define the diameter of a partition of the given set of entities into clusters as the maximum dissimilarity between any pair of entities in the same cluster and the split of a partition as the minimum dissimilarity between entities in different clusters. The problems of determining a partition into a given number of clusters with minimum diameter (i.e., a partition of maximum homogeneity) or with maximum split (i.e., a partition of maximum separation) are first considered. It is shown that the latter problem can be solved by the classical single-link clustering algorithm, while the former can be solved by a graph-theoretic algorithm involving the optimal coloration of a sequence of partial graphs, described in more detail in a previous paper. A partition into a given number of clusters will be called efficient if and only if there exists no partition into at most the same number of clusters with smaller diameter and not smaller split or with larger split and not larger diameter. Two efficient partitions are called equivalent if and only if they have the same values for the split and for the diameter.