Skip to Main Content
Many important applications in biology have underlying datasets that are relational, that is, only the (dis)similarity between biological objects (amino acid sequences, gene expression profiles, etc.) is known and not their feature values in some feature space. Examples of such relational datasets are the gene similarity matrices obtained from BLAST, gene expression data, or gene ontology (GO) similarity measures. Once a relational dataset is obtained, a common question asked is how many groups of objects are represented in the original dataset. The answer to this question is usually obtained by employing a clustering algorithm and a cluster validity measure. In this article we describe a cluster validity measure for non-Euclidean relational fuzzy c-means that is based on the correlation between a relation induced on the data by the cluster memberships and the original relational data. This validity measure can be applied to partitions made by any fuzzy relational clustering algorithm. We illustrate our measure by validating clusters in several dissimilarity matrices for a set of 194 gene products obtained using BLAST and GO similarities.