Skip to Main Content
In this paper, we introduce a semisupervised approach for clustering and aggregating relational data (SS-CARD). We assume that data is available in a relational form, where information only about the degrees to which pairs of objects in the dataset are related is available. Moreover, we assume that the relational information is represented by multiple dissimilarity matrices. These matrices could have been generated using different features, different mappings, or even different sensors. SS-CARD is designed to aggregate pairwise distances from multiple relational matrices, partition the data into clusters, and learn a relevance weight for each matrix in each cluster simultaneously. These weights have two main advantages. First, they help in partitioning the data into more meaningful clusters. Second, they can be used as part of a more complex learning system to enhance its learning behavior. SS-CARD uses partial supervision information that consists of a small set of constraints on which instances (should link) or ( should not link) reside in the same cluster. This additional information can guide the algorithm in learning the optimal relevance weights and in generating a better partition. The performance of the proposed algorithm is illustrated by using it in two different applications. The first one consists of categorizing the discrete nominal-valued mushroom data. The second application consists of categorizing a collection of images where each image is represented by several continuous features. For both applications, we represent the pairwise image dissimilarities by multiple relational matrices extracted from different feature sets. The results are compared with those obtained by three traditional relational clustering methods. We show that the partial supervision information and the learned aggregation weights can improve the results significantly.