Skip to Main Content
A methodology for clustering multi-relational data is proposed. Initially, tuple linkages in the database schema of the multi-relational entities are leveraged to virtually organize the available relational data into as many transactions, i.e. sets of feature-value pairs. The identified transactions are then partitioned into homogeneous groups. Each discovered cluster is equipped with a representative, that provides an explanation of the corresponding group of transactions, in terms of those feature-value pairs that are most likely to appear in a transaction belonging to that particular group. Outlier data are placed into a trash cluster, that is finally partitioned to mitigate the dissimilarity between the trash cluster and the previously generated clusters.