Skip to Main Content
Clustering is a predominant data mining task which attempts to partition a group of unlabelled data instances into distinct clusters. The clusters so obtained will have maximum intra-cluster similarity and minimum inter-cluster similarity. Several clustering techniques have been proposed in literature, which includes stand-alone as well as ensemble clustering techniques. Most of them lack robustness and suffer from an important drawback that they cannot effectively visualize clustering results to help knowledge discovery and constructive learning. Recently, clustering techniques via visualization of data have been proposed. These rely on building a Self Organizing Map (SOM) originally proposed by Kohonen. Even though Kohonen SOM preserves topology of the input data, it is widely observed that the clustering accuracy achieved by SOM is poor. To perform robust and accurate clustering using SOM, a cluster ensemble framework based on input constraints is proposed in this paper. Cluster ensemble is a set of clustering solutions obtained as a result of individual clustering on subsets of the original high-dimensional data. The final consensus matrix is fed to a neural network which transforms the input data to a lower-dimensional output map. The map clearly depicts the distribution of input data instances into clusters.