Skip to Main Content
Clustering is unsupervised learning where ideally class levels and number of clusters (K) are not known. K-clustering can be categorized as semi-supervised learning where K is known. Here we have considered K-Clustering with simultaneous feature selection. Feature subset selection helps to identify relevant features for clustering, increase understandability, better scalability and improve accuracy. Here we have used two measures, intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) for clustering. Measures are using mod distance per feature suitable for categorical features (attributes). Rather than combining H and S to frame the problem as single objective optimization problem, we use multi objective genetic algorithm (MOGA) to find out diverse solutions near to Pareto optimal front in the two-dimensional objective space. Each evolved solution represents a set of cluster modes (CMs) build by selected feature subset. Here, K-modes is hybridized with MOGA. We have used hybridized GA to combine global searching powers of GA with local searching powers of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA. Results on 3 benchmark data sets from UCI Machine Learning Repository containing categorical features shows the superiority of the algorithm.