Skip to Main Content
This article deals with the development of an improved clustering technique for categorical data that is based on the identification of points having significant membership to multiple classes. Cluster assignments of such points are difficult, and they often affect the actual partitioning of the data. As a consequence, it may be more effective if the points that are associated with maximum confusion regarding their cluster assignments are first identified and excluded from consideration at the first stage of algorithm and these points may be assigned to one of the identified clusters based on an ANN classifier at the second stage of this algorithm. At the first stage of this algorithm we are using our developed genetic algorithm and simulated annealing based fuzzy clustering and well known fuzzy C-medoids algorithm when the number of clusters is known a priori. The performance of the proposed clustering algorithms has been compared with the average linkage hierarchical clustering algorithm, in addition to the genetic algorithm based fuzzy clustering, simulated annealing based fuzzy clustering and fuzzy C-medoids with ANN for a variety of artificial and real life categorical data sets. Also statistical significance test have been performed to establish the superiority of the proposed algorithm.