Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. In this paper we present a comparative study on different clustering algorithms with respect to k - means clustering to work on large data sets. In this paper we present a comparison among some nonhierarchical and hierarchical clustering algorithms including SOM (Self-Organization Map) neural networks methods. Data were simulated considering correlated and uncorrelated variables, non overlapping and overlapping clusters with and without outliers. Tested with Telecommunication Users and Iris Flower data set, the comparative algorithms had demonstrated a very good classification performance. Experiments on a very large telecommunication data set set consisting of 1000 records and 32 categorical attributes & Iris Flower data set consisting of 150 samples show that the SOM clustering with respect to k means & hierarchical clustering algorithm is scalable in terms of both the number of clusters and the number of records.
Published in:
Methods and Models in Computer Science (ICM2CS), 2010 International Conference on
Date of Conference: 13-14 Dec. 2010