In this paper, we present an efficient algorithm, called pattern reduction (PR) algorithm, to reduce the time required for data clustering based on iterative clustering algorithms. Conceptually similar to a lossy data compression scheme, this algorithm removes at each iteration those data patterns that are close to the centroid of a cluster or remain in the same cluster for a certain number of iterations in a row and are thus unlikely to be moved again from one cluster to another at later iterations by computing a new pattern to represent all the data patterns removed. Our simulation results - from 2 to 1,000 dimensions and 150 to 6,000,000 patterns - indicate that the proposed algorithm can reduce the computation time of k-means, genetic k-means algorithm (GKA) and k-means with genetic algorithm (KGA) from 10% up to about 80% and that for high dimensional data sets, it can even reduce the computation time for more than 70%.
Published in:
Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on
Date of Conference: 7-10 Oct. 2007