Skip to Main Content
In this paper, we present an efficient algorithm, called pattern reduction (PR) algorithm, to reduce the time required for data clustering based on iterative clustering algorithms. Conceptually similar to a lossy data compression scheme, this algorithm removes at each iteration those data patterns that are close to the centroid of a cluster or remain in the same cluster for a certain number of iterations in a row and are thus unlikely to be moved again from one cluster to another at later iterations by computing a new pattern to represent all the data patterns removed. Our simulation results - from 2 to 1,000 dimensions and 150 to 6,000,000 patterns - indicate that the proposed algorithm can reduce the computation time of k-means, genetic k-means algorithm (GKA) and k-means with genetic algorithm (KGA) from 10% up to about 80% and that for high dimensional data sets, it can even reduce the computation time for more than 70%.
Date of Conference: 7-10 Oct. 2007