The k-means clustering is one of the most popular schemes to solve the problem of clustering. This paper investigates the approximate algorithm for the k-means clustering by means of selecting the k initial points used as centers from the original point set. It is proved that an expected 2-approximation factor can be obtained, if k centers belong to one of the optimal sub cluster points respectively. To find these k points, a randomized algorithm is proposed which obtain an expected 2-approximation factor with high probability. This algorithm selects some points from the original points to be used as candidate centers, and the size of the sample is based on having at least points of each cluster. At last, some-examples are selected to verify our algorithm and get good results.
Published in:
Intelligent Control and Automation, 2008. WCICA 2008. 7th World Congress on
Date of Conference: 25-27 June 2008