An efficient k-means clustering algorithm: analysis andimplementation
Kanungo, T.
Mount, D.M.
Netanyahu, N.S.
Piatko, C.D.
Silverman, R.
Wu, A.Y.
Almaden Res. Center, San Jose, CA;
This paper appears in: Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publication Date: Jul 2002
Volume: 24,
Issue: 7
On page(s): 881-892
ISSN: 0162-8828
References Cited: 50
CODEN: ITPIDJ
INSPEC Accession Number: 7324832
Digital Object Identifier: 10.1109/TPAMI.2002.1017616
Current Version Published: 2002-08-07
Abstract
In k-means clustering, we are given a set of n data points in
d-dimensional space Rd and an integer k and the problem is to
determine a set of k points in Rd, called centers, so as to minimize the
mean squared distance from each data point to its nearest center. A
popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We
present a simple and efficient implementation of Lloyd's k-means
clustering algorithm, which we call the filtering algorithm. This
algorithm is easy to implement, requiring a kd-tree as the only major
data structure. We establish the practical efficiency of the filtering
algorithm in two ways. First, we present a data-sensitive analysis of
the algorithm's running time, which shows that the algorithm runs faster
as the separation between clusters increases. Second, we present a
number of empirical studies both on synthetically generated data and on
real data sets from applications in color quantization, data
compression, and image segmentation
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.