Skip to Main Content
This paper considers k-Means clustering of incomplete data sets including missing values. Although the main purpose of k-Means clustering is to partition samples into several homogeneous clusters by minimizing within-cluster errors, it has been shown that a relaxed solution of k-Means can be recovered in a PCA-guided manner. In this paper, the PCA-guided k-Means procedure is extended to a situation in which some observations are missing. Principal component scores, which can be identified with a rotated solution of cluster indicators of k-Means clustering, are estimated in an iterative process without imputation. Besides solving the eigenvalue problem of covariance matrices, k-Means-like partitions are derived through lower rank approximation of the data matrix ignoring missing elements. Several experimental results demonstrate that the PCA-guided process is more robust to initialization problems even though it is based on iterative optimization, just as the k-Means procedure is.