Skip to Main Content
Traditional clustering techniques are inapplicable to problems where the relationships between data points evolve over time. Not only is it important for the clustering algorithm to adapt to the recent changes in the evolving data, but it also needs to take the historical relationship between the data points into consideration. In this paper, we propose ECKF, a general framework for evolutionary clustering large-scale data based on low-rank kernel matrix factorization. To the best of our knowledge, this is the first work that clusters large evolutionary data sets by the amalgamation of low-rank matrix approximation methods and matrix factorization-based clustering. Since the low-rank approximation provides a compact representation of the original matrix, and especially, the near-optimal low-rank approximation can preserve the sparsity of the original data, ECKF gains computational efficiency and hence is applicable to large evolutionary data sets. Moreover, matrix factorization-based methods have been shown to effectively cluster high-dimensional data in text mining and multimedia data analysis. From a theoretical standpoint, we mathematically prove the convergence and correctness of ECKF, and provide detailed analysis of its computational efficiency (both time and space). Through extensive experiments performed on synthetic and real data sets, we show that ECKF outperforms the existing methods in evolutionary clustering.
Knowledge and Data Engineering, IEEE Transactions on (Volume:24 , Issue: 6 )
Date of Publication: June 2012