Aiming at the classification problem of large-scale document information, a large-scale data clustering algorithm based on improved CURE algorithm is proposed. By clustering the data partition and the initial class of after partition, data tracking, the large-scale data hierarchical clustering and sample classification is achieved, that better solved the balance of clustering quality and clustering effectiveness. Taking the actual document processing of Large-scale network data, the experiment results show that the algorithm is efficient.
Published in:
IT in Medicine and Education (ITME), 2011 International Symposium on
(Volume:1
)
Date of Conference: 9-11 Dec. 2011