Skip to Main Content
This paper presents a CPCluster Map Reduce algorithm to achieve parallelism in cloud computing platform for clustering large, high-dimensional datasets. The proposed Map Reduce paradigm based clustering algorithm improves the traditional cluster algorithm in a parallelized way. It is scalability and has a good acceleration capability, and by adding the compute nodes, speedup is achieved. Experimental results show that the CPCluster Map Reduce algorithm works much better than traditional cluster algorithm, especially when the number of samples in the data sets increases.