Skip to Main Content
With the fast development of storage technologies, large-scale and high dimensional datasets are stored in a distributed way. It usually applies distributed clustering algorithms to cluster distributed datasets. This paper presents a distributed clustering algorithm based on Clique and high dimensionality reduction to do the distributed clustering. Moreover, the efficiency, accuracy and extendibility of clustering analysis are improved by self-adapting algorithms and the assistant of data and mission parallelism in master or child node. Through experiments, we show that DPA-CLIQU efficiently finds accurate clusters in large high dimensional datasets from a distributed system.