Skip to Main Content
Nowadays the data collections are huge and in most cases do not reside in a centralised location. The latter complicates the task of traditional data mining techniques, as datasets are distributed and often heterogeneous. In this paper we propose a distributed approach based on the aggregation of models produced locally. The datasets will be processed locally on each node to produce clusters from local data then, we construct global clusters hierarchically. The aim of this approach is to minimise the communications, maximise the parallelism and load balance the work among different nodes of the system, and reduce the overhead due to extra processing while executing the hierarchical clustering. This technique is evaluated and compared to the sequential version using benchmark datasets and the results are very promising.