Abstract:
Compared to traditional centralized clustering, distributed clustering offers the advantage of parallel processing of data from different sites, enhancing the efficiency ...Show MoreMetadata
Abstract:
Compared to traditional centralized clustering, distributed clustering offers the advantage of parallel processing of data from different sites, enhancing the efficiency of clustering while preserving the privacy of the data at each site. However, most existing distributed clustering techniques require manual tuning of several parameters or hyperparameters, which can pose challenges for practical applications. This paper introduces a novel parameter-free distributed clustering framework known as distributed torque clustering (DTC). When dealing with data or subdata distributed across various sites, DTC predominantly executes two steps. The first step is a data reduction at each site using torque clustering, and the second step involves performing global clustering with weighted torque clustering. We compare DTC against six state-of-the-art distributed clustering algorithms and automatic centralized clustering techniques on ten large-scale or medium-scale datasets. The results show that the average rank of DTC is at least three times better than those of the other algorithms across all the datasets. Additionally, DTC can accurately predict the ground-truth number of clusters in nine out of ten datasets, further demonstrating its competitive performance and practical potential.
Published in: IEEE Transactions on Emerging Topics in Computational Intelligence ( Volume: 9, Issue: 2, April 2025)