Skip to Main Content
Clustering is the process of discovering groups within multidimensional data, based on similarities, with a minimal, if any, knowledge of their structure. Distributed data clustering is a recent approach to deal with geographically distributed databases, since traditional clustering methods require centering all databases in a single dataset. Moreover, current privacy requirements in distributed databases demand algorithms with the ability to process clustering securely. Among the unsupervised neural network models, the self-organizing map (SOM) plays a major role. SOM features include information compression while trying to preserve the topological and metric relationship of the primary data space. This paper presents a strategy for efficient cluster analysis in geographically distributed databases using SOM networks. Local datasets relative to database vertical partitions are applied to distinct maps in order to obtain partial views of the existing clusters. Units of each local map are chosen to represent original data and are sent to a central site, which performs a fusion of the partial results. Experimental results are presented for different datasets.