Skip to Main Content
Data mining is the process of knowledge discovery in databases (centralized or distributed); it consists of different tasks associated with them different algorithms. Nowadays the scenario of one centralized database that maintains all the data is difficult to achieve due to different reasons including physical, geographical restrictions and size of the data itself. One approach to solve this problem is distributed databases where different parities have horizontal or vertical partitions of the data. The data is normally maintained by more than one organization, each of which aims at keeping its information stored in the databases private, thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed data when privacy is highly concerned. Cluster analysis is a frequently used data mining task which aims at decomposing or partitioning a usually multivariate data set into groups such that the data objects in one group are the most similar to each other. It has an important role in different fields such as bio-informatics, marketing, machine learning, climate and healthcare. In this paper we introduce a novel clustering algorithm that was designed with the goal of enabling a privacy preserving version of it, along with sub-protocols for secure computations, to handle the clustering of vertically partitioned data among different healthcare data providers.