Secure and Evaluable Clustering based on a Multifunctional and Privacy-Preserving Outsourcing Computation Toolkit

Although tremendous revolution has been made in the emerging cloud computing technologies over digital devices, privacy gradually becomes a big concern in outsourcing computation. Homomorphic encryption has been proposed to facilitate the preservation of data privacy while computational tasks being executed on ciphertext. However, many existing studies only support limited homomorphic calculation functions which barely satisfy complex computing tasks such as machine learning with massive computing resources and rich types of function. To address this problem, a novel multifunctional and privacy-preserving outsourcing computation toolkit is proposed in this paper, which supports several homomorphic computing protocols including division and power on ciphertext of integers and floating point numbers. Specifically, we first implement the homomorphic mutual conversion protocol between integer and floating point ciphertext to balance the efficiency and feasibility, considering the high-precision ciphertext operation on floating point numbers costs 100x computational overhead than that on integers. Second, we implement a homomorphic K-means algorithm based on our proposed toolkit for clustering and design the homomorphic silhouette coefficient as the evaluation index, thereby providing an informative cluster assessment for local users with limited resources. Then, we simulate the protocols of our proposed toolkit to explore the parameter sensitivity in terms of computational efficiency. Last, we report security analysis to prove the security of our toolkit without privacy leakage to unauthorized parties. Comprehensive experiments further demonstrate the efficiency and utility of our toolkit.


I. INTRODUCTION
C LOUD computing is a third party provider delivering on-demand services of data storage, computing resources and running functions without direct management by local users. It exceeds local servers in massive storage, unlimited computational resources and increasing availability. Therefore, individuals and companies with limited resources resort to the cloud services by outsourcing their computational tasks to the cloud. Meanwhile, sensitive information exists in countless domains, such as medical health monitoring [1], [2], financial transaction records and location services [3]. Though powerful and flexible, the misgiving of privacy preservation on the cloud remains as a prior concern, which may be deteriorating the functionality and performance of the cloud services. For example, the personal information of over 77 million of accounts in Sony PlayStation Network outage were stolen due to an illegal and unauthorized intrusion, which damaged the reputation of the enterprise and caused huge economic loss. The breach of photographs of the celebrity from iCloud service in Apple company raised widespread condemnation from the public which forced the company to improve the data protection policy as well as technologies. As a consequence, the security issues exposed in such incidents have aroused an upsurge of research in both industry and academia.

A. MOTIVATIONS
In general, the privacy preservation approaches rely on the following assumptions: the cloud servers are honest-butcurious, and they do not act maliciously against the users and will follow the protocols as expected. However, the cloud server may not always be trusted. On the one hand, they might be vulnerable to the attackers who intend to gain access to the sensitive information from the outsourced data on the cloud. On the other hand, some intelligent cloud servers can be curious and sometimes even malicious who may deduce the information of the original data through the model and intermediate calculation results [4]. Therefore, how to design effective and suitable privacy-preserving protocols to achieve secure outsourcing computation on the cloud becomes an opening challenge.
In recent years, machine learning technology has been widely applied in natural language processing, computer vision, intrusion detection and etc. Consider a clustering task requested by a local user who uploaded the resourcedemanding task to the cloud. The privacy concerns of this user are in threefold. First, the private training data has potential risk of leakage through the deduction from model parameters or predictions. Second, existing machine learning algorithms may involve complex operations such as division, power, exponential and logarithmic, which few secure outsourcing computation protocols are able to support. Previous frameworks [5], [6] can only support homomorphic addition, multiplication and comparison on ciphertext of floating point numbers(FPN) which cannot meet the requirement of clustering. Third, the secure outsourcing computation protocols on integer ciphertext are not capable of ensuring the convergence and precision of these machine learning algorithms. If running the entire task on the ciphertext of floating-point numbers is a must, it will cause huge computational overhead and time cost since the computation overhead on FPN is hundreds of times higher than that on integers. Furthermore, some of the previous outsourcing clustering algorithms [7]- [9] only implemented the clustering algorithm on ciphertext, and they did not consider how to evaluate the performance of clustering on ciphertext. In practice, the data owners are usually not aware of the number of classes that need to be clustered, which is necessary to be compared through multiple experimental evaluations.
Based on the above problems, we summarize the main contributions of this paper in the next section.

B. MAIN CONTRIBUTIONS
In this paper, we propose a multifunctional and privacypreserving outsourcing computation toolkit and we have implemented a secure and evaluable homomorphic clustering algorithm. The main contributions of our paper are summarized as follows: • A multifunctional and privacy-preserving outsourcing computation toolkit (MPOCT) is proposed in this paper which provides several protocols both on ciphertext of integers and floating point numbers(FPNs). Specifically, our protocols achieve homomorphic operations such as division, power, logarithm, etc. • In MPOCT, mutual ciphertext conversion protocols between FPNs and integers are provided to balance calculation efficiency and precision. During outsourcing computations, when encountering high-precision calculation tasks, we can transform the integer ciphertext to FPN ciphertext for calculation. If tasks do not have high requirements for precision, we can use integer ciphertext for calculations to improve calculation efficiency and reduce computational overhead. • Based on MPOCT, we implement the homomorphic algorithm of K-means. Furthermore, the homomorphic silhouette coefficient is designed on the ciphertext of clustering results, providing the informative clustering assessment to the local data owners. The rest of this paper is organized as follows. We discuss the related works in Section II. Then we introduce the preliminaries of this paper in Section III. Our toolkit is described in Section IV and its application on clustering is introduced in Section V. We analyze the security of our toolkit in Section VI.The experimental evaluation is reported in Scetion VII. Section VIII concludes the paper and outlines the future work.

II. RELATED WORK
In this section, we briefly review the most relevant research on homomorphic encryption, secure outsourcing computation frameworks and previous secure outsourcing computation solutions for K-means.
Cloud computing drives the transition of local ondemanding tasks to the cloud servers for storage and calculation [10]. In recent years, privacy-preserving in cloud security has gradually become an increasing important concern [11]. The emergence of homomorphic encryption allows people to perform simple calculations on the ciphertext which is a powerful cryptographic primitive to secure outsourcing computations against an untrusted third-party provider. For example, Paillier cryptosystem [12] which is widely used in industry supports additive homomorphism, while Elgamal [13] supports multiplicative homomorphism. However, partially homomorphic cryptosystem only supports the homomorphic operation of addition or multiplication, so the application scenarios are very limited. Gentry [14] first proposed a fully homomorphic encryption (FHE) based on ideal lattice from a theoretical perspective in 2009. However, FHE cannot be applied in real world due to its large computational overhead. To address with the efficiency problem of FHE, many researchers have proposed more efficient FHE schemes like BGV [15], BFV [16], CKKS [17]. Based on the abovementioned cryptographic schemes, many scholars have also proposed secure outsourcing computation frameworks. In 2016, Liu not only designed operations such as homomorphic multiplication and comparison through dual servers and Paillier cryptosystem, but also proposed secure outsourcing computation frameworks that support rational numbers [6] This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  [5] and floating-point numbers [5]. Liu et al. [18] implemented a secure k-nearest neighbor based multi-label classification scheme based on Paillier cryptosystem, which both guarantees the security of the training data and classification model. Liu et al. [19] proposed a secure outsourcing computation framework based on FHE and key packaging technology, and this framework only required a single server to perform oursourced computing tasks. Li et al. [20] proposed a novel homomorphic encryption framework over non-abelian rings and defined the homomorphism operations in ciphertexts space which support real numbers encryption and privacypreserving for machine learning training and classification in data ciphertexts environment. Zong et al. [21] investigated secure outsourcing computation of matrix determinant based on BGV and proposed an efficient matrix encoding technique which packs a matrix into a single ciphertext and can be easily applied as a submodule in the high-level applications. Nevertheless, above works more or less suffered from low efficiency, accuracy degradation, lack of scalability or ciphertext expansion issues. As a common data mining algorithm, K-means [25] has abundant application scenarios in daily lives. Therefore, many scholars have studied how to perform clustering on cloud servers under the premise of protecting data privacy. Rao et al. [7] implemented the outsourcing computation of K-means based on Paillier in 2015. Then, Rong et al. [9] improved the computing efficiency by adding a trusted third party with limited computing capabilities. Almutairi et al. [8] also implemented the outsourcing computation of K-means based on homomorphic encryption. This solution has greatly improved the computing efficiency through a third party and limited interaction between user and server. However, neither of the above solutions provided evaluation indicators on clustering ciphertext. Therefore, the user can only get VOLUME 4, 2016 the clustering result, but cannot get the evaluation of them. Besides, the emergence of differential privacy [26] opened up new areas of secure outsourcing computation. Xia et al. [22] could efficiently complete outsourcing clustering tasks through local differential privacy technology. However, the choice of privacy budget had a great impact on the results, which means a better privacy may lead to a poorer precision. At the same time, the frequent interaction between user and server also puts a lot of pressure on the user side. Mohassel et al. [23] presented a scalable privacy-preserving clustering algorithm and design a modular approach for multi-party clustering. Although this scheme is computationally efficient, it relies on frequent collaboration and interaction between sender and receiver. Zou et al. [24] proposed a highly secure privacy-preserving outsourced k-means clustering under multiple keys in cloud computing, which applied both AES encryption and BCP encryption to provide privacy preservation against semi-honest adversaries. While the scheme guarantees strong security, it does not provide users with an evaluation metric for clustering. Table 1 summarizes the characteristics of different solutions to the aforementioned studies.

III. PRELIMINARIES
In this section, we will introduce the preliminaries of this paper. Our toolkit is mainly based on privacy-preserving outsourcing computation framework (POCF) [5], and we have an expansion and improvement on it. The following will introduce the overview of our system model, problem statement, the basic framework of POCF, and its protocols on integers and floating-point numbers.

A. SYSTEM MODEL
In our system, we mainly focus on how the cloud server can fulfill the computing tasks requested by the users while meeting the requirements of privacy protection. As shown in Figure 2, the simple system comprises a Key Generation Center(KGC), a Cloud Platform(CP), a Computation Service Provider(CSP) and Data Owners(DOs), while it can be extended to a distributed system with multiple pairs of CP and CSP. In this paper, we only use the simple model. The participants in this model are shown below.
• KGC: KGC is a trusted third party whose task is to distribute and manage the keys in the system. • DOs: DOs own the data and they are the requesers of the computing tasks. DOs can encrypt their data by the public key(pk). Then they outsourced the encrypted data to CP for storage and requester CP to do some calculations on the outsourced encrypted data. In addition, DOs can use the private key(sk) to decrypt the encrypted result calculated by CP and CSP. • CP: CP has 'unlimited' data storage space. CP can store and manage the outsourced data of all registered data owners. CP has the public key(pk) and the partially private key sk 1 . CP always stores the encrypted results of the computational tasks.
• CSP: CSP assists CP in completing computing tasks requested by the DOs. CSP owns the public key(pk) and the partially private key sk 2 . CSP can partially decrypt the ciphertext sent by CP and perform some operations on the decrypted results. Then CSP encrypts the calculated results and returns them to CP.

B. PROBLEM STATEMENT
Consider DOs have a lot of data and a clustering task, and their limited local resources make it impossible to complete the clustering task. The data therefore needs to be encrypted before being uploaded to the CP for storage. DOs can submit the input parameters required for the clustering task to CP. After CP and CSP cooperate to complete the calculation task, the clustering result and its evaluation are returned to DOs. In this process, we have the following challenges: • Clustering algorithms and evaluation indicators involve division and comparison operations, but many homomorphic encryption schemes cannot support division homomorphism and comparison homomorphism. Therefore, more calculation protocols need to be constructed. • Traditional K-means outsourcing computation schemes based on homomorphic encryption always normalize the original data and enlarge it. Then they encrypt the data after rounding. However, simply calculating the magnification of the plaintext is likely to exceed the limit of the plaintext space after a finite number of multiplications. • Through previous experiments, we have learned that the calculation efficiency of floating-point number ciphertext is low and the communication overhead is high. However, in the process of clustering and evaluation, we may encounter situations where floating-point numbers must be used. If the whole process is based on floatingpoint ciphertext, it will take a lot of time to complete the clustering task.

C. ATTACK MODEL
In our attack model, DOs, CP and CSP are honest and curious. They will strictly implement the protocols, but they are also interested in data uploaded. We assume that CP and CSP are cooperative and not colluding. Meanwhile we assume that the communication between KGC and all parties are secure.
Here, we introduce the adversary A * in our model. The purpose of adversary A * is to use the following capabilities to decrypt the ciphertext of challenger DO: 1) Adversary A * may eavesdrop on all communication channels to obtain encrypted data transmissed. 2) Adversary A * may compromise CP, thereby guessing the plaintext corresponding to encrypted data outsourced from DO. In addition, it can perform interactive protocols with CSP to guess the plaintext corresponding to ciphertext sent from CSP. 3) Adversary A * may compromise CSP. Therefore, it can perform interactive protocols with CP to guess the plaintext corresponding to ciphertext sent from CP. 4) Adversary A * may compromise DOs to obtain their decryption capabilities, with the exception of the challenger DO. Then adversary A * may try to guess all the plaintext belonging to challenger DO through the decryption capabilities of other DOs.
However, Adversary A * cannot compromise CP and CSP at the same time. Moreover, adversary A * is restricted from compromising challenger DO. The above restrictions are common in adversary models in cryptographic protocols [27].

D. BASIC FRAMEWORK
The toolkit proposed in this paper is based on the Paillier Cryptosystem with Partial Decryption (PCPD) [5]. The basic functions and key splitting algorithm (KeyS) of Paillier have been described in PCPD, the description is omitted here. In this paper, we specify the following symbols: represents the ciphertext of integer x and ⟨x⟩ represents the ciphertext of floating-point number (FPN . s x represents the sign of x, m x represents η-significant digits, t x represents the exponent of base β. For the sake of convenience, we will use η = 16 and β = 10 for protocols on FPNs in this paper. Given [x], [y] and public key N , we have the following properties:

E. PREVIOUS WORK ON INTEGERS & FLOATING-POINT NUMBERS
Before presenting our toolkit, we briefly review the previous study [5] of secure outsourcing computational protocols on integers and floating-point numbers respectively. Besides, the construction of floating point number ciphertext is also introduced. Note that our toolkit is an extension based on this previous work. For simplicity, here we define '@' as 'performed by'. For example, '@CP' means 'performed by CP'.

1) Protocols on Integers
Protocols on integers enable the cloud platform CP to cooperate with the computation service provider CSP to implement multiplication, comparison, xor and other operations of integer plaintext on the ciphertext.

A. Revised Secure Multiplication Protocol (RSM):
It enables CP and CSP to cooperate to perform secure integer plaintext multiplication on ciphertext, i.e.
Details are as follows: • Step-1 (@CP): CP randomly selects two random numbers r x , r y ∈ Z * N and compute: After computation, CP will send X, Y, X 1 and Y 1 to CSP. VOLUME 4, 2016 • Step-2 (@CSP): CSP partially decrypts X 1 and Y 1 transmitted from CP based on CSP's owned partially private key sk 2 , namely: Then, CSP encrypts h based on public key pk to get ciphertext H = [h] and sends H to CP. • Step-3 (@CP): When CP receives H, CP first calculates: Details are as follows: Then, CP randomly tosses a coin s ∈ 0, 1 and generates a random number r, where r satisfies r ̸ = 0 and ∥r∥ ≤ ∥N ∥ 4 . If s = 1, then CP computes: After that, CP uses its partially private key sk 1 to compute K = P D sk1 ([l]), and then send it to CSP. • Step-2 (@CSP): CSP decrypts K with its partially private key sk 2 to get l. If ∥l∥ ≤ ∥N ∥ 2 , CSP sets u to 1; otherwise, set u to 0. Next, CSP encrypts u with the public key pk to get [u], and sends [u]to CP.

C. Secure Exclusive OR Calculation Protocol(SXOR):
It performs secure exclusive or result of two integer plaintext on ciphertext, i.e.
CP and CSP jointly calculate as follows: [

E. Secure Exponent Calculation Protocol(SEXP):
It performs secure integer plaintext exponent on ciphertext, i.e. [f ] = SEXP(x, [y]) where x is a public integer, x > 0, y ≥ 0 and f = x y . Details are as follows: • Step-1 (@CP): CP selects a random number r ∈ Z * N and computes: Then, CP uses its own partially private key sk 1 to compute Y = P D sk1 ([y 1 ]), and sends [y 1 ] and Y to CSP. • Step-2 (@CSP): CSP decrypts Y using its partially private key sk 2 to obtain y 1 . Then, CSP computes h = x y1 mod N and encrypts it into [h] using the public key pk, and sends [h] to CP. Obviously, there exists h = x y1+r mod N .
Details are as follows: Then, CP computes Y = P D sk1 ([y 1 ]) using its partially private key sk 1 and sends [y 1 ] and Y to CSP. • Step-2 (@CSP): CSP decrypts Y using its partially private key sk 2 to obtain y 1 . Then, CSP computes h = y 1 mod x and encrypts it as [h] using the public key pk. Then it sends [h] to CP. Obviously, there exists h = (y + r) mod x. • Step-3 (@CP): When CP receives [h], it will compute [y mod x] by the following process: Step-4 (@the collaboration of CP and CSP): Through the above formula, we can get −x ≤ U ≤ x. But ((y + r) mod x)−(r mod x) and y mod x are not always equal. Therefore, the following two cases are discussed.
So it will only need to compare the size of U and 0: by the following process: In computer, floating-point number is expressed as follows.
Given base β and extreme value index e min , e max , there exists at least one triple (s, m, t), so that x = (−1) s · m · β t . In order to facilitate the storage of floating-point ciphertext and the construction of secure outsourcing computation protocols, it is stipulated that all floating-point numbers in the system should use the same significant digit η, base β, sign bit s ∈ {0, 1}, the significant digit m is a ηbit integer, and the exponential bit t satisfies (e min −η+1) ≤ t ≤ (e max −η+1). For example, when η = 5, β = 10 are fixed in the system, −31 can be expressed as (−1) 1 · 31000 · 10 −3 , i.e. begin stored as (1, 31000, −3). 1.024 can be expressed as (−1) 0 · 10240·10 −4 , i.e. begin stored as (0, 10240, −4). Through the above example, it can be found that this triple expression is uniquely determined when a finite floating-point number is given. When the integer triplet (s x , m x , t x ) representing the floating point number x is uploaded to the cloud platform, it is necessary to separately encrypt s x , m x , t x . On the cloud platform, the ciphertext storage form of floating point number

3) Protocols on Floating-Point Numbers
Based on the above two parts(i.e. protocols on integers and construcion of FPN ciphertext), we now introduce the protocols on floating-point numbers [5] which drives the collaboration of CP and CSP to implement addition, multiplication, comparison and other operations of floating point number plaintext on ciphertext.

IV. OUR TOOLKIT
In this section, we will introduce the protocols in our toolkit concretely. To make the relationship and interdependence of the protocols clearer, a call graph is presented in Fig 1. Here, our implemented protocols are colored in blue, while the previous protocols in [5] are colored in grey. Similar to the secure outsourcing computation protocols proposed in POCF [5], our extended protocols can handle outliers(NaN) in the same way. For the sake of simplicity, we only describe the process that excludes these outliers. Below we will describe the details of these secure computation protocols.
Based on the above idea, the overall steps of SDIV protocol are shown as follows: Step-1: First, CP and CSP jointly determine whether x, y ≥ 0, Step-3: Through the above steps, CP and CSP can jointly compute 2) Secure Integer Ciphertext Divided by Integer Plaintext Protocol(SCDP) Given one encrypted number [x] and a public integer y(y ̸ = 0), the goal of SCDP protocol is to calculate the result [f ], s.t. f = ⌊x · y −1 ⌋. The idea of SCDP is as follows: based on the definition of ⌊x · y −1 ⌋, we have: And, x − (x mod y) can be divided by y, so that we can compute ciphertext of y −1 through Secure Inverse Calculation Protocol(SINV), and then compute the result of [f ] through Revised Secure Multiplication Protocol (RSM).
The overall steps of SCDP protocol is shown as follows: The idea of SPOW is as follows: since y is a positive integer, we can calculate the power within O(log 2 y) time complexity through Exponentiation by Squaring algorithm. Compared with cumulative multiplication, it can largely reduce the number of calls to RSM, so as to improve the computation efficiency and reduce the communication overhead.
The overall process of SPOW protocol is shown as follows: Step-1:(@CP) Let B = (b n · · · b 1 ) is the binary string of y, where the bit length of y is n y = ⌊log 2 y⌋ + 1. Step

5) Secure Integer Logarithm Protocol(SLOG)
Given one encrypted number [x] and a public integer y, where y ≥ 2 and 0 < x < y δ . The goal of SLOG protocol is to calculate the result [f ], s.t. f = ⌊log y x⌋. The overall process of SLOG protocol is shown as follows: Then, we have [⌊log y x⌋] = SLOG([x], y).

6) Secure Integer Array Maximum(SIAX)
Given an integer ciphertext array The overall process of SFPO protocol can be executed only by CP: Then, we have ⟨−x⟩ = SFPO(⟨x⟩).

2) Secure Floating Point Number Subtraction(SFPU)
Given two encrypted FPNs ⟨x⟩ and ⟨y⟩, the goal of SFPU protocol is to calculate the result ⟨f ⟩, s.t. f = x − y. Since subtraction is the inverse of addition, the overall process of SFPU protocol is shown as follows: Then m f is the maximal m that satisfies the above inequality. Next, we will discuss this problem in two cases according to the value of m Then we confirm the value of t f , we have: Based on the above analysis, the overall steps of SFPR protocol are shown as follows: Step-1: CP and CSP first jointly handle the special cases. We determine whether ⟨x⟩ is equal to ⟨0⟩, if ⟨x⟩ is equal to ⟨0⟩, then the result is ⟨0⟩. This can be achieved by the following calculations: [Z] = FPEQ(⟨x⟩, ⟨0⟩); Step-2: Then CP and CSP can jointly evaluate the significance of f . Initialize  Step-3: Finally, CP and CSP can jointly evaluate the exponents of f by combining the two cases in the analysis. Then we calculate: Then, CP and CSP can jointly compute ⟨x −1 ⟩ = SFPR(⟨x⟩).

C. CIPHERTEXT MUTUAL CONVERSION PROTOCOLS 1) Secure Integer to Floating Point Number(SITF)
Given one encrypted number [x] where (|x| < 10 2η ), the goal of SITF protocol is to calculate the result ⟨x⟩. First, we need to determine the sign of x, which is easy to achieve. Then the question can be discussed in two cases: Case I: If |x| ≥ β η , then t x > 0. We should keep the first η significant digits of x as m x .
Case II: If |x| < β η , then t x ≤ 0. We should expand x to η digits.
Based on the above idea, the overall steps of SITF protocol are shown as follows: Step-1: First, CP and CSP jointly determine whether x is equal to 0, this can be achieved by the following calculations: Step-2: Then, CP and CSP jointly determine whether x is greater than 0. [ Step-3: CP and CSP jointly determine whether |x| is equal or greater than β η . [ Step-4: Then, CP and CSP jointly solve the first case that |x| ≥ β η . [ Step-5: After that, CP and CSP jointly solve the second case that 0 < |x| < β η . [ Step-6: Then CP and CSP jointly calculate: Step-7: Finally, CP and CSP jointly calculate the final result: Then, CP and CSP can jointly compute ⟨x⟩ = SITF([x]).

2) Secure Floating Point Number to Integer(SFTI)
Given one encrypted number ⟨x⟩, the goal of SFTI protocol is to calculate the result [f ], s.t. f = ⌊x⌋. For a floating point number x, we can easily obtain the sign of x by s x . Then the question can be discussed in three cases. Case Case III: if −η < t x < 0, we just keep the first η + t x significant digits as |x|.
Based on the above analysis, the overall steps of SFTI protocol are shown as follows: Step-1: First, CP and CSP jointly determine whether m x is equal to 0, this can be achieved by the following calculations: Step-2: Then, CP and CSP jointly determine whether s x is equal to 0, this can be achieved by the following calculations: Step-3: CP and CSP jointly divide t x into three conditions. First, we determine whether t x is equal or greater than 0. Let Then, we determine whether t x is less than or equal to −η. Let Step-4: Now, CP and CSP jointly solve the first condition that t x ≥ 0. [ Step-5: Next, CP and CSP jointly solve the second condition that t x ≤ −η. We have [x 2 ] = [0].
Step-6: Finally, CP and CSP jointly solve the third condition that −η < t x < 0. Step-7: CP and CSP jointly combine the above cases.
Step-8: CP and CSP jointly calculate the final result.

V. SECURE AND EVALUABLE CLUSTERING
In this section, we first introduce Square Euclidean Distance, which will be used as a measure of the distance between samples. Next, we give the implementation of the homomorphic algorithm of K-means. To evaluate the performance of the clustering results on ciphertext, we design and implement the homomorphic silhouette coefficient.

A. SECURE SQUARE EUCLIDEAN DISTANCE(SSED)
Given two arrays of the same length, x = {x 0 , · · · , x n−1 } and y = {y 0 , · · · , y n−1 }, then the square euclidean distance SED(x, y) = The algorithm partitions the samples into K disjoint subsets S = {S 1 , S 2 , ..., S K } so as to minimize the within-cluster sum of squares. Here, we will introduce Kmeans algorithm module by module. Then, we will introduce the implementation of secure K-means clustering based on our toolkit MPOCT.

1) Secure Sample Partition
For each sample x i (0 ≤ i < m) in the dataset, the K-means algorithm calculates the distance D i,k from each sample x i to each cluster center C k , and assigns the sample x i to the cluster to which the nearest cluster center belongs. Suppose C i is the index of the cluster to which sample x i belongs.
The implementation of the above partition of samples on ciphertext is shown in Algorithm 2. For each partitioned cluster set S k , K-means algorithm will recalculate its cluster center C k = (C 0 k , C 1 k , · · · , C n−1 k ) where the i-th center of the j-th feature are computed as follows: The implementation of the above cluster updating approach on ciphertext is shown in Algorithm 3.
for l = 0 to n − 1 do 7: for l = 0 to n − 1 do 3) Secure K-means Based on the above two modules, for a given number of iterations E, the implementation of K-means clustering algorithm on ciphertext are shown in Algorithm 4.

C. EVALUABLE CLUSTERING ON CIPHERTEXT
Since the previous schemes did not provide an evaluation index on the ciphertext of the clustering result, DOs can only obtain the clustering results returned by the server. However, it is impossible to evaluate the clustering results due to DOs' limited local resources. Here, we choose silhouette coefficient [28] as the evaluation index of clustering. Here we describe the silhouette coefficient module by module and give the corresponding implementation on ciphertext based on our tookit MPOCT.

1) Secure Distance Calculation between Samples
First, the silhouette coefficient needs to calculate the distance between two samples d i,j . This is easy to implement on ciphertext, which is shown in algorithm 5. 2) Secure Distance Calculation between Sample and Cluster Next, the silhouette coefficient needs to calculate the distance between each sample x i and each cluster S k , which is defined as follows: Let N k is the number of samples contained in each cluster, E i,k ∈ {0, 1} is the relationship between sample x i and cluster c k , T is the product of all N k , i.e. T = K−1 i=0 N k . The implementation of the above process on the ciphertext is shown in Algorithm 6.

3) Secure Cohesion Calculation
Then, we come to the calculation of cohesion. For sample x i , its cohesion A i is defined as follows: where S ci is the cluster to which x i belongs and |S ci | is number of samples in S ci . The implementation process of the above formula on the ciphertext is shown in Algorithm 7.

4) Secure Separation Calculation
Then we come to the calculation of separation b i . For sample x i , separation b i is defined as follows: The implementation process of the above formula on the ciphertext is shown in Algorithm 8.

5) Secure Silhouette Coefficient
Silhouette coefficient combines cohesion and separation. Based on the above modules, we can calculate the silhouette coefficient of x i and the overall silhouette coefficient sc, i.e. Let Thus we can implement the silhouette coefficient on the ciphertext based on our toolkit MPOCT, which is shown in Algorithm 9.

VI. SECURITY ANALYSIS
In this section, we first analyze the security of our outsourcing computation protocols. Then, we demonstrate the security of our toolkit MPOCT.

A. THE SECURITY OF PROTOCOLS
Before we explain the security of our protocols, we first give the definition of the semantic security of a public key encryption system that supports partial decryption. As the semantic security of PCPD has been proven in [5], we will use it to demonstrate that our outsourcing computation protocols are secure.
Definition 1 (Semantic Security). Let E = (KeyGen, Enc, Dec) be a public-key encryption scheme supporting partial decryption. We say that E is semantically secure if for any polynomial-time adversary A it has a negligible advantage (in the security parameter) in the following experiment (between the challenger and the adversary): 1) The challenger runs KeyGen(1 k ) to obtain a public and private key pair (pk, sk) and splits the private key sk into two parts sk 1 and sk 2 . Then the challenger sends the public key pk as well as one part of the secret key, e.g. sk1 to the adversary A.
2) The adversary A chooses two equal-length messages m 0 and m 1 . Then the adversary A sends them to the challenger. The adversary's advantage in the above experiment is defined as Here, we discuss the security model for securely implementing an ideal functionality in the presence of noncolluding semi-honest adversaries. For simplicity, we operate according to the specific scenario of our functionality, which involves three parties, challenger DO (i.e. D 0 ), CP (i.e. S 1 and CSP i.e. S 2 ). Therefore, we need to construct three simulators Sim = Sim D0 , Sim S1 , Sim S2 to against three kinds of adversaries (A D0 , A S1 , A S2 ) that respectively corrupt D 0 , S 1 and S 2 .
Theorem 2. The SCDP protocol described in Section IV is to securely calculate the ciphertext result of division on the premise that the ciphertext of the dividend and the plaintext VOLUME 4, 2016 of the divisor are known in the presence of semi-honest (noncolluding) adversaries A = (A D0 , A S1 , A S2 ).
Proof. We now demonstrate how to construct three independent simulators Sim D0 , Sim S1 , Sim S2 . Sim D0 receives x and y as input and simulates A = A D0 as follows: it generates a public integer y and ciphertext [x] = Enc(x) of x. Finally, it returns [x] and y to A D0 and outputs A D0 's entire view.
The view of A D0 consists of the encrypted data. And the views of A D0 in the real and ideal executions are indistinguishable due to the semantic security of PCPD.
Sim S2 is analogous to Sim S1 .
The security proofs of SDIV, SIMF, SPOW, SLOG are similar to that of the SCDP under the semi-honest (noncolluding) adversaries A = (A D0 , A S1 , A S2 ). For the encrypted floating point number calculations (include SFPR, SFPD, SFPU, SFPO, SFTI and SITF), the security relies on the basic encrypted integer calculation (the prove method is similar to that of SCDP), which has been proven. Due to the semantic security of PCPD, it is secure for all calculations to be performed in the ciphertext domain.

B. THE SECURITY OF MPOCT
Here, we give an analysis to show that our toolkit MPOCT can resist the system attackers defined in Section 2. The specific analysis is as follows: • If adversary A * eavesdrops on the transmission between DO and CP, then A * can obtain the original encrypted data and the final result. In addition, adversary A * can obtain the encrypted result transmitted between CP and CSP through eavesdropping. However, these data are encrypted during transmission. Based on the semantic security of the PCPD cryptosystem, adversary A * will not be able to decrypt the ciphertext without knowing the private key of DO. Since the public key and private key in the system are distributed securely to each participant by KGC, our system model will not be affected by Man-in-the-middle attack. • Suppose adversary A * has compromised the CP (or CSP) to obtain the challenge RU's partially private key.
As the private key is randomly split by executing KeyS algorithm of PCPD, adversary A * is unable to recover the private key of the challenger DO to decrypt the ciphertext. In addition, adversary A * cannot obtain useful information even if the CSP is compromised, because our protocols use the known technique of "blinding" the plaintext [29]: given the ciphertext of the message, we use the additive homomorphism of the PCPD cryptosystem to add random messages to it. Therefore, the original plaintext becomes blinded. • If adversary A * has a private key belonging to another DO (i.e. which is not the private key of challenger DO). Since the private keys of different DOs in our system are irrelevant, adversary A * is still unable to decrypt the ciphertext of challenger DO.

VII. EXPERIMENTAL EVALUATION
In this section, we first evaluate the performance of protocols in MPOCT. Then we analyze the performance of SKM and SSC.

A. EXPERIMENTS OF MPOCT
First, we let N be 1024 bits to achieve 80-bit security [30]. The computation cost and communication overhead of the proposed MPOCT are evaluated using a Python program, and the experiments are performed on a single server with 2.3GHz one-core processor and 8GB RAM memory. The performances of protocols for both integers and FPNs in MPOCT are respectively shown in Table 2 and Table 3, while some protocols are evaluated under specific parameters: SPOW(y = 31), SLOG(δ = 20), SDIV(β 0 = 3, η 0 = 32), SITF(η max = 32). We can find that the protocols of integer ciphertext is faster in computing time and lower in communication overhead than protocols of FPNs. Then, we discuss the factors that affect the performance of these protocols. From Figure 3(a)-(h), we can easily find that both the running time and communication overhead of the protocols increase with N . This is because the running time required for basic operations (modular multiplication and exponential) increases while N increases. Meanwhile, more bits need to be transmitted. In addition, the running time of some protocols will also change according to specific parameters, which is shown in Figure 3(i)-(l). We find that SDIV has the highest computational efficiency, when β 0 is equal to 3. In addition, the calculation efficiency of SLOG changes linearly with δ and the calculation efficiency of SITF also changes linearly with η max . Since SPOW is based on the idea of Binary Exponentiation, the calculation efficiency only changes logarithmically with y.

B. EXPERIMENTS OF SKM AND SSC 1) Computational Efficiency Analysis
According to the algorithm process of SKM and SSC, we found that the main factors affecting the computational efficiency are the bit size of the public key |N |, the number of clusters k, the number of samples m and the number of features n. From Figure 4(a)(b)(c), we find that the calculation time of SKM is linearly related to k, m and n. From Figure 4(d)(e)(f), we find that the calculation time of SSC is non-linearly related to m, but it is linearly related to k and n. Since SSC needs to calculate the distance between paired samples, the calculation time of SSC is related to m 2 . Therefore, the calculation time of SSC will increase significantly when the number of samples m increases. In addition, we have also compared with the previous scheme PPODC [7] and PPCOM [9]. We fix the bit size N of public key in our Paillier cryptosystem to be 1024. When m = 10000, n = 10, k = 4, the running time of PPODC on Python is 13360 minutes per iteration and 1856 minutes on PPCOM, while our solution SKM only needs 577 minutes, which is more than 20 times faster than PPODC and 3 times faster than PPCOM.

2) Performance Analysis
To demonstrate the effectiveness of SKM and SSC, we conduct extensive experiments over three public datasets. The dataset description is shown in Table 4. First, we compare the performance of three different Kmeans algorithms, i.e. SKM, K-means on plaintext, K-means using differential privacy(DP-K-means [22]). Parameter configuration: the number of iterations E is set as 10. The   privacy-preserving budget of DP-K-means ϵ is set as 1. Since the datasets have provided label information, we conduct our experiments in two aspects: with and without labels. Table 5 shows the classification accuracy of three methods with labels. We can find that the K-means on plaintext outperforms the others with the highest accuracy. Our scheme SKM achieves the second place, and it is competitive with the performance of K-means on plaintext. This is because SKM requires a given fixed number of iterations E and thus may not fully converge. To ensure convergence, it is only necessary to set a larger number of iterations. However, the computational cost increases linearly as the number of iterations E increases. Therefore, in order to balance the computational efficiency and the usability of the algorithm, we choose a small number of iterations E for experiments. DP-K-means performs relatively inferior due to the added noise in data.
Since ground truth labels are hard to obtain in a real application scenario, we use silhouette coefficient for evaluation. As an effective index to evaluate the performance of clustering, the silhouette coefficient ranges from −1 to 1. The closer the it is to 1, the better the clustering performance is; on the contrary, the closer the silhouette coefficient is to −1, the worse the clustering performance is. Table 6 shows the silhouette coefficient of three methods. The situation is similar to those with labels. K-means on plaintext surpasses the others with the highest silhouette coefficient. The performance of our scheme SKM is competitive with that on Kmeans. DP-K-means, however, performs relatively inferior for the same reason above.
Last, we validate the effectiveness of secure silhouette coefficient. Based on the clustering results obtained by SKM, we evaluate the performance of silhouette coefficient on the decrypted plaintext(SC) and secure silhouette coefficient on the ciphertext(SSC). We can observe from Table 7 that the proposed SSC in this paper can ensure the same availability as that on the plaintext.

VIII. CONCLUSION AND FUTURE WORK
In this paper, a novel multifunctional and privacy-preserving outsourcing computation tookit (MPOCT) is proposed to support several homomorphic computing protocols including division and power on ciphertext of integers and floating point numbers. Concretely, we first extend several homomorphic operations on the ciphertext of floating-point numbers based on the previous framework POCF. However, due to the low efficiency and high communication overhead on the ciphertext of floating-point numbers, we further extend the secure outsourcing computation protocols on the ciphertext of integers. After that, homomorphic mutual conversion protocols between integer and floating-point ciphertext are proposed to balance the efficiency and feasibility of computation. Next, we implement a homomorphic K-means algorithm based on MPOCT for clustering and design the homomorphic silhouette coefficient as the evaluation index, providing an informative cluster assessment for local users with limited resources. Comprehensive experimental results and security analysis have proved the proposed MPOCT can achieve efficiency and utility without privacy leakage to unauthorized parties. In the future, MPOCT is expected to resort to homomorphic neural network modules such as homomorphic convolution and homomorphic pooling. In addition, MPOCT can only be applied to the algorithms with certain termination conditions(i.e. the homomorphism of the while loop cannot be implemented) since the servers cannot be aware of the comparison results of ciphertext. An improved scheme of solving this problem is to be explored in the future. JIALIN LI received her bachelor's degree in Software Engineering from Jilin University in 2019. She is currently pursuing master's degree in Software Engineering from East China Normal University. Her research interests include semantic segmentation, graph data processing and privacypreserving related topics.
PENGHAO LU received his bachelor's degree in Science from East China University of Science and Technology. He is currently pursuing master's degree in Software Engineering from East China Normal University, Shanghai, China. His research interests mainly include machine learning and privacy-preserving.

XUEMIN LIN is a UNSW distinguished
Professor-Scientia Professor, and the head of database and knowledge research group in the school of computer science and engineering at UNSW. He is a concurrent professor in the School of Software, East China Normal University. He is also a distinguished visiting Professor at Tsinghua University and visiting Chair Professor at Fudan University. He is a fellow of IEEE. Xuemin Lin is working in the areas of scalable processing and mining of large-scale data, including graph, spatial-temporal, streaming, text and uncertain data. Xuemin Lin currently serves as the editor-in-Chief of