Secure Data De-Duplication Based on Threshold Blind Signature and Bloom Filter in Internet of Things

Within the cloud environment, the availability of storage, as well as bandwidth, can be effectively preserved in virtue of data de-duplication. However, refraining redundancy from additional storage or communication is not trivial due to security concerns. Though intensive researches have been addressed on a convergent cryptosystem for secure data de-duplication, the conflicts amongst functionality, confidentiality, and authority remain unbalanced. More concretely, although data are obfuscated under convergent encryption, a violent dictionary attack is still efficacious since the whole pseudorandom process relies heavily on plaintexts. As for data ownership, the download privilege, which depends on hash value, may also be infringed due to the same reason. To dispose of these problems, we presented a conspiracy-free data de-duplication protocol based on a threshold blind signature in this article. With the help of multiple key servers, the outsourced file and de-duplication label will be computationally indistinguishable from random strings. We used the Boom filter as a tool to implement a proof of ownership, ensuring that the ownership claims made by users are real. It effectively prevents the attacker from using the stolen tag to get the whole file to gain file access without authorization. The most significant innovation of this article is to use homomorphism computation to aggregate and generate partial signature tags, and to introduce a secret sharing mechanism based on The Chinese Remainder Theorem to hide signature keys, thus balancing the security concerns of cloud and client. Compared with existing schemes, both communication and computation performances are preferable in our protocol. As far as we know, our scheme is the only data de-duplication scheme that satisfies the semantic security of ciphertext and label.


I. INTRODUCTION
With the development of cloud computing and big data, more and more enterprises and clients choose to outsource their files to the cloud for convenient storage and management, making the occupancy of cloud disk exponentially growth. According to the ''White Paper on Forecasting and Methods 2015-2020'' released by Cisco Global Cloud Index [1], the number of users registering on personal cloud storage The associate editor coordinating the review of this manuscript and approving it for publication was Weizhi Meng . service will increase from 1.3 billion in 2015 to 2.3 billion in 2020, with global data rising to 40ZB. Encountered with such a massive amount of data, how to leverage the economy and efficiency of cloud sources has become an inevitable challenge for cloud service providers.With the popularization of the Internet of Things, the privacy data exposure problem in the Internet of Things has gradually been exposed.There are a lot of duplicate privacy data in the Internet of Things. If these duplicate data can be reasonably removed, the risk of privacy leakage can be reduced. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ To improve storage availability and reduce management costs, cloud service providers prone to draw support from de-duplication technology, using randomly sampled strings or hash values as labels to avoid redundantly uploading identical data. According to the phase when de-duplication is carried out, relevant techniques can be divided into two categories. (1) server-side de-duplication: Users trivially upload their data to the cloud, and the server will be responsible for detecting and striking out reduplicated data on its own. Though such an idea preserved the functionality of deduplication, massive bandwidth will be consumed to transmit unnecessary data. (2) Client-aided de-duplication: Before uploading, the client calculates the hash function of data for cloud retrieval. Once received the hash value, cloud server checks if the same label exists within its local storage system. If so, the server will instruct to cancel the actual upload process and mark the client as an owner of the corresponding data. Since hash values are always abbreviated, this method could significantly cut down both storage space and transmission overhead.
Concerning security issues of client-aided data deduplication, cloud servers are supposed to be honest, but also curious, which may interest in the privacy outsourced by users [2], [3]. Therefore, clients are apt to upload their data in the form of ciphertexts. Nevertheless, since distinct users spontaneously choose the keys, the same files will be encrypted as different ciphertexts, which is not feasible for redundancy detection.
Aiming at the two problems of privacy disclosure and unauthorized access in cloud data de-duplication scheme based on convergence encryption, this article proposes a ciphertext data de-duplication scheme combining blind signature and door threshold secret sharing. By introducing a series of secondary key servers, not only can the encrypted file be protected from dictionary attacks, but it also can tolerate partial destruction of cloud nodes. Since the basic primitive we use is the combination of the Chinese remainder theorem and discrete logarithm problem, unconditional security is realized in the proof to solve the problem of privacy disclosure. To ensure the authenticity of the ownership claimed by customers, we use The Bronc filter to realize the document ownership certificate, which effectively prevents the attacker from using the stolen label to obtain the complete document, thus obtaining the file access without authorization. The simulation results show that the scheme has a high efficiency in computing and communication overhead. The most significant innovation of this article is to use homomorphism computation to aggregate and generate partial signature tags, and to introduce a secret sharing mechanism based on The Chinese remainder theorem to hide signature keys, thus balancing the security concerns of cloud and client. After formal security proof, our scheme will be compared with some recent methods [4], [5], showing that fewer costs regarding uploading and de-duplication are required.

II. RELATED WORKS
The concept of convergent encryption (CE) was firstly proposed by Douceur et al. [6] in 2002, in which the key generation algorithm is set as deterministic to ensure that the same data produces the same key, taking their hash outputs for example. Then, over the next decade or so, many researchers studied CE.This is shown in Table 1. Among them, Li's scheme [7] has achieved remarkable results. Directing at clear security objectives with a formal description model, Li et al. [7] combined the CE algorithm with convergence diffusion mechanism, proposed a scheme of cd store to achieve data security and de-duplications. And their experiments show that the proposed scheme saves nearly 70 % of the storage cost. But it is worth mentioning that both the ciphertext and de-duplication label of the schemes discussed above is not semantically secure.
As for the authority of file downloading, PoW is commonly used to ensure the user's data ownership, which is mainly established on Merkle hash tree (MHT), random sampling, or generalized hash function. In 2011, Halevi et al. [11] proposed the concept of PoW for the first time. The core idea behind it is to compress the file on the client-side and build a Markov tree according to it. Thus, when authentication is needed, the server may launch a challenge, and the client is responsible for returning a series of paths from the root to designated leaves. Only if all of the ways are valid, the server will confirm that the user owns the file and direct him to its storage. Then, many researchers subsequently conducted extensive studies on PoW. This is shown in Table 2.
It's worth noting that Blasco et al. [41] then proposed a flexible, scalable, and provably secure data de-duplication scheme based on Bloom filter. Briefly speaking, the server divides the received file into blocks of the same size, calculates their corresponding tokens through a pseudo-random function (PRF), and inserts the symbols into a Bloom filter, which will be stored as a triple data structure. During the challenge phase, the server barely asks the client to upload a certain number of tokens to validate his ownership.
In 2016, Xiong et al. [42] made a general survey on the curriculum development of cloud security data de-duplication, analyzed and compared different schemes, and pointed out the problems of various schemes.

III. PRELIMINARIES
In this section, we present some building blocks that underpin our final scheme.

A. CONVERGENT ENCRYPTION
For the sake of secure de-duplication in the cloud environment, a convergent key, together with a label, is generally originated from the file to be uploaded. To preserve privacy, the user exploits the convergent key for data encryption. When a file is supposed to be outsourced, the user submits its corresponding label in advance, which can be used to check the presence of duplication on the server-side.
It was evident that, once a duplication is found, real uploading would be unnecessary, and only a link should be sent back to the user. During this process, two tough but significant problems must be addressed. One is that the concurrent key and label must be statistically independent since the server may be interested in deducing the key from the label. The other is how to make the same key apparent to all data owners for correct decryption.

B. DISCRETE LOGARITHM PROBLEM
Based on a cyclic group G of order q, the computational discrete logarithm problem (CDH) can be described as, for any a ∈ Z * q , it is difficult to recover a form g a ∈ G, even if the generator g is given. Another version of this problem is named decisional discrete logarithm problem (DDH), which aims to distinguish g ab ∈ G from a randomly sampled group element g c ∈ G in case that both g a ∈ G and g b ∈ G are present. DDH assumption is more robust than the CDH hypothesis and can thus be deduced from the latter.

C. BLIND SIGNATURE
Blind Signature was firstly proposed by Blasco et al. [41] in 1982, requiring that the signer must be ignorant of any information about what he signed. The reason that we draw support from a blind signature for duplication inspection is to conceal the relevance between the file and its hash value. Thus violent dictionary attacks can be avoided.

D. CTR BASED THRESHOLD SECRET SHARING
Considering that part of key servers may collude with the cloud to violate user's privacy, we resort to CTR (Chinese Remainder Theorem) threshold secret sharing scheme to introduce an auxiliary property that signing can be executed only if more than t − 1 key servers are available.
For n key servers represented as KS = {KS 1 , · · · , KS n }, the (t, n)-threshold secret sharing scheme is realized as below.

1) SECRET SHARING
The distribution center (DC) selects a prime q which is larger than any secret SK, an integer A and a sequence d = as a share of secret SK to corresponding key server KS i .

2) SECRET RESTORATION
Denoting KS = {KS 1 , KS 2 , · · · , KS t } ⊆ KS as a subset of key servers who are willing to reveal the secret SK .For clarity, we denote |KS | as t in the rest of this article. Since all shares (L j , d j ) conserved by these participants comply with a congruence equation that Once at least t secret pieces are available, it is trivial to restore the shared value by computing where D = t j=1 d j and D d j e j ≡ 1 mod d j It is worth noting that the secret SK should not be exposed to any participant in our scheme; thus, each server is supposed to sign the message privately via its secret share L j as a preconditioning step.

E. BLOOM FILTER
The Bloom filter is employed as PoW to authenticate user's ownership about a specific file. Under random oracle model, the filter is a 2 l -dimensional vector BF ∈ {0, 1} 2 l relating to a hash function assemble .Given a binary string m, the Bloom filter can be constructed from a zero vector by setting all its H k (m)th elements to one. When checking whether a client possesses an alleged file, the cloud can request his response for those hash values. According to the structure of BF, if at least one position-specific to the user's reply is not correct, the authentication will be taken as invalid.

IV. PROBLEM DESCRIPTION
The main objective of our scheme is to refrain from the client from uploading reduplicative files. However, considering that cloud and key servers may conspire to usurp the user's privacy and a malicious attacker may deceive to gain illegal access rights, such a task is not trivial. To alleviate the conflicts between functionality and security, we first define the system and security models as below.

A. SYSTEM MODEL
As shown in Figure 1, a group of key servers is introduced besides a cloud server and several clients. The reason behind this is to make sure that once the number of corrupted servers are less than a threshold, nothing about the user's plaintext can be learned from uploaded information.
The roles of three kinds of participants as defined as the following.

1) CLIENTS
The roles of three kinds of participants as defined as the following. They were uploading their encrypted files to the cloud. However, if replication is detected, they save the corresponding link and PoW at local without actual uploading.

2) KEY SERVER (K-CSP)
It provided with part of the master key to cooperatively generate convergent keys for clients. It is noteworthy that some of them may be semi-malicious and league with a cloud server for interested user privacy.

3) CLOUD SERVER (S-CSP)
Act as an agency to conserve user's data. Moreover, it is also responsible for saving file labels for retrieval and access control. The cloud is assumed to be semi-malicious.

B. THREAT MODEL
For any probabilistic polynomial time (PPT) adversary, they may statically compromise a series of nodes to launch an associated attack. To prevent from being perceived and subsequently published by law, the attacker is prone to be semimalicious, which falls into one of the two categories.

1) COLLUDED SERVERS
It is reasonable to take the cloud and part of key servers as honest but curious, meaning that they follow the protocol except for random coins but try to reveal relevant information from uploaded data beyond their privileges.

2) UNAUTHORIZED USERS
To access some files without corresponding PoWs, an illegal or revoked client may deceitfully claim that he possesses file ownerships by intercepting the communication channel or exploiting known hash values. We also consider some attackers that are out of the system, so this kind of adversary does not have to comply with the established protocol and are deemed as entirely malicious.

C. SECURITY OBJECTIVES
Directly followed by the threat model mentioned above, two secure objectives are defined concerning confidentiality and authenticity.

1) CONFIDENTIALITY
Suppose that the ciphertext of uploaded file m is c, for at most t-1 corrupted key servers colluding with the cloud server, -On input message m -Output ciphertext c, satisfying Pr[c = c|m = m] < neg(λ),where neg(·) is a negligible function related to security parameter λ.
This implies that the adversary cannot learn any information about uploaded files if the threshold property is guaranteed.

2) AUTHENTICITY
For any malicious client who is not provided with the file m at the very beginning, That is to say; the client is incapable of acquiring the file even if he obtained its corresponding hash value.

V. THE PROPOSED SCHEME
The main idea behind our scheme is exploiting a series of key servers to cooperatively sign the hash value of the file to generate a convergence key. Thanks to the preliminary of CRT-based threshold secret sharing, such the key cannot be reproduced by a minority of key servers who conspire with the cloud. Meanwhile, since the signature is blindly generated via the semantic secure crypto algorithm, no information about the file would be exposed to any signer. Considering that the convergence key and duplication label are statistically independent of the hash value, dictionary attacks can thus be averted. Moreover, we also take the threaten of ownership deception into account. The solution is to build a PoW according to Bloom filter, which will be capable of excluding unauthorized users with overwhelming probability.
For clarity, our scheme will be presented as three parts, namely System Initialization, File Uploading, and File download.

A. SYSTEM INITIALIZATION
As shown in Figure. 1, the system is configured as one cloud and n key servers, serving numerous clients who expect to outsource their files without privacy infringement.
To distribute the shares of the master signing key, an auxiliary DC can be trivially employed to achieve the task. However, the distribution center may also be a pothole where the key may be divulged.
For each key server KS i , it stochastically chooses a key share SK i together with a private integer A i satisfying.
(1) 0 < SK i < q/n ; Then it calculates a ij = SK i + A i q mod d j for j = 1, 2, · · · , n and delivers them to corresponding servers.
After receiving all a ij from other nodes, the share L j for server KS j can be directly computed as Denoting SK = n i=1 SK i and A = n i=1 A i ,since SK + Aq ≡ L j mod d j , it is obvious that the secret SK is securely shared without exposed to any key server. To this point, each member takes L j as its signing key and release PK j = g D d j e j L j −A j mod D mod q to jointly publish system public key as PK = n j=1 PK j mod q.

B. FILE UPLOADING
When a file m is supposed to be updated, a convergent key k c is computed by signing its hash value blindly with the help of key servers. Then the client transmits the label TF to cloud for duplication inspection. If such a file does already exists, the cloud hands a link back to the user without real uploading. Specifically, the protocol is described as follows.
Step1) Client permutes and divides the file into n blocks as m = f p (m) = m 1 | m 2 | · · · | m n .according to a fixed permutation function f p (·).Then it calculates H ( m i ) for i = 1, 2, · · · , n in terms of a pre-defined hash function H (·) : {0, 1} * → Z * q and secretly deliver them to their corresponding key servers.
Step2) Once H ( m i ) is received, key server KS i figures out u i = H (H ( m i )||k KS i ) according to its private but fixed random string k KS i and broadcasts r i = g u i mod q to the client as well as a subgroup KS containing at least t servers. The servers who will not participate in the following procedure should also send their u i secretly to the client.
Step3) Client randomly samples ϕ from Z * q and broadcasts to all servers in KS , where h = H (m) and Step4) After all aforementioned information is collected, each participant KS j signs h with its private key and returns the partial signature back to the client, that is 1)KS j calculates r the same as formula (5).

2)Then it figures out
for D = t j=1 d j and D d j e j ≡ 1 mod d j VOLUME 8, 2020 3) Finally, it delivers a triad (r, D, s j ),where as a reply to the user.
Step5) The client halts if the inputs regarding r or D are not coincident. Otherwise, it computes and de-blind it via for U = n i=1 u i and r = r ϕ mod q. Step6) Concerning about the requirement of deduplication, we devise the replication inspection label as TF = (tf maj , tf sub ),where and Based on TF, a secure de-duplication sub-protocol can be carried out as follows.
1) Without loss of generality, assume TF = (tf maj , tf sub ) stands for the label submitted to check duplication. Before real uploading, the client transmits TF to the cloud who will inspect whether tf maj /tf maj is congruential with PK tf sub /tf sub over group G for any exist TF. If such equivalence is found, the file should be not be uploaded again since the cloud has already held a copy of it.
2) Depending on whether or not replication is detected, our protocol will proceed in two different ways.
A. As for a fresh file, the client takes actions as below.
(1)Divides the file into K blocks, marked as B 1 , B 2 , · · · , B K respectively, and computes all p k = f PRF (H (B k )||k) in terms of a pseudo-random function f PRF (·) : {0, 1} log 2 q + log 2 K → {0, 1} l . Then converts all of the p i th bits in Bloom filter BF to 1.
(2)Exploits the pre-defined symmetric cryptosystem to compute c = E k c =H (s ) (m), where s = s /ϕ mod q.
(3) Conserves all p k corresponding to K blocks and uploads the index A = (TF, BF) as well as the ciphertext c to the cloud.
B. Howbeit, if duplication is found in the cloud, the client has to prove its ownership of the file and acquire the symmetric key k c .
(1) The cloud randomly selects K blocks and sends their indexes k to the client. Once the correspondent hash values h k = H (B k ) are figured out, the client calculates p k = f PRF (h k ||k ) and sends them back to the cloud. For all K received values, the cloud checks if all of the p k th bits in BF are 1 and terminates once an inconsistency comes to light. They were noting that the client should calculate and preserve all p k for K blocks for further retrieval (2)If the ownership is validated, the cloud simply transmits tf sub = tf sub − tf sub mod q to t key servers. On received tf sub , each key server KS j computes ξ j = tf sub v j and privately sends it to the client. By computing the client can correctly obtain the convergent key k c .
(3) The cloud sends file link to the client.

C. FILE DOWNLOAD
Under the circumstances of file retrieval, the cloud must authenticate the PoWs for clients. That is to say, once a user launched a retrieval request, the cloud should verify his ownership about the Bloom filter of such a file. The process is similar to Step6) B (1), and the client will be allowed to download the ciphertext, which can be decrypted by k c .

D. CORRECTNESS
We now prove that any replication can be correctly detected, and the convergent key achieved by the client is accurate. Lemma 1: Based on de-duplication labels TF = (tf maj , tf sub ), the cloud is able to determine the equality of their corresponding data. Proof: After system initialization, each key server KS j ∈ KS is provided with a share L j related to the secret key SK .Respectively, they sign the deformed hash of a message as s j = u j h + rv j , in accordance with equitation (4), (5) and (6). By collecting t shares of the signature, the client combines them together with KS i ∈KS u i h and achieves s = Uh + rSK mod q in terms of Lemma 1 and nq 2 + nqd n < N . Since the client is aware of r = g U mod q and ϕ, he can then compute the label as tf maj = g Uh+SK (g U ϕ /ϕ) and tf sub = g U ϕ /ϕ.
Denoting TF and TF as two labels whose corresponding files are m and m, then If the two files are duplicate, we have U = U and h = h since h = H (m) and U = n i=1 H (H ( m i )||k KS i ) where m i is deterministically extracted from m and k KS i is fixed for each key server. It means the above equation can be rewritten as which is equal to PK tf sub −tf sub mod q because tf sub = g U ϕ /ϕ mod q Once a duplication is found, the client does not need to upload his file any longer. However, if he retrieved the ciphertext of his data, it should be guaranteed that he can decode it. Therefore, we claim the correctness of retrieval in Theorem 2.
Theorem 2: For any valid user, he is capable of downloading and deciphering his file.
Proof: As for the PoW of a file, any legal client must be aware of all p k = f PRF (H (B k )||k) corresponding to file blocks B 1 , B 2 , · · · , B k .Therefore, he is able to respond every challenge launched for ownership authentication, thus downloading the ciphertext of his file.
In order to decrypt the ciphertext, valid clients must be provided with the convergent key k c It is obvious that the client, who uploaded the file for the first time, is conscious of k c = H (Uh + SKr /ϕ mod q), As for other clients, they received ξ j = ((tf sub − (tf ) sub ) mod q)v j from t key servers and know s = ϕUh + SKg U ϕ mod q. Thus, they can compute to obtain the key in light with Lemma 1 and q + nqd n < nq 2 + nqd n < N .

VI. SECURITY ANALYSIS
In light of the aforementioned security objectives, we will prove that semi-malicious servers are unable to violate user's privacy, and fraudulent clients cannot deceive to access unauthorized files. To certify the security of our scheme, we give an impossibility in advance. Impossibility: Disguised as a valid user who is provided with a file m, the cloud can tell if it coincides with any outsourced file m, implying that the ciphertexts would no longer be indistinguishable.
The proof of such impossibility is straightforward due to the functionality of de-duplication. That is to say; any valid user is entitled to carry out a file label TF, which can be used to detect if any uploaded file is corresponding to it by the cloud. Owing to the conflict between de-duplication and user privacy, we propose a weaker definition of the indistinguishability of ciphertexts as below. Furthermore, our scheme also achieved a security level that if less than t key servers collude with the cloud, the ciphertext can even be indistinguishable under the chosen-plaintext attack, where the security of CRT-based secret sharing should be exploited.
Lemma 2: The private key SK is unconditionally secure if less than t key servers collude.
Proof: Suppose only t < t key servers participate in the process of key recovery, they compute D = t j=1 d j . Because Since gcd(D, d) = 1 and q < N /D, α can be any integer ranging from 0 to q at least, meaning that the possibility of revealing the exact Z = n i=1 SK i + n i=1 A i q is not better than 1/q after modulo D. That is to say, they are incapable of acquiring SK = n i=1 except for coin guess. Based on the definition of indistinguishability under chosen hash attack, we consider the situation when only the hash value of plaintext is available at first.
As for user's privacy, we claim that the cloud is unable to relate a hash value h to any outsourced ciphertext c even if he impersonates as a valid user.
Lemma 3: On hash value h, the probabilistic polynomialtime cloud cannot determine whether it corresponds to an uploaded ciphertext c under chosen hash attack.
Proof: In order to decide if a copy c of file m, where H (m = h), dose already exist on the disc, the cloud has to compute TF = (tf maj , tf sub ) for duplication inspection. However, since he knows nothing about m but its hash value h,the key servers cannot help to compute the correct value of U = n i=1 H (H ( m i )||k KS i ) for him. Denoting U as the specious value he derived by coin tossing, the de-duplication label should be rewritten as tf maj = g U h+SK (g U ϕ /ϕ) and tf sub = g U ϕ /ϕ Even if a ciphertext is really encrypted by c = E k c (m) for m = m thus h = h = H (m), the value tf maj /tf maj = g (h(U −U )+SK (g U ϕ /ϕ−g U ϕ /ϕ) mod q takes only 1/q chance to be equal to PK tf sub −tf sub = g SK (g U ϕ /ϕ−g U ϕ /ϕ) mod q . In other words, the cloud cannot carry out de-duplication if the file itself is unknown. Noting that, since the convergent key is computed as k c = H (U h + SKg U ϕ/ϕ mod q) according to the label, it is also impossible for him to decrypt the file even if such duplication is detected.
Without the file beforehand, the following conclusion can also be achieved concerning a malicious client.
Lemma 4: The user is incapable of using a hash value h to deceive for unauthorized download permission.
Proof: Assume that the user is aware of the link corresponding to h. Before downloading, he has to show the PoW of the file. However, since he is ignorant of the values p k = f PRF (H (B k )||k) corresponding to the file blocks, the probability that he passes any k verifications of the Bloom filter checking is only (1 − (1 − 1/(l − 1)) KT ) T , which is negligible,T represents the number of independent hash functions in BF.
Lemma 5: On hash value h, a malicious user is unable to decrypt its corresponding ciphertext c even if he conspires with the cloud.
Proof: Due to the fact that the ciphertext c in encrypted by k c = H (Uh + SKg ϕ ) which is computed by the original uploader. Without being aware of the file m, they can only guess U the same as that of Theorem 3. After carrying out s = U h + SKg U ϕ mod q in terms of step6) B (2), they obtain k c = H (U h + SKg ϕ /ϕ mod q) which is equal to k c with a negligible probability 1/q Far from the authenticity objective, an attractive security property can also be achieved as below.
Theorem 6: Taking advantage of a known hash value h, a malicious user is not able to refrain the real file m from uploading.
Proof: Without the knowledge of m, the adversary is only capable of forging a label TF = (tf maj , tf sub ) the same as that of Theorem 4. Therefore, even if he outsourced a counterfeit of the ciphertext, the valid label TF = (tf maj , tf sub ) uploaded by a subsequent user will not be coincide with TF thus a real copy can be outsourced independently.
Then we consider the circumstance that the file is explicit to the cloud and testify the user's privacy according to Lemma 2.
Theorem 7: Colluding with less than t, the cloud is unable to determine which ciphertext corresponds to a given file m Proof: Assume that m = m and c = E k c (m), the cloud is capable of launching a request in terms of the correct file. However, since only t < t key servers will cooperate with him, the de-duplication label he obtained would be TF = (tf maj , tf sub ), where tf maj = g Uh+SK (g U ϕ /ϕ) and tf sub = g U ϕ /ϕ According to Lemma 2,it is obvious that there is at most 1/p possibility for SK = n i SK i even if U is accurately generated. Therefore, after he computes tf maj /tf maj = g SKg U ϕ/ϕ −SK g U ϕ/ϕ mod q, PK ( tf sub − tf sub ) = g SK (g U ϕ/ϕ −g U ϕ /ϕ) mod q will negligibly be congruential with it. Since the concurrent key is derived from the label, it also means that the cloud is not able to decrypt the corresponding ciphertext.
To sum up, the user's privacy can be protected because, if the file is not available or less than t key servers collude with him even when the file is known, the cloud is unable to inspect any duplication, not to mention decrypting the ciphertexts. As for the authenticity, a malicious user is incapable of downloading or decrypting any unauthorized file or even refrain the file from uploading if he is not provided with it beforehand.

VII. PERFORMANCE ANALYSIS
In this section,to demonstrate the efficiency of the scheme,we choose two schemes [4], [5] that based on the blind signature for comparison. And We also compare this scheme with many other schemes that are not based on blind signatures.In our performance evaluation, for the convenience of discussion, some notations are introduced, we denote by Exp the modular exponentiation, by Hash the convergent encryption, by n the total number of K-CSPs, and by t the number of K-CSPs participated in the convergent key creation. Please note that we omitted the general file transfer and file encryption/decryption modules for simplification. Besides, the file downloading process only has symmetrical encryption that is very efficient. It will not influence the efficiency of the system, so we omit it in our discussion.
Our experimental environment was a computer with Intel(R) Core(TM) i5-3470 CPU processor running at 3.2GHz and 8G memory running Linux. Here we set the reliability level n − t = 2, the quantity of the file in the data upload from 10 to 70.

A. SET SYSTEM COST
In the establishment phase of the system, the primary computation cost is dominated by modular exponentiation operations, which are used to generate public/private pairs for each K-CSP and the system public/private pair. In Miao [4] scheme, each K-CSP needs n + 2 times exp operations to publish its public and private keys by interacting with other K-CSPs; In Dupless [5] scheme, because a single server is used to build a system, we ignore this comparison; In our scheme, each K-CSP only requires three times exp operations for publishing its public and private keys by interacting with other K-CSPs. In the experiment, we set the number of K-CSPs from 5 to 10, as shown in Figure.2.

B. FILE UPLOAD COST
In the file upload phase, the user generates the convergent key by interaction with K-CSPs, the time cost includes all signatures and labels, we denote m by the numbers of the uploaded file, it ranges from 10 to 70. Please note that we only consider the situation of the first upload.When the number of key servers n = 10,in the Miao [4] scheme,the total computation consists of 2t + 1 times Exp and 2m times Hash;In the Dupless [5] scheme,the total computation consists of 2 times Exp and 3m times Hash;In our scheme,the total computation consists of t + 6 times Exp and m times Hash,as is shown Figure.3.

C. COMPARED TO OTHER SCHEMES
Then, we compare the performance of our solution with other major technologies. For the third party's necessity, de-duplication level, the necessity of participation, the necessity of key fusion and other functions, see Table 3 below for comparison. Now, we compare the computational overhead for PoW, respectively on client, third-party and cloud side. The results are shown in Table 4. In order to achieve security and resist violent dictionary attack, this scheme has no obvious advantage over the one without blind signature technology in terms of time cost, but its security is guaranteed.Compared with the same scheme using blind signature technology, this scheme is more efficient and more secure.

VIII. CONCLUSION
Aiming at the violent dictionary attack, we proposed a conspiracy-free data de-duplication protocol based on a threshold blind signature in this article. We argued that the scheme [4], [5] is defective in both security and efficiency aspects. So we use CRT secret sharing, blind signature, and bloom filter to construct the scheme, the experimental results show that the calculation cost of the systems establishment and files upload is relatively small. And we used homomorphism computation to aggregate and generate partial signature tags, and introduced a secret sharing mechanism based on CRT to hide signature keys, thus balancing the security concerns of cloud and client.