Certificateless Provable Data Possession Protocol for the Multiple Copies and Clouds Case

For user’s extremely important data, storing multiple copies on cloud(s) may be a good option because even if the integrity of one or more copies is broken, it can still recover data from other intact ones, which increases the availability and durability of the outsourced data. Some provable data possession (PDP) protocols guaranteeing the integrity of multi-copies had been proposed in the past years. But almost all of them considered storing multi-copies to single cloud, and the necessary management of certificates as well as the dependence on PKI greatly decrease their efficiencies. Therefore, in recent work, Li et al. proposed an identity-based PDP protocol, which not only avoids the tedious certificates and PKI, but also supports muiti-copies stored on multi-clouds. However, it is well-known that identity-based protocols suffer from the key-escrow attack. In this paper, we consider the certificateless multi-copy-multi-cloud protocol. Specifically, we first present its security model and then construct a concrete protocol, whose security can be proven under the classical CDH assumption. Finally, the performance analysis demonstrates that our protocol yields better efficiency and hence is practical.


I. INTRODUCTION
Outsourcing data to a remote cloud service provider (CSP) instead of private computers allows individuals or organizations to save their space and concentrate on innovations or other aspects, which relieves the burden of constant server updates as well as other computing issues. Meanwhile, some authorized users can access those remotely stored data from different geographic locations, which provides convenience for them. Hence, cloud storage has become a popular trend for individuals or organizations in recent years, and many famous corporations, such as Google, Microsoft, Amazon, Huawei, IBM etc. provide this service to people in the whole world [4].
When uploading their data to CSP, data owners (DOs) will delete them from their local computers for saving space. Thus, they also lose the direct control of their own data. If the remote CSP is not trustworthy, then user's data is just in a dangerous situation. For example, the CSP may secretly delete user's data that is not commonly used for saving its own space and hence attract more users to The associate editor coordinating the review of this manuscript and approving it for publication was Kim-Kwang Raymond Choo . store data on it. How to guarantee the integrity of user's data is an extremely interesting problem and in recent years, many provable data possession (PDP) schemes are designed [2], [11], [15], [18]- [20], [25].
Generally speaking, in a PDP scheme, the verifier sends a challenge message to CSP, who should return an integrity proof based on DO's stored data and this challenge message. If this proof can pass the verification of the verifier, then DO believes that its data is intact. Otherwise, it is broken and DO can claim for compensations from CSP. For some resource-constrained DO, it prefers to authorize third-party auditor (TPA), who has more professional knowledge and powerful resources, to perform the verification task. Hence, TPA-based PDP scheme is a useful tool to help DO check data's integrity in cloud storage.
However, for DO, once CSP loses its data, then it may lose it forever, although it knows this truth by TPA's verifying. If the data is extremely important, such as bank's account or daily's transaction information, then DO would recover its data rather than obtain CSP's compensations. To improve the availability and durability of data, a good choice for DO may be generate multiple copies and store all of them on CSP.
In this situation, when one copy is tampered, the DO can still recover its data from other copies. To further decrease the risk, the DO can also store its multiple copies on different CSPs. In this situation, it can still recover its data from other CSPs even all the copies in some CSP are lost or tampered. Therefore, the traditional PDP schemes should also be twisted or extended so that the integrity of DO's data can also be checked in this case. We call this PDP scheme as multi-copymulti-cloud (MCMC) PDP protocol.
When defining or designing MCMC-PDP protocol, we must note the following facts. (a). Data's copies should be different from each other. The reason lies in that, if some two copies on one CSP are the same, then the CSP only needs to store one and delete another copy. Meanwhile, it claims that the two same copies are integrally stored. In this sense, we somewhat abuse this word ''copy'' since in general, copies should be completely same. (b). All copies should be simultaneously checked by only one challenge-response interaction. Or it will be meaningless to define and design MCMC-PDP protocol. (c). From the view of security, TPA should not gain any useful information on DO's data except for honestly conducting the verification.
In the recent work, Li et al. proposed an efficient MCMC-PDP protocol, which satisfies all those facts [12]. As they said, their protocol is also identity-based (IB), which avoids the tedious certificates management and the dependence on public key infrastructure (PKI) because traditional PDP schemes with public verifiability often encounter the problem that how to recognize the relationship between someone's true identity and his/her public key. In their IB-MCMC-PDP protocol, an additional entity: key-generation center (KGC) is proposed, who is responsible for issuing keys to different users (or DOs).
It is also well-known that, although identity-based primitives avoid the management of certificates and usage of PKI, they suffer from the key-escrow attack, which means that KGC owns all the private keys of its users and hence can do everything, such as signing or decrypting, on behalf of any user [9], [13], [19]. Therefore, it is extremely important and urgent to design a MCMC-PDP protocol that not only avoids the certificates as well as PKI but also does not have the key-escrow problem.

A. OUR CONTRIBUTIONS
In this paper, we apply the certificateless-technique [3] to MCMC-PDP protocol and thus propose a CL-MCMC-PDP protocol. More precisely, our contributions includes the following aspects.
• For the first time, we define the security model for CL-MCMC-PDP protocol. In fact, in our security model, the adversaries are divided into three types, which respectively model a malicious KGC, a general adversary allowed to replace user's public key, and a dishonest CSP.
• We propose a concrete CL-MCMC-PDP protocol and, in the random oracle model, prove its security based on the computational Diffie-Hellman (CDH) assumption.
• Analyze the performances of our proposed protocol in terms of communication and computational costs. By comparing it with Li et al.'s IB-MCMC-PDP and another traditional multi-copies PDP protocol [26], we illustrate that this protocol is very efficient and hence practical.

B. RELATED WORKS
In 2007, Ateniese ea al. presented the new notion of PDP, which enables DOs to check the integrity of their data stored on an untrusted CSP without downloading the whole file, and realized the so-called ''spot-checking'' technique [2]. This scheme is also the first one supporting public auditing. In the same period, Juels et al. considered a stronger model: Proof of retrievability (PoR), and proposed a concrete implementation [11]. In the next year, Schamcham and Waters improved Juels et al.'s model and designed a new scheme, which can be proven secure under CDH assumption and compact [15]. To achieve better efficiency and properties, several PDP protocols are proposed, such as [8], [17], [21], [22]. Those previously suggested ones used the traditional public key cryptography technique and hence a certificate, issued by an trusted certificate-authority center to bind some DO's identity and its public key, is needed. The main problem for this cases lies in that as the number of users grows, the certificate management becomes extremely difficult. Therefore, the certificate-based PDP protocols become very inefficient when they are used in practical scenarios.
In fact, back to 1984, Shamir proposed the notion of identity-based cryptography to resolve this problem [16]. Hence, many researchers considered to apply identity-based cryptographic technique to PDP protocols and hence many IB-PDP schemes have been proposed [18], [23], [24], [27].
However, in later applications, researchers also found that the IB-PDP protocol had become extremely vulnerable under the key-escrow attack since the KGC has too powerful ability. To help the PDP protocol defend from this attack, many certificateless PDP protocols [9], [10], [13], [19] were proposed, which combined the certificateless cryptography [3] with PDP schemes.

C. ORGANIZATIONS
The later parts are organized as follows. In Section 2, we introduce some basic notations and notions. In Section 3, we present our proposed CL-MCMC-PDP protocol and its security analysis. In Section 4, discuss other properties of this protocol. Performance analysis can be found in Section 5. Finally, we conclude this paper in Section 6.

II. PRELIMINARIES
Basic Notations: In this paper, we always use λ as the security parameter and 1 λ is its unary form. For a prime q, the symbol Z * q defines the finite field [0, 1, · · · , q−1] and [q] denotes the same set. |Z * q | means the bit-length of one element in Z * q . PPT is an abbreviation of the term ''probabilistic polynomial time''. We call a function f (λ) is negligible, if for any c > 0, there exists some λ 0 ∈ Z such that f (λ) < λ −c for any λ > λ 0 . Other symbols and their definitions appeared in our paper is listed in Table 1.

A. BILINEAR MAP
Assume that G 1 , G 2 are two multiplicative groups with the same order q. The map e from G 1 × G 1 to G 2 is called bilinear if: • Non-Degeneracy. For any generator P ∈ G 1 , it holds that e(P, P) = 1 G 2 .
• Computability. It is efficient to compute the values of e.
• Bilinearity. For any P, Q ∈ G 1 and a, b ∈ Z q , it holds that e(P a , Q b ) = e(P, Q) ab .

B. CDH ASSUMPTION
For a group G with the prime order q and generator P, the CDH assumption means that, for any PPT adversary A, it is computational infeasible to output the group element P ab when given the tuple (P, P a , P b ). In other words, the probability to output P ab is negligible. Here, the probability is taken over the randomness of a, b and the inner coins of A.

C. SYSTEM MODEL
In the CL-MCMC-PDP system model, there are five kinds of entities: KGC, DO, CO, CSPs and TPA, which work as follows.
• The KGC issues partial-private keys for different DOs after getting their identities.
• A DO obtains partial-private key from KGC and generates its full private key. Using this key, it can generate authenticated tags for many copies of the outsourced file. Then store those files as well as their tags to different CSPs.
• CSP provides storage services to DOs and rises to the challenge from the TPAs, who may check the integrity of DO's data.
• The CO is a manager of CSPs. When storing the (authenticated) file copies, a DO first sends them to CO. Then CO assigns different copies to the target CSPs according to DO's request. When checking the integrity, TPA first gives the challenge message to CO, who will distribute this message to the corresponding CSPs. After receiving all the proofs from those CSPs, this CO aggregates them into the final proof and returns it to TPA.
• TPA verifies the integrity of all outsourced copies on behalf of DO. A graphic description of those entities as well as their working ways can be seen in Fig. 1. Generally speaking, a concrete CL-MCMC-PDP protocol includes the following PPT algorithms: Setup, PPGen, Set-Secret-Value, Set-Private-Key, Set-Public-Key, RepGen, TagGen, TagVer, Chal, ProofGen, ProofAggr and Verify, which work across the above entities.
• Setup(1 λ ) → (params, msk). This algorithm is run by KGC and will output the system parameters params and the KGC's master secret key msk. Note that this params can be publicly obtained by all other entities in the system but msk is private.
• PPGen(msk, ID) → PP ID . This algorithm is run by KGC and used to generate DO's partial private key PP ID based on the queried identity ID.
• Set-Secret-Value(ID) → x ID . This algorithm is run by DO and will generate DO's secret value x ID , which is used to construct DO's full private key.
• Set-Private-Key(PP ID , x ID ) → sk ID . This algorithm is still run by DO and will output DO's full private key sk ID based on its partial private key PP ID and secret value x ID .
• Set-Public-Key(sk ID ) → pk ID . The DO uses this algorithm to generate its public key pk ID based on sk ID .
• RepGen(F, N , n) → F. The DO uses this algorithm to generate N copies (denoted by F) of the data file F to be stored on CSPs. Denote the i-th copy by F i (1 ≤ i ≤ N ). Then this file F i is split into n blocks: • TagGen(ID, sk ID , pk ID , F, C) : This algorithm is run by DO to generate authenticated tags (denoted by T i ) with respect to (ID, pk ID ) for the initial file F according to some policy C. Here, the policy denotes how the DO will store its copy files and their tags to the CSPs. For example, for the copies F 1 , F 2 , · · · , F N , in the first CSP, he would like to store the tuples (F 1 , T 1 ), (F 3 , T 3 ) and (F 8 , T 8 ), while, in the second one, he would like to store (F 2 , T 2 ) and (F 4 , T 4 ) and so on. We remark that if multiple copies (F i , T i ) will be stored on a same cloud, then their cloud identities Cid i are identical.
where Fid is the unique filename of the initial file F, Cid i is the identity of target CSP that will store the i-th copy.
• Chal( ) → chal. This algorithm is run by TPA to generate challenge message chal. Here, the input denotes the number of blocks to be challenged (for single copy file).
• ProofGen(Fid, (F i , T i ), chal) → i . This algorithm is run by CSP with identity Cid i that stores the authenticated copy file (F i , T i ). Based on the challenge message chal, this algorithm returns a proof i .
This algorithm is run by CSP or CO to generate an aggregated proof . If multiple copies are stored on one cloud, then this cloud will aggregate those generated proofs (from the above algorithm ProofGen) into one by using this algorithm. Alternately, if this algorithm is run by CO, then it will generate an aggregated proof from different CSPs.
• Verify(ID, pk ID , chal, , Fid, {Cid i }) → 1/0. This algorithm is run by TPA to check the validity of returned proof from CO. Note that this proof should be an aggregation of all the proofs from different CSPs, and hence it needs to take all those clouds' identity as inputs.

D. AN EXAMPLE
For clear readability, in this subsection, we take an example to describe how the CL-MCMC-PDP protocol works when combing the algorithms with the entities.
Concretely, a KGC first runs Setup(1 λ ) to get params and msk. Broadcast params and keep msk secret. When a user DO with identity ID requesting its partial private key by submitting ID, KGC runs PP ID ← PPGen(msk, ID) and returns it to this user. Now, DO runs Set-Secret-Value(ID) to get x ID and Set-Private-Key(PP ID , x ID ) to get the full private key sk ID . Then obtain its public key pk ID by running Set-Public-Key(sk ID ). Broadcast pk ID to other entities. This also finishes the setup process of DO's key.
If DO wants to store its initial file F to different CSPs, he first randomly chooses Fid from some set, such as {0, 1} λ , and runs F ← RepGen(F, N , n) to get the N -copies of F, each with n-blocks. For example, N = 5, n = 4. Then F = That is, he intends to store F 1 , F 3 , F 4 to the first CSP and F 2 , F 5 to the second CSP. Thus, Cid 1 = Cid 3 = Cid 4 , and Cid 2 = Cid 5 . Then run TagGen(ID, sk ID , pk ID , F, C) to generate the correspond- After receiving DO's data, CO first runs TagVer(ID, pk ID , If TPA needs to check the integrity of DO's data stored in the two CSPs after receiving DO's information (Fid, C), he runs chal ← Chal( ) and gives it to CO, who will transfer it to the two CSPs. Now, the first CSP runs i ← ProofGen(Fid, (F i , T i ), chal) for i = 1, 3, 5 and then aggregate them into a new proof by running ProofAggr( 1 , 3 , 5 ). The second one can also compute a proof by aggregating the proofs generated from (F 2 , T 2 ) and (F 5 , T 5 ).
Obtaining the returned two proofs and , the CO aggregates them into a whole proof by running ProofAggr( , ) again. Then return the proof to TPA, who will check the validity of by running Verify(ID, pk ID , chal, , Fid, {Cid i } 1≤i≤5 ). If the result is 1, then accept this . Otherwise, inform DO that the integrity of his data is broken.

E. REQUIREMENTS
The following fundamental requirements are needed for a CL-MCMC-PDP protocol.
• Correctness. If the CSPs honestly generate their proof based on TPA's challenge message, then the aggregated proof can pass the verification of TPA.
• Security. If some CSP loses one or more blocks of DO's data file, then the TPA can detect it with very high probability. Moreover, for certificateless PDP, it also should be secure against malicious KGC and general adversaries, who may replace DO's public key and try to forge valid signatures.
• Blockless. For the verifier, it can correctly verify data's integrity without downloading the original data file.
• Public Verifiability. The auditing process should be public and hence is suitable to outsourced to the curious TPA, which means that although he honestly performances the auditing task, it may also be interest in the contents of DO's data.
• Privacy Preserving. The TPA should not get any useful information on contents of DO's stored data during the process of auditing.
• Certificateless. In the whole process of auditing, it is not needed to use any certificates to identify the relationship between DO's identity and it's public key.

F. SECURITY MODEL
In this subsection, we define the security model of CL-MCMC-PDP protocol. Concretely, we consider three types of adversaries: Type I, II and III, which model the misbehaving of malicious KGC, general adversary against the signature of single block, and CSP, respectively. Type I Adversary. For this kind of adversary, it has access to the master secret key msk but is not allowed to replace user's public key (or it will certainly success in the following security game).
Game I. This is a game played between a challenger CH I and an adversary A I .
• Phase-I-1. First, the challenger runs the algorithm Setup(1 λ ) to obtain system parameter params and a master secret key msk. Give them to A I . Note that this adversary can generate any user's partial private key because it has msk.
• Phase-I-2. A I can adaptively make the following queries, which are recorded in a list L.
-Private-Key-Query. A I submits ID together with the corresponding partial private key PP ID (generated by itself) to the challenger, who then runs x ID ← Set-Secret-Value(ID) and sk ID ← Set-Private-Key(PP ID , x ID ). Finally, record (ID, PP ID , x ID , sk ID ) to the list L and return sk ID to A I . -Public-Key-Query. The adversary A I submits (ID, PP ID ) to the challenger, who checks if sk ID exists in L. If it is, recover it. Otherwise, generate it as the above step. Then run pk ID ← Set-Public-Key(sk ID ). Give pk ID to A I and update ID's public key as pk ID . -Tag-Generation-Query. A I submits the identity ID, its partial private key PP ID , the data file F and the policy C to CH I , who will first check if sk ID , pk ID exist in L. If it is, recover them. Otherwise, generate them as in the above two steps. Then randomly choose Fid from a proper set, run T ← TagGen(ID, sk ID , pk ID , F, C), and return T to A I . -Transcript-Query. This query returns the verification transcript among the TPA, CO, and CSPs for any data file F, policy C, and DO's identity ID chosen by A I . Concretely, A I submits an identity ID, its partial private key PP ID , C, and a data file F to the challenger, who will first randomly choose the filename Fid, generate N copies F 1 , F 2 , · · · , F N by running the algorithm RepGen(F, N , n) and compute all the tags T 1 , T 2 , · · · , T N for all the copies by running TagGen. 1 Then the challenger stores the files and their corresponding tags according to C. Now, A I continues to submit chal to CH I by running Chal( ). The challenger will compute the proof i and the aggregated by running ProofGen and ProofAggr. Return all the generated proofs to A I .
• Phase-I-3. Finally, A I outputs the tuple (1) and the following conditions hold: 1) ID * has not been issued as a private-key-query.
2) T * should not be the generated tag of m * w.r.t. the tuple (ID * , pk ID * , Fid * , i * , j * , Cid i * ) by making the tag-generation-query, then we call A I wins the above game.
Type II Adversary: For a general adversary against the signature of single block, it is allowed to replace user's public key but not able to have the master secret key. Next, we consider the following security game.
Game II: This is a game played between a challenger CH II and an adversary A II .
• Phase-II-1. First, the challenger runs the algorithm Setup(1 λ ) to obtain system parameter params and a master secret key msk. Give params to A II .
• Phase-II-2. A II is allowed to adaptively make the following queries, which are recorded in a list L.
-Partial-Private-Key-Query. The adversary submits ID to the challenger for its partial private key. CH II runs PP ID ← PPGen(msk, ID), and return PP ID to A II . Add (ID, PP ID ) to L. -Private-Key-Query. A II submits ID to the challenger, who will first check if PP ID exists in L. If it is, recover it. Otherwise, make a partial-private-key-query. Then run x ID ← Set-Secret-Value(ID) and sk ID ← Set-Private-Key(PP ID , x ID ). Finally, update the item (ID, PP ID , x ID , sk ID ) in the list L and return sk ID to A II .
-Public-Key-Query. The adversary A II submits ID to the challenger, who checks if sk ID exists in L. If it is, recover it. Otherwise, generate it as the above step. Then run pk ID ← Set-Public-Key(sk ID ).
Give pk ID to A II and update ID's public key (in L) as pk ID . -Replace-Public-Key-Query. A II submits (ID, pk ID , pk ID ) to the challenger in order to replace pk ID with pk ID . Then CH II updates ID's public key as pk ID . -Tag-Generation-Query. A II submits the identity ID, the data file F and the policy C to CH I , who will first check if sk ID , pk ID exist in L. If it is, recover them. Otherwise, generate them as in the previous two steps. Then randomly choose Fid from a proper set, run T ← TagGen(ID, sk ID , pk ID , F, C), and return T to A II . -Transcript-Query. This query is the same as the transcript-query of Game I except that A II does not need to submit the partial private key PP ID when making this kind of queries.
2) ID * cannot be an identity for which both the public key has been replaced and the partial private key been extracted. (Or the adversary can definitely win the game.) 3) T * should not be the generated tag of m * w.r.t. the tuple (ID * , pk ID * , Fid * , i * , j * , Cid i * ) by making the tag-generation-query, then we call A II wins this game.
Type III Adversary: This kind adversary models a malicious CSP who intends to pass the verification of TPA when the integrity of DO's stored data file is broken. Now, we consider the following security game.
Game III: This is a game played between a challenger CH III and an adversary A III .
• Phase-III-1. Same as Phase-II-1. • Phase-III-2. This phase is the same as Phase-II-2 in the above Game II except that in the transcript-query, the challenger CH III runs chal ← Chal( ) and submits it A III , who will compute i and by running ProofGen and ProofAggr, respectively. Then the challenger checks the validity of the returned proof and gives the result to A III .
Denote by¯ the returned proofs (based on the same challenge message as that of A III to compute * ) generated by other honest CSPs. Compute¯ * ← ProofAggr( * ,¯ ). If it holds that 1 ← Verify(ID * , pk ID * , chal, * , Fid * , {Cid i }), and the returned * does not equal to the one that it should be for the challenge message, then we call A III wins this game.
If any PPT adversaries A I , A II and A IIII cannot win Game I, II, and III, respectively, then the CL-MCMC-PDP protocol is secure.

III. THE PROPOSED PROTOCOL AND SECURITY ANALYSIS A. THE PROPOSED PROTOCOL
In this subsection, we propose a concrete CL-MCMC-PDP protocol, which consists of the following algorithms.
• Setup. For input of security parameter λ, this algorithm chooses two cyclic groups G 1 , G 2 with the same prime order q (|q| ≥ λ) and e : G 1 × G 1 → G 2 the bilinear map. Choose P as a generator of G 1 and let h : {0, 1} * → Z * q , H ,H : {0, 1} * → G 1 be three hash functions. Then randomly choose s from Z * q and compute P pub = P s . Finally, set msk = s as the master secret key and params = (q, G 1 , G 2 , e, h, H ,H , P, P pub ) as the system parameter that can be publicly known by other algorithms. Output (msk, params).
• PPGen: For the inputs msk and ID, this algorithm randomly chooses t ID from Z * q and computes T ID = P t ID , Output the partial private key PP ID = (T ID , s ID ) (for ID), where T ID is public and s ID is private.
• Set-Secret-Value: For the input ID, this algorithm randomly chooses x ID from Z * q and outputs it. • Set-Private-Key: For the inputs PP ID and x ID , this algorithm sets sk ID = (PP ID , x ID ) and outputs it.
• Set-Public-Key: For the input sk ID , parse it as (PP ID , x ID ) = (T ID , s ID , x ID ) and compute X ID = P x ID . Output the public key pk ID = (T ID , X ID ) (for ID).
• RepGen: For the inputs of file F and the numbers N , n, this algorithm first splits F into n blocks m 1 , m 2 , · · · , m n satisfying each block is smaller than q, and for all i ∈ [N ] and j ∈ [n], encrypt each block m i,j = E K (i||m j ), where E is a symmetry encryption algorithm (e.g. AES) and K is the corresponding encryption key. Here, we remark that the using of symmetric encryption algorithm is to guarantee the differences of these copies. Define as the i-th copy of F. Finally, output • TagGen: For the inputs ID, its key pair (sk ID , pk ID ), a data file F and a policy C, this algorithm first randomly chooses Fid from {0, 1} λ and runs F ← RepGen (F, N , n), where F = {F 1 , F 2 , · · · , F N }, and each F i = {m i,1 , m i,2 , · · · , m i,n } for 1 ≤ i ≤ n. Then compute • TagVer : For the input of the tuple (ID, pk ID , Fid, i, j, Cid i , m i,j , T i,j ), in which we denote byP ID the element X ID · T ID · (P pub ) h ID . If it is, output 1; otherwise output 0.
• ProofAggr: For the inputs i 1 , i 2 , · · · , i ξ , this algorithm parses each i j as (σ i j , M i j ) for 1 ≤ j ≤ ξ , and computes Note that this algorithm can be run by any CSP or the CO. Here, consider a concrete example. Assume that the N -copies will be stored on r different CSPs and, without loss of generality, the copies {(F i j−1 , T i j−1 ), · · · , (F i j , T i j )} are stored on the j-th (1 ≤ j ≤ r) CSP, where i 0 = 1 and i r = N . Then we have Cid i j−1 = · · · = Cid i j . If it is run by the j-th CSP, then the returned proof is If it is run by the CO, then the returned proof is • Verify: For the inputs ID, pk ID , Fid, all the cloud's identities {Cid j } N j=1 , this algorithm checks the following equation If it is, output 1, otherwise output 0. The correctness of (2) can be verified as follows.

B. SECURITY ANALYSIS
In this subsection, we analyze the security of the protocol proposed above and present the detailed proof. Concretely, we have the following Theorem: If the CDH assumption holds in the group G 1 and h, H ,H are modeled as random oracles, then our proposed CL-MCMC-PDP protocol is secure.
In order to prove this main theorem, we parse it into the following three lemmas.
Lemma 1: If the CDH assumption holds in the group G 1 and h, H ,H are modeled as random oracles, then, for any PPT Type I adversary A I , the probability that it can win the Game I applied to CL-MCMC-PDP protocol is negligible.
Proof: Consider an adversary B I who attacks on the CDH assumption and uses A I as a subroutine. Here, assume that A I queries p different identities in the whole process. When given the tuple (P, , ϒ) = (P, P a , P b ) ∈ G 3 1 , B I needs to compute and output the group element P ab . 2 Now, randomly choosej * ∈ [p] as the guess of user's identity outputted by A in the final phase, and simulate the environment for A I as follows.
• Phase-I-1. Choose another group G 2 , which has the same order q as G 1 , and define e as the bilinear map from G 1 × G 1 to G 2 . Then randomly choose s ← Z * q and compute P pub = P s . Set msk = s and give the public parameters (q, P, G 1 , G 2 , e, P pub ) as well as s to A I . Next, choose a symmetric encryption key K . The hash functions are simulated by B I . Finally, initialize an empty list L.
• Phase-I-2. B I answers A I 's queries as follows.
-Hash Queries. The simulation of h is just lazysampling. For the query toH , B I also adopts the same policy except that, in the response to generate tags of blocks, he uses a special way, which will be described there. For the query (ID, T ID , X ID ) to H -oracle, B I first randomly chooses β, γ from Z * q and then checks if the identity ID is thej * -th new identity appeared in L. Return pk ID = (T ID , X ID ) to A I and store pk ID as ID's public key into L. -Tag-Generation-Query. When A I submits a query (ID, pk ID , F, C) to this oracle, B I runs F ← RepGen (F, N , n) Then the tag of m i,j is Denote by T all the tags of {T i,j } 1≤i≤N ,1≤j≤n . Return (Fid, F, T) to A I . -Transcript-Query. The answer to this oracle is normal since B I can simulates the tags for any data file submitted by A I .
• Phase-I-3. Finally, the adversary outputs a forgery (m * , T * ) with respect to (ID * , Fid * , i * , j * , Cid i * ). From we know that e(T * , P) = e(H (Fid * , i * , j * , Cid i * ) · (Q ID * ) m * , If ID * is just thej * -th identity appeared in L, then the analysis can be divided into two cases: 1) The tuple (Fid * , i * , j * , Cid i * ) appears in the taggeneration for some message m i * ,j * . 2) The tuple (Fid * , i * , j * , Cid i * ) does not appear in the tag-generation for any queried messages. For the case 1), according to the simulation process, we know thatH Therefore, Note that, m * = m i * ,j * . This holds for the reason that, otherwise, it will result in T * = T i * ,j * . Hence, combining with (3), we know that e(T * , P) = e P r m i * ,j * from which B I can easily obtain the solution of the CDH problem.
For the second case, we know that the simulation ofH is by lazy-sampling and henceH (Fid * , i * , j * , Cid i * ) = P r * for some randomly chosen r * ∈ Z * q . As a result, combining with (3), we know that e(T * , P) = e P r * · P β γ m * , X ID * T ID * (P pub ) h ID * = e P r * · P β γ m * , ϒ · T ID * (P pub ) h ID * , from which B I can also easily obtain the solution of the CDH problem. This ends the proof of Lemma 1. Lemma 2: If the CDH assumption holds in the group G 1 and h, H ,H are modeled as random oracles, then, for any PPT Type II adversary A II , the probability that it can win the Game II applied to CL-MCMC-PDP protocol is negligible.
Proof: Consider an adversary B II who attacks on the CDH assumption and uses A II as a subroutine. Here, assume that A II queries p different identities in the whole process. When given the tuple (P, , ϒ) = (P, P a , P b ) ∈ G 3 1 , B II needs to compute and output the group element P ab . Now, randomly choosej * ∈ [p] as the guess of user's identity outputted by A in the final phase, and simulate the environment for A II as follows.
• Phase-II-1. Choose another group G 2 , which has the same order q as G 1 , and define e as the bilinear map from G 1 × G 1 to G 2 . Then set P pub = ϒ and give the public parameters (q, P, G 1 , G 2 , e, P pub ) to A II . Next, choose a symmetric encryption key K . The hash functions are simulated by B II . Finally, initialize an empty list L.
• Phase-II-2. B II answers A II 's queries as follows.
-Hash Queries. The simulations of h, H , andH are same as those in the construction of B I except that, in the generation of user's partial private key, the answer to h-oracle will be different, which will be described in the next step. -Partial-Private-Key-Query. A II submits ID to B II , who randomly chooses s ID , h ID from Z * q and computes T ID := P s ID /ϒ h ID . Here, B II potentially defines h(ID, T ID ) as h ID . Return PP ID = (T ID , s ID ) to A II and store (ID, PP ID ) to L. -Private-Key-Query. A II submits ID to B II , who will first check if PP ID exists in L. If it is, recover it. Otherwise, make a partial-private-keyquery. Then randomly choose x ID from Z * q and set sk ID as (PP ID , x ID ). Finally, update the item (ID, PP ID , x ID , sk ID ) in the list L and return sk ID to A II .
-Public-Key-Query. The adversary A II submits ID to B II , who checks if sk ID exists in L. If it is, recover it. Otherwise, generate it as the above step. Then compute X ID = P x ID and set pk ID as (T ID , X ID ).
Give pk ID to A II and update ID's public key as pk ID in L.
-Replace-Public-Key-Query. A II submits (ID, pk ID , pk ID ) to B II in order to replace pk ID with pk ID . Then B II updates ID's public key as pk ID . -Tag-Generation-Query. The simulation of this oracle is same as the process in B I except for the computation of T i,j . More precisely, The tag of m i,j is -Transcript-Query. The answer to this oracle is also normal since B II can simulates the tags for any data file submitted by A II .
• Phase-II-3. Finally, B II outputs a forgery (m * , T * ) with respect to (ID * , pk ID * , Fid * , i * , j * , Cid i * ). From Performing a similar analysis as Lemma 1, we know that B II can obtain the solution of the CDH problem. This ends the proof of Lemma 2. Lemma 3: If the CDH assumption holds in the group G 1 and h, H ,H are modeled as random oracles, then, for any PPT Type III adversary A III , the probability that it can win the Game III applied to CL-MCMC-PDP protocol is negligible.
Proof: Consider an adversary B III who attacks on the CDH assumption and uses A III as a subroutine. The construction of B III is same as B II except for the Transcript-Query. In particular, B III runs chal ← Chal( ) and gives it to A III , who will compute and aggregate the the proof according to chal.
In the final phase, A III outputs a tuple (ID * , pk * , Fid * , * ) based on B III 's challenge chal = (ν τ , a τ ) τ =1 , where Fid * is a filename of a previously queried data file F to the tag-generation oracle. Without loss of generality, assume that the first µ copies F 1 , F 2 , · · · , F µ is stored on A III . Thus, Cid 1 = Cid 2 = · · · = Cid µ := Cid. Parse * as (σ * , M * ), which satisfies 3 = (σ , M ) be the expected response (i.e. the one that would have been obtained from an honest prover), which should also satisfy respectively. If for all τ , m i,ν τ = m * i,ν τ , then M = M * . Thus, combining (4) with (5), it holds that σ = σ * , which contradicts with the validity of A III 's forgery. Therefore, we know that at least one m i,ν τ = m * i,ν τ . Now, divide (4) by (5), it holds that Hence, we have from which B III can easily obtain the solution of the CDH problem. This ends the proof of Lemma 3. Putting all the above facts together, we know that the main theorem holds.

IV. DISCUSSIONS ON OTHER PROPERTIES
The correctness and security of our proposed CL-MCMC-PDP protocol have been proved in Section III. Now, in this section, we discuss other properties that our protocol satisfies.
• Blockless. From the description of our CL-MCMC-PDP protocol, we can see that, after sending the challenge message, TPA receives the returned proof , which is an aggregation of the proofs i 's generated by all the CSPs. Each i only consists of the tuple of (σ i , M i ) ∈ (G 1 , Z q ), and the aggregated also has the same form. Hence, it is not needed to return all the original data blocks to check their integrity.
• Public Verifiability. From the description of the algorithm Verify, we know that it does not need the DO's private key as input. Instead, only DO's identity, public key, challenge message, filename, and the identities of CSPs are needed. All of them can be publicly known and hence our protocol satisfies the public verifiability.
• Privacy Preserving. For a curious TPA, it will obtain the returned proof = (σ, M ), in which M is a linear combination of the messages m i,j 's. However, recalling that the generation process of these messages, we know that m i,j = E K (i||m j ). Here, E is a symmetric encryption scheme, K is the corresponding key, and m j is the j-th block of original data file. Since we use a secure encryption technique to ''mask'' the information of m j 's, the returned part M does not leak any useful information of DO's original file.
• Certificateless. In our definition of security model and the analysis of our proposed protocol, the Type II adversary models a general adversary who is allowed to replace DO's public key by any other values it chooses. Our scheme is secure against this kind of adversary. Thus, the certificate to describe the relationship of DO's identity and its public key in our proposed protocol is not needed.

V. PERFORMANCE ANALYSIS
In this section, we analyze the performances of our proposed CL-MCMC-PDP protocol. Concretely, we compare it with other two previous PDP protocols in [12], [26] in terms of communication costs, computational costs and securities. Note that, Li et al.'s protocol uses a traditional signature scheme as a building block, which result in the whole protocol becoming a non-identity-based one. A solution is to change this traditional signature scheme into an IB-signature one such as Galindo's lightweight scheme in [7]. For presenting a fair comparison, we use Galindo's IB-signature to replace their standard signature scheme and describe the comparison process in the following parts. Communication Costs: In our protocol, the communication process mainly includes those between DO and KGC, DO to CO/CSPs, CO/CSPs to TPA. Hence, we compute the communication costs for all of them as follows. The total comparisons of communication costs are listed in Table 2.
• KGCtoDO: This process considers the communication cost from KGC to DO. More precisely, when getting DO's identity ID, the KGC will compute the partial private key PP ID for ID. Recall that PP ID = (T ID , s ID ) ∈ (G 1 , Z * q ). Therefore, the required communication bandwidth equals to |PP ID | = |T ID | + |s ID | = |G 1 | + |Z * q |. In [12], Li et al.'s protocol is an IB-MCMC-PDP. From their construction, we know that the communication cost from KGC to DO is the length of user's private key |sk ID | = 2|G 1 | + |Z * q |. Since Zhu et al.'s protocol is a traditional certificatebased multi-cloud protocol, there does not exist the entity of KGC. Thus, the communication cost from KGC to DO in [26] also does not exist, which is denoted by ''−'' in Table 2.
• DOtoCO: This process consider the communication cost from DO to CO. Here, we consider the data file F i that is split into n blocks m i,1 , m i,2 , · · · , m i,n . Each block has one sector. For each block, the generated tag T i,j in our protocol is only one element in G 1 . Thus, the communication overhead for our protocol equals to n · |G 1 |. Similarly, we can compute the communication overheads for the protocols in [12] and [26], which are n·|G 1 |+|Sig| and (2+n)·|G 1 |, respectively, where Sig is the used signature in [12]. If it is replaced by Galindo's IB-signature, then |Sig| = 2|G 1 | + |Z * q |. VOLUME 8, 2020  • COtoTPA: This process consider the communication cost from the CO to verifier. In our protocol, when TPA submitting the challenge message chal, CO will return the aggregated proof = (σ, M ) ∈ (G 1 , Z * q ). Hence, the cost is |G 1 |+|Z * q |. For the protocol [12], the returned proof is = (σ, M , R, u, Sig). Thus, the cost equals to | | = |σ | + |M | + |R| + |u| + |Sig| = 3|G 1 | + |Z * q | + |Sig|. If the signature is Galindo's IB-signature, then the total cost is 5|G 1 | + 2|Z * q |. We can also compute the communication cost from CO to TPA for the protocol in [26], which equals to 2|G 1 | + |Z * q |. Computational Costs: Denote by T p , T exp , and T mul the operations of pairing, exponentiation and multiplication in G 1 , respectively. Compared with these three operations, other ones, such as hash, addition and multiplication in Z * q can be omitted because their time-consumptions are nearly negligible. Here, we consider N copies for the initial data file that will be stored on r CSPs. For each CSP, the number of stored copies is |CT i | (1 ≤ i ≤ r). For TPA, it will challenge blocks.
In our protocol, to generate the partial private key for a user, the KGC will cost one T exp . The user needs one T exp to generate its public key. Hence, the computational cost for user generating its full private key and public key equals to 2 · T exp . Similarly, in [12] and [26], the costs for generating user's private keys are one T exp and 2 · T exp , respectively.
To compute all the tags for all the N copies, our protocol will totally cost N · n · (T mul + 2 · T exp ). The same process for Li et al.'s and Zhu et al.'s protocols will cost N · n · (4 · T exp + 2 · T mul ), and N · n · (T mul + 3 · T exp ), respectively.
For the proof-generation process, the i-th CSP runs ProofGen to construct a single proof for one copy, which will cost · (T exp + T mul ). If |CT i | copies are stored in this cloud, then the cost is |CT i | · · (T exp + T mul ).
The CO will aggregate the r returned proof, which will cost r · T mul .
The CO obtaining the aggregated proof costs r 2 · T mul + r · T mul + 2 · T exp .
The total comparisons of the computational costs are listed in Table 3.
Experimental Results: In order to give an intuitive comparison of the three protocols. Here, we implement them in a laptop with Intel Core i5-6200U CPU @2.3GHz and 2GB RAM running Ubnutu 14.04 LTS 64-bit and Python 3.4. The experiments are within the framework of ''Charm'' [1] and we choose the 512-bit SS elliptic curve from pairing-based cryptography (PBC) library [14] as the basis of those protocols.
We evaluate the computational costs of tag-generation, proof-generation, and verification. In particular, we first choose a 10M data file and generate N = 20 copies files. Each copy consists of 600 blocks. We change the block count from 1000 to 6000 with an increment of 1000 in each test. The time-consumption of each instance is computed by repeating 100 times and taken the average time. The time-consumptions for tag-generation are depicted in Fig. 2.  The returned proof is run by CO, who aggregates the proofs generated from CSPs based on TPA's challenge message. Here, let the number of challenged blocks changes from = 30 to = 180 with an increment of 30. Each CSP stores 4 copies and hence r = 5. Then the time-consumptions for generating the responses are presented in Fig. 3. Finally, we implement the verification algorithm Verify, which is used by TPA to check the validity of the proof returned by CO. For different numbers of challenged blocks, the experiment results are depicted in Fig. 4.
Comparisons on Other Aspects: Finally, we compare the three protocols from other aspects, which is described in Table 4. Since KGC does not exist in [26], the key-escrow problem is not considered for it. Thus, we use ''−'' to denote this fact in Table 4.

VI. CONCLUSION
In this paper, we consider and design an efficient CL-MCMC-PDP protocol. In particular, we introduce the concrete security model for it and based on the famous CDH assumption construct a proven secure protocol. The performance analysis shows that our protocol is rather practical.