Dynamic Multiple-Replica Provable Data Possession in Cloud Storage System

In cloud storage scenarios, data security has received considerably more attention than before. To ensure the reliability and availability of outsourced data and improve disaster resilience and data recovery ability, important data files possessed by users must be stored on multiple cloud service providers (CSPs). However, we know that CSP is invariably not reliable. In this situation, to verify the integrity of replica files stored by users on multiple CSPs simultaneously, a new dynamic multiple-replica provable data possession (DMR-PDP) scheme is proposed. In addition, due to the importance of the tag set, we utilize vector dot products instead of using the modular power calculation in the traditional PDP scheme, which greatly reduces the calculation time and storage space usage. Moreover, a novel dynamic data structure, the divided address-version mapping table (DAVMT), is presented and used to solve the problem of data dynamic operation. A practical experiment validates the effectiveness of our proposed scheme in the end.


I. INTRODUCTION
Cloud computing [1], [2], as the new generation of distributed computing, parallel computing and grid computing, has achieved rapid development and wide application since it was proposed. The advantages of cloud computing, such as scalability, on-demand service, and high-reliability flexibility, have attracted a large number of users. However, the increasing security issue has become the primary obstacle that restricts the development of cloud computing.
As we mentioned before, cloud storage [3], [4] is an extension of the concept and application of cloud computing and has been developed continuously as a new type of network storage technology since it was put forward. Cloud storage is essentially a cloud computing system with data storage and management as the core. Therefore, ensuring the security of outsourced data is also the primary problem restricting the development of cloud storage systems. The security of data storage consists of three aspects: confidentiality, integrity and availability (CIA). Confidentiality guarantees that unauthorized users cannot access the data, integrity guarantees unauthorized users cannot modify the data illegally, and The associate editor coordinating the review of this manuscript and approving it for publication was Jun Huang . availability guarantees legally authorized users can access the data in a timely, accurate and uninterrupted manner. In our study, the integrity of data storage security is our main focus, and the details are discussed as follows.
In the cloud storage scenario, when a user uploads the local data to the cloud, the control over the outsourced data may be totally lost; thus, data integrity becomes a problem. To verify the integrity of the outsourced data, the PDP scheme [5] was proposed in 2007. In the PDP scheme, the DO (data owner) calculates a set of homomorphic tags for the outsourced data, uploads them together with the encrypted file to the CSP, and deletes the local file on the premise of keeping the secret key. When the DO needs to verify the integrity of the data stored on the cloud, he/she will send a challenge to the CSP, the CSP responds to the challenge, and the DO verifies the response. Different from the traditional integrity verification scheme, in the PDP scheme, the DO uses a sampling method with probability in the integrity verification phase. However, as the PDP scheme in [5] is only applicable to static data and cannot realize the data's dynamic operation (such as update, append), some dynamic PDP schemes are proposed [6]- [11].
To verify the integrity of replica files stored by users on multiple CSPs at the same time, MR-PDP [12] is proposed, which proves that integrity verification on multiple replica files simultaneously has a much higher efficiency compared with multiple integrity verification on a single replica. In the MR-PDP scheme, the DO calculates the tag set for the source file. This scheme provides a method to recover the data of the corrupted replica. If the CSP discards some data for some reasons (the CSP discards some uncommon files to save space or file loss due to server downtime), the DO can recover the discarded data with the help of the rest of the replicas, but this scheme cannot achieve the data's dynamic operation and does not support public verification. In [13], the authors proposed the MB-PMDDP scheme for the dynamic operation of replica files, in which the tag generation method is completely different from [12]. With the help of the MVT table, the MB-PMDDP scheme realizes the data's dynamic operation. However, because the tag set is closely related to the replicas, if some replicas are discarded or damaged, it inevitably leads to failure in the integrity verification phase, and the damaged replicas cannot be located. In [14], combining chaotic mapping with an AVL tree, this scheme realizes the data's dynamic operation. With the help of the AVL tree, this scheme greatly improves the performance of the data block search and reduces the difficulty and overhead of the data's dynamic operation. However, the tag generation method is similar to [13].
In the above three schemes, the tag generation method all requires complex modular exponentiation, which greatly increases the system's computational cost. In [15], [16], the authors propose their own schemes based on lattice vectors and algebraic signatures. In addition, for different problems, different schemes are put forward [17]- [21]. Therefore, how to design the tag generation method to save system calculation time and storage space usage and how to realize the dynamic operation of data blocks have become problems worthy of consideration.

A. MAIN CONTRIBUTION
In this paper, we propose a new dynamic multiple-replica provable data possession (DMR-PDP) scheme that can verify the integrity of replica files stored by users on multiple CSPs simultaneously. Compared with the previous schemes, our scheme has better computational cost performance and can save system storage space. The main contributions can be highlighted as follows: (1) We present a dynamic multiple-replica provable data possession scheme, named DMR-PDP, which can verify the replica files' integrity simultaneously stored on multiple CSPs. In this scheme, the method of tag generation is based on the vector dot products instead of the modular power calculation, which greatly reduces the calculation time and the storage space.
(2) To realize the dynamic operation of data blocks, we propose a novel dynamic data structure, the divided addressversion mapping table (DAVMT). With the help of DAVMT, the problem of the data block's update operation can be solved. (3) By comparing the experiment with MB-PMDDP, we verify the validity of the proposed scheme.

B. PAPER ORGANIZATION
The rest of the paper is organized as follows. In section II, we propose the system model and system components. The proposed scheme is explained in section III. The security analysis is shown in section IV. The experiment is presented in section V. Section VI is the summary of this paper.

II. PROBLEM STATEMENT
Here, the architecture and components of the system are described in detail.

A. SYSTEM MODEL
The system architecture is shown in Fig. 1 and consists of three entities: (1) Data Owner (DO): Individuals/institutions/organizations who hold private data and need to store these data in the cloud.
(2) Cloud Service Provider (CSP): Corporation that provides data storage services for DOs.
(3) Authorized Users: Users who are authorized by DOs and have access to private data stored with the CSP. -KenGen: this algorithm is run by the DO to generate the public key (pk) and the secret key (sk).
-CopyGen: this algorithm is run by the DO to generate replica files, where the source file is F = {c j } 1≤j≤m , the encrypted file isF = {b j } 1≤j≤m , the number of replicas is n and the replicas set is represented asF = {F i } 1≤i≤n .
-TagGen: this algorithm is run by the DO to create the tag set. InputF and sk, and output T = {σ j } 1≤j≤m .
-Pr epareUpdate: this algorithm is run by the DO to update the replica files.
-ExecUpdate: this algorithm is run by the CSP. After receiving the update operation instruction from the DO, the VOLUME 8, 2020 CSP performs the update operation on the specified data block.
-Challenge: this algorithm is run by the DO/Verifier to verify the integrity of the replica files.
-GenProof : this algorithm is run by the CSP. InputF, T and chal, and output P.
-VerifyProof : this algorithm is run by the DO/Verifier. Input P, pk and chal, and output {0/1}. 1 indicates validation is passed, and 0 indicates a failure.

B. DIVIDED ADDRESS-VERSION MAPPING TABLE
In this paper, we proposed a novel divided address-version mapping table (DAVMT), which is a dynamic data structure, to implement update operations. With DAVMT, the DO can achieve the update operation of data blocks.
DAVMT contains two columns: the logic index (LI j ) is the logical index of the data blocks, and the block version (Ver j ) is the current version of the data block. When the data block is updated, the corresponding Ver j adds one, and the initial Ver j of all data blocks is 1. According to the total number of data blocks, the DAVMT can be divided into several child-tables. Note that DAVMTs are stored on the DO, and are irrelevant to the number of replicas. For example, assuming that the file has 18 data blocks, we can divide 3 child tables (DAVMTs) to realize the dynamic update operation, which is shown in Fig. 2.

C. SPECIFIC ALGORITHMS
KenGen :ê : G × G → G T is a bilinear pairing map, and g is a generator of G. The DO selects a group of random numbers α = (α 1 , α 2 , . . . , α s ), α k = α k , k ∈ [1, s], where s is the sector number of a block. Define g k = g α k = g α k , k ∈ [1, s]. K s , K 1 and s o are the key of PRP, K s is used in the replica  generation phase, K 1 is used in the challenge phase, and s o is used to generate the block tag. K 2 is the key of PRF and is used in the challenge phase. The DO runs KenGen to generate the public key pk = (g 1 , g 2 , . . . , g s , K 1 , K 2 ), and the corresponding secret key sk = ( α, K s , s o ). The DO keeps sk secret and publishes pk.
CopyGen : The DO runs this algorithm to generate replica files. For a file, F = {c j } 1≤j≤m consists of m data blocks (every block consists of s sector, c j = {c jk } 1≤k≤s ), the DO encrypts the file and the encrypted file isF = {b j } 1≤j≤m (b j = E K (c j ||j)) and uses theF to generate the replica filesF i = {u ij } 1≤i≤n,1≤j≤m , where the minimum unit of every replica is u ijk (u ijk = b ijk + r ijk , r ijk = ψ K s (i||j||k) 1≤k≤s ). From Fig. 3, we can learn how the replica files are generated.
TagGen : The DO runs this algorithm to generate the tag set. Note that the tag set is generated byF and is independent of the number of replicas. We use vector dot products to generate block tags, which can save time for tag generation compared with modular exponentiation. First, DO carries out the PRP operation for α, and the result is β j = PRP s o (j, α). Then, the DO computeŝ β j k · b j k and the block tag for b j is σ j = H(LL j Ver j j) ·σ j ; thus, the tag set is T = m j=1 σ j . Finally, DO uploads all replicasF i and the tag set T to the CSP, keeps sk secret and deletes the local file.
Pr epareUpdate : the DO runs Pr epareUpdate algorithm to update data blocks. The detailed steps are in D. Update operations.
ExecUpdate : After receiving the update instruction of the data block, the CSP runs the ExecUpdate algorithm. The detailed steps are in D. Update operations. Challenge : The DO/Verifier runs the Challenge algorithm to verify the integrity of the replica files. The DMR-PDP scheme provides two data integrity verification methods. The first method verifies the integrity of all replica files; the second method verifies the integrity of the data blocks specified in the replica files and is discussed in the proposed scheme. The DO/Verifier sends chal = (R, c, K 1 , K 2 ) to the CSP, where R is the challenged replicas set and c is the number of challenged blocks. K 1 , K 2 are two fresh keys selected by the DO/Verifier in every challenge phase. K 1 is a key of PRP(π ) to generate random indices, and K 2 is a key of PRF(ψ) to generate random values, where j = π K 1 (l) 1≤l≤c , {ω j } = ψ K 2 (l) 1≤l≤c . Both the DO and the CSP use K 1 , K 2 to generate the challenge query set Q = {(j, ω j )}.
GenProof : The CSP runs the GenProof algorithm to prove the data blocks' integrity. After receiving chal from the DO/Verifier, the CSP uses K 1 , K 2 to generate the challenge query set Q = {(j, ω j )}. Then, CSP generates the proof µ = s k=1 g i∈R (j,ω j )∈Q ω j u ijk H(LI j Ver j j) k and σ = (i,ω j )∈Q ω j σ j to the DO/Verifier. The challengeresponse process between the DO/Verifier and the CSP is given in Fig. 4.
VerifyProof : After receiving the response from the CSP, the DO/Verifier runs the VerifyProof algorithm to check whether the following equation holds: e(g, g |R|σ )·e(g, |R| is the size of the challenged replicas set. If equation (1) holds, the output ''1'' indicates that the CSP passes the check; Otherwise, the output will be ''0''. The demonstration process is as follows, e(g, g |R|σ ) · e(g, s k=1 g i∈R (j,ω j )∈Q ω j r ijk H(LI j Ver j j) k ), as shown at the bottom of the next page.

D. UPDATE OPERATIONS
For updating data blocks, the DO sends an instruction to the CSP, and the CSP performs update operations. We denote the update operations by BU.
For an encrypted fileF = {b j } 1≤j≤m , assume that the DO wants to update a block b j with b j , the DO runs the Pr epareUpdate algorithm, and the steps are as follows: (1) The DO searches the logic index LI j of the b j in DAVMTs and updates Ver j = Ver j + 1; (2) The DO creates a new block b j , where b j = E K (c j j) has s sectors; then, the DO computes u ij for everyF i , where u ijk = b ijk + r ijk , and r ijk = ψ K s (i j k) 1≤k≤s remains unchanged.
(3) The DO computes the tag σ j of b j as follows: (4) The DO sends the update instruction < BU , j, {u ij } 1≤j≤n,1≤j≤m ,σ j > to the CSP (this instruction means that the DO wants to modify b j , j indicates the location of b j , b j indicates a new block, and σ j is the tag of the new block).
After receiving the instruction, CSP performs the ExecUpdate algorithm to update the block.  (1) CSP replaces u ij with u ij (u ij → u ij ), σ j with σ j (σ j → σ j ).
(2) CSP returns ''done'' instruction to DO. An example of update operations with the help of DAVMTs is shown in Fig. 5.

E. FIND AND RECOVER CORRUPTED REPLICA
In the challenge phase, if the CSP fails to pass the integrity verification, it indicates that there are corrupted blocks in the replica files.
How are corrupted replicas found? When finding the CSP cannot pass the integrity verification in the challenge phase, the proposed scheme makes it possible to find the corrupted replicas and corrupted blocks. Because our scheme can challenge any number of replicas, we just need to adjust chal = (R, c, K 1 , K 2 ) several times and use the recursive divide-and-conquer method to lock the corrupted replicas and corrupted blocks.
How can corrupted replicas be recovered? The DO retrieves a correct replica and then recovers the corrupted replica with this correct replica. Assuming that the corrupted replica is q, u ijk is the correct block. The steps are as follows: (1) The DO computes b jk = u ijk − r ijk ; (2) The DO computes u qjk = b jk + r qjk ; (3) The DO sends the corrected replica to the CSP.

IV. SECURITY ANALYSIS
In this section, we analyze the correctness and security of our proposed scheme. In this paper, we assumed that the DO is fully trusted, but the CSP is not trusted and may maliciously corrupt data blocks. = e(g, µ) Proof: The Verifier checks the proof received from the CSP. Based on the properties of bilinear mapping, if the output of equation (1) is ''1'', the correctness of our scheme is illustrated.

Theorem 2 (Resisting Collusion Attack):
The CSP cannot convince the DO to believe that they store all replicas, but they actually store only one replica.
Proof: In our scheme, the DO generates n replicas and stores them on the CSP, but the CSP cannot know the content of the replicas and only executes the storage service. Assuming that the CSP stores only one replica, it cannot pass the validation of the verifier in the verification phase. That is, the CSP cannot convince the DO to believe that they store all replicas, but they actually store only one replica.
Theorem 3 (Resisting Replace Attack): In the challenged phase, if the CSP uses another valid and uncorrupted data block to generate P instead of the challenged blocks, it cannot pass the Verifier's verification in the verification phase.
Proof: In the replica generation phase, the minimum unit for every replica is u ijk and u ijk = b ijk + r ijk , where r ijk = ψ K s (i j k) 1≤k≤s is a random value related to index i, j, k. In the challenge phase, assume that the CSP replaces the index j with challenge block j, r ijk changes into r ij k and u ijk changes into u ij k correspondingly. In the proof generation phase, the CSP runs the GenProof algorithm to generate P = (µ , σ ) and response the Verifier, but it cannot pass the validation of the verifier in equation (1).
Theorem 4 (Detectability): Our verification scheme is ( m n , 1 − ( n−1 n ) c ) detectable if the CSP stores the file with n blocks, including m bad blocks (some corrupted or discarded blocks), and c blocks are challenged.
Proof: Assume that the CSP stores the file with n blocks, including m bad blocks. The number of challenged blocks is c. The bad blocks can be determined if and only if at least one of the challenged blocks chosen by the verifier matches the bad blocks. We use a discrete random variable X to denote the number of blocks selected by the challenger to match the bad block. We use PX to denote the probability that at least one bad block in the challenge set will be detected. So We can obtain PX ≥ 1 − ( n−1 n ) c . Therefore, our scheme is ( m n , 1 − ( n−1 n ) c ) detectable if the CSP stores the file with n blocks including m bad blocks, and c blocks are challenged.

V. EXPERIMENT
All experiments use OpenSSL (1.1.1d) and are conducted on a Windows 10 operating system with a 3.30 GHz Inter(R) Core (TM) i5 processor and 16 GB RAM. We use the type A elliptic curve of the PBC with 160-bit group order, and the security level can be equivalent to the 1024-bit RSA. The  This section demonstrates the feasibility of our DMR-PDP scheme through experiments. For better evaluation, we choose the MB-PMDDB [11] scheme for comparison.
In the replica and tag processing phase, we generate tag set based on encrypted files and is irrelevant to the number of replicas. Fig. 6 shows the difference between DMR-PDP and MB-PMDDB [11] on the cost of tag generation when the number of replicas gradually increases. Due to different tag generation methods, regardless of the number of replicas, our DMR-PDP scheme takes approximately the same time. However, in the MB-PMDDB scheme, the time cost of tag generation increases with the number of replicas, and the relationship between them is linear. Therefore, our DMR-PDP scheme is more efficient than the MB-PMDDB scheme in terms of tag generation. Fig. 7 displays the cost of verification time of the DMR-PDP and MB-PMDDB schemes when the number of replicas is different. Our scheme is slightly higher than the MB-PMDDB on the time cost, but the increase is tolerable.
Comparing the experiment with the MB-PMDDB scheme, we can verify the validity of the scheme proposed in this paper.

VI. CONCLUSION
At present, data security has become a hot research topic in cloud storage scenarios, and data integrity has become increasing worthy of attention. To verify the integrity of replica files stored by users on multiple CSPs simultaneously, VOLUME 8, 2020 we proposed a new DMR-PDP scheme in this paper. Furthermore, we provided a new method for generating the tag set by utilizing vector dot products. Then, with the help of the dynamic data structure DAVMT, the problem of the data's dynamic operation was solved. Finally, the effectiveness of our proposed scheme was validated. In future work, we consider proposing a better data structure to achieve full dynamic operations. In addition, how to generate tag set to save calculation time and storage space is still a problem to be considered.