An Authorized Public Auditing Scheme for Dynamic Big Data Storage in Cloud Computing

Nowadays, to utilize the abundant resources of cloud computing, most enterprise users prefer to store their big data on cloud servers for sharing and utilization. However, storing data in remote cloud servers is out of user’s control and exposes to lots of security problems such data availability, unauthorized access and data integrity, among which data integrity is a challenging and urgent task in cloud computing. Many auditing schemes have been proposed to check the integrity of data in cloud, but these schemes usually have some disadvantages. One is that these auditing schemes cannot check which block is corrupt when the data is not integrated. The other is that there’s no efficient authenticated data structure helping to achieve accurate auditing when the data needs to update frequently. To solve the problems, we propose a public auditing scheme for dynamic big data storage in cloud computing. Firstly, we design a dynamic index table, in which no elements need to be moved in insertion or deletion update operations. Secondly, when data in cloud is not integrated, the third-party auditor can detect which block is corrupt. Finally, an authorization is employed between the third party and cloud servers to prevent denial of service attack. The theoretical analysis and the simulation results demonstrate that our scheme is more secure and efficient.


I. INTRODUCTION
Cloud computing is a fast-developing business computing model and has many advantages, such as large storage, low cost, and scalability. More and more enterprise users are apt to outsource their big data to cloud servers for storage and processing. After outsourcing, the enterprise users usually choose to delete their original big data from their local storage servers for saving storage space. Despite of the convenience of cloud computing, storing the data on cloud servers without possessing the original copy may bring about many security problems. Cloud servers are subject to hardware or software failures and malicious attacks occasionally [1]- [3]. And for their own benefits, cloud service providers (CSP) are reluctant to tell users the truth when the failures or attacks occur. Even worse, CSP might discard the data that the users are not or rarely accessed for saving maintenance cost or cloud storage space [4]- [6]. So CSP must provide proof to ensure the data is correctly stored on cloud servers before the data The associate editor coordinating the review of this manuscript and approving it for publication was Adnan Abid. is utilized by users. How to ensure the integrity of the data in cloud servers is a challenging and urgent task in cloud computing. Data auditing schemes can enable users to verify the integrity of their data in remote cloud servers without retrieving the data. Based on the role of verifier, data auditing schemes can be divided into two categories: private auditing and public auditing. In private auditing schemes [7]- [9], the data integrity is only verified by the users and accordingly increases the overhead of users that they cannot afford. While public auditing schemes allow any public verifier who has the user's public key to execute the auditing process. Commonly, a third-party auditor (TPA) who has expertise and capabilities is involved and executes the verification performance. Many auditing schemes have been proposed to check the data integrity in cloud. However, these schemes cannot check which block is corrupt when the data is modified. Furthermore, there's no efficient authenticated data structure helps to achieve accurate auditing when the data needs to update frequently. Therefore, it is essential to propose an efficient public auditing scheme for dynamic big data storage in cloud computing. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In this paper, we propose an authorized dynamic public auditing scheme by introducing a new data structure named dynamic index table (DIT). Through the DIT, our scheme can achieve dynamic updating without the elements' adjustments. Additionally, our scheme can judge which block is lost or corrupt when data integrity fails. Our contributions are summarized as follows.
1) We propose an authorized dynamic public auditing scheme that can check which block is corrupt. 2) We design an efficient authenticated data structure named dynamic index table (DIT), which is used to store block properties to help TPA achieve data auditing and can be updated without element moving. 3) We prove the security of the proposed scheme and evaluate the performance of computation and communication cost. The results show that our scheme is more efficient than the other ones. The rest of the paper is organized as follows. Section II introduces the related works on integrity verification. Section III describes the system model, threat model and the design goals of the scheme. Section IV addresses the preliminaries of the scheme. In section V, we present the proposed scheme in detail. Section VI and section VII demonstrates the security analysis and the performance of the scheme in computation and communication cost. Finally, we conclude this paper in section VIII.

II. THE RELATED WORKS
So far, many typical public auditing schemes have been proposed to verify the integrity of data stored in remote untrusted servers. In 2007, Ateniese et al. [10] proposed the first public auditing scheme which proposed provable data possession (PDP). This scheme allows any public verifier to check the data integrity without retrieving the data. However, this scheme can only verify static data integrity. Later Ateniese et al. [11] proposed another scheme based on the symmetric key PDP scheme to audit the dynamic data in cloud servers. This scheme supports dynamic modification and deletion operations, but does not support the insertion operation. To improve update efficiency, an authenticated data structure is always introduced. Erway et al. [12] introduced an authenticated skip list in his dynamic provable data possession (DPDP) scheme. Later, Wang et al. [13] proposed a dynamic public auditing scheme based on Merkle Hash Tree (MHT). The scheme can achieve dynamic data operations, but it would incur multitude computation and communication overhead during the verification process. In scheme [14], Zhu et al. introduced an index-hash table (IHT) stored at TPA side to help dynamic verification. Compared with other schemes, it is more efficient in computation and communication costs. However, in updating process, as IHT is a sequence data structure, it would cause an average of half adjustment of elements in IHT, resulting in the decrease of the system efficiency. In 2013, Yang and Jia [15] proposed an index table (ITable) to store the abstract information of blocks, including the current and original index number, current version number and time stamp of each block. It is efficient to prevent the replay attack, but in insert and delete operations, all the tags of blocks after deleted or inserted need to be recomputed as the indexes of these blocks are changed. Liu et al. [16] put forward an authorized public auditing scheme for big data with efficient verifiable fine-grained updates. Later in 2017, Tian et al. [17] proposed a Dynamic-Hash-Table based auditing scheme for cloud storage. In 2018, Gan et al. [18] designed an efficient and secure auditing scheme for outsourced big data with algebraic signature. Zhang et al. [19] proposed a cloud storage auditing for shared big data. In 2020, Lu et al. [20] propose an integrity verification scheme for Internet of Things (IoT) mobile terminal devices. In the scheme, block-tag generation and integrity verification operations are executed at third-party auditor (TPA) side, which achieves lightweight operations of data owners. However, the employed data structures in all the schemes cannot ensure the replay attack during integrity verification process. Therefore, it is crucial to develop a more secure auditing scheme for achieving dynamic integrity verification services. Table. 1 compares the scheme with other typical schemes in terms of dynamic auditing, batch auditing, data structure and authorized auditing. Nowadays, many other integrity verification schemes have been proposed and prompted the security development of cloud computing. Because the data stored in cloud for sharing can face many privacy challenge such as identity privacy and sharing data privacy. Scheme [2], [21]- [28] proposed privacy preserving auditing protocols to prevent privacy leaking. At the same time, with the development of Internet of things and mobile devices, lightweight schemes [29]- [35] are proposed to satisfying the efficiency needs of auditing process. In recent years, many schemes [36]- [40] based on identity encryption and attribute encryption are put forward to realize data sharing with other authorized users in cloud.

III. SYSTEM MODEL, SECURITY REQUIREMENT AND DESIGN GOALS A. SYSTEM MODEL
We describe the system model as illustrated in fig. 1. It involves four entities named enterprise user (user), Cloud Server Provider (CSP) and the Third-Party Auditor (TPA). The user generates and outsource massive amount of data to Cloud Servers (CS) which has large capacity to maintain the user's data. CSP manages the cloud servers and gives user access anywhere with an Internet service. TPA is an entity that is authorized by user and has much expertise and resources to verify data integrity efficiently. In the system, we assume that both TPA and CSP are semitrusted. TPA is semi-trusted because he may be curious about user's data. The scheme must preserve the outsourced data privacy from TPA. CSP is semi-trusted because when some data in cloud servers is corrupt or lost, CSP may launch forge attack or replace attack to TPA for economic reasons.

B. SECURITY REQUIREMENT
Public auditing. TPA can publicly verify the integrity of outsourced data for user.
Authorized auditing. Only the authorized TPA can launch auditing challenge to avoid replay attack.
Data Privacy. TPA cannot learn the content of data stored in cloud servers in public auditing process.
Unforgeability. Only the user can generate the block tags for auditing.
Storage integrity. The integrity verification can be achieved only if CSS correctly stores data blocks and the corresponding block tags.

C. DESIGN GOALS
Based on the system model and security requirements, our scheme should achieve the following properties.
Security requirements. The scheme should satisfy the security requirements including data privacy, authorization and unforgeability during integrity verification process.
Lightweight operations. Both the computation and communication costs of user are greatly reduced in our auditing scheme because TPA is responsible for generating block tags and managing DIT.
Effectiveness. The scheme should effectively achieve data auditing process under user's authorization.

A. NOTATIONS
The notations in this paper are described in Table. 2.

B. BILINEAR MAPS
Suppose G 1 ,G 2 are two multiplicative groups with same large prime order q, and G is a generator of G 1 . A bilinear map e is a map function e:G 1 ×G 2 →G 1 with the following properties: i) Computability. ∀u,v∈G 1 , an efficient algorithm exists to compute e(u, v). ii) Binearity. ∀a,b∈Z q , ∃e u a ,v b =e (u,v) ab . iii) Nondegeneracy. e [g,g] = 1. iv) Security. It is hard to compute Discrete Logarithm (DL) in G 1 .
C. COMPLEXITY ASSUMPTIONS 1) Discrete Logarithm (DL) Assumption. Suppose g is a generator of multiplicative cyclic group G with prime order q. On input y ∈G, there does not exist probabilistic polynomial time algorithm that outputs a value x ∈ Z * q such that g x = y with non-negligible probability.

2) Computational Diffie-Hellman (CDH) Assumption.
Suppose g is a generator of multiplicative cyclic group G with prime order q. On input g x ,g y ∈ G, there does not exist probabilistic polynomial time algorithm that outputs g xy ∈ G with non-negligible probability.

V. CONSTRUCTIONS OF SECURE DATA SHARING SCHEME A. DYNAMIC INDEX TABLE
To achieve the public integrity verification efficiently, an authenticated data structure named Dynamic Index Hash i is mainly used to check which block is corrupt when data is not integrated. T i and V i are used to avoid attacks from adversaries. Next i points to subordinate of next block for connecting the file together. For example, Next 1 is 2 means the next block data of m 1 is m 2 . Next n is 0 means m n is the final block data of the file. The initial DIT information is described as Table. 3. When block m i is deleted, Next i−1 is set to value i + 1 from the obvious value i, which means the next block data of m i−1 is m i+1 . Moreover, Next i is set to value −1, indicating that m i is deleted from the file F and a new element information can be stored here. Table. 4. Describes modified DIT after m i is deleted. When a new block m i is inserted after m i−1 , Next i−1 is changed to i and previous Next i is changed to i + 1. If there are no ineffective lines, the information of m i can be added in the last position only with the corresponding static pointer are updated. Table. 5. Describes modified DIT after m i is inserted after block data m i−1 .

B. DETAILED INTEGRITY VERIFICATION SCHEME
The efficient and secure auditing scheme consists of three phases including setup phase, integrity verification phase and dynamic update phase. The three phases are described in detail as follows.

1) Setup phase
In this phase, KGC generate system parameters and keys for user and TPA in algorithm Initial. The user is responsible to divide big data into blocks and blinds each in algorithm BlockBlind. TPA is in charge of generating block tags in algorithm TagGen and deriving DIT in algorithm DITGen. The user computes challenge authority for TPA in algorithm AuthorityGen.
The dataflow in each algorithm of this phase is described in fig. 2.  G 2 , p, g, e) where G 1 , G 2 are multiplicative groups with prime order p, g is a generator of G 1 and e is a bilinear map e:G 1 ×G 1 → G 2 . KGC selects sk∈ Z * p as user's private key and computes pk = g sk as the corresponding public key. Then KGC selects α∈ Z * p as TPA's private key and computes w = g α as the corresponding public key. Next, KGC chooses secure one-way hash functions π : Finally, KGC sends sk to user with identity Uid, α to TPA securely and make {G 1 , G 2 , p, g, e, π, h, H , w, pk} public. BlockBlind (F,SK) → M. The user first divides F with identifier Fid into n data block named m * i by erasure code algorithm. To keep the data private to others, user blinds each block before outsourcing F to CSS. The user selects r i ∈ Z * p , i ∈ [1,n] randomly and computes δ i = g r i . Then user blinds each block as m i = m * i + π (σ i ) and denotes M = {m i } i∈ [1,n] . Furtherly, user divides each block m i into s sectors. That means Then TPA sends W = (Fid, M , σ i ) to CSP. DITGen F info → DIT . TPA generates DIT including Bid i , Hash i , T i , V i , Next i and stores it locally for dynamic updates later. To save space, then TPA deletes m i from local server. AuthorityGen (sk) → sig . Only authorized TPA can launch auditing challenge to prevent malicious attackers from generating denial-of-service attacks on CSP. The user with identity Uid randomly selects x ∈ Z * p and computes y = g x . Then user generates authorization for TPA to launch auditing challenge as follows.
Finally, the user sends sig to TPA. 2) Integrity verification phase In this phase, TPA first generates a challenge and sends it to CSP in algorithm ChallGen. Next, CSP computes integrity proof and sends it to TPA for verification in algorithm ProofGen. Then TPA verifies whether the data is intact through the proof in algorithm ProofVerify. The dataflow in each algorithm of this phase is described in fig. 3. ChallGen F info . When TPA gets the verification delegation from the user, he selects some blocks to construct a random c-element subset C from set [1, n] and generates random numbers l i ∈ Z * p , i ∈ C. Then TPA sends the challenge Chall = {sig, (i, l i ) , Fid, Uid} , i ∈ C to CSP. ProofGen (F, T , Chall) : On receiving the challenge, CSP verifies the equation sig = pk · y H (Uid) . If it fails, it outputs NO, otherwise, CSP computes tag proof and data proof as follows.  ProofVerify (P, w) → {1, 0} . After receiving the proof P from CSP, TPA verifies the proof P as follows.

3) Integrity verification phase
The user can update the data outsourced to the cloud whenever needed. The user can execute insertion, deletion and modification operations on block level. Algorithm BlockInsert executes block insertion BlockDelete realizes block deletion. Block modification can be executed with algorithm ockModify. The dataflow in this phase is described in fig. 4.  After each updating, the user delegate TPA to verify the update block. When the verification is passed, the user chooses to delete the local data.

C. BATCH AUDITING FROM MULTIUSERS
Batch auditing can concurrently process multiple verifications from different users. Suppose U is collection of k different users. When receiving k challenges from k users, CSP computes tag proof S i , i ∈ [1, k] and data proof D i , i ∈ [1, k]. Then CSP gets S U and D U by aggregating S i and D i respectively according to the following equations: When receiving the proof S U and D U from CSP, TPA checks the proof through the following verification equation: If the equation holds, it outputs YES, meaning all the files of the k users are correctly stored on cloud servers. Otherwise, it outputs NO, meaning one or more files are corrupt.

VI. SECURITY ANALYSIS
In this section, the security of the proposed scheme, including correctness, unforgeability and privacy is analyzed.
Theorem 1: An authorized public verifier can correctly verify the integrity of the file stored in cloud servers in our scheme.
Proof: Theorem 1 can be proved through verifying the correctness of eq. (5). The proof is as follows: From the proof of eq. (5), TPA can verify the integrity of the file outsourced to the CSP.
Theorem 2: It is computationally impossible for CSP to forge an integrity proof to pass the public verification, if the Computational Diffie-Hellman (CDH) problem is hard in bilinear group.
Proof: After CSP receives the challenge Chall = {sig, (i, l i ) , Fid} , i ∈ C from TPA, he should send the correct proof P = {S, D} to TPA. Instead, suppose CSP generates an incorrect proof P = S, D to TPA, where It is obvious at least one λ j is nonzero. If CSP can pass the verification with P , the CSP wins the game, otherwise, it fails.
Suppose CSP can win the game, the following equation can be inferred according to eq. (5).
where µ j , ν j ∈ Z p . Then we have the following.
Obviously, a solution to the DL problem can be found. The value x can be computed as follows unless λ j is zero.
However, at least one λ j is defined nonzero and ν j is a random element of Z p , which means the probability of ν j being equal to zero is 1 p. Therefore, we can find a solution to the DL problem with a probability of 1 − 1 p, which is conflict with the suppose that the DL problem is hard in G 1 . This is the proof of the theorem 2.
Theorem 3: As long as the DL assumption holds, it is computationally infeasible for TPA to get any private data during the integrity verification.
Proof: After CSS gets the challenge Chall from TPA, he sends D = s j=1 u i∈C l i ·m ij j to TPA as the data proof. Because i∈C l i · m ij is at the exponent position of D, according to DL assumption, TPA cannot get any information on the user's private data.

VII. PERFORMANCE EVALUATION A. COMMUNICATION COSTS
According to the proposed scheme, in setup phase, the main communication cost is generated between user and TPA and between TPA and CSP. Suppose an element's size of Z p is |p|. In algorithm BlockBlind, after user blinds each block, he sends F info = m ij , t i , v i to TPA. Therefore, the communication cost is n |p| + n (|t i | + |v i |), where |t i |, |v i | are size of t i and v i . In algorithm TagGen, TPA sends W = (Fid, M , σ i ) to CSP. Therefore, the communication cost is 2n |p| + 1. In integrity verification phase, the main communication cost is mainly generated between TPA and CSP. When launching a challenge in algorithm ChallGen, TPA sends Chall = {sig, (i, l i ) , Fid, Uid} to CSP and the main communication cost is c (|i| +|p|) bits, where |i| is the size of the block index. In algorithm Proof Gen, CSP sends P = {S, D} to TPA and the communication is 2|p|, which is constant and can be ignored. In updating phase, the communication cost between the user and TPA and between the TPA and CSP is a constant. We compare our scheme with scheme [12]- [14], [17] in the complexity of communication costs as Table. 6. From the table, it can be concluded that the communication cost of our scheme is more efficient than Wang's and the same as Zhu's.

B. STORAGE COSTS
Our scheme consists of three phases and the storage costs mainly generate in setup phase. Suppose the file outsourced to cloud named F includes n data blocks and each block size is |p|. In algorithm BlockBlind, user transfers F info = m ij , t i , v i to TPA. In algorithm DITGen, TPA generates DIT including Bid i , Hash i , T i , V i , Next i . To save space, TPA deletes m i from the local server. Therefore, the total storage cost of TPA in setup phase is nl 3 , where l 3 = |Bid i | + |t i | + |v i | + |Hash i | + |Next i | and indicates the bit size of each element of DIT. In algorithm TagGen, TPA sends W = (Fid, M , σ i ) to CSP and CSP stores W . Therefore, the main storage cost of CSP in setup phase is 2n |p| which is mainly generated by block data M and block tag σ i . In scheme [14], Index Hash Table (IHT) is used to indicate the changes of blocks and generate hash block value during integrity verification process. In IHT, B i , V i and R i respectively represent block number, version number and random value. Therefore, the total storage cost of TPA is nl 1 , where l 1 = |B i | + |V i | + |R i | indicating the bit size of each element of IHT. In scheme [17], Each block element is one node of the file list, including the block version v i , time stamp t i and a pointer indicating the next node next i . Accordingly, the total storage cost of TPA is nl 2 , where l 2 = |v i |+ |t i | +|next i | representing bit size of each element of Dynamic hash table (DHT). The storage costs of the scheme is evaluated and compared with scheme [12]- [14], [17] as described in Table. 7. Although the size of l 3 is a bit larger than l 1 and l 2 , DIT is more secure than IHT and DHT because of the employment of hash value of each block.

C. COMPUTATION COSTS
In this section, we will evaluate the computation time of the scheme with experiments and compare it with the Zhu's scheme [14]. The scheme simulates on a Linux system with an Intel Core i5 1.60GHz processor and 1G RAM. The Pairing based Cryptography (PBC) library of version 0.5.14 is used to implement our simulation. Furtherly, in the experiment, an MNT d159 curve with 160-bit group order is utilized. All the experiment results represent the average of 20 trials. 1) Computation time of the user in setup phase In our experiment of setup phase, the computation time by different numbers of blocks is tested with the max block size of 1KB. From fig. 5, it can be concluded that the user's computation time is proportional to the number of blocks, and the computation cost of our protocol is lower than Zhu's. VOLUME 8, 2020

2) Computation Cost in Verification Phase
In verification phase, the relationship between computation costs and the block size is tested with the same file size of 1MB. In the simulation, the challenged block number accounts for 20% of the total block number. From fig. 6, it can be concluded that with the block size increasing, the verification cost of our scheme is decreasing. However, the verification time in Zhu's scheme is increasing, because the verification equation in Zhu's scheme has relation with the sectors of each block.

3) Computation cost in update phase
In the experiment of update phase, it supposes that the max block size is 1KB. The update time with file size from 1MB to 50MB is tested respectively. From Fig. 7 and Fig. 8, it can be conclude that either in insertion operation or in deletion operation, our scheme is more efficient. In Zhu's scheme, as IHT is a sequence data structure, it would cause an average of half adjustment of elements, resulting in the update operation efficiency's decrease. While in our scheme, only the static pointer needs to be changed without any moving of elements.

VIII. CONCLUSION
The paper proposes an efficient dynamic auditing scheme for outsourced data in cloud servers. In the scheme, a dynamic index table (DIT) where no elements need to be moved in insertion or deletion update operations is designed to improve data update efficiency. Furtherly, when file in cloud is not integrated, TPA can detect and recover the corrupt block. Moreover, an authorization is used between users and cloud servers to prevent denial of service attack. The scheme can achieve authorized and efficient secure integrity verification for big data in clouds and the simulation results demonstrate that the scheme costs less communication and computation than the previous schemes.
For further work, we should point out that the efficiency and security of the integrity verification scheme can be furtherly developed, because they are most important issues in cloud storage of big data. For efficiency, we should minimize the communication costs between users and cloud servers to improve integrity verification speed. Moreover, storage cost in cloud server should also be considered. For security, the privacy of user data should be emphasized, because privacy is another key point in data security of cloud computing. Efficiency and security are two important directions of our future work.