Data Integrity Audit Scheme Based on Blockchain Expansion Technology

Increasing numbers of users are outsourcing data to the cloud, but data integrity is an important issue. Due to the decentralization and immutability of blockchain, more and more researchers tend to use blockchain to replace third-party auditors. This paper proposes a data integrity system based on blockchain expansion technology that aims to solve the problem of high cost for blockchain network maintenance and for user creation of new blocks caused by the rapid growth of blocks in the data integrity audit scheme of existing blockchain technology. Users and cloud service providers (CSP) deploy smart contracts on the main chain and sub-chains. Intensive and frequent computing work is transferred to the sub-chain for completion, and the computation results of the sub-chain are submitted to the main chain periodically or when needed to ensure its finality. The concept of non-interactive audit is introduced to avoid affecting user experience due to the communication with the CSP during the audit process. In order to ensure data security, a reward pool mechanism is introduced. Comprehensive analysis from aspects such as storage, batch auditing and data consistency proves the correctness of the scheme. Experiments on the Ethereum blockchain platform demonstrate that this scheme can effectively reduce storage and computational overhead.


I. INTRODUCTION
Cloud computing is a distributed computing model based on a large shared virtualized computing resource pool, it helps users use powerful computing and storage resources. And it can greatly reduce the burden of data storage on hardware and software for users, which encourages many enterprises and individuals to store their data on cloud servers [1].
Despite the great success of cloud storage, it also faces various challenges [2]- [4], and its security, reliability and privacy have always been a serious issue [6], [7]. After the user stores the data on the cloud server, the server provider may damage or delete the user data due to various factors [12], verifying the integrity of outsourced data becomes a crucial issue in cloud storage. Remote data integrity audit technology is very convenient and safe to help users check the integrity of data stored in outsourced [5], [28]. Therefore, the essence of cloud data security is how cloud storage providers (CSP) can establish trust with users. Cloud device failures, illegal attacks, and CSPs may be bribed to view user data, all The associate editor coordinating the review of this manuscript and approving it for publication was Thanh Ngoc Dinh . of which can lead to illegal infringement of user data. Furthermore, even if the user data is damaged, the user may not be able to hold the CSP accountable effectively, since the CSP may evade responsibility and deny it [16]. This is due to the lack of trust between the two parties, resulting in the party being questioned being unable to come up with evidence that would convince the other party. In addition, the current law on cybersecurity is not sound, which makes it difficult for users to obtain due compensation [18].
In traditional cloud auditing schemes, there is an entity called auditors (often referred to as third-party auditors, or TPA) which implement public audits [8], [21]. The TPA accept audit mandates from data owners and perform as instructed. In each of these methods, a trusted Third Party Auditor (TPA) must be found to assist the user in auditing, but in reality it is difficult to find fully trusted third-party auditors. For example, TPA will also partner with CSP for some ulterior purpose to hide data corruption, or with data owners to avoid penalties.
The emergence of blockchain can solve this problem very well. Blockchain has the properties of decentralization, tamper resistance, consistency and traceability. Therefore, information stored on the blockchain is open and transparent. In recent years, more and more researchers use blockchain to replace third-party auditors [9], [10]. Although the use of blockchain as a trusted third-party auditor can well address users' concerns in cloud computing environments, but the rapid growth of blocks will lead to high cost for blockchain network maintenance and for user creation of new blocks [17].
To solve the above problems, a data integrity verification scheme based on block chain expansion technology is proposed. By slowing the growth of the block chain, reducing storage and calculation costs. In particular, our contribution can be summarized in three aspects: 1) A data integrity audit protocol based on plasma smart contracts is proposed. By introducing plasma sub-chains and deploying smart contracts on the main chain and sub-chains, the storage pressure of the main chain can be reduced and the growth rate can be slowed down through this protocol. TPA audit protocol can be executed with low computational and communication overhead. 2) A batch auditing scheme is proposed, the scheme can batch-process multiple audit tasks at the same time. In order to avoid affecting the user experience due to the communication with the CSP during the audit process as much as possible, the concept of non-interactive audit is introduced. For the sake of ensuring the correctness of the audit, the reward pool mechanism is adopted, and the verification node can obtain reasonable rewards. 3) An analysis of the security of the scheme shows that it can achieve the expected security objectives. Numerous experiments on the ether block chain also showed the efficiency and effectiveness of the scheme. This paper is organized as follows: Related work is presented in Chapter 2. The system model and design objectives are described in Chapter 3. The detailed description of the scheme is in Chapter 4. Chapter 5 contains an analysis of the security system. Chapter 6 discusses the performance of this experimental method. Chapter 7 is the conclusion of the paper.

II. RELATED WORK A. BLOCKCHAIN
In view of the decentralization, tamper-proofness and traceability of blockchain technology, some blockchain-based data integrity auditing methods have been explored [18], [19]. Fan et al. zhe'ge [11] replaced the TPA with a smart contract, and the user signed an agreement with the CSP to prevent one party from denying it.. The data owner obtains the hash of the remote data through the block identifier and compares it with the hash value previously stored in the blockchain ledger. Obviously, this scheme cannot resist the replay attack carried out by the CSP. Yu et al. [13] decentralize the data without any TPA in their scheme. Their solution is effective against replay attacks due to the random challenge set generated in each audit request. To defend against dishonest provers and verifiers, Xu et al. [20] proposed an arbitrable data audit protocol that supports exchange hashing. Existing cloud storage service providers (CSP) may not have a fair compensation for users even if they damage data, and CSP may store redundant and duplicate data. Yuan et al. [24] proposed a deduplication scheme with public audit and fair arbitration.

B. BLOCKCHAIN EXPANSION
Although the emergence of blockchain has many advantages in data integrity auditing, with the increase of the number of users, the transaction throughput of the blockchain system will be seriously insufficient, and the storage burden on the blockchain is bound to increase. To address the above issues, the authors of conduct an extensive classification and comparison of blockchain scalability solutions [23], [25]. Zhou et al. [26] proposed a solution for blockchain scalability. The existing expansion schemes are designed to improve different layers, and are divided into layer-0 expansion, on-chain expansion, and off-chain expansion. Among them, on-chain expansion improves the efficiency of the blockchain by changing the basic protocol. Off-chain expansion does not change the basic protocol, and changes are made at the application layer to improve scalability. Layer-0 expansion improves blockchain scalability by changing the underlying data transmission protocol of the blockchain. The on-chain expansion scheme includes data layer improvement scheme, consensus layer improvement scheme and network layer improvement scheme. The basic idea is to increase the block size (either directly or indirectly) or reduce the block verification propagation time and consensus formation time. The off-chain expansion scheme mainly includes four methods: state channel, side chain, cross-chain and offchain computation. The idea is to transfer some on-chain transactions to off-chain for execution, in order to reduce the processing pressure on the chain and improve the overall efficiency. While improving the performance of the blockchain, the off-chain scaling technology takes into account decentralization and security, and has various excellent properties.

III. MODEL AND SAFETY GOALS A. SYSTEM MODEL
Based on the blockchain expansion technology, we propose a new audit scheme. Our scheme consists of three entities: data owner, cloud service provider and verifier. The system model is shown in Figure 1.
Data Owner (DO): The owner of the data, who can authorize other users to access and use the data.
Cloud Service Provider (CSP): Generally composed of multiple servers. It can provide users with massive data storage service.
Verifier: Audit the proof provided by the CSP and inform the DO of the result. In this scheme, the smart contracts deployed on the blockchain and consensus nodes cooperate to perform audit tasks. In order to avoid affecting the user experience due to the communication with the CSP during the audit process as much as possible, the concept of noninteractive audit is introduced. The vulnerability to replacing attack and replay attack is also mitigated in this scheme.

B. SECURITY DEFINITION (SOUNDNESS)
The soundness of the scheme is proved by the following game between adversary A and challenger C: 1) C calls key generation algorithm KeyGen(1 k ) to generate keypair (pk, sk) and gives pk to A. 2) A tries to get a signature set ← SigGen(F, pk, sk) by doing multiple interactions and queries with C 3) A outputs the audit proof P of the file F, the signature set , and the current state τ . Let the probability of A forging a proof and passing the verification be Adv A = Pr[Verify (pk, P) = 1]. The definition of adversary winning is that Adv A is non-negligible.
Definition 1: The proposed scheme is sound if there exists an efficient extraction algorithm. Adversary A outputs proof P based on file F, state τ and signature . If the adversary A can win the game with a non-negligible probability, then there is an extraction algorithm that can recover the file F based on the signature and the proof P, i.e.Extract (pk, P, ) = F.

C. DESIGN GOALS
Our program should achieve the following goals: 4) Correctness: For all key pairs (pk, sk) ← KeyGen(1 k ), and for file F, state τ , the verification algorithm outputs 1 ← Verify (pk, ProofGen(sigGen (F, sk) , F, τ )). 5) Soundness: For any forged proof, it cannot pass the verification with a non-negligible probability. 6) Batch auditing: including multi-user single-task auditing and multi-user multi-task auditing. This is to ensure the efficiency of auditing. 7) Non-interactive: Reduce the number of interactions between the CSP and the user during the audit process. 8) Public auditing: Ensure that any user including the data owner can challenge the CSP to verify the integrity of the data based on the certificate generated by the CSP.

IV. SCHEME CONSTRUCTION A. MAIN IDEA
In the existing data integrity audit schemes based on blockchain, an independent block is often created for each uploaded file for storage [14]. However, as the number of users of CSP services and the amount of data uploaded by users continue to increase, the cost of executing contracts and creating new blocks will continue to grow, reducing the effectiveness of the solution. At the same time, since a block is created separately for each file and the audit algorithm is written into it, the signatures cannot be aggregated, so batch auditing is not possible, resulting in a much lower audit efficiency than traditional solutions. For the problem of high cost for blockchain network maintenance and for user creation of new blocks caused by excessive block growth, a data integrity audit scheme based on blockchain expansion technology is proposed, and a reward pool mechanism is introduced to ensure that verification nodes can get a fair reward. The main idea is that the user first deploys the Plasma contract and the contract T to the main chain and sub-chain respectively. Then the user uploads the data and signature to the cloud. The data owner executes the KeyGen algorithm and the SigGen algorithm, and uploads the file F and the corresponding signature set to the CSP. At the same time, the data owner deploys the Plasma contract ( Figure 2) and the contract T ( Figure 3) to the main chain and sub-chain respectively.

B. SMART CONTRACT BASED ON PLASMA EXPANSION TECHNOLOGY
Plasma is a blockchain expansion technology that has been widely researched and applied. By introducing this technology, the growth rate of blockchain can be slowed down and the storage and computational overhead can be reduced. The characteristics of smart contracts ensure that the blockchain network will automatically execute smart contract code, and the efficiency and accuracy of auditing can be  to send an exit request to the Plasma contract. An ''exit bond'' is included in the application. If this exit request is successfully challenged, the withdrawal operation will be cancelled and the exit deposit will be sent to the challenging user. 5) Challenge (): Each exit request has a dispute period during which validators can challenge the exit in progress. A dispute is executed by calling the challenge () function. After the dispute is successful, the ongoing withdrawal operation will be canceled, and the deposit frozen by submitting the withdrawal application will be sent to the challenge. The deployment and execution of the contract includes the following 5 steps: 1) Contract deployment: DO and CSP jointly confirm the content of the contract and deploy the Plasma contract to the main chain. DO sends a certain amount of main chain Token to the Plasma contract through the deposit algorithm to join the Plasma Chain, and deploys the verification contract to the sub-chain. 2) Proof verification: The operator processes the contracts deployed by users on the Plasma Chain, executes the verify algorithm to get the audit results, and generates blocks for them to add to the chain. 3) State Commitment: The operator periodically submits the hash value of the block on the Plasma Chain to the main chain as a proof of the state update of the sub-chain. 4) Status Monitoring: Users monitor the update status through the On-chain wallet. 5) User exit: When the user executes the exit algorithm to request to exit the sub-chain, the verifier can challenge the request by executing the challenge algorithm; when the operator uploads the recent verification to the main VOLUME 10, 2022 chain, the user or consensus node can also challenge the request.
C. AUDIT SCHEME 1) SETUP PHASE 1) KeyGen 1 k → pk, sk The data owner chooses the hash function H (·):{0, 1} * → G, and randomly chooses a signing key pair (ssk, spk). Then the user randomly selects x ← Z p , µ ← G and calculates h = g x . Private key is sk=(ssk, x), and public key is pk = (spk, g, h, e (a, b) , H (·)). 2) sigGen (F, pk, sk) → : Taking file F as an example, first divide the file into n blocks, i.e., F = {m i } 1≤i≤n , where m i ∈ Z p is the identifying name of the file F selected by the data owner. Signature set is = {σ i } 1≤i≤n . In order to ensure the correctness of the file name, use the signature private key ssk to sign the name, and then generate a tag t = name||Sig ssk (name) for F. Finally, the data owner uploads the file F, data signature and tag t to the cloud.
If the verification is passed, the audit information and results will be saved to the sub-chain; otherwise, the data owner and the cloud should be notified that the verification fails, and the cloud will be punished.

V. SECURITY ANALYSIS A. CORRECTNESS
If the CSP correctly stores the user's data, the proof generated by it can be verified by the ProofVerify algorithm.
Equation (1) is verified as follows: [1,c] (H (name||i) · µ m i ) x·s i , g) 1) Anti-replacing attack: During the auditing process, if the CSP does not store the holder's data correctly, the signature generated by the non-challenge block data cannot be verified by the auditor. 2) Anti-replay attack: similar to anti-replacing attack.
During the auditing process, if the CSP does not store the holder's data correctly, the signature generated by the CSP using the previous data block information cannot be verified by the auditor.

Proof:
1) The CSP forges signatures using non-challenge blocks in an attempt to pass verification. Assuming that the CSP uses a non-challenge block to forge the signature σ − . If e σ − , g = e (σ, g) (2) holds, the replacing attack works. The proof is given below: Substitute the forged signature and m j − = m j , which contradicts the previous assumption. Therefore, signatures forged by CSP using non-challenge blocks cannot be verified by auditors.
2) The CSP uses the information of previous challenge blocks to forge the signature and try to pass the verification. Assuming that the CSP uses previous challenge blocks' information to forge the signature σ * , if e σ − , g = e (σ, g) (3) holds, the replay attack works. The proof is similar to the anti-replacing attack, and will not be repeated here.

C. BATCH AUDITING
There are usually two cases for batch auditing: 1) Multi-user single-task, that is, the auditor handles a single audit request from multiple users at the same time; 2) Multi-user multi-task, that is, the auditor handles multiple audit requests from multiple users at the same time.  Proof: 1) If u k [1,k] e (σ, g) = u k [1,k] e( i [1,c] H (name||i) s i · µ λ , h) (4) holds, the proposed scheme supports multi-user single-task auditing. Multi-user single-task auditing is a special case of multi-user multi-task auditing, so a detailed proof will be given in multi-user multi-task batch auditing.
2) If f s [1,s] u k [1,k] e (σ, g) = f s [1,s] u k [1,k] e( i [1,c] H (name||i) s i · µ λ , h) (5) holds, the proposed scheme supports multi-user multi-task auditing. The proof is as follows: [1,k] e (σ, g) Table 1 shows the comparison between the proposed scheme and scheme in [24], scheme in [27], scheme in [22], and scheme in [20] in four aspects: public verification, batch auditing, fair arbitration, and blockchain expansion. All schemes support public auditing. The schemes of in [24] and in [20], and our proposed scheme all use the blockchain network as the auditor, so they all support fair arbitration.
As shown in Table 2, our scheme is the only solution that supports all four aspects. Table 2 shows the formal computational costs of schemes in [24], [27], [22], [20] and our proposed scheme. Map represents a bilinear pairing operation. Mul represents a multiplication operation on an elliptic curve. Indicates the number of challenged data blocks. Exp represents an exponentiation operation on an elliptic curve.

VI. PERFORMANCE ANALYSIS A. COMPUTATIONAL OVERHEAD
Experiments are conducted using (JPBC) and Solidity and under Intel Core i9-9800HK CPU 2.3GHz 32RAM Linux environment. The experimental results of our proposed scheme are compared with those of schemes in [20], [22], [24], [27]. Since the I/O delay mainly depends on factors such as hardware conditions and scheduling algorithms, and has little to do with the scheme itself, the experiment mainly tests the signature generation time, proof generation time and gas cost of integrity verifying. Proof generation time is the time the CSP uses to produce proof of integrity verification. Integrity verification Gas cost is the cost of the smart contract for integrity verification results. Figure 5 shows the comparison of the signature generation time between our proposed scheme and that of scheme in [20], [22], [24], [27]. In order to be closer to the actual use case, the file block is divided into n data blocks, where n ranges from 100 to 1000, and the data block size is 5MB.
It can be seen from the figure that when the number of data blocks increases from 100 to 1000, the time of signature generation of our proposed scheme increases from 0.41s to 4.14s; the time of scheme [24] increases from 0.54s to 6.3s; the time of scheme [20] increases from 1.55s to 8.01s; the time of scheme [22] increases from 2.34s to 20.55s; the time of scheme [27] increases from 3.51s to24.94s. Figure 6 shows a comparison of the proof generation time between our proposed scheme and that of scheme in [20], [22], [24] and [27].
In order to make the experimental results more accurate, the number of challenge blocks is selected from 100 to 1000 in the experiment. It can be seen from the figure that when the number of challenge blocks increases from 100 to 1000, the audit proof generation time of VOLUME 10, 2022  our proposed scheme increases from 0.26s to 3.4s; the time of scheme [24] increases from 0.4s to 3.9s; the time of scheme [20] increases from 0.47s to 4.1s; the time of scheme [22] increases from 0.6s to 4.8s; the time of scheme [27] increases from 1s to 6.3s. Figure 7 shows the comparison of gas cost for integrity verification between our proposed scheme, scheme [24], and scheme [20]. Gas represents the amount of resources that must be consumed and cannot be refunded in the Ethernet system. EVM must consume gas when executing code. Therefore, gas reflects the amount of resources consumed by EVM when working. All EVM transactions and smart contract executions require gas fees. In our scheme, most operations are performed on the blockchain, so the gas consumption reflects the cost of the scheme. According to the standard of the Ethereum Association, we set the gas price to 2.5 GWei (the basic currency unit of Ethereum, 109 GWei = 1 ether) in our experiments. The experiment uses Solidity to program smart contracts and experiment on the Ethereum blockchain. Since our solution only stores the user's state information, sub-chain ID and proof aggregation information on the main chain, the gas cost of our scheme is much lower than that of scheme [24] and scheme [20]. We set the number of challenge blocks to 300 and 400, and data size ranges from 1MB to 10MB. The results of the performance evaluation are the average of 20 experiments. It can be seen from Figure 7 that when the number of randomly selected data blocks is constant, the gas cost does not grow with the increase of the data size. When validating 300 randomly selected data blocks, the gas consumption value is 0.93 × 107 Gwei (0.013 ether), and when validating 460 randomly selected data blocks, the gas consumption value is 1.55 × 107 Gwei (0.020 ether).

VII. CONCLUSION
As cloud computing and cloud storage technologies evolve faster and faster, the amount of data in cloud storage grows explosively, how can we ensure that the full information stored by users on cloud servers becomes an important topic for discussion. This article proposes a data integrity scheme based on block chain expansion technology. In our scheme, we use the blockchain network to overcome some of the shortcomings of traditional auditing, improving the efficiency and security of the scheme. In addition, we introduce plasma sub-chain and deploy smart contracts on the main chain and sub-chain respectively. Through this protocol, the storage pressure of the main chain can be greatly reduced, the growth rate can be slowed down, the storage and computational overhead can be reduced, and the system performance can be improved. At the same time, the reward pool mechanism and the concept of non-interactive audit are introduced to ensure the correctness of the audit and avoid the interaction between the smart contract platform and the CSP during the contract execution process, and the solution can achieve the expected security goals.
ZHENPENG LIU received the B.S. and M.S. degrees from the School of Cyberspace Security and Computer, Hebei University, and the Ph.D. degree from Tianjin University. He is currently a Professor with the School of Cyberspace Security and Computer, Hebei University. His research interests include big data, cloud computing, and information security research.
YONGJIANG FENG is currently pursuing the M.S. degree with the School of Cyberspace Security and Computer, Hebei University, Baoding, Hebei. His research interests include cloud computing, cloud storage, and data integrity verification.
LELE REN is currently pursuing the M.S. degree with the School of Cyberspace Security and Computer, Hebei University, Baoding, Hebei. His research interests include cloud computing security, privacy protection, and data integrity verification.
WEIHUA ZHENG received the M.S. degree from the Hebei University of Engineering. He is currently an Associate Professor with the School of Information Engineering, Handan University, and the School of Information Science and Engineering, Xinjiang University of Science and Technology. His research interests include computer application, big data, cloud computing, and information security research. VOLUME 10, 2022