A Collaborative Auditing Blockchain for Trustworthy Data Integrity in Cloud Storage System

Cloud storage system provides data owners with remote storage service, which allows them to outsource data without local storage burden. Nevertheless, the cloud storage service is not fully trustworthy since it may not be honest and remote data would be corrupted. One way to ensure trustworthy preservation of cloud data is the remote data auditing method, through which data owners can check storage reliability of cloud system on demand and avoid potential data corruption in time. However, private auditing methods fail to promise the mutual trust in auditing results. Thus, public auditing methods are introduced, in which traditionally a third party auditor is delegated to interact with cloud service providers for auditing tasks. Although the third party auditor serves as a medium to exchange trust, a centralized third party is hard to stay neutral, which exposes the remote data auditing to some threats such as collusion attacks. To address the trust problem between data owners and cloud service providers, we propose a collaborative auditing blockchain framework for cloud data storage. In this framework, all consensus nodes substitute for the single third party auditor to execute auditing delegations and record them permanently, thereby preventing entities from deceiving each other. Security analysis shows that the proposed framework has advantage of preserving remote data integrity from various attacks. Performance analysis demonstrates that the framework is more functional and resource-friendly than existing schemes.


I. INTRODUCTION
W ITH the volume of data becoming larger and larger rapidly, cloud storage has made outsourcing data an inevitable trend for resource-constraint data owners including individuals and even organizations. Such cloud service provides not only outsourcing function but also ubiquitous network access and location independence [1], [2], which make data owners not worry about local operating system failures. In the meantime, overheads resulting from data management are greatly reduced, enabling data owners and local devices to focus on data processing.
However, data stored in cloud storage system is out of strong control of its owners, thus suffering from trust problem mainly caused by misbehaviors of the cloud service provider (CSP) [3], [4]. In particular, malicious CSPs may attempt to delete data which is accessed infrequently to save storage space without authorization, or pretend nothing happens when stored data is corrupted. They may even tamper with some data to deceive its owners for other financial profits. The integrity and availability of cloud data are being challenged. Many researchers have developed the remote data auditing (RDA) method for outsourced data to enable data owners to measure the credibility of CSP without further loss. In other words, a data owner who has deleted local copy can still verify the correctness and integrity of remote stored data through the RDA. To implement RDA, a mechanism of "challenge-proofverify" is generally adopted between data owner and CSP in private auditing, where the data owner generates data challenges and verifies corresponding proofs from the CSP to track the state of remote data. Nevertheless, the verification process is only executed by the data owner in private auditing. On the one hand, the auditing result may be unfavorable to the CSP intentionally. On the other hand, such private methods bring a lot of burdens on the data owner as data volume and auditing requests increase. To resolve doubt about auditing result, and to make verification process more energy efficient for data owners, a new entity named third party auditor (TPA) is introduced to achieve public auditing, who accepts auditing delegations from data owners and executes them as directed. But such solutions must assume that the TPA behaves in an honest way, which is not a reliable premise in practice. For example, a TPA may collude with either the CSP to hide data corruption or the data owner to deceive for penalty. In addition, an obvious weakness of a single centralized TPA is the single point of failure.
To tackle the mutual trust problem in public auditing, the emerging blockchain enables a decentralized way to track state changes of a system. This technique first proposed by Satoshi Nakamoto [5] is with the features of decentralization, tamper-proof, consistency and traceability. Recent years, a few schemes [6], [7] have combined integrity checking and blockchain-based storage. However, they only regard the blockchain as an immutable ledger, and data owners still verify proofs by itself. In some cases, data owners have to maintain the whole blockchain, which increases much storage burdens.
Hence, inspired by this technique as well, we design a collaborative auditing blockchain (CAB) to enhance mutual trust between data owners and CSPs in the cloud storage system, while reducing as much resource overheads over data owners as possible. In this paper, our main contributions are summarized as follows: • We design a hierarchical auditing framework to combine RDA and blockchain, which makes all consensus nodes verify data operation records collaboratively and releases data owners from verification cost. • We propose a credit-based consensus protocol and an incentive mechanism intended to quantify behaviors of entities. • We extend our work to support some auditing properties such as batch auditing and dynamic auditing. • We conduct a comprehensive comparison between existing schemes and the proposed scheme. Security analysis and simulation results can meet our design goals.
The remainder of this paper is organized as follows. Section II presents some works related to our work. Section III introduces some preliminaries which serves for auditing process. Section IV is the problem statement including system framework and design goals of the proposed scheme. Section V describes the core of the CAB and auditing protocol. Sec-tion VI gives correctness, security, and simulation analysis respectively. In the end, we draw conclusions in Section VII.

II. RELATED WORK
The RDA can be categorized into private auditing and public auditing. The former only contains two types of entities, namely the data owner and the CSP. The auditing process is mainly performed by the data owner. For example, Ateniese et al. [8] first proposed the concept of provable data possession (PDP), which adopted the mechanism of "challengeproof-verify" to verify the integrity of remote stored data, and has become the basis of many auditing protocols ever since. However, such private protocols increase the burden on the data owner who lacks computing resources. Furthermore, the data owner and the CSP distrust each other in this context. Hence, auditing result may be harmful for the CSP since it can only be obtained from the data owner. To remove the above doubts, Wang et al. [9] first introduced a trusted TPA into PDP to challenge the CSP on behalf of data owners, which implemented public auditing. In this context, delegation of auditing tasks means that the verification process no longer requires the participation of the data owner, thus reducing a lot of computation overheads. Nevertheless, such public auditing schemes did not consider that the single TPA is not always trustworthy and may be bribed by some entity in practice. And it is even more dangerous that the TPA can derive some information about delegators, since all relevant information for verification are transmitted to it.
To address the above problems brought by the TPA in public auditing, several blockchain-based solutions were proposed. Liu et al. [10] replaced TPA with smart contract, where the data owner and the CSP signed an auditing commitment to resist repudiation. Then the data owner could get the hash result of remote data through block identifier, which was compared to the hash previously stored in the blockchain ledger. Obviously, this scheme is not able to resist replay attacks arose from the CSP. Yu et al. [6] presented a fully decentralized data auditing solution without any TPA. They employed homomorphic verifiable tag (HVT) to perform RDA and their solution could effectively resist replay attacks due to random challenge set generated in every auditing request. But the data owner had to traverse the blockchain ledger to find the specific proof for every challenge, which brought extra cost. Yang et al. [11] combined merkle hash tree (MHT) and timestamp server to ensure that all behaviors of data owners and CSPs satisfied accountable traceability. In their design, the data owner should store all proofs and compute the tree root from leaf nodes during every verification process. Qi et al. [12] suggested reputation to quantify the reliability of a CSP, and improved its blockchain with a two-step validation. But more details were not given in this scheme. Li et al. [13] separated operation behaviors and file information within a block. They also introduced a proxy node to efficiently search specific blocks. However, the data owner needed to download the whole file to verify the integrity and could not afford to require auditing from time to time. Xue et al. [7] utilized the nonce in blockchain to construct unpredictable challenges, thereby preventing malicious TPA from forging auditing results. The TPA would insert all proofs and auditing results into a log file, which is uploaded to the blockchain afterwards. Nevertheless, it still required the data owner to review the log from blockchain, where a subset should be reverified by the data owner to discover malicious TPA.
In conclusion, existing blockchain-based auditing schemes mostly focus on solving the collusion problem in TPA-based schemes through recording entities' behaviors or removing the TPA. However, they only take the blockchain as an immutable ledger rather than a distributed trust network, and still involve data owners in nearly the whole auditing process, which causes much burden especially on resourceconstrained ones. Therefore, to make use of trust that consensus nodes can contribute to the auditing process, we redescribe relationships among entities in the cloud storage system based on the blockchain, and design a CAB for resource-constrained data owners. There is no more need for them to traverse the CAB ledger to obtain auditing results, while various security threats cannot work on the CAB. We also introduce a credit score mechanism to encourage all participants to keep honest and maintain the stability of the CAB. In addition, a new entity called group manager is set up to separate data owners from consensus process, enabling our design to accommodate more data owners.
Additionally, data dynamics support is also a hot spot attracting scholars' attention these years. In 2008, Ateniese et al. [14] improved the PDP and achieved partially dynamics. Later, Wang et al. [15] developed the MHT which has been employed widely to support full data dynamics. And Zhu et al. [16] introduced a structure called as index hash table (IHT), which recorded the changes of data blocks. However, the above schemes required large communication resources during the updating and verification processes. Then in 2016, Tian et al. [17] constructed a single linked sequence table DHT, which reduced the computation cost of the CSP and communication overheads in the updating process. Inspired by the DHT, we also introduce an auxiliary chain table (ACT) to support data dynamics and help data owners and consensus nodes search records in the CAB efficiently.

III. PRELIMINARIES
Let G be an elliptic curve subgroup, and G T be a multiplicative subgroup. They are of a large prime order p, and g is a generator of G.

A. COMPUTATIONAL DIFFIE-HELLMAN PROBLEM
A computational problem generated with a security parameter λ is hard if, given as input a problem instance, the probability of finding a correct solution to this problem instance in polynomial time is a negligible function of λ. The security of our scheme is based on the hardness of CDH problem, which is at least as hard as discrete logarithm problem (DLP).
Definition 1 (CDH problem) Given g a and g b , where a, b ∈ R Z * p , compute g ab .

B. SYMMETRIC BILINEAR PAIRING
Bilinear pairing for scheme construction is built from a pairing-friendly elliptic curve where it should be easy to find an isomorphism from the elliptic curve group to the multiplicative group. It is a relatively mature and efficient method and has been employed in cloud auditing. The definition of symmetric pairing is stated as follows.
Definition 2 (Symmetric Bilinear Pairing) A map function e : G × G → G T is a symmetric bilinear pairing only when it satisfies three properties below: g) is a generator of G T . • Computability: For ∀u, v ∈ G, there exists efficient algorithms to compute e(u, v).

C. BLS-BASED HOMOMORPHIC VERIFIABLE TAG
BLS-HVT is based on the BLS signature algorithm, which can be efficiently aggregated and verified without disclosing private key.
Definition 3 (BLS-HVT) Given a data block blk, and a cryptographic hash function H : {0, 1} * → G. Select a random secret key sk = a ∈ Z * p , and compute the corresponding public key pk = g a . Then the BLS-HVT for blk is σ blk = H(blk) a . For verification, the verifier will simply need to check whether e(σ blk , g) = e(H(blk), pk) holds.

A. SYSTEM FRAMEWORK
As shown in Fig. 1, the proposed framework contains four entities: private key generator (PKG), data owners (DOs), group managers (GMs) and CSPs. In our framework, the PKG is governed by a fully-trusted authority which is responsible for setting public parameters for the whole system and generating key pairs for the GM. The DO is assumed to have limited communication, computation, and storage resources. It generates and sends auditing challenges to the CSP on demand, meanwhile maintaining ACT to tracking changes of data blocks. As a member of DOs, the GM is collectively designated by a certain group of DOs. However, it is assumed to possess more resources than common DOs. The GM is responsible for maintaining the blockchain ledger for managed DOs, and returning results to them when ledger is updated. The CSP provides DOs with significant storage space and computation capability. It is also responsible for storing the chain of blocks, while responding proofs to auditing challenges. In our assumption, the CSP may not be honest, and there exists business competition among different CSPs. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  To support public auditing, the CAB will store auditing requests and proofs, which can be accessed and verified by any entity if need be. In addition, the CAB works more like a permissioned blockchain, since only node who meets some requirements can participate in the consensus process.
As for relationships among the above entities, a brief exposition will be provided here. The DO outsources its data files to a CSP and can retrieve them on demand. When the DO sends an operation request to the CSP through GM, the GM simultaneously broadcasts related auxiliary information to other consensus nodes. Once the CSP confirms request and responds to it, the CAB performs consensus process, and then records this operation and its result in a new block. Eventually the DO is able to obtain final response from the GM after the CAB ledger is updated.

B. THREAT MODEL AND SECURITY ASSUMPTIONS
Generally, an malicious entity in our framework may attack the CAB in three ways: • by returning positive results to DOs at all times to deceive for rewards, no matter what real responses are. • by denying requests or operations done to the remote data for further compensation. • by colluding with other entity to directly interfere consensus judgement.
Since all behaviors are signed and traceable in the CAB, repudiation to what have happened is meaningless. Then we mainly consider four types of attacks in this paper: • Replacement attack. The CSP attempts to pass auditing by replacing the challenged data block and tag with combination of other uncorrupted data blocks and tags. • Forgery attack. The CSP forges proofs to deceive other verifiers. • Replay attack. The CSP replays previous proof which has passed verification to bypass present challenge, or the GM replays an existed and valid auditing result to deceive the DO.
• Collusion attack. The CSP colludes with the GM to change challenges and proofs, leading other verifiers to wrong judgement.
We also make some standard cryptographic assumptions. For example, the adversary is not able to forge signatures without owning signer's private key, and the one-way hash function is secure. In addition, we assume that all entities are rational, and no entity can control more than 30% consensus nodes in the CAB.

C. DESIGN GOALS
In this paper, we target public and trustworthy integrity auditing for cloud storage system. The following properties are intended to achieve considering functionality, security and Efficiency respectively. Functionality: • Decentralization. All operation requests and responses are maintained by all consensus participants so that records can be accessed publicly. Through credit score mechanism, each consensus node has a probability to be in charge of generating new block and writing it into the CAB ledger. • Collaboration. The reliability of auditing result requires all verifiers to devote some resources to execute the verification process. • Storage correctness and freshness. The CSP must pass auditing only if it is storing DO's data files intactly. Furthermore, the CSP keeps the latest version of data blocks and corresponding tags, while the DO and verifiers hold the latest auxiliary information used to verify proofs. • Dynamic operations support. The DO is able to perform remote data update operations, i.e., insertion, deletion, and modification, with necessary cost. Security: • Identity-privacy preservation. It is impossible for entities except the GM to get the knowledge of DO's identity. • Blockless verification. It enables verifiers to verify proofs without original data, which ensures the security of data to some extent. • Unforgeability. This property indicates that the probability of forging a proof able to pass verification of honest verifier in polynomial time is negligible. • Collusion resistance. It offers the CAB with the ability of resisting against collusion attacks under certain circumstance. Efficiency: • Batch auditing. Verifiers can check multiple auditing challenges corresponding to various data blocks of different files from different DOs in a management domain simultaneously. • Efficient traceability. All operation histories related to a certain data block can be quickly retrieved by any entity.  Notation Description λ a security parameter G an elliptic curve group G T a multiplicative group p the prime order of G, G T g 1 , g 2 two random generators of G e : a pseudo-random function π(·) a pseudo-random permutation Zp a field of residue classes modulo p σ k|ε (·) a signature signed with key k or by entity ε IDε the unique identicator of entity ε • Stability. The sharp increase in the number of DOs and GMs will not substantially reduce verification efficiency.

V. THE PROPOSED SCHEME
In this section, we first introduce a dynamic structure to support various operations on data blocks. Then the core of our public auditing scheme, which is based on the hierarchical CAB, is illustrated in detail. Afterwards a description of our auditing protocol is presented in subsection V-C. What follows is the extended property of batch auditing. Finally, the interaction between dynamic data operations and the CAB are described briefly and formally. Table 1 shows some basic notations and their descriptions referred to in our scheme.

A. AUXILIARY CHAIN TABLE
In this part, based on the DHT designed in [17], we develop a new data structure ACT employed by each DO, which provides fast retrieval in the CAB and some metadata that helps verification. The ACT is two-dimensional and its specific structure is shown in Fig. 2. Metadata information in the ACT is categorized into file information and data block information. The left part is the identifier of each file owned by a DO, which forms an array since they are independent in storage order. However, due to the correlation among operations on data blocks within the same file, the right part is designed to be a double linked list, where each item records the current version number, time stamp, last operation height, last operation type, and last operation state of corresponding data block. In our scheme the version number is denoted as v m,n , which is updated when a given data block is inserted or modified. Likewise, the time stamp, last operation height, last operation type, and last operation state are respectively abbreviated to ts m,n , loh m,n , lot m,n and los m,n . We should note that ts m,n and lot m,n are updated when operation request is generated, but loh m,n and los m,n are updated after corresponding responses are obtained from the CAB. Here in the above symbols, m represents the file ID and n represents the index of data block.
With such a double linked array, the insertion and deletion of a data block will cause no change in other items    within the same file. Moreover, the ACT provides with local convenience when retrieving state of a certain element or performing forward traceability in the CAB. For further operation convenience, we manually categorize the ACT into existing ACT (EACT) and deleted ACT (DACT). When a data block has been deleted, corresponding item would be deleted from the EACT and inserted into the DACT. In this way, the order of data blocks within the same file would not be chaotic when a new data block is to be inserted afterwards.

B. COLLABORATIVE AUDITING BLOCKCHAIN
The main function of the CAB is to encourage participants with credit rewards in packaging operation information and verifying proofs to make auditing results more reliable. In this part, we present the designed CAB from aspects of block structure, consensus process, and incentive mechanism, which are the basis of a blockchain.

1) Block Structure
We should note that every practical blockchain system has a strong requirement for application scenario, and existing security architectures based on the blockchain have their own security goals. Therefore, information stored in a block are different in these systems. In practice, considering that the capacity of a block is limited for the reason of efficiency, we only store the most important metadata which reflects entities' behaviors and corresponding results in the form of hash. VOLUME 4, 2016 As shown in Fig. 3, a block is divided into two parts: block header and block body. Different from cryptocurrency, we adapt the block to remote auditing context by reconstructing transaction structure. Details about important fields are showed as follows: • height: the hight of current block, which is responded to the DO to update the ACT. • from, to: identifiers of delegated GM and requested CSP. • type: an operation type among insertion, modification, deletion and auditing. • sig: the signature of GM who is responsible for transmitting the transaction to the CAB network. • result: the result to operation request signed by the representative. • last op heights: a list of the heights of previous blocks which track last operations done to the same data blocks. It is filled with the field loh obtaining from the ACT. • extension field: if operation type is auditing, this field is filled with challenges and corresponding proofs. If operation type is insertion or modification, this field is written with a list of v and ts obtaining from the ACT.
Otherwise it is empty.

2) Consensus Process
Apart from the construction of block structure for specific scenario, another crucial part of a blockchain is the design of its consensus process. Generally, the purpose of a consensus protocol is to solve problems of storage consistency and process reliability. Based on [18]- [20], which are known as PoS and PBFT, we define credit score as measurement of CSPs' and GMs' reliability and a factor affecting selection of consensus nodes and representative. Before describing consensus process, we give definitions about some concepts.
Definition 4 (Credit Score) Credit score cr is employed to express the degree at which a CSP or a GM can be trusted via evaluating their behaviors.
On the whole, the credit-based consensus process is run by all CSPs and selected GMs over a sequence of rounds in our framework. Before establishing the CAB, through negotiating, CSPs and GMs get their initial credit scores according to the percentage of DOs that they provides service for or they manages. Then the representative election proceeds as follows: 1) Initialization: When system starts, CSPs and GMs broadcast their public keys and credit scores (pk, cr, σ P KG (pk cr)). Then all CSPs and GMs get the credit block to initialize the CAB. 2) Consensus nodes selection: As the number of DO rises, there would be more GMs for load balancing. Considering efficiency and stability of consensus pro- cess, we first sort GMs by the number of request source in current transaction pool in descending order and get {GM 1 , GM 2 , . . . }. Then GM i is selected to be a consensus node in current round with a probability p i , where p i = cri j crj . Consensus nodes selection process flips a p 1 -biased coin to check whether GM 1 is selected; then for all j 2 if exists, it flips a (1 − p 1 ) · · · (1 − p j−1 )p j -biased coin to check whether GM j is selected. Once a GM is picked out, the sorted set of GMs would remove it. The selection process would be executed 2f + 1 times, where f ∈ Z is the maximum number of possible malicious GMs in the CAB. Additionally considering that the number of clouds is limited in reality, all CSPs will participate in the consensus process in our scheme.
3) Representative election: Similar to consensus nodes selection in GMs, except that we first sort CSPs by the number of request destination in current transaction pool in ascending order. After rearranging the set of consensus nodes in the order of {{CSP s}, {GM s}}, the p-biased coin selection process will be only executed once to determine current representative. For every round after the representative has been elected, the abstract flow from sending requests to the end of consensus is as Fig. 4 shows.
As a prerequisite, a DO sends an operation request req on remote stored data to the GM, and GM broadcasts req to the CAB network after re-signing it. Upon receiving req, the designative CSP signs response res and broadcasts to the network as well. When the representative collects enough (req, res) to form a block, it adds its verification result to each (req, res) and launches a three-stage consensus proposal. Firstly, the representative broadcasts a pre-prepare message and a new packaged block. When a consensus node receives the message, it checks correctness of current round information. If passed, this consensus node broadcasts a prepare message to claim its ready state. Once a consensus node obtains more than 2N/3 prepare messages, where N is the total number of consensus nodes in current round, it begins verification process with relevant information stored in the CAB ledger to check whether the result is the same as what the representative gets or not. If the same, this node will broadcast a commit message to other peers. Finally, if a consensus node gets more than 2N/3 consensus commits, it will accept the new block and append it to the end of ledger. Then all consensus GMs reply to other unselected GMs in current round, each of which will accept the new block if more than f + 1 same blocks are received. Eventually, the GM responds to the DO with operation result after ledger is updated.

3) Incentive Mechanism
Another important part to a blockchain is the incentive mechanism, which can encourage consensus nodes to repeat some tasks honestly and get their rewards, thus enhancing the distributed trust of the whole system and the reliability of results. Based on the assumption that all entities are rational, we roughly present a feasible incentive mechanism that takes economic benefit and misbehaviors into consideration for further stability: • A DO needs to buy credits to pay GMs and CSPs for operation requests. • Each communication in consensus process will cost GMs and CSPs a portion of credit. • A CSP who fails DO's request will decrease exponentially on credit score, whereas the DO gets extra compensation apart from request fee which has been paid. • A GM can only get credit rewards when managed DO's request passes, otherwise the GM makes neither profit nor loss. • A representative will get large but linear and other consensus nodes will get small but also linear increase on credit scores if current round successfully reaches final consensus. • A DO has the freedom to choose any CSP to store data or any GM to transfer messages according to the credit score. If the credit score falls below a threshold, existed consumers will leave for another CSP or GM.
It is obvious that the credit score not only affects the probability of consensus nodes selection and representative election, but also determines practical economical benefit of all entities, since it is like a token to some extent. If a CSP keeps data intact, performs DO's requests honestly, or reports other CSPs' misbehaviors, it will be rewarded with some credit scores. In other words, if a CSP is detected having some misbehaviors, which eventually leads to failing to pass auditing in our scheme, it will be punished hard in credit score. Any DO prefers choosing a CSP which has a relatively high credit score to store data, and such CSPs can also be elected to be the representative with a high probability. Likewise, for a rational GM, though its credit score will not decrease even if DOs' requests fail, a DO tends to send requests to a GM with a relatively high credit score when there are other GMs within a big domain, which further influence the probability of a GM to be selected as a consensus node. Therefore, the proposed incentive mechanism guarantees the honesty of entities to a considerable extent.

C. HIGH DESCRIPTION OF THE AUDITING PROTOCOL
In this part, we focus on our public auditing protocol with the help of the CAB described in subsection V-B. The construction of the proposed protocol is divided into two phase: setup phase and audit phase. The former is responsible for system parameters initialization and auxiliary information generation. The latter is the core process of remote data auditing. Both phases need collaboration among all participants, that is, the interaction with the CAB.
Assuming that a PKG, a DO, a GM which the DO chooses, and all CSPs participate in our auditing protocol. In addition, the specific signature algorithm for message authentication is out of our consideration.

1) Setup Phase
In this phase, the PKG is in charge of system parameters initialization, while the DO and GM need to pre-process files. Firstly, the PKG utilizes security parameter to generate a public-secret key pair for the GM, and the DO also generates its own key pair in KeyGen. Then the DO initializes EACT and DACT in FileToAux to store metadata in the form of as described in subsection V-A. Finally, the DO pre-processes files and uploads them to the CSP in FileToCS.
KeyGen. With a security parameter λ, the PKG first selects an elliptic curve group G and a multiplicative group G T of the same large prime order p, a field Z p of residue classes modulo p, and a symmetric bilinear pairing e : G × G → G T . Two random generators g 1 , g 2 ∈ G are also picked. Additionally, the PKG defines a one-way hash function H : {0, 1} * → G, a pseudo-random function (PRF) f , and a pseudo-random permutation (PRP) π. Thus, the set of system public parameters is SP = {G, G T , p, Z p , g 1 , g 2 , e, H, f, π}. (1) Then the PKG chooses random element α ∈ Z * p = Z p \{0} as the secret key gsk of GM, and let its public key be gpk = g α 1 . The GM also chooses a signing key pair (gssk, gspk). As for the DO, it randomly chooses sk = β ∈ Z * p as private key, and computes pk = gpk β to be public key. Afterwards the DO calculates the inverse inv satisfying inv · β ≡ 1 (mod p), and a parameter γ = gpk inv . Likewise, the DO chooses a signing key pair (ssk, spk). Furthermore, we assume that the CSP which is designated to offer storage service to the DO has signing keys (cssk, cspk).
FileToAux. Initially the DO creates the EACT for all files to be uploaded. For i-th data block in a file, we set corresponding item as FileToCS. For simplicity, we suppose that the DO needs to upload a file F . The DO splits F into data blocks such as: VOLUME 4, 2016 where b i is a general name of data block. Then the DO generates a BLS-HVT for each b i : of which the aggregated set is σ = {σ i } i∈ [1,n] .
In (2), v i and ts i are obtained from the EACT. Afterwards the DO constructs an insertion instruction {insert, ts ins , ID F , F, σ, H(F ), [1,n] , σ DO } and sends to the GM, where In the meantime, the GM creates and broadcasts a insertion request to the CAB network: where σ GM = σ gssk (ts ins {(v i , ts i , loh i )} i∈ [1,n] ).
Upon receiving insertion instruction, ID CSP verifies σ GM firstly and then checks integrity of file F through H(F ). If passed, it stores F and broadcasts a response where σ CSP = σ cssk (1 ts ins ).
When the representative receives both T X ins2CAB and T X res2ins , it verifies GM's and CSP's signatures, and writes (1, σ rep (1 ts ins ID GM ID F ) into field result of a transaction in the new packaged block if passed. Once the representative collects enough transactions, the consensus process will be performed as described in V-B2. In the end, the GM obtains the latest ledger and returns to the DO, where rep is the abbreviation of the representative and h is set as the hight of the latest block in the updated ledger. The DO verifies the representative's signature with local stored ts ins , and overwrites items in the EACT as (v i = 1, ts i , loh * i = h, lot i = insert, los * i = 1) if passed, which means that the insertion request is performed successfully. At this moment the DO can delete the local copy of file F .

2) Audit Phase
This phase performs remote data integrity auditing. Specifically, challenges are generated from the DO and sent to the CSP in ChalGen, after which the CSP computes integrity proofs and broadcasts them to the CAB network in ProofGen. Then in ProofAudit, consensus nodes check the correctness of proofs with the help of auxiliary metadata previously stored in the CAB ledger. For simplicity, supposing that the DO wants to audit aforementioned file F which has been uploaded.
ChalGen. The DO randomly chooses two generation keys k 1 , k 2 ∈ Z * p , and picks z ∈ Z + random items from the EACT, eventually obtaining a challenge set chal = {i, r i } i∈ [1,z] (4) by calculating Meanwhile, the DO updates these items as where ts aud is current time. Then the DO signs chal and other auxiliary information, after which it constructs an auditing instruction and sends it to the GM: where σ DO = σ ssk (ts aud γ chal {loh i } i∈ [1,z] ).
Finally it will be broadcast to all participants for future verification.
ProofGen. Once ID CSP gets the T X aud2CAB , it verifies GM's signature and checks whether chal are out of range or not. If valid, ID CSP generates a tag proof T P = i∈ [1,z] σ ri i (6) and a data block proof DP = i∈ [1,z] b i · r i (7) according to chal. Upon completion, the above proofs T X res2aud = {ID CSP , ts aud , ID GM , ID F , T P, DP, σ CSP } will be broadcast, where σ CSP = σ cssk (ts aud T P DP ).
ProofAudit. The representative collects T X aud2CAB and T X res2aud . Based on loh i , the representative retrieves field last op heights of a transaction in corresponding blocks recursively till the type of transaction is "insert" or "modify", and gets all v and ts of challenged data blocks from the CAB ledger. Then the representative computes aggregated data block information DBI = e( i∈ [1,z] H(v i ts i ) ri , gpk) (9) for the whole chal. And it continues to check whether e(T P, γ) = DBI · e(g DP 2 , gpk) holds. If holding, the representative writes (1, σ rep (1 ts aud chal ID GM ID F ) into field result of corresponding transaction, otherwise it writes (0, σ rep (0 ts aud chal ID GM ID F ). When enough transactions are collected and processed in such way, the representative performs the consensus process. Finally, the GM receives new block and updates local ledger, after which it returns result to the DO. The DO verifies it with stored ts aud , k 1 and k 2 , and updates corresponding items in the EACT as: (v i , ts * i , loh * i = h, lot i = audit, los i = state). If state = 1, the challenged data blocks are complete; otherwise if state = 0, there is at least one of these outsourced data blocks suffering corruption.

D. BATCH AUDITING
There are many different DOs to be served and multiple auditing tasks may be waiting for processing simultaneously. The batch auditing in our work supports aggregating challenges of multiple files from various DOs managed by the same GM, which can reduce computation overheads of verifiers in ProofAudit compared with individual auditing.
Supposing that the u i ∈ [1, u z ] DOs delegate auditing tasks of i ∈ If it holds, the completeness and correctness of all verified files can be ensured, otherwise at least one file of a certain DO is corrupted.

E. DYNAMIC AUDITING
Data flow during insertion and auditing processes has been respectively described in detail. Hence, we proceed to introduce how the other two operations generate trails in the CAB, which are modification (T X mod2CAB ) and deletion (T X del2CAB ). In the following, we suppose that a remote data block b i in file ID F of a DO will be manipulated. In addition, corresponding item in current EACT is assumed to be: Firstly, the DO updates corresponding item in the EACT to: , and generates a new BLS-HVT 2 ) sk for the modified data block. Then a modification instruction , σ GM } is sent to the CSP with the help of the GM, where ts mod is the generation time of this instruction. After the GM broadcasts and the CSP modifies b i as indicated and broadcasts response T X res2mod = {ID CSP , ts mod , ID GM , ID F , 1, σ CSP }, the CAB collects them to verify. If their information matches and the CAB ledger is updated after all processes, will be returned from the GM. The last step for the DO is to validate the result, and overwrite corresponding item to loh * i = h i and los * i = 1. 2) Deletion. Similarly, after setting ts * i = ts del , lot * i = delete and los * i = −1, where ts del is the deletion time, a deletion request is broadcast. Once the CSP accepts the request, it makes a response T X res2del = {ID CSP , ts del , ID GM , ID F , 1, σ CSP }.
Then after T X del2CAB and T X res2del have been written into the CAB, the DO finally gets and updates the EACT as loh * i = h i and los * i = 1, after which corresponding item will be moved to the DACT. VOLUME 4, 2016 9 We can note that los in relevant items would be reset and overwritten at the beginning and the end of each request. This field keeps the state of lot and it has three different values. When los equals to -1, it means that request lot has not gained a response yet. If los is set to 0, which indicates request lot fails, there exists two possible cases: 1) when lot = audit, then the outsourced data is not complete or tampered with; 2) otherwise the asked CSP refused to perform request lot for some reasons such as illegal remote data operation. In this case, the CSP should present relevant evidence to other consensus nodes to validate. If los = 1, it shows that request lot has been successfully admitted by the CSP or the auditing result is positive.

VI. EVAULATION
In this part, analysis of correctness, security and performance are showed.

A. CORRECTNESS ANALYSIS
There are two primary equations employed in our verification process: (10) for a single auditing task from a single DO and (11) for several auditing tasks from multiple DOs. Supposing that each consensus node in our protocol has received challenges and valid proofs, we will demonstrate the equality of these two equations to prove the correctness of audit phase.
We should note that only DOs who are managed by the same GM are able to reduce pairing operations through information aggregation, compared to individual auditing.

B. SECURITY ANALYSIS
In this subsection, we analyze how the CAB resist four types of attacks which could happen during the whole audit phase, including replacing attacks, forgery attacks, replay attacks and collusion attacks. It is noteworthy that in our assumption all other entities apart from the mentioned in each kind of security threat will comply with auditing regulations.

Theorem 1
The CAB can resist replacing attacks generated by the CSP. A malicious CSP is not able to get the correct combination of intact data block information to replace corrupted data block when generating proofs and deceive verifiers.
Proof: Supposing that a data block b j to be checked has been corrupted but that two data blocks b j1 and b j2 , whose HVT are σ j1 and σ j2 respectively, are well maintained in the CSP. To retrieve the HVT of b j , the CSP should find out the correct combination of σ j1 and σ j2 . Since then the CSP sets that where α j1 , α j2 ∈ Z p . Comparing σ * j and σ j , it follows that must be satisfied simultaneously. Solution to the second equation refers to the DLP on elliptic curve, which means it is hard to work out the desired α j1 and α j2 . Thus the probability that the CSP performs such replacing attacks successfully is negligible.

Theorem 2
The CAB can resist forgery attacks from the CSP. A malicious CSP cannot directly forge a tag proof to make (10) hold.
Proof: If the CSP has modified data block b i to b i + off i for i ∈ [1, z], where off i represents the modification part. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Then in ProofGen, the CSP should have computed the new T P * which adapts to the new DP * as follows: Since the CSP only owns T P , it must continue to multiply the rest to get T P * . Note that sk is just the private key of the DO, which cannot be known by any other entities in our assumption. Hence, the CSP could not forge proofs to deceive other verifiers. Proof: For the former case, the replay attack may be executed by a malicious CSP when it has illegally changed the state of a data block b j . We assume that b j 's proof information is replaced by its former data block b − j . Then the CSP calculates the new proof (T P * , DP * ) as follows: Upon receipt, verifiers will analyze proofs according to (10). The left part concerning the tag proof can be written as: · e(g i∈ [1,z]\{j} bi·ri+b j − ·rj 2 , gpk).
Comparing the above two expanded equations, we conclude that only when H(v j ts j ) = H(v j − ts j − ), i.e., v j = v j − and ts j = ts j − , can (10) hold. To our knowledge, version number and timestamp of two different data blocks can never be the same. Moreover, it is impossible for the CSP to tamper with DBI, since this variable is calculated by all verifiers with version number and timestamp recorded in the CAB. Hence, replay attacks cannot work in this case.
For the latter case, the replay attack may happen when a malicious GM is greedy for rewards and attempts to deceive the DO with previous result even if the CSP has failed auditing request. However, this evil intention can be easily prevented owing to the representative's signature of ts aud , ID GM , and ID F contained in the result. These three variables act as a random and unique identifier set by the DO and thus the signature of representative changes every request. Therefore, previous auditing result cannot work any more in this case.
In conclusion, our scheme can resist replay attacks.

Theorem 4
The CAB can resist collusion attacks from CSPs and GMs. Colluding CSP and GM cannot deceive the DO by tampering with challenges and adapting proofs to bypass corrupted data blocks.
Proof: In this situation we consider that the CSP and GM have more power to control a part of auditing process. Since the GM is responsible for transferring challenges to the CAB network for DOs, there is a possibility that the GM negotiates with the challenged CSP in advance and then the GM tampers with the range of a challenge set, i.e., chal → chal * = {(i * , r i )} and {loh i → loh * i }, to pick out those still intact data blocks so as to avoid auditing failure. Obviously, altered challenge and proof can easily pass other verifiers' verification afterwards. Nevertheless, similar to the method of preventing replay attacks, in ProofAudit, our design requires a signed auditing result from the GM, which contains the CAB's acknowledgement of current challenge set. Hence, the DO can judge whether current auditing request has been suffered from collusion attacks based on k 1 and k 2 stored temporarily, which can be used to re-generate chal. So our scheme can resist collusion attacks.
Theorem 5 Apart from the GM managing a DO, all other verifiers cannot get information about the relation between the DO and challenged data blocks. VOLUME 4, 2016 Proof: Before broadcast to the CAB network, each request from the DO will be re-constructed by the GM, i.e., removing signature of the DO. What other entities can get is some auxiliary information about a data block. During the audit phase, an extra variable other verifiers will get is γ = gpk inv sent along with chal. Although there holds an equation gpk inv·sk = gpk, verifiers still cannot solve out gpk sk = pk with the help of γ and gpk due to the CDH problem. That is, verifiers are not able to determine which DO in the domain sends the request. Hence, the CAB can protect DO's identity.
With respect to common data operations, our CAB satisfies the properties of non-repudiation and accountable traceability. On the one hand, each behavior is signed with unique file information and timestamp, which means that no entity can deny what has happened or reuse previous information. On the other hand, all related records in the CAB ledger can be found rapidly and can directly reflect modification history of a data block.
Lastly, Table 2 summarizes security comparison results of our scheme with several blockchain-based auditing schemes.

C. PERFORMANCE ANALYSIS
In this subsection, we evaluate the performance of the CAB from two aspects: property comparison and computation cost comparison.
Considering functionality and efficiency, Table 3 lists out comparison result in several properties mentioned in subsection IV-C. Our scheme provides all verifiers with common knowledge, where no sensitive information is involved, and they all have to cost some resources to check auditing proofs, which contributes to the reliability of results. As for compared schemes, their consensus members own no prior knowledge and are only responsible for the verification of signatures, which means that the DO still has to execute auditing process by itself. The CAB supports dynamic data due to operations on data blocks instead of on the whole file. In addition, the proposed ACT helps the DO and consensus paticipants efficiently locate related blocks. With respect to storage freshness, [6], [10], [11] cannot prevent all types of mentioned attacks to ensure the integrity of remote data, thus failing this property's requirement. Apart from [6] and ours, other schemes can only process auditing requests and responses in serial order, which means that they do not support batch auditing to reduce the number of calculation. Moreover, blockchain-based designs generally allow permissible entities to search records in the ledger. We add field last op heights bound to operation history of a data block to accelerate retrieving, whereas other schemes must iterate through the whole ledger to find out the correct transaction in the worst case. The last property that proposed scheme provides is described as stability, which is influenced by the number of DOs in our consideration. Our CAB meets this target due to the hierarchical structure which is introduced into the consensus process, where the GM is delegated to participate in verification process. Therefore, a sharp increase of DOs will not substantially reduce the efficiency.
The comparison of entities' computation cost with [6] and [7], which also employ HVT for auditing, is shown in Table 4. Computation overheads are mainly distributed in FileToCS, ProofGen and ProofAudit in the comparison. Firstly in FileToCS, the DO generates tags for all data blocks to be uploaded. Then in ProofGen, the CSP computes proofs according to challenges. Finally in ProofAudit, the DO in [6], the TPA and DO in [7], or each consensus node in the proposed scheme verifies the correctness of proofs. From Table 4, we can see that the DO in the CAB costs much less, since the burdens caused by local trust in auditing result is spread across the CAB network, which is adapted for resource-constrained DOs.
Additionally, we evaluate performance of the CAB by conducting several experiments using JAVA SE 8.0 on Ubuntu 16.04 Virtual Machine equipped with Intel Core i5 CPU at 2.3GHz and 4GB RAM. All pairing related calculations are implemented with JPBC library v2.0.0 and type A pairing parameters, in which the group order is set to 160 bits and the base field order is 512 bits. We divided a file into 10,000 shards and the size of each data block is set to 4KB, while the proportion of corrupted data blocks is set to 0.001. According to the loss function theory described in [21], overall considering the optimal balance of a high detection probability and low validation cost, only a limited number of data blocks need to be challenged. Hence, the sample size is changed from 50 to 500 data blocks in our experiments. Fig. 5 shows the computation cost on the DO side during the whole auditing process. It is obvious that as the sample size increases, the growth rate of DO's computation time in our scheme is almost half that of other two schemes. In fact apart from tag generation, DOs in comparative schemes have to search blockchain ledger to get proofs and execute verification, whereas the DO in proposed scheme does not concern about these processes and can obtain trustworthy results at last. Fig. 6 shows the average verification time which the DO in [6], the DO and the TPA in [7] and the consensus node in our scheme spend respectively. Thanks to less expensive operations such as map-to-point hash and pairing, and quick search for specific authentication information stored in the CAB with the help of field last op height, our verification cost is more acceptable to each involved verifier compared with other schemes.
In Fig. 7, on the one hand when the size of a challenge set is fixed, the total consensus time in a round, which includes ProofGen and ProofAudit, does not vary a lot as the number of GMs increases. This is because that the actual number of GMs which are to participate in consensus process depends on the stability of current framework, i.e., only a few when most of GMs are honest, which means the total number of consensus nodes is limited and thus the CAB can support more domains without affecting efficiency greatly. On the other hand, when the size of a challenge set becomes bigger,     it takes more time to get a final auditing result due to the growing waiting latency caused by more proof generation and verification cost. However, it also means that when the amount of data grows, more resources are needed to exchange for more reliable results. In our scheme, challenges and proofs are only relevant to the GM, and all DOs are separated from consensus process. Therefore, the burdens of consensus nodes barely change when the number of DOs in a group increases from 50 to 500 in Fig. 8, where a challenge is set to the size of 250 data blocks. Lastly in Fig. 9, we present the comparison of batch VOLUME 4, 2016 auditing with [6] under the condition that each DO executes auditing tasks on 10 different files, of which each challenge set contains 250 data blocks, and these DOs are from the same group. We can see that as auditing tasks come from more DOs, our average auditing efficiency improves whereas [6] stays nearly unchanged. This is because that the total time cost does not increase linearly when different proofs from multiple DOs are aggregated into (9). However, in the compared work, each DO verifies proofs with local information itself and there is no way to aggregate proofs from various DOs.

VII. CONCLUSION
In this paper, we propose a collaborative auditing scheme based on blockchain to mainly achieve trustworthy data integrity in cloud storage system. By introducing the CAB, the RDA process and its results can be more reliable with the help of interested entities, instead of relying on a single third party. Meanwhile, the proxy role played by the GM separates the DO from consensus process, which means that the DO is free from verification burdens and its identity can be hidden. Moreover, our distributed consensus protocol avoids centralization, and incentive mechanism enhances security and stability of the CAB with the credit score. Security analysis demonstrates that our scheme can handle with multiple security threats. Performance evaluation indicates that the CAB is more resource friendly towards DOs and more functional compared with other blockchain-based auditing schemes. Both show that our CAB is capable of solve the mutual trust problem between DOs and CSPs in cloud storage system practically.