GasFuzzer: Fuzzing Ethereum Smart Contract Binaries to Expose Gas-Oriented Exception Security Vulnerabilities

Ethereum is a kind of blockchain platform where developers may develop and run programs called smart contracts. It inherently relies on gas consumption within a specified allowance to constrain code execution, making every instruction along an execution path to be a location for raising an exception. In this paper, we present GasFuzzer, the first work in exploring the effects of gas allowance manipulation to expose gas-oriented exception security vulnerabilities. GasFuzzer consists of two phases. The first phase introduces a gas-greedy strategy to favor transactions having higher gas consumption for mutation to obtain test transactions with different gas consumptions. The second phase introduces a novel notion of fractional gas consumption coverage and a novel gas-leveling strategy. It applies them to mutate the gas allowances of some of these transactions resulting in the highest gas consumptions produced in the first phase followed by applying these allowance-mutated transactions together with those which remained non-mutated to fuzz test the smart contract. We report an evaluation of GasFuzzer via an experiment on 3170 real-world smart contracts deployed on the public Ethereum Blockchain between October 2017 and July 2019. The findings show that GasFuzzer with gas-greedy strategy can detect more Exceptions Disorder kind of security vulnerabilities (7 more cases) than the previous state-of-the-art black-box fuzzer, and GasFuzzer with gas-leveling strategy and gas coverage criterion can detect 6 additional cases of Exceptions Disorder security vulnerabilities, which is significant.


I. INTRODUCTION
Since the inception of Bitcoin, the first cryptocurrency that took advantage of decentralization, both the industry and academia are taking interest in the blockchain technology as it was to reach a market capitalization of more than a quarter trillion USD [5] at the time of submission of this work.Ethereum is another major decentralized platform, which not only allows transactions with tokens but also offers storage and execution of the code, known as smart contracts.Smart contracts can be written in a high-level procedural language named Solidity.In addition to Solidity, other languages such The associate editor coordinating the review of this manuscript and approving it for publication was Lo'ai A. Tawalbeh .
as Serpent and LLL are also available, however these have not been as popular in comparison to solidity.In this work, we will use the term smart contract exclusively for an Ethereum smart contract (written in Solidity for illustration purpose) and blockchain for an Ethereum blockchain.
A key feature of Ethereum is that it uses a mechanism of gas allowance to constrain each external call to any smart contract (i.e., a transaction in Ethereum terminology) to execute within a given gas allowance, where each execution of any instruction consumes a certain amount of gas.This inherent reliance on gas consumption and allowance to execute code makes the execution of smart contracts different from the execution of traditional programs (e.g., Java programs) on traditional platforms in terms of control flow, where in executing a smart contract, any executing instruction can be a code location to raise an exception.
In this paper, we investigate whether gas consumption and allowance may play an effective role in exposing exception-oriented security vulnerabilities in smart contracts.Transactions encoded with different input parameters may require different amounts of gas to be executed.Our insight is that by controlling the amount of gas allowance of a transaction and manipulating the parameter values of the function associated with a transaction when the transaction is issued, we can indirectly select not only a particular function but also a particular statement in the function (inside the function call sequence induced by the transaction) to raise the first out-of-gas exception.This allows a new kind of testing technique to be developed to (fuzz) test how well those gas-oriented exceptions are handled in selected statements and functions and to what extent the exceptions are back-propagated to the corresponding callers along the function call sequence.
In this paper, we present GasFuzzer, which is the first work to our knowledge that manipulates gas allowance of a transaction to exploit security vulnerabilities through the dimension of gas consumption and allowance.We also note that the idea of gas allowance manipulation is general, and it is orthogonal to techniques that impose no particular constraints on gas allowance and gas consumption.
The basic idea of GasFuzzer is as follows: Like typical fuzzers [1], [28], GasFuzzer starts with a pool of seed transactions.It assigns the same amount of energy to every such transaction.Given a transaction in a seed pool of transactions, GasFuzzer mutates it with two original strategies: gas-greedy and gas-leveling.
In the gas-greedy strategy, the input parameters of the given transaction are mutated to produce a mutated transaction.The gas consumption of the mutated transaction is collected.If the mutated transaction consumes more gas than the given transaction, it is placed into the seed pool for potential further mutation and the energy of the given transaction is reduced according to a power law.
In the gas-leveling strategy, the seed pool is firstly filled with a small number of transactions consuming most gas generated by the gas-greedy strategy.They form a sequence of transactions.From the sequence, transactions are randomly picked for mutation.For every picked transaction, GasFuzzer divides the gas consumption of the picked transaction into a number of intervals, mutates the gas allowance of the picked transaction to fall within a randomly selected interval, and substitutes the original picked transaction by the mutated transaction.It applies the resultant sequence of transactions to test the corresponding smart contract.GasFuzzer also includes a novel coverage-based test data adequacy criterion (referred to as the gas coverage criterion) to terminate fuzz testing: It repeats the process until the overall coverage on the set of the above-mentioned gas allowance intervals of these picked transactions has reached a predefined threshold.
In the experiment, we collected the a set of 3170 real-world smart contracts from Etherscan [9] deployed between Oct 2017 and Jul 2019 as our smart contract dataset.The empirical results show that, the gas-greedy strategy detects 28% more Exceptions Disorder security vulnerabilities than ContractFuzzer [17] (the current state-of-the-art in black-box fuzzing).It was also found that both techniques had similar security vulnerability detection ability on other kinds of security vulnerabilities detectable by the full set of ContractFuzzer's test oracles in the experiment.Furthermore, the gas-leveling strategy detected 6 additional smart contracts incurring Exceptions Disorder vulnerability (additional increase of 24% over ContractFuzzer), which is significant because they are real bugs.We also observe that some gas-related security vulnerabilities can only be detected under certain pre-conditions, which we will report in Section V.
This work makes the following contributions: 1.This paper is the first work that proposes gas allowance and consumption as a guiding dimension to provide feedback for smart contract fuzzing.2. It presents a novel technique, called GasFuzzer, to realize the above proposal and shows its feasibility by implementing it as a tool.GasFuzzer also includes a novel gas-leveling strategy and a novel coverage-based test data adequacy criterion.3. It presents the first empirical study that compares black-box fuzzing (ContractFuzzer) and gas consumption driven fuzzing (GasFuzzer) for security vulnerability detection on Ethereum smart contracts.It shows the effectiveness of GasFuzzer.The rest of this paper is organized as follows.Section II introduces a running example while section III provides a background on EVM, smart contracts, gas architecture and fuzz testing.Section IV is an overview of ContractFuzzer.Section V presents our proposed method, GasFuzzer which is built on top of ContractFuzzer.We present our experiments and results in Section VI.Related work is discussed in Section VII and finally, Section VIII concludes this work.

II. RUNNING EXAMPLE
A simplified scenario of interactions between two smart contracts tokenHolder and txManager has been presented in Fig. 1.The function receiveToken() receives an unsigned integer t as input and updates a storage variable token.receiveToken() also uses the address m of contract txManager to call the function manageTx() that increments the storage variable tx by 1 to maintain its transaction count.Unlike a variable labeled as ''memory'', a storage variable in Ethereum will cause the storage area of the blockchain to permanently keep the value assigned to that variable, provided that the transaction is executed successfully.In Ethereum, owing to the need for blockchain space to keep a value for a storage variable in a smart contract, the gas consumptions to keep different values may not be identical.For instance, suppose that the smart VOLUME 8, 2020 contracts shown in Fig. 1 are newly deployed on a blockchain.Table 1 shows the gas consumption (i.e., transaction cost) of each transaction (i.e., each call) in the calling sequence to receiveToken(t) with t = 0, 7, 6, 0, 4, 7 and 3.The gas allowance provided for each transaction is listed in the second column of Table 1.Considering the first five transactions, by calling receiveToken(0), token is set to 0 in the 1st and 4th calls, and the gas consumptions of these function calls are considerably less than invoking the same function with a non-zero parameter.On the other hand, updating the value of the storage variable from zero to non-zero consumes more gas than all the other cases, e.g., receiveToken (7) and receiveToken (4).
When a transaction is issued to a blockchain for execution, the transaction must come with a gas allowance that implicitly constrains the total number of execution steps allowed to complete the function call.The transaction can only be completed before exhausting this gas allowance.
The last two rows in Table 1 summarize transactions where their gas allowances are less than those of the first five transactions.receiveToken (7) and receiveToken(3) are allowed with gas limits of 35,000 and 20,000 respectively.Similar to the transaction receiveToken(6), the full execution cost for receiveToken (7) should also be 35,256.However, this transaction was only allowed to spend 35,000 units of gas to execute.As such, not every instruction can be completed: the value of token variable in tokenHolder is updated from 4 to 7, but the value of tx remains 5 without any exception being reported.If the gas allowance is further reduced to 20,000 units with transaction receiveToken(3), no variable in these two contracts is updated.
The reason for the first case in the last paragraph is that the statement tx = tx + 1; has an insufficient gas allowance to be executed, and thus, the function call on manageTx() aborts the update of tx and results in an exception.Nonetheless, its calling function (i.e., receiveToken()) neither catches the exception nor determines whether the effect of calling manageTx() has been properly in place.It goes on completing its execution.Therefore, the state of the smart contract becomes inconsistent and the transaction is marked as successful.In the second case in the last paragraph, the amount of gas allowance is small enough that it even cannot complete the update of the variable token.This type of bug can also be viewed as a breach of the atomicity region (i.e., an atomicity violation that raises through an exception which corrupts the memory state).We will use this running example in illustrating GasFuzzer.

III. BACKGROUND
In this section, we present an overview of Ethereum.After that, the Ethereum Virtual Machine (EVM), smart contracts, and gas architecture of Ethereum are described.In the last subsection, an overview of typical fuzz testing is provided.

A. ETHEREUM VIRTUAL MACHINE
The Ethereum Virtual Machine (EVM) is the platform for all smart contracts to be deployed, maintained and executed in a decentralized architecture.It is the only execution environment for Ethereum smart contracts to carry out their operations.
EVM is a clean stack-based implementation and a light-weight execution environment.Each element on the stack consists of 256 bits and is also referred to as a word.EVM is responsible to handle all the state changes that happen to the blockchain in accordance with the predetermined execution phases and environment, e.g.exception handling, transaction reversion and verification of jump target locations.
EVM performs operations on the bytecode of a smart contract after compilation and deployment on a blockchain.It handles tasks such as running the bytecode, computing and keeping a record of the amount of gas consumed/remaining and halting the execution once all the gas offered has been consumed (including but not limited to throwing an out-of-gas exception).In case of a successful state transaction, all the remaining gas is returned to the caller of the transaction.Further details can be found in [8].

B. SMART CONTRACTS
Smart contracts are programs.A simplified source code listing for a smart contract is shown in Fig. 1.Once compiled successfully, a smart contract's bytecode can be deployed on a blockchain.After a successful deployment, the bytecode of the smart contract is visible publicly and the functions can be invoked.Similarly, every transaction calling the public function of the smart contract is also publicly visible.
If somehow, a malicious user manages to execute the public functions of a deployed smart contract in such a manner that renders the smart contract in a state that it is not designed to handle, unintended consequences can arise.The malicious user may exploit such loopholes in the code to carry out attacks such as locking all the digital tokens (e.g., ether or other user-defined cryptocurrency, or data) inside the contract or getting the tokens from the smart contracts which should not happen according to the intention of the smart contract developer.
Any smart contract deployed on a Ethereum blockchain is immutable and its address cannot be allocated to another smart contract.Hence, if a smart contract has to upgrade its version to handle some issues (such as logical bugs or security vulnerabilities), additional deployments need to take place which increase execution costs.Forwarding the transaction from an older version to a newer version on each transaction received by the old smart contract, or transferring the data from the older version to the newer version are the remedies that are often used in such scenarios.But even after the application of such remedies, transferring of data from older versions to newer version remains a problem.
Effective and efficient techniques to find security vulnerabilities before the deployment of new smart contracts is highly desirable.In this sense, even for fuzz testing, one should not target at having a one-size-fit-all technique to expose higher average security vulnerability instances (or bug locations).In Section V, we will present GasFuzzer which targets detecting gas-related exception security vulnerabilities.

C. THE GAS ARCHITECTURE
When initiating a transaction, the initiator (i.e., the caller of the transaction) has to define a gas allowance to pay for running the transaction.The EVM will deduct a certain amount of gas from this given amount of gas allowance after every execution step.Any transaction that exceeds the given gas allowance will be reverted, and all the gas consumed is transferred to a miner (if using proof-of-work consensus protocol).A miner may choose to include or exclude a transaction in its computational task of the required consensus protocol.In the current state of the practice, a transaction with a higher unit gas price has an advantage over a transaction with a lower unit gas price to be processed earlier, and a transaction with a very low unit gas price may never be processed.

D. FUZZ TESTING
There are many different kinds of fuzzing techniques used in the past.For blackbox fuzzing, random inputs are generated to test applications without any knowledge of the implementation of the system.In general, fuzzing is often started by providing a small set of seed inputs and then incrementally and randomly mutating them to generate new inputs without referring to the application details.An example is ContractFuzzer [17], which we will review in Section IV.
To make the fuzz testing process program-aware, grey-box fuzzing such as AFL [28] and AFL Fast [1] have been proposed.Grey-box fuzzers generally follow the methodology depicted in Fig. 2. A program is provided to the fuzzer along with some seed inputs.Such an input is executed on an instrumented version of the program to mutate inputs from feedback such as whether new code-based artifacts (e.g., new branch or new branch subsequences) have been discovered by the applied input.These mutations can be random or guided by heuristics such as the frequency that an input has previously been used for mutation.The program under test is executed on a mutated input, and if there is an increase in path coverage, the mutated input is added to the original input queue making it eligible for further mutation.Harvey [27] is an example of a grey-box smart contract fuzzer that uses input prediction to improve coverage on program paths.GasFuzzer to be presented in Section V is built on ContractFuzzer.Therefore, we review ContractFuzzer in greater detail.ContractFuzzer consists of two end if 9 end while 10 end for sub-components.One is an offline instrumented EVM and the other is an online fuzzer.The EVM instrumentation component is responsible for instrumenting the EVM for enabling the fuzzer to be able to examine the execution of smart contracts and retrieve data for the discovery of bugs.The fuzzing process begins with exploring the bytecode of a smart contract using static analysis and an ABI analysis.In this phase, the data types of ABI parameters, the addresses of the smart contracts, and signatures of functions in these smart contracts are obtained.ContractFuzzer analyzes the ABI signatures of the deployed smart contracts from the blockchain.After performing these two tasks, a phase of random input data generation begins.These generated inputs conform to the ABI specifications.The fuzzing process of ContractFuzzer has been shown in Algorithm 1.
What ContractFuzzer does is: For each contract c in the pool of contracts needed to test, ContractFuzzer extracts a set F c of all the public functions {f 1 , f 2 , f 3 . . .f m } ∈ F c along with the data type of each input parameter that is required by each function.To achieve this, ContractFuzzer utilizes the ABI of each smart contract from where all the public functions and their input types can be identified, i.e., {i 1 , i 2 , i 3 . . .i n } ∈ f where f ∈ F c .Once all the necessary information to generate a transaction sequence is obtained, ContractFuzzer randomly assigns a value to each input parameter i (for i = 1 . . .n) of f and constructs a transaction to represent an invocation of f with these parameter values.A chain of such transactions is then applied to the blockchain to test the set of deployed smart contracts in the blockchain.
The instrumented version of EVM analyzes the execution traces of all the invoked smart contracts through its implemented test oracles.For brevity, we do not review the set of test oracles and how they are formulated in ContractFuzzer.Interested readers may refer to the work of Jiang et al. [17].

V. OUR PROPOSAL: GAS-AWARE FUZZING
In this section, we present GasFuzzer.It includes two strategies to increase the effectiveness of security vulnerability detection.In the gas-greedy strategy, GasFuzzer tends to prioritize transactions that consume more gas than others for input parameter mutation.The insight behind this strategy is that if a transaction consumes more gas, it indicates that more opcodes are likely to have been exercised or more important blockchain-related operations have been performed.In the gas-leveling strategy, the gas allowance for gas-expensive transaction is mutated with the aim of assessing whether exceptions generated due to gas unavailability have not been properly back-propagated to preceding function calls in the call chain.To the best of our knowledge, we are not aware of similar strategy as our gas-leveling strategy in the literature.Moreover, the gas coverage criterion is the first black-box coverage criterion proposed to facilitate gas-leveling strategy in testing smart contracts.The remainder of this section describes these strategies in detail.

A. GAS-GREEDY STRATEGY
An overview of how GasFuzzer fuzz-tests smart contracts with the gas-greedy strategy is depicted in Fig. 3.In this strategy, GasFuzzer initially generates transactions randomly, similar to ContractFuzzer, and executes the smart contracts under test with these transactions.These transactions are then mutated and sent to the blockchain for execution.If these newly generated transactions consume more gas, they are added to the inputs queue for further possible mutations.This process is allowed to take place until all the testing time has been used.Finally, the execution logs are analyzed for security vulnerability detection.Algorithm 2 presents the gas-greedy strategy of GasFuzzer.A list of smart contracts c i ∈ C under test is provided along with a set of seed transactions t i ∈ T for each function with priority 1 in each smart contract c i in C. Seed prioritization factor θ (which is in the range of 0 to 1) and a threshold ρ to choose between generating a new transaction or selecting one from the queue are also provided.

Algorithm 2 GasFuzzer Algorithm for Gas-Greedy Strategy
Input: list of smart contracts to test (c i ∈ C) list of seed transactions: c i , t j , gas j , priority:1 ∈ Q seed prioritization factor: θ new transaction generation threshold: ρ Output: tuple M c i , vulnerability v i list of transactions and gas: c i , t j , The output is a list of tuples c i , v i that contains a detected security vulnerability v i for the vulnerable smart contract c i .(We note that seed transactions can be obtained from ContractFuzzer.)For each public or external (in the context of smart contracts written in Solidity) function in c i , a list of input parameters is obtained randomly and then represented as a transaction.Two sets Q out and M are initialized as empty sets (lines 1-2).In lines 3-4, a smart contract c i is chosen from C and transactions relevant to c i are extracted from Q into Q c .Then, a loop is initiated which selects a function f of c (lines 5-6) and in each iteration, a transaction t j is either obtained from Q c or generated randomly based on the parameter ρ (lines 7-11).If a new transaction is generated, its gas consumption (gas j ) is recorded after executing it through the blockchain (line 9).After that, t j is mutated to obtain a mutated transaction t m (line 12), which is then executed to get its gas consumption gas m (line 13).In case that the gas consumption of t m is greater than that of t j , the mutated transaction t m is added to Q c along with its gas consumption gas m and priority of 1 (lines [14][15].The priority of the chosen transaction t j is reduced for further selection by multiplying it with θ (line 16).
Any security vulnerabilities found by the test oracles are recorded in set M (line 20).In the end, all the transactions in Q c are added to Q out for usage in the next phase of gas-leveling.
In comparison to ContractFuzzer, GasFuzzer, with this strategy, keeps a record of gas consumption for each selected transaction.It mutates transactions in the inputs queue to change input parameters in a manner that tends to increase gas consumption and employs mutation operators similar to those used in [28].However, unlike traditional gas-unaware fuzzers, mutations in GasFuzzer are only applied to transactions that call functions requiring input parameters to be provided for successful execution.Moreover, as presented in [20], seed prioritization is important in making fuzz-testing increasingly cost-effective.A seed prioritization scheme to improve the diversity of input transactions has also been adopted in GasFuzzer.We further explain these strategies in Section VI.
Consider the exemplified smart contract presented in Fig. 1 in the context of Algorithm 2. For the tokenHolder smart contract, transaction at index t 1 in Table 1 is considered as the seed transaction with priority 1 in Q c (line 4).This transaction is picked up as t for mutation (line 10) and input 0 is mutated to 7 (i.e., t 2 from Table 1) producing t m (line 12).The transaction receiveToken(7) increases gas consumption from 45,992 to 50,256 and thus according to line 14, t m is added to Q c with priority 1 (line 15) while reducing the priority of t by a factor of θ (line 16).In the next iteration, a random transaction (t 3 ) is generated (line 8) which is receiveToken(6) and its gas consumption is recorded to be 35,256 (line 9).This transaction is also mutated and input 0 is generated which reduces the gas consumption and does not get included in Q c .Security vulnerability analysis is performed in the end using test oracles to analyse the execution logs.

B. GAS-LEVELING STRATEGY
In the gas-leveling strategy, as depicted in Fig. 4, the gas allowances of gas-expensive transactions are manipulated to find out if an insufficient gas allowance leads to storage changes in the blockchain that should not be taking place, as described through our example in Fig. 1.It is also important to point out here that only successful transactions, the transactions that do not revert or discover a security vulnerability, from gas-greedy strategy were considered for the gas-leveling strategy to extract the gas-expensive transactions in the experiment.If all transactions were to be considered, then transactions that have already triggered a security vulnerability in the first strategy could result in producing duplicate results.
Algorithm 3 explains the gas-leveling strategy of GasFuzzer.The set of transactions from the gas-greedy fuzzing (Q out ) are provided for each contract along with values γ , k, m and ε which are the number of expensive transactions to extract, number of sections to divide gas consumption, number of transaction for gas allowance  manipulation and coverage threshold respectively.In the beginning, an empty set M of security vulnerabilities is initialized (line 1).Then a contract c is chosen for testing and relevant transactions for c are extracted from Q out to Q c (lines 2-3).After this, a queue of gas-expensive transactions Q e is initialized and top γ gas-expensive transactions for each function f are extracted into it (lines 4-9).For this purpose, an empty queue is initialized (line 4), followed by a function-wise iteration over the contract c (line 5).In each iteration, transaction for a function f are extracted into Q f (line 6) and a cutoff point for gas-consumption is decided by first choosing the top γ gas-expensive transactions and then taking the minimum gas allowance among them (line 7).Each transaction in Q f that has a gas-consumption greater than cutoff is extracted into Q e (line 8).
As stated in Section I, GasFuzzer includes a novel coverage-based test data adequacy criterion (referred to as the gas-coverage), which is as follows: A gas coverage map G is created for each transaction in the sequence Q e over k gas intervals (lines 10-11).Lines 13-20 iterate over the transaction sequence Q e by mutating allowances for m transactions in each iteration which transforms Q e into Q m .G is updated in each iteration for gas interval coverage (line 17) and the mutated transaction sequence Q m is executed (lines [19][20].The loop executes until a coverage of ε is achieved over G.At the end, the execution log is analyzed for detecting security vulnerabilities which are added to M if detected. Consider the same exemplified smart contract from Fig.  (t 5 , 50256), . . .).It needs to be pointed out that Q e will contain transactions for various functions but we are focusing only at one function for the sake of brevity here.Next, a gas coverage map G is created.To fill G, each g i is divided into 3 intervals (value of k).The gas map G is a 2-dimentioanl array with each entry consisting of the input data and 2 gas values (corresponding to the interval with a high and low value) as the key and a boolean (cov) to be the value that keeps a record if this particular interval has been covered or not.For instance, the entries corresponding to (t 2 , 50256) in Q e will be of the form (t 2 , 0, 16752)=, (t 2 , 16752, 33504)=, (t 2 , 33504, 50256)= .The symbol signifies that a particular section has not been covered and its value is false.Next, Q e is mutated into Q m by modifying the gas allowance for some (value of m) transactions.For each transaction in Q m that 99558 VOLUME 8, 2020 gets modified, for example (t 2 , 50256) to (t 2 , 35000), the corresponding entry in G for t i , where g m falls within the interval is updated and boolean cov is set to (signifying that the value is true).This corresponds to the last interval that we stated above which now will look like (t i , 33504, 50256)= .Then Q m is executed by GasFuzzer and execution logs are analyzed in the end where the Exceptions Disorder security vulnerability in t 6 is successfully detected.This process is repeated until a coverage of ε is achieved over G.

VI. EVALUATION
In this section, we present an evaluation of GasFuzzer.We first describe our dataset, experimental setup and procedure.Then in the experiment, we aim to answer the following three research questions: RQ1: Is the gas-greedy strategy of GasFuzzer more effective than ContractFuzzer in exposing Exceptions Disorder Security Vulnerability?RQ2: Is the effectiveness of the gas-greedy strategy of GasFuzzer at par with ContractFuzzer in exposing other types of security vulnerabilities?
RQ3: Can Exceptions Disorder vulnerability detection be further improved by GasFuzzer through its gas-leveling strategy with the use of gas-coverage criterion?
In the end, we discuss the threats to validity and future work.

A. DATASET
To set up our experiment, we obtained a dataset of 3170 smart contracts from Etherscan.io[9] deployed on Ethereum Main-Net between October 2017 and July 2019.The standard of smart contracts has evolved quickly.The dataset of smart contracts has been made available online. 1n recent two years, a limit had been imposed by Etherscan.io2on the number of smart contracts that can be downloaded.To operate under this constraint, we searched for verified contracts [10] with popular keywords in their names and then extracted their Solidity source code, bytecode, ABI and constructor arguments to be used in our experiments.
From Table 2, on average, each contract in our dataset consists of around 158 lines of code with the median being 134.Lower and upper quartiles are at 53 and 196 with the largest of contracts reaching almost 1500 lines of code.Each contract on average has 23 functions where upper and lower quartiles are 8 and 41 respectively.The number of pure3 functions is much lower as compared to total number of functions which indicate that most functions in these contracts affect the state of the blockchain.Each contract on average makes about 4 calls to other contracts.Median, lower and upper quartiles are 2, 0 and 6 respectively.Mean number of payable4 functions per contracts is 1.57 with median being 1. Lower and upper quartiles are 0 and 2 respectively.

B. SETUP OF GASFUZZER
We ran our experiment on a desktop computer that was equipped with 64GB of RAM, an 8-core Intel Xeon 2.2 GHz processor with Ubuntu 18.04 running.Our experimental setup was inspired by ContractFuzzer so we also used the same instrumented GETH client (version 1.7.0) to interact with a private Ethereum Blockchain.We deployed all the smart contracts in our dataset to this private chain (i.e.our TestNet).The mining difficulty is set very low so that our transactions could be mined easily and without any unnecessary delay.We carried out our experiment with only one miner since none of the smart contract security vulnerabilities that we were considering should have any impact due to different numbers of miners.

C. EXPERIMENTAL PROCEDURE 1) PRE-PROCESSING
GasFuzzer started with a phase of light-weight static analysis of each smart contract in the dataset.This helped in performing realistic message calls upon real contracts from the Ethereum Main-Net.In contrast to ContractFuzzer, which provided random inputs from a pre-written file for address type inputs, our implementation of GasFuzzer provided address inputs of actual smart contracts deployed on the TestNet and these contracts were downloaded from the actual Ethereum Main-Net in an automated manner.To accomplish this task, a light-weight static analysis is performed upon the binary files of smart contracts in the dataset.During this process GasFuzzer looked for function signatures that appeared right after an external message call to other contracts.Once a set of these function signatures was obtained, GasFuzzer searched on the Main-Net for any contracts containing functions that matched with the function signatures in these message calls.Fortunately, a public dataset for Ethereum Blockchain was available on Google BigQuery [13] for GasFuzzer to perform this task automatically.For each function signature identified, GasFuzzer downloaded the bytecodes of 20 latest contracts containing matching functions and deployed them on the TestNet.These smart contracts can be referred to as the dependency contracts.The addresses on which these downloaded contracts got deployed were then provided as options of address type inputs to be chosen for fuzzing inputs wherever an address type of value was required to produce a valid message call.Following the deployment of all the smart contracts in our data set and their dependency contracts, the fuzz testing process was ready to be started.

2) EXPERIMENT DETAILS
We obtained ContractFuzzer from [18] and built on top of it to implement GasFuzzer.Like ContractFuzzer, GasFuzzer also calls each smart contract with the following three types of accounts: One was the account that was used to deploy these contracts, hence making it the owner account.The second one was an Externally Owned Account (EOA) which has never interacted with the contract before.The last one was an Agent account which was an attacking contract to look for any Re-entrancy bugs.The same Attack Agent contract, employed by ContractFuzzer, is used by GasFuzzer as well.However, different from ContractFuzzer, GasFuzzer tested each contract in our dataset on an individual basis.After deploying all the contracts at once, a separate fuzzing process was started for each contract generating separate logs for each smart contract and hence each smart contract's analysis can easily be done on an individual basis.
For each contract in our experiment regarding the gas-greedy strategy, each function was called about 30 times on average with a variety of inputs and a gas allowance of 80,000.As all the smart contracts in our dataset are deployed at the beginning of fuzzing process, the state of each smart contract is set to default and no prior transactions exist for that particular smart contract.Transactions are sent to the blockchain in batches after generation and on average a batch consists of about 30 transactions.These transactions are picked up and mined.We would like to point out here that some transactions never get picked up for mining and in our opinion this problem is related to the Geth version being used.Overall, this number was very small and had no significant effect on the experiments.It is necessary to highlight at this point that experiments conducted for gas-leveling strategy are not yet fully automated.For gas-leveling strategy, a value of 5 is chosen for γ , k and m, while ε is set to 70% (from Algorithm 3).
The test oracles in the ContractFuzzer tool [18] were used to detect security vulnerabilities.In particular, the oracle for Exceptions Disorder checks a call chain for any cases where the root call throws no exception while one of the nested message calls throws one and it was never handled properly.

3) MUTATION STRATEGY
In the gas-greedy strategy, GasFuzzer should mutate transactions.Two operators were implemented for this purpose in the experiment to demonstrate the feasibility of GasFuzzer.The first mutation operator is randomly flipping the bits of an input parameter and the second one is the addition of random bits to the input parameter.Furthermore, these mutations were applied selectively on certain input types.For an address type input no mutation operation was applied and the parameter was provided as it was before.For bools, uints, ints and other fixed length input parameters, only the first mutation operator was applied and for other input parameters such as bytes and strings any one or both of the mutation operators was applied.Even though two basic mutation operators were used in the experiment, the experimental results have already shown that GasFuzzer can be effective.

4) SEED PRIORITIZATION
In the gas-greedy strategy, GasFuzzer needs seed prioritization.It started with assigning a uniform priority to each seed input and every time a seed input was used to generate a new gas expensive transaction, the priority of the original seed was reduced with θ = 0.5 in Algorithm 2. In addition, GasFuzzer kept introducing new randomly generated transactions into the process.Whenever a transaction was to be picked for fuzzing, GasFuzzer chose between extracting a transaction from the seed input queue or generating one from scratch.In the experiment, GasFuzzer was configured to maintain a balance between these two options by giving a 70% probability (ρ = 0.7 in Algorithm 2) to generate a new random transaction and the remaining probability (i.e.30%) for reusing an existing transaction from the seed queue.

D. DATA ANALYSIS
In this section, we report our findings and answer the RQs.We use GF and CF to signify the GasFuzzer and ContractFuzzer implementations, respectively.

1) ANSWERING RQ1
To answer RQ1, we compare the number of Exceptions Disorder security vulnerabilities detected by CF and GF.Table 3 summarizes the results.From the table, GasFuzzer was more effective than ContractFuzzer in the detection of Exceptions Disorder by 28%.The Exceptions Disorder security vulnerability surfaced when two or more contracts interacted with each other via lower-level message calls, e.g., address.call().Suppose an EOA initiates a transaction for contract C 1 followed by C 1 interacting with C 2 via a later message call.If an exception occurs at any step in the call chain, then that exception should be passed back to its caller function.However, if a necessary reversion in those contracts does not take place, unexpected states may be resulted.
GF produced transactions that tended to be more gas-expensive than CF.In case a transaction produced an Out-of-Gas Exception, which normally it would not, and that exception had not been handled properly by the involved smart contracts, unexpected states may be achieved as a result.From the results reported in Table 3, GF was shown to be more effective than CF in finding such unhandled cases.
Answer to RQ1: From the experimental results, GasFuzzer is likely more effective in finding Exceptions Disorder security vulnerabilities than ContractFuzzer.

2) ANSWERING RQ2
We compare the effectiveness of GF with CF to see whether there is any significant decline in the effectiveness of GF on other types of security vulnerabilities that CF can detect.
The results in Table 4 show that there are no significant performance degradations of GF in finding other security vulnerabilities detectable by CF.Both GF and CF do not find any cases of Dangerous DelegateCalls or Freezing Ether on our dataset.As for Gasless Send, Re-entrancy and Block Number Dependency, both the tools found the same number of security vulnerabilities.As for the Timestamp dependency, which was very similar to Block number dependency, we observed a slight drop in the number of cases identified by GF.Upon closer inspection of these cases, it was found that the use of block timestamp was done under certain assertion conditions.For example, in a contract named lockEtherPay, hardcoded timestamps were being used to control when Ether transfer can take place.By the time the reported experiments were conducted using GasFuzzer, the end time to allow Ether transfer had passed.Since no Ether transfer could take place, GasFuzzer did not identify this smart contract as vulnerable to Timestamp Dependency.Answer to RQ2: We find no significant performance degradations in the ability of GasFuzzer to detect other security vulnerabilities in comparison with ContractFuzzer.

3) ANSWERING RQ3
The Exceptions Disorders vulnerability has been discussed in RQ1.Results from the RQ1 gave us an insight that a fuzzer could further improve its detection ability on this type of security vulnerability by not only manipulation of the input value provided in a transaction but allocating a varying gas allowance may also help affect the security vulnerability detection rate.Our experiments show that manipulating gas allowances for expensive transactions has a positive effect on security vulnerability detection.
As a result of this experiment, GasFuzzer found 6 additional Exceptions Disorder vulnerabilities and the vulnerable contracts are listed in the Table 5.Most of the smart contracts listed here were unable to ensure whether a message call to an external smart contract had been executed successfully.CoinContract is the exception among these which received the returned value from a call but never verified if the value is false to consequently revert the parent calls as well.Among these contracts, only X2ETH has been self-destructed while others are still live on the Ethereum network which is why the addresses of these smart contracts are not being disclosed in this paper.These results show that certain security vulnerabilities can only be detected under certain pre-conditions which did not necessarily have to be triggered at specific states of the blockchain.It was also found that even initiating transactions with a wrong gas allowance can lead to unexpected results.These security vulnerabilities are very hard to be identified by current fuzzing tools since a high gas allowance is usually reserved in a testing environment.
We also analyzed the execution of the transactions by both GF and CF to look for cases where Out-of-Gas exceptions were thrown due to increased gas requirements.For RQ3, 0.27 million transaction sequences were executed and almost all of the transactions with revised gas allowances threw exceptions except for the cases we reported or maximum gas allowance was provided.
Answer to RQ3: The results show that six new Exceptions Disorder security vulnerability can be identified by GasFuzzer, and all of them are previously unknown real bugs.

E. FURTHER DISCUSSION
From the data analysis reported in the previous sub-section, we believe that gas-related security vulnerabilities were mostly hiding in plain sight, but they can only be exercised under special conditions.The gas allowance assigned to a transaction was critical to detect those security vulnerabilities  as revealed through answering RQ3 (gas-leveling strategy).A simplified version of the AirDrop smart contract has been provided in Fig. 5 where the function transfer expects two addresses and an unsigned integer.After performing some internal state changes, this function makes an external call to a smart contract deployed at the first input address with a function that matches the signature calculated in id.As explained earlier in Section VI that GasFuzzer initially performs a light-weight static analysis looking for such calls to download and deploy matching realistic smart contracts, a realistic message call can be made.The gas-leveling strategy makes sure that such calls are made with varied gas-allowances so that security vulnerabilities can be identified even if the underlying logic in the target smart contract is error free.In this AirDrop smart contract updates its internal storage for some transfer and then an external call is made to transfer some assets to an address.If due to insufficient gas allowance, the message call throws an error-out-of-gas exception which is not checked (such as in this case), an inconsistency will arise.
Without these pre-conditions achieved, it will be quite difficult to hunt out these gas-related security vulnerabilities.Moreover, from our experiments, we observed that both manipulating transactions with higher gas consumption or with a wrongful gas allowance can lead smart contracts to run into dangerous state transitions.
In future, we will further generalize GasFuzzer for it to be able to perform gas-aware fuzzing better by manipulating gas allowance for gas-expensive transactions.We will work on making this process more generic and automated to be able to perform it in an effective manner.

F. THREATS TO VALIDITY
In this section, we present the threats to validity of the experiment.
The evaluation is based on a set of smart contracts deployed in the public blockchain within a particular period.Using other periods will produce a different dataset, and we will see a different number of vulnerabilities detected in the corresponding experiment.However, since we did not know which contracts containing the detected vulnerability, the experiment is still fair in comparing CF and GF.We tend to believe that GF will still detect more Exceptions Disorder cases than CF on other datasets due to its intrinsic ability to distinguish transactions with insufficient gas and/or drive the mutations toward the high end of the gas consumption spectrum.
Algorithms 2 and 3 require some configuration parameters to be initialized.We only evaluated CF and GF on one set of parameters, which already took weeks to complete due to the large amount of transactions produced, and at par with the scale of the experiment reported in the original paper of ContractFuzzer.Having said that, further generalization is necessary.
The experiment used the test oracle of CF.There may be bugs in the implementation and the set of test oracles of CF was limited.The use of other test oracles and their implementations may produce different results.
There may be implementation errors in GF.To alleviate this issue, we have tested GF on a small dataset of self-crafted smart contracts.
The experiment only implemented two mutation operators for GF.We tend to believe that the use of more mutation operations will produce the diversity of mutated transactions.In the literature on traditional mutation testing, the general trend is that a more diverse test suite tends to detect more failures.Apparently, the effectiveness of GF could be further improved if more mutation operators can be used, which requires further experimentation to confirm.
We measured the effectiveness of GF and CF by the number of instances detected by each kind of test oracle.The use of other criterion may produce different results.Moreover, due to the need to mutate transaction and maintain the data structure, GF is less efficient than CF in generating transactions for fuzz testing.One may consider that by allowing the same amount of time budget for either tool to test the same smart contracts, CF will generate more transactions than GF.In the current experiments, we set the same timeout limit to run both tools.

VII. RELATED WORK
Oyente was one of the first tools aimed at smart contract verification proposed by Luu et al. in [21].Oyente is a symbolic execution based smart contract verification tool that uses control flow graphs (CFGs) of smart contracts under test to perform symbolic execution on them.Oyente looks for any vulnerable patterns that may lead to the discovery of some types of security vulnerabilities that include transaction order dependency, re-entrancy and timestamp dependence.Using similar techniques based on Symbolic Execution, another tool called MAIAN was introduced in [23].MAIAN looks to find out whether a contract can be classified as greedy (contracts that can be made to transfer ether to an address it never transacted with), prodigal (contracts that tend to lock Ether under certain state conditions), suicidal (contract whose code can be removed from the blockchain by address that do not own it) or a combination of these.A major drawback of both these approaches is a high number of false positives.In [16] Securify [26] and Vandal [2] are both static analysis based smart contract verification tools.Securify establishes security patterns in a domain-specific language that are then verified for accordance or defiance.For contract analysis, a stackless static-single assignment form of the bytecode is used to deduce predefined semantic facts (data-flow/controlflow dependencies).Vandal, on the other hand, decompiles the bytecode and collects features of the smart contract under test as a datalog.These tools perform well on re-entrancy vulnerability.
ZUES [19] and SmartCheck [25] also try to verify smart contracts for security vulnerabilities but require the provision of source code.ZEUS uses a Solidity based Abstract Syntax Tree (AST) to gather policies which can be edited by smart contract developers.Different from ZEUS, SmartCheck converts Solidity code into Intermediate Representation (IR) which is XML based.This IR is checked for patterns, violation of which leads to detection of security vulnerabilities.Both these approaches operate on a wide range of security vulnerabilities but loose accuracy if properties are not well defined.
Echidna [7] and Harvey [27] use Fuzzing to concretely verify smart contracts.Echidna needs to be written by developers inside unit tests which again puts the strain on contract developers.Harvey on the other hand uses classic Greybox Fuzzing techniques like AFL [28] and AFL-Fast [1] to generate inputs but with a modification of input prediction.ContractFuzzer [17] is a black-box fuzzer that generates random transactions to find security vulnerabilities in Ethereum smart contracts.It uses the Application Binary Interfaces (ABIs) of the smart contracts to generate transactions without any feedback from the execution and employs an instrumented EVM to execute these transactions.Execution logs are then analyzed for security vulnerability detection.In [15], He et al. propose a fuzz testing tool which tries to behave like a symbolic execution engine for smart contracts.Imitation learning is used to train a fuzzer on large number of inputs generated through a symbolic execution engine proposed in [24].The fuzzer is basically a set of neural networks which had been trained on the dataset of generated transactions.A recent work by Nguyen et al. [22] presents an adaptive fuzzer which applies a light-weight multi-objective strategy to target difficult to reach branches in Ethereum smart contracts.EvmFuzzer [12] is also a tool that uses fuzz testing, not to verify smart contracts for security vulnerabilities, but to identify discrepancies among various implementations of EVM in different programming languages.
The only previous works that discuss gas consumption as an important entity are Gasper [3] and GasReducer [4] but these tools only aim at finding out patterns in smart contracts that lead to gas wastage.The focus here is not to consider gas consumption for security verification but to reduce the amount of gas consumed to make the process more cost-efficient.These techniques construct CFGs form bytecode to perform symbolic execution employing an SMT solver to discover possible execution paths.

VIII. CONCLUSION
In this paper, we have presented a novel technique GasFuzzer.It consists of two strategies.The gas-greedy strategy has been formulated based on the insight that gas consumption of executed transactions provides lightweight information about the executed program code to deal with blockchain states of the involved smart contracts.GasFuzzer used this aspect of information to iron out transactions subject to further generation of mutated transactions.The experiment has shown that GasFuzzer can detect more Exceptions Disorder security vulnerabilities than the previous black-box state-of-the-art technique ContractFuzzer by 28% while it does not compromise the ability to detect other kinds of security vulnerabilities.The gas-leveling strategy is novel in that it formulates a novel test data adequacy criterion and uses it to guide the generation of mutated transactions with lower gas allowances.The experiment has shown that this strategy is effective in detecting Exceptions Disorder security vulnerabilities that have been missed to expose in the experiment above.Through this work, we believe that by focusing on gas-expensive transactions and manipulation of gas allowance, one can significantly improve the fuzz testing process for some of the most serious security vulnerabilities that can be induced in smart contracts.We plan to further explore along this research direction in the future.A version of GasFuzzer has been deployed in FUSE, an online fuzz testing service for Ethereum smart contracts under the HKSAR ITF (project no.ITS/378/18).

FIGURE 5 .
FIGURE 5. A simplified version of AirDrop smart contract.
FIGURE 1.An exemplified smart contract written in Solidity.
1 but this time in the context of Algorithm 3. Suppose that Q out in Algorithm 3 contains all the transactions from gas-greedy strategy.The transaction sequence Q c contains transactions t 1 to t 5 from Table 1. . 2 transactions (value of γ ), in the context of function receiveToken(t), that consume the most gas are used to form a sequence Q e containing transactions t 2 and t 5 from Table 1 (i.e., . . .,(t 2 , 50256), Algorithm 3 GasFuzzer Algorithm for Gas-Leveling Strategy Input: list of smart contracts to test (c i ∈ C) list of transactions: c i , t j , gas j ∈ Q out number of gas expensive transaction for each function: γ number of section to divide gas consumption: k number of transactions for gas manipulation: m coverage threshold: ε Output: tuple

TABLE 2 .
Descriptive statistics of our dataset.

TABLE 3 .
Summary of exceptions disorder vulnerabilities detected.

TABLE 4 .
Effectiveness comparison of GF and CF on other five types of security vulnerabilities.

TABLE 5 .
Detected smart contracts with security vulnerabilities in RQ3.
, Jiang et al. present Artemis, which is an improved smart contract security vulnerability detection 99562 VOLUME 8, 2020 tool that is based on Oyente.Artemis can effectively find out four types of security vulnerabilities such as Freezing Ether, Block number dependency, Expensive Fallback and Dangerous Delegatecall.