Runtime Evolution of Bitcoin’s Consensus Rules

—The runtime evolution of a system concerns the ability to make changes during runtime without disrupting the service. Blockchain systems need to provide continuous service and integrity. Similar challenges have been observed in centrally controlled distributed systems or mobile applications that handle runtime evolution, mainly by supporting compatible changes or running different versions concurrently. However, these solutions are not applicable in the case of blockchains, and thus, new solutions are required. This study investigates Bitcoin consensus evolution by analysing over a decade of data from Bitcoin’s development channels using Strauss’ grounded theory approach and root cause analysis. The results show nine deployment features which form nine deployment techniques and ten lessons learned. Our results illustrate how different deployment techniques ﬁt different contexts and pose different levels of consensus failure risks. Furthermore, we provide guidelines for risk minimisation during consensus rule deployment for blockchain in general and Bitcoin in particular.

Runtime Evolution of Bitcoin's Consensus Rules Jakob Svennevik Notland , Mariusz Nowostawski , and Jingyue Li , Senior Member, IEEE Abstract-The runtime evolution of a system concerns the ability to make changes during runtime without disrupting the service.Blockchain systems need to provide continuous service and integrity.Similar challenges have been observed in centrally controlled distributed systems or mobile applications that handle runtime evolution, mainly by supporting compatible changes or running different versions concurrently.However, these solutions are not applicable in the case of blockchains, and thus, new solutions are required.This study investigates Bitcoin consensus evolution by analysing over a decade of data from Bitcoin's development channels using Strauss' grounded theory approach and root cause analysis.The results show nine deployment features which form nine deployment techniques and ten lessons learned.Our results illustrate how different deployment techniques fit different contexts and pose different levels of consensus failure risks.Furthermore, we provide guidelines for risk minimisation during consensus rule deployment for blockchain in general and Bitcoin in particular.Index Terms-Bitcoin, blockchain, consensus, grounded theory, root cause analysis, runtime evolution.

I. INTRODUCTION
D EPLOYMENT of consensus changes are the most impor- tant yet controversial [1], [2] and error-prone [3], [4], [5] activities in a blockchain.These changes redefine the fundamental behaviour of a blockchain, which can affect its security and the value of its currency.Trivial changes could cause disruption, which may result in suspended services [6], loss of mining revenue [4] and theft [7], [8].
Runtime evolution concerns the deployment of changes in a system while it is running.The concept is most relevant for critical systems that cannot afford to halt their services during an upgrade [9].The critical system in the case of blockchains is a payment system which must be continuously operational.Challenges in runtime evolution have traditionally been handled in distributed systems that can be centrally controlled.A single person can practically halt or roll back if any exceptions occur.Furthermore, distributed systems can handle different versions running concurrently [10].
In contrast, Bitcoin and other decentralised blockchains are autonomous systems that, by design, cannot be controlled centrally [11].The design of blockchain enables immutability and constant uptime, which are principles conflicting with the ability to correct flaws.Blockchains also require all operational nodes to run compatible versions to be part of the same consensus agreement and consistently evaluate state changes.The differences between distributed and decentralised systems introduce challenges regarding technical and governmental aspects of runtime evolution.
This research is conducted to understand the implications of consensus rule changes, techniques for a safe transition, and crisis management.There are two research questions, focusing on the current practice for consensus changes in blockchains.
• RQ1: What techniques have been applied to deploy consensus changes?• RQ2: What are the lessons learned from deploying consensus changes?The motivation of this work is to collect knowledge on system evolution techniques from the Bitcoin financial system and to gather unknown known security requirements.Unknown known security requirements may have appeared as security incidents, but they are unknown to requirements engineers [12].Therefore, our proposed techniques and security requirements may be well-known to a seasoned Bitcoin developer.However, any other blockchain engineer may have to shuffle through thousands of unstructured data samples to realise these security requirements.Rashid et al. suggest combining grounded theory (GT) analysis and incident fault trees to discover the root of these unknown known security requirements [12].Similarly, we apply the Straussian GT [13] approach in combination with Ishikawa diagrams [14] for root cause analysis.
This study has been conducted as a qualitative analysis, covering 34 consensus rule changes over more than a decade of Bitcoin development and entails 1700 samples.Samples correspond to email threads, forum threads, Github issues/pulls, IRC days, and improvement proposals.
The results from RQ1 suggest nine features as building blocks for consensus evolution, e.g. a feature for the flag-daylike triggering of deployments.Additionally, we propose how the building blocks can be combined into nine deployment techniques, e.g. a blockchain community can use the Miner Activated Reduction Fork (MARF) technique to coordinate the super-majority deployment of backwards-compatible changes.
The results demonstrate that deployment techniques pose a trade-off between changing the functionality of a blockchain and maintaining the consistency of the transaction data and the corresponding community cohesion.This means that blockchain engineers might have to choose between realising consensus changes and maintaining compatibility with legacy implementations and consensus participation for all actors in the system.Sometimes participants in a blockchain community cannot agree on how to evolve functionality consistently.In these cases, the project might end up with forked chains such as Bitcoin Core (BTC) versus Bitcoin Cash (BCH).
With the experience and the analysis of Bitcoin's consensus rule evolution, we have identified ten lessons learned: • Missing transformation assurance The main contributions of the study are: • We propose unique features and deployment techniques to enable safer evolution of consensus rules.• We propose theories about the trade-offs between consistent consensus evolution and evolving functionality of the blockchains.• We show which issues lead to consensus failure and suggest how to avoid and handle different crisis scenarios during deployment.The document structure is as follows: Section II explains the background of blockchain and describes software and system evolution.Section III outlines the research design and implementation.Section IV presents the deployment features and techniques.Section V presents the lessons learned.Section VI discusses the results.Section VII concludes the study and proposes future work.Throughout the article, we include inline quotes and related quotes in Appendix Section A (see the supplementary material) to provide evidence and strengthen our claims.

A. The Bitcoin Consensus Protocol
Bitcoin is "a peer-to-peer electronic cash system" [11] consisting of a chain of blocks containing transaction history, see Fig. 1.Any new block must abide by the consensus rules to be regarded as valid by the nodes in the network.Miners attempt different values for the nonce variable in a brute-force manner to produce a SHA-256 hash based on the entire block.The miners find a valid nonce for the block header when the resulting hash meets the required target difficulty.The difficulty indicates that the resulting hash must have a certain number of leading zeros.This mechanism is known as Proofof-Work (PoW), first proposed to prevent email spam [15] and later applied for cryptocurrencies [16].In the case of Bitcoin, PoW prevents Sybil attacks [17] and provides immutability of  transaction data.The strength of these principles is preserved by the difficulty adjustment algorithm [11], ensuring that the network will produce blocks with an average rate of around ten minutes.Miners must comply with further consensus rules.Generally, blocks and transactions must be in a valid format.Miners are incentivised to follow these rules by collecting fees and the block reward generated in a special coinbase transaction with no inputs from previous transactions.
Block collisions occur when two new valid blocks are produced at the same height at approximately the same time.Collisions are resolved by the longest (valid) chain rule as specified in Nakamoto's whitepaper: "The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it" [11].The notion of a valid chain is important because validness is subjective from the implementation's point of view.Any collision should quickly resolve when another block is appended on top of one of the colliding blocks.This behaviour implies that blocks have a slight chance of being orphaned (discarded) as illustrated in Fig. 2. The informal recommendation to prevent financial loss from orphaned blocks is to wait until the block containing the relevant transaction has at least six blocks built on top [18].Collisions happen naturally, by an attack [19], or by inconsistent consensus validation [4].

B. Bitcoin Consensus Evolution
An inherent ideology within Bitcoin, especially BTC, indicates what changes are viable and which are controversial.One quote shown in Listing 1 from the creator(s) can be seen as a cornerstone of this ideology (additional related quotes are in Section A.1, available online).The nature of Bitcoin is such that once version 0.1 was released, the core design was set in stone for the rest of its lifetime.
The statement from Nakamoto explains that the system itself, as well as the original specification [11] defines Bitcoin's fundamental behaviour.Moreover, it highlights the importance of non-disruptive changes, no matter how insignificant they seem.Our paper distinguishes between Bitcoin, Bitcoin Core (BTC) and Bitcoin Cash (BCH).Bitcoin is the idea of peer-to-peer electronic cash envisaged by Satoshi Nakamoto.BTC implements Bitcoin, preserves most compatibility with Nakamoto's original implementation, and contains the highest Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
value [20] and consensus participation [21].BCH is a minority fork that created an alternative chain by introducing a backwards-incompatible consensus rule in 2017 [22].
The BTC and BCH communities have different approaches to consensus changes.Consensus rule changes in BTC should preferably allow backward compatibility such that legacy nodes can accept any new behaviour, keeping the network consistent (so-called soft forks).Backwards-incompatible changes (socalled hard forks) are preferred in BCH and sometimes required in general if the fundamental implementation does not work as intended.Such a change could be essential to prevent exploits or allow for the adaption and survival of the system.As shown in Listing 2, the Bitcoin community is sceptical of consensus changes and making them a habit because bugged or ill-intended code may be deployed and disrupt the integrity and stability of the system (additional related quotes are in Section A.2, available online).
Listing 2. 2011-08-10 Gavin Andresen Email: bitcoin-dev 5. Testing.I don't have time to personally test every PULL request, but if a pull involves more than trivial code changes I'm not going to pull it unless it has been thoroughly tested.We had a very good rule at a company I used to work for-programmers were NOT allowed to be the only ones to test their own code.Help finding money and/or people for a dedicated "core bitcoin quality assurance team" is welcome.More unit tests and automated testing is also certainly welcome.
If this was open source blogging software I'd be much less uptight about testing and code review and bugs.But it's not, it is software for handling money.
We refer to conflicting blocks as chain splits.Chain splits can be temporary; less than six blocks will be orphaned, persistent; six or more blocks are orphaned, or permanent; both chains are expanded independently for the foreseeable future.An accidental chain split caused by inconsistent validation among the nodes will be referred to as a consensus failure.The longest chain rule is the most fundamental factor in deciding whether a rule change is successfully adopted in Bitcoin.The longest chain rule implies that the majority of miners can apply a network-wide backwards-compatible rule change.Relying on the majority is the preferred technique to deploy changes in BTC.
In Bitcoin, one may rely on the majority ( > 50%) of blockproducing nodes (n) to perform a backwards-compatible consensus change.The reason for this to work is that the Nakamoto consensus model has a Byzantine Fault Tolerance (BFT) [23] threshold of 50% (f ).Other consensus models might have different fault tolerance, such as 33% or 20% [24].We use the term super-majority (SM) to generalize the minimal threshold requirement for different deployment techniques and to describe the required threshold where faulty nodes are equal to or less than the tolerated threshold.A super-majority of abiding nodes (h) is denoted as h > (n-f).

C. Software Evolution
Software evolution is a field entailing processes and models for changing software.Within this domain, there is the subfield of runtime evolution [9].Relevant to blockchain and this study is mainly the challenge of avoiding service outages while changing the running system.In the case of blockchain, this also relies on consistency in the network, combined with the strict requirements of partition tolerance.Hence, minimizing the impact of a change on the consistency, availability, and partition tolerance of the system (CAP) [28] is challenging.
To distinguish and characterise consensus changes, we initially categorise consensus changes by different types of maintenance [9]: • Adaptive maintenance: Change the deployed software to adapt to a changing environment.• Perfective maintenance: Change the deployed software to perfect existing functionality by improving the user experience or performance.• Preventive maintenance: Change the deployed software to avoid faults before they occur.• Corrective maintenance: Change the deployed software to correct a fault discovered.

D. Evolution and Maintenance in Distributed Systems
Before the decentralised networks, there were, and still are, distributed networks dominating sectors that blockchains now target, such as finance [29], [30], logistics [31], [32], and healthcare [33], [34].Researchers in these areas use frameworks for maintenance where updates are deployed with central control and rely on either being compatible [35] or running different versions in parallel [36].In these cases, changes were deployed with fast reboot, rolling upgrade, and big flip [10].However, we argue that these methods are not directly applicable when deploying consensus rule changes on a blockchain.Firstly, known techniques are hard to coordinate without central administration.Secondly, a blockchain consensus rule change conflicts with legacy rules, changing the set of valid actions.Some techniques may drastically affect the stability of a blockchain, e.g. total network hash power.Thirdly, running different versions in parallel is problematic as it may cause consensus failures.

A. Research Motivation and Research Questions
Blockchain systems must evolve to meet current and future requirements.Although there have been studies on different kinds of consensus code changes in blockchains [37], there continues to be a considerable lack of understanding of how these changes should be deployed safely, and how to handle failure modes.A flawed deployment can result in financial loss for invested participants or loss of faith in the system, undermining the value of the contained cryptocurrency.Throughout Bitcoin's history, there have been cases of suspended services [6], lost mining revenue [4], and theft [7], [8].One example shown in Listing 3 shows that consensus changes can cause critical incidents in a blockchain (additional related quotes are in Section A.3, available online).Thus, timing and correctness are vital to minimize these incidents' damage.Therefore, we investigate: What techniques have been applied to deploy consensus changes?(RQ1) and What are the lessons learned from deploying consensus changes?(RQ2).
Listing 3. 2020-08-03 Eric Lombrozo [38] Every time that you open up the door to changing the rules, you are opening yourselves up to attack

B. Research Method
To answer RQ1, we used a qualitative and inductive approach to achieve an insightful and holistic view of consensus changes in blockchains.We chose Strauss' approach of grounded theory (GT) as it is well-suited to studies with predefined research questions [39].The GT approach is an iterative and recursive approach where the researchers must go back and forth until they achieve theoretical saturation, i.e. when new samples stop expanding the developing theories.The observations throughout the study are covert [40], where a researcher can get the most authentic experience of how the actors conduct a process.Although covert observations can be ethically questionable, it is crucial to consider that the public archives of Bitcoin were created for this purpose and these archives aid transparency and accountability.To answer RQ2, root cause analysis has been utilized to address lessons learned by drawing an Ishikawa diagram [14].The approach is similar to "Discovering unknown known security requirements" [12] as it also uses concepts from GT and root cause analysis with incident fault trees.

C. Data Collection and Filtering
The study started with purposive sampling [40] of data archives, specifically Bitcoin Core's development channels.
Samples were discovered in these archives by purposive sampling and filtering relevant to consensus rule changes.The selected samples were efficiently sorted using flexible coding [41].Further, we applied snowball sampling and triangulation [13] to avoid limitations from the initial samples.A graph of the research method is outlined in Fig. 3.
1) Purposive Sampling: The initial step in collecting data related to the consensus changes in Bitcoin was to identify the consensus-changing events in Bitcoin.An exhaustive list of all the consensus changes throughout Bitcoin Core's history is listed in the Bitcoin Wiki [42] and in Table I, highlighting eventual issues and maintenance types.Further inspection reveals an overview of the main events and information such as the time, block height, version number, and deployment techniques.Development channels were identified as the most fruitful sources, and their data was extracted to initiate the purposive selection of samples and the filtering process.
The channels selected as initial data were the Bitcoin improvement proposals (BIPs) [43], the bitcoin-dev emails [44], the Development & Technical Discussion topic in the Bitcoin forum [45], the code repository (pull requests [46] and issues [47]), and IRC channels (#bitcoin-dev and #bitcoin-core-dev [48]).The total sample count is 1700 and is available online [49].One sample corresponds to one proposal, one thread (email, forum and GitHub), or one day of IRC messages.Figs. 9 and 10 in Appendix Section A (available online) depict the distribution of samples over archives and events.
However, the sample boundary was fluid, so samples from other domains were considered whenever they appeared.These could be domains such as other Bitcoin channels, announcements, news articles, magazines, videos, or other cryptocurrencies.The additional sources primarily strengthened the theories rather than expanding them, which shows the relevance of the development channels initially selected by purposive sampling.
2) Data Filtering: After acquiring an overview of consensus change events and corresponding BIP specifications, filtering was applied to the initial data.The first data considered were emails from the mailing list.With a relatively compact overview of all the threads made, it was considered viable to traverse through the titles to purposefully select relevant samples.
After the email samples were included, the next step was the analysis of the forum texts.However, the forum source proved large and challenging to sort through.Therefore, a search was conducted in two stages: first, all the threads leading up to and surrounding the dates of each consensus incident were checked for relevance and purposively sampled.Second, the forum was filtered with the help of a search tool targeting the Bitcoin forum [50].This filtering approach was also applied to the GitHub search.The relevant BIPs could be found as they directly correlate to consensus changes and deployment techniques.
We used the search strings in Table I to filter the Bitcoin forum and GitHub.We based our initial search on Bitmex's overview of consensus changes [51], which did not contain the consensus changes after 2017.We used unique identifiers to search for samples related to consensus changes: Title, BIP number, changed code, and block height.The consensus-changing events after 2017 were found later by snowball sampling.
Regarding the IRC channels, we discovered samples by looking for scheduled developer meetings and used a few search strings to find other relevant discussions.There were only four meetings found before weekly meetings were established in the fall of 2015.The search was conducted by applying a few search strings that could indicate conversations about the issues of consensus changes that have taken place.These strings were "fork," "chain split," "stuck," and "reorg".Using grep [52], the files were traversed all together, showing each related sentence.The whole log from each sample was collected whenever indicating some value or relevance.
The search strings used for IRC samples are fewer and more general than those used for the other archives.That is because almost as many IRC samples are available as there are days since the first conversations from 2010.Additionally, a single IRC sample often contains discussions on many different topics.Therefore, the search strings in Table I would provide too many samples, making them infeasible to sort out.We focused on search strings indicating how developers assessed concerns on consensus failures during deployment.
The size of the resulting data set of IRC samples indicates that the search strings were accurate and broad enough to catch relevant samples.For instance, it revealed samples explaining how the deployment techniques were initially implemented and further developed to avoid failure.Additionally, the logs around important dates were purposively sampled and inspected.Finding relevant samples also became simpler from the autumn of 2015 as the weekly meetings could be collected.A challenge with the IRC logs has been that the logs are somewhat dispersed between different and inconsistent archives [53], [54], [55], [56].Therefore, it became clear that all the different archives had to be considered when searching.

D. Data Analysis
Although Strauss' approach [13] was promising to answer the research questions, it became apparent that the grounded theory approaches describe little on how modern data analysis tools should be utilized most effectively when performing analysis with a large number of samples.To cope with this, we applied flexible coding proposed in [41], which explains ways that large sets of data can be collected and coded through qualitative data analysis software (QDAS) such as Atlas.ti[57], NVIVO [58], or MaxQDA [59].
1) Open Coding: The indexing approach was applied as a specific form of open coding to analyse large datasets using QDAS according to the guidelines on flexible coding [41].The purpose is to get an overview of the initial data and effectively define and evaluate codes and categories.In practice, the data was initially indexed on a sample basis to highlight the essence of each sample.The process of indexing was continuously evaluated as new codes and categories emerged.Memos were created to understand the correlation between codes, categories, and events.One specific code was used to highlight notable quotes that considerably impacted the results [41].This code always overlaps with some other code and is labelled "aha" to signal the aha experience that these quotes represent.Quotes labelled with this code were continuously revisited and were candidates to present and support the content of this paper.
2) Axial Coding: The axial coding phase was conducted by revising, combining, and splitting codes and categories.Some codes were combined with decreasing levels of granularity, and others were split to increase granularity and enhance insights.This stage also developed patterns and relations within the codes and categories.as shown in Fig. 4.
The open and axial coding processes also revealed issues highlighted in Table I.A few issues in Table I are related to wellknown general software engineering practices, such as introducing bugs caused by insufficient reviewing and testing.Other issues are blockchain-specific and should be analysed in depth.These issues include chain split, network partitioning, adoption, controversial changes, stuck nodes, conflicting proposals, and malicious behaviour.
3) Selective Coding: The process of selective coding would focus on saturating the main categories of the data, their correlation, and the root causes of blockchain-specific issues.We realize the causalities of where deployment issues are rooted and where they surface.The lessons learned were revised to reflect this, and the measurements to address these issues are summarized.

4) Constant Comparison and Theoretical Saturation:
The codes, categories and emerging theories were constantly compared by controlling whether they made sense regarding the research questions and the objective domain.It was assumed that the samples collected in the initial search could lead to an adequate theory in this paper.However, there was always a possibility that the current data set was too narrow.Therefore, the techniques of data saturation, snowball sampling [60], and data triangulation [13] were systematically applied.
The data saturation process led to additional samples that were not found in the initial data collection phase.The findings could, for instance, be missing links from one of the resources used in the data collection, or it could be an article from the Bitcoin Project's website [61], a blog post, or a video.These new samples were systematically collected by applying the snowball sampling technique.The essence of the snowball sampling technique is to include samples found by references in the collected samples.
Data triangulation was applied between the different resources to see the same phenomena from different perspectives.It was possible that when something important happened in one place, there would probably be more to read about from other samples.These techniques gave a rich data set with multiple perspectives.For instance, the IRC chat had inconsistencies on the 2015 event (version 0.10.0),where the samples of 2015-06-03, 2015-06-04, and 2015-06-05 were missing.Many other samples also revealed how this specific consensus failure was caused by custom and lazy validation and spy-mining [62].

5) Root Cause Analysis:
The appliance of Ishikawa diagrams [14] as seen in Fig. 11 in Appendix Section B (available online) further enhanced the analysis for RQ2 (lessons learned).The identified root causes were classified within the categories of human errors [63] to gain further insight.The different errors were applied as codes during the GT analysis as seen in Section V. Complementing the grounded theory approach with root cause analysis gave higher confidence in covering relevant issues and gaining in-depth understanding.

IV. RESULTS OF RQ1 (DEPLOYMENT TECHNIQUES)
The data analysis revealed a chain of events, shown in the timeline in Fig. 5, which summarizes 34 consensus-changing events, including 24 Bitcoin Core changes and 10 Bitcoin Cash changes.The figure indicates some issues where the deployment was performed by emergency, caused a chain Fig. 6.A minimal overview of analytical categories highlighting the pattern in the development cycle: An issue sparks the development processes, which eventually is realised by the deployment processes.The human error category could negatively impact both development and deployment, as further discussed in Section V.The social/political aspects decide whether a consensus change fork will be deployed and which deployment techniques are utilised.split, or other problems.These issues are further assessed in Section V on lessons learned.
The categories discovered and applied through grounded theory also revealed the process of defining, implementing, and deploying consensus rules, as shown in Fig. 6.In Bitcoin's case, these changes are usually motivated by some issues which prevent the implementation from providing the full service envisaged in Nakamoto's white paper or code.The discovery of an issue leads to development before moving on to deployment.The nine features for deployment were derived from the codes applied during analysis.Fig. 7 illustrates how the deployment features were derived.These features are used in nine combinations to define different deployment techniques that answer RQ1.

A. Deployment Features
The features for deployment are Deployment strategy, Fork type, Chain split risk, Parallel, Standard, Signal, Inclusive, Threshold and Trigger.
1) Deployment Strategy: The deployment strategy feature defines whether nodes 1) depend on each other to coordinate the timing of an upgrade, i.e. miner-activated strategy.2) The upgrade is forced regardless of miners' promised support, i.e. a user-activated strategy.3) Deployment must be forced due to an imminent issue, i.e. emergency-activated strategy.
2) Fork Type: Common terminology [64] describing different consensus rule changes in a blockchain distinguish between hard and soft forks.However, these concepts fail to provide an accurate description when considering the low-level details of a rule change.For instance, a hard fork has been established as a term for rule changes that results in a permanent chain split.However, this can also happen in a soft fork if a minority deploys the change.Therefore, Zamyatin et al.'s terminology [37] was adopted to accurately distinguish relevant fork types: • Expanding: Changes that make previously illegal actions legal (commonly referred to as a hard fork).• Reducing: Changes that restrict the set of valid actions (commonly referred to as a soft fork).• Bilateral: Changes that deem all previous legal actions illegal and expand the rule set (commonly referred to as a hard fork).Whenever deploying rule changes to a blockchain, one must consider the compatibility between new and old versions by understanding the fork type of the implementation.The main difference is that a reducing fork will be backward-compatible, allowing it to be enforced by the network with a super-majority of supporting hash power.Therefore, a reducing fork can be desirable as miners can keep the network consistent without relying on the whole network to perform the deployment.As Listing 4 indicates, BTC developers usually look for ways to implement changes as reducing forks since they have desirable compatibility attributes and are easier to digest for the network and the community (additional related quotes are in Section A.4, available online).

3) Chain Split Risk:
The different fork types and deployment strategies imply different levels of chain split risk.This feature indicates how likely a prolonged chain split is.Applying additional deployment features can sustain chain splits' potential risk, length, and impact.
4) Parallel: Another compatibility issue is whether performing several deployments in parallel is possible.Deployments can be conducted in parallel if the rule changes are isolated and the deployment attributes are independent.Listing 5 shows an example that Bitcoin adopting parallel deployments after realising that non-parallelism could become a problem (additional related quotes are in Section A.5, available online).BIP 34 introduced a mechanism for doing soft-forking (...).As it relies on comparing version numbers as integers however, it only supports one single change being rolled out at once, requiring coordination between proposals, and does not allow for permanent rejection: as long as one soft fork is not fully rolled out, no future one can be scheduled.

5) Standard:
In addition to the consensus rules, nodes can utilize relay policies to specify what transactions they will include in their blocks and whether they are relayed to other nodes.This can be seen as softly enforced rules that can be applied at any rate before activating new consensus rules, allowing individual miners to avoid unwanted or experimental behaviour.A deployment conducted by gradually changing the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
policies before or after the rule deployment will be recognized by the standard feature.

6) Signal:
A signal is a feature used to signal the intention to upgrade and enforce new consensus rules in Bitcoin that are usually represented by the version bits in the block header or a string in the coinbase transaction message.Other implementations of on-chain signals observed are multi-signature commitments to the chain as implemented in Dash [65].The signals make deployment more predictable and allow the measurement of total hash power support on the network.Listing 6 is one example describing the first approach to signalling to deploy pay-to-script-hash (additional related quotes are in Section A.6, available online).Listing 6. Luke-jr 2011-10-02 IRC: #bitcoin-dev when 50% of the last N coinbases contain "I support FOOFEATURE", it's enabled On-chain signalling provides confidence that a significant portion of miners will behave according to the new rules.However, a signal can also come in other forms, such as a verbal agreement.This happened once in Bitcoin's history during the deployment of BIP 30, as illustrated in Listing 7 (additional related quotes are in Section A.7, available online).7) Inclusive: The signals are most helpful on-chain, where they can be interpreted by validating nodes to coordinate an upgrade.They can also be used to exclude blocks mined by non-signalling nodes to persuade them to upgrade and avoid unwanted behaviour.This behaviour is defined by the inclusive feature.An inclusive fork will continue to append blocks from miners that do not intend to validate by the new rules.In contrast, an exclusive fork will stop accepting blocks from miners who do not show intention to validate by new rules.This restriction can be lifted after a certain time or if the deployment fails.
8) Threshold: Activation and enforcement of consensus rule changes can happen in stages, which can be controlled by the threshold feature.One deployment technique may implement several thresholds.For instance, the first threshold enforces rules for all signalling nodes.The second threshold enforces the rules for all nodes.Having several thresholds is a trade-off.On the one hand, the first threshold incentivises miners to stay true to their intention of validating by the new rules.On the other hand, every threshold is a potential trigger for consensus failure.This is possibly one of the reasons that Bitcoin ceased using two-stage activation by ISM (IsSuperMajority) [25].The activation thresholds are most relevant for miner-activated strategies because they rely on coordination with other nodes.The thresholds should be at least the super-majority (> 50% in Bitcoin) to preserve consistency, ensuring enforcement on the longest chain.9) Trigger: When the threshold is reached, the trigger feature will enforce activation.The activation is triggered dynamically, statically, or instantly.Using a rolling window to decide the timing for miner-activated strategies dynamically can be desirable.A rolling window trigger will determine the amount of support based on a number of recent blocks.The most primitive trigger can be based on a static flag day (FD) or block height (BH), as used for user-activated strategies and demonstrated by Nakamoto, as shown in Listing 8 (additional related quotes are in Section A.8, available online).Instant triggers are utilized in the urgency of an emergency and are adopted as soon as they are rolled out to prevent or resolve exploits or consensus failures.The static and instant triggers have a higher risk as they do not guarantee that a super-majority of miners will prevent a chain split when the rules activate.

B. Deployment Techniques
Our first contribution is to define nine possible consensus change deployment techniques.These techniques have different qualities indicated by the inherited chain split risk.Our findings do not necessarily indicate that one technique is better than the other.Instead, some techniques are more viable than others depending on the context.Therefore we suggest the theory that choosing a deployment technique is a trade-off between the functionality and consistency of the blockchain and its community.This is a spectrum.On the one hand, if the whole community supports a fork, then it can be deployed without causing a chain split.On the other hand, if a fork does not have unanimous support, then different techniques can be used to persuade the network to move together or split the chain if the new functionality is more important than maintaining consistency.
Furthermore, deployment techniques preserve consistency and predictability for the parties involved in deployment.We summarize nine deployment techniques, shown in Table III.The techniques are defined as a combination of two deployment features: 1) The fork type (expanding, reducing, or bilateral) and 2) the deployment strategy (miner, user, or emergency).For each deployment technique, the optimal combination to reduce the risk and impact of a chain split is shown using the remaining deployment features.
An overview of Bitcoin's evolution over time and the deployment features utilized are depicted in Table II.This section presents the nine deployment techniques (Table III) and three special cases.The features in Table III are highlighted as essential, useful, or insignificant (-) for the corresponding technique.The only feature with consistent behaviour across all the deployment techniques is parallel because it can always be useful to allow several deployments in flight simultaneously.

1) Miner-Activated Reduction Fork (MARF):
The MARF deployment technique is the only technique with a low chain split risk when deployed using all the available deployment features.Like all miner-activated techniques, it should rely on   coordination between miners in the network.The coordination is achieved by relying on signals from other miners to reach at least super-majority support.However, a higher threshold is desirable to reduce the chain split risk.A dynamic trigger ensures that the threshold is reached before triggering the activation.
2) Miner-Activated Expansion Fork (MAEF): Expansion forks will cause legacy nodes to deviate from patched nodes when the behaviour of new rules appears in blocks.Thus the chain split risk is high, and the network will only stay consistent with 100% adoption.A super-majority threshold can reduce the impact of a split by keeping patched nodes together on the new chain from the time of activation.With less than super-majority adoption, the patched nodes will continue to follow the legacy chain as new blocks violating the legacy rules will be discarded due to the longest chain rule.This can cause frequent chain splits depending on the adoption percentage, as shown in Fig. 8.The issue of low adoption can also be avoided by exclusiveness to enforce the discarding of all legacy blocks, causing patched nodes to follow their own path of the valid longest chain.
3) Miner-Activated Bilateral Fork (MABF): Bilateral forks carry the property that patched nodes will never create valid blocks according to legacy nodes and vice versa.Therefore, it is only possible to avoid a chain split with 100% adoption.Choosing any activation threshold less than 100% carries less utility in limiting the impact of a chain split, and there can only be one split.However, it can be helpful to demand a certain amount of support to ensure that the patched nodes can provide sufficient security and reliable service for the patched network.
4) User-Activated Reduction Fork (UARF): When moving over to the domain of user-activated forks, the deployment has a different objective.In contrast to keeping the network consistent, it is more important that the fork activate regardless.
Therefore, such a fork does not require any threshold and should activate by a static trigger.Furthermore, the exclusion is essential since the updated consensus rules cannot be expected to reach a super-majority.Exclusion ensures that the deployment may only cause a single chain split.However, including legacy nodes will cause chain splits every time the new rules are violated.The standard feature can be essential for UARF to discriminate against upcoming rule-breaking transactions.As shown in Listing 9, feather-forking can be a viable technique to persuade other nodes to upgrade by actively attempting to orphan legacy blocks (additional related quotes are in Section A.9, available online).
Listing 9. Socrates1024 2013-10-17 Forum, ID: 312668 A feather-fork is when a miner refuses to mine on any chain that includes a transaction it doesn't like in the most recent several blocks.

5) User-Activated Expansion Fork (UAEF):
UAEF requires all nodes to upgrade to avoid a chain split and is mainly applied when expecting full network adoption with high confidence.This property was observed as the preferable deployment technique for consensus changes in both BCH and Ethereum.In these cases, the forks have usually held high or unanimous support from the community.Changes are implemented in different node distributions and are expected to be adopted by the time of activation.Exclusion must be applied if there is any doubt of super-majority adoption.Otherwise, the upgraded nodes might follow the legacy chain even after adoption, as it might be the longest valid chain, as illustrated in Fig. 8.However, if there is doubt, implementing a bilateral fork will be more beneficial in avoiding influence from legacy nodes, just like when BCH forked off BTC by UABF [22].

6) User-Activated Bilateral Fork (UABF):
The most outstanding example of a UABF is the activation of BCH.The fork became bilateral by demanding that the first block produced after activation was larger than 1 MB.Hence, legacy nodes would never accept the patched chain, and patched nodes would never accept the legacy chain.Bilateral forks ensure that a permanent chain split will commence and there will be two different cryptocurrencies.
7) Emergency-Activated Reduction Fork (EARF): Emergency-activated deployment strategies are required when the implementation does not work as specified, and a consensus failure has already occurred or might occur.One of the earliest cases, when the implementation did not work as intended, was seen in BTC 0.3.10 with the overflow bug where a seemingly valid transaction could be created to generate additional bitcoins.An example of a consensus failure was when BTC 0.8.0 deployed a new database, and it caused a chain split.The third case, a potential exploit, can be illustrated by the inflation bug in version 0.14.0, which was discovered before being exploited.All of these deployments were EARFs.It is useful for EARF deployments to be inclusive to allow unpatched nodes to reorganize and generate blocks on the valid chain originating from the reduction fork when it becomes the longest.
An interesting observation in BTC 0.3.10 and 0.8.0 is that miners performed a rollback of blocks, deviating from the longest valid chain rule and Bitcoin's immutability property to reach consensus.First, the overflow bug was so severe that the consensus rules had to be changed such that the chain containing the malicious transaction would be rejected.During the consensus failure caused by the database deployment (BTC 0.8.0), there was a need to downgrade nodes even though the new chain was the longest and valid according to the specification.That was because it was the most conservative approach to keep compatibility with old nodes and because many merchants and users were likely to follow the legacy chain.Listing 10 shows that it was not obvious to downgrade and deviate from the longest chain rule.Furthermore, Listing 11 highlights that users and services depended on the legacy chain (additional related quotes are in Section A.10, available online).
Listing 10.Luke-Jr & gavinandresen 2013-03-12 IRC: #bitcoin-dev <Luke-Jr> gavinandresen: sipa: jgarzik: can we get a consensus on recommendation for miners to downgrade?(...) <gavinandresen> the 0.8 fork is longer, yes?So majority hashpower is 0.8.... <Luke-Jr> gavinandresen: but 0.8 fork is not compatible Listing 11. nevafuse 2013-03-13 Forum, ID: 152470 Doesn't matter which chain is longer if a majority of the people aren't on it.Breaking changes need to be given lots of warning to be effective.Trying to force everyone to use 0.8 would have only made the situation worse.From the chat discussion, I don't think mtgox was using 0.8.So trading at the largest exchange would be halted until it could be upgraded.If that doesn't sound disastrous, I'm not sure what does.
There is no point in signalling or waiting for a certain threshold in the urgency of emergency activation.If the flaw is exploited, the triggering will happen naturally as soon as the bug in question triggers a consensus failure.The exclusion happens naturally because legacy nodes violating the rules of the EARF will be orphaned.In addition, gradual soft enforcement by standard relay policies becomes unnecessary since the rule changes of an emergency fork require instant consensus enforcement.
8) Emergency-Activated Expansion Fork (EAEF): Deploying EAEF alone can be risky as it might not become widely adopted on a network basis.Miners might be reluctant to deploy a hasty and radical expansion of the consensus rules.Anything less than 100% adoption could cause a permanent chain split if never fully adopted.This encourages miners rather perform an emergency-activated reduction fork if feasible, as it is the safer alternative.
Fig. 8 illustrate the problem that patched nodes might keep jumping back to the legacy blockchain as long as that is the longest.This will eventually be resolved as soon as the supermajority of miners work on the expanded blocks, and that chain will become the longest.However, legacy nodes will still work on the legacy chain as they do not see the expansion blocks as valid.The BIP50 consensus failure caused by BTC 0.8.0 might have looked somewhat like the EAEF figure before making a persistent chain split, although that cannot be assessed without access to the orphaned blocks.9) Emergency-Activated Bilateral Fork (EABF): This deployment technique has not been observed in any known upgrade.However, one could imagine the BCH fork being deployed with EABF as a reaction to revert the SegWit deployment.In that case, the patched chain would have to roll back to a block before the first SegWit-block was created and create a conflicting block.
10) Special Cases: In addition to the nine deployment techniques, there are three special cases.These cases fit into more than one of the defined techniques: • Temporary reduction forks (related to MARF, UARF, and EARF) • Hybrid deployment (related to all deployment techniques) • Non-deterministic forks (related to all deployment techniques) In addition to ordinary activation at time T, a temporary reduction fork has a predefined deactivation time.BTC 0.8.1 demonstrated a temporary reduction fork shown in Listing 12.The code defines a temporal reduction that was activated on 2013-03-21 and deactivated on 2013-05-15 (lines 2057 & 2058).The code for the temporary reduction counts transaction IDs (TxIDs) (lines 2062-2069) and enforces the limitation (lines 2071 & 2072).The limit of 4,500 TxIDs was assumed low enough to avoid reaching the database lock limit of 10,000.
Temporarily reduction forks can also be illustrated by an example from Bitcoin's legacy: The 1 MB limit was initially applied as a reduction fork.However, expanding that limit would not require an expansion fork if the limit had a predefined deactivation time.Then the community would have years to find a solution or delay the issue by another temporary reduction fork before the end time.Legacy nodes can still accept all blocks created under the reduction fork, while patched nodes will know the start and end-time.
A hybrid deployment is another special case where different fork types are deployed together.This technique inherits the attributes of the most disruptive fork type regarding chain split risk.That is in the following order: BF > EF > RF.Hybrid deployment with combinations of expansion and reduction forks has become a relatively common practice in BCH, which performed hybrid deployments in BCHN 0.16.0,BCHN, 0.18.0, and BCHN 0.19.12.Combining several forks into deployment is practical because it limits the number of deployments where the network is exposed.
The non-deterministic forks are best explained by the example of BIP50 and the upgrades deployed with BTC 0.8.0 and BTC 0.8.1.The implementations contained a MAX_BLOCK_SIZE of 1 MB.However, this rule was often overrun by the default database locks setting in pre-0.8.0 nodes that were too small to handle certain large blocks containing many transactions.The problem would surface long before the consensus failure because blocks used too many locks led to reorganisation.This caused many nodes to run custom configurations.Listing 13 shows the problem surfacing one year before the consensus failure and that some miners had to set custom lock limits (additional related quotes are in Section A.11, available online).Furthermore, the Berkeley Database would behave inconsistently depending on the underlying hardware.The result is that chain splits and stuck nodes appear non-deterministic.Listing 14 describes how nodes running identical code would result in a non-deterministic fork depending on how the blockchain is stored on disk (additional related quotes are in Section A.11, available online).Listing 14. Gavin Andresen 2013-03-20 BIP50 (...) contents of each node's blkindex.datdatabase is not identical, and the number of locks required depends on the exact arrangement of the blkindex.daton disk (locks are acquired per-page).
When the database was changed in BTC 0.8.0, the new implementation would handle the locks differently and always be able to handle the edge-case blocks.The legacy nodes with custom lock limits would also handle these blocks.On the contrary, a non-deterministic set of the legacy node implementations would regard these blocks as invalid, causing a chain split.The fork was non-deterministic because of the inconsistent compatibility to blocks among legacy nodes running the same protocol.

V. RESULTS OF RQ2 (LESSONS LEARNED)
All consensus rule changes in a blockchain can be a liability as they increase the attack surface.The deployment process itself can disrupt the community as conflicts arise.Lessons learned from Bitcoin deployments are synthesised to minimise the risk of future deployments in any blockchain.The GT and root cause analysis derive these lessons as seen in the Ishikawa diagram in Fig. 11 in Appendix Section B (available online).
The human error categories [63] were used as codes in the GT analysis to classify the issues discovered in the root cause analysis.These are 1) Skill-based errors, i.e. execution failure: Slips and lapses.2) Mistakes, i.e. planning failures: Rule-based (RB) mistakes and knowledge-based (KB) mistakes.3) Violations: Routine violations, e.g.laziness and 4) exceptional violations, e.g.sabotage.Table IV shows the lessons derived, the impacted deployment features, their corresponding error categories, and the affected Bitcoin versions.

A. Missing Transformation Assurance
The most dangerous forks are those deployed by accident.They occur either because existing consensus rules are exploitable or new rules are deployed by accident.Accidental forks are not safely deployed using the deployment features and will have a high risk of a chain split.This error is seen as an RB mistake because developers misclassify the fork-type feature of the given code change.As proposed in Listing 15, the most obvious remedy is to perform extensive testing and review, although further assurance is required (additional related quotes are in Section A.12, available online).
Listing 15. 2020-06-23 Luke-Jr [67] The review process is definetly a good idea, I dont know if it provides as much security as people assume it does.One thing that slip past one person may as well slip past ten people or whatever.
The lack of transformation assurance in Bitcoin has caused an accidental chain split on one occasion (BTC 0.8.0) and allowed a serious bug to enter the code (BTC 0.14.0).However, Bitcoin has never had an accidental chain split caused by compatibility issues due to cross-node implementation, although the split in Ethereum's Berlin UAEF [6] demonstrates this.To avoid bad code from entering deployment and accidental chain splits, techniques to provide assurances [68] of consensus rule transformation in blockchain must become widely adopted and further developed.
Having several implementations can both cause and detect invalid transformations.Although BTC mainly relies on a single implementation, many cryptocurrencies, such as BCH, use multiple different implementations, all of which should follow the same consensus rules.Running testing on a test network with different implementations increases the chance of discovering transformation issues before deployment.However, as pointed out in Listing 16, having different implementations increase the risk of causing transformation issues (additional related quotes are in Section A.13, available online).
Listing 16.Gmaxwell 2012-10-28 Forum, ID: 120836 Diversity is good and may help discover issues.But as Gavin was saying and as I like to point out: The most dangerous kind of failure in bitcoin isn't an implementation bug-any blockchain validation inconsistencies in widely deployed implementations are significantly worse than pretty much anything other than a full private key leak or remote root exploit... and are even harder to avoid.

B. Improper Reorganisation
In case of an accidental split, nodes must be prepared to handle reorganisation to coordinate everyone to work on the same chain.Some nodes have been forced to re-download the whole blockchain.However, the original slow initial block download (IBD) [69] made it troublesome (BTC 0.3.10).Moreover, the database lock limit caused stuck nodes during reorganisations before BTC 0.8.0 (see explanation in Section IV.B.10).Some nodes would also wipe the existing mempool on reboot, making it harder to detect double-spend attempts (BTC 0.8.0).Measurements should be taken to keep the current state of valid blocks and pending transactions when performing an emergency fork, enabling a swift recovery.The error leading to slow reorganisation could be a lapse in the case where node operators, in a weak moment, delete the whole blockchain on reboot and patch.It can also be a KB mistake where developers defining the code for reorganisations did not have the knowledge and experience to handle them properly.Listing 17 shows the issue of quickly reorganising the blockchain during the BTC 0.3.10EARF (additional related quotes are in Section A.14, available online).

C. Improper Human Interference
Bitcoin's early history shows improper handling of deployment features.This happened in the pay-to-script-hash upgrade.The BTC developers manually set and moved the flag day trigger depending on whether the threshold was reached (BTC 0.6.0).The threshold was not met in time for the first flag day, and nodes had to update to change the new flag day.Some nodes did not catch this in time and lost track of the correct chain as an invalid pay-to-script-hash transaction was mined after the first flag day.The error can be seen as an RB mistake from the developers' side, which had false expectations for node operators.It could also be a KB mistake from the node operators' side if they were unaware of the changed flag day or a lapse in case they forgot to update in time.The error in the pay-to-script-hash deployment demonstrates that dynamic thresholds must be incorporated into the software, not changed manually.
Deployment features should not be changed during deployment because each change acts as a fork by itself and is a liability.In addition, developers should not alter ongoing deployment without giving time to review changes.That can be severe as it can allow the inclusion of flawed or ill-intended changes at the last minute.

D. Too High Thresholds
High thresholds are crucial to onboard hash power during deployment.The threshold feature is relevant in combination with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the deployment strategy feature since miner-activated strategies utilize thresholds.The Segwit deployment (BTC 0.13.1)showed that high thresholds, such as 95%, are troublesome because it allows a > 5% minority veto as shown in Listing 18 (additional related quotes are in Section A.15, available online).The error can be seen as an RB mistake because SegWit was falsely considered non-controversial.The slow adoption of SegWit engaged using the less safe user-activated strategy to overthrow non-signalling nodes using the inclusive feature to exclude the opponents.
Listing 18. Shaolinfry 2017-04-06 Email: bitcoin-dev Activation is dependent on near unanimous hashrate signalling which may be impractical and is also subject to veto by a small minority of non-signalling hashrate.
BTC demonstrated some changes to avoid issues with high thresholds in the most recent Taproot upgrade (BTC 0.21.1).They changed the activation threshold to 90% to reduce the decision-making time and intended to use useractivated deployment if the miner-activated deployment failed.Another method to cope with high thresholds is gradually decreasing the threshold towards the lower limit of the super-majority.Dash's dynamic activation thresholds utilize this technique where the initial limit is 80% and is gradually reduced to 60% [70].However, there is a trade-off that lower thresholds are more likely to disrupt consensus.

E. Deploying 'Irreversible' Changes
Changes conducted with reduction forks should be applied carefully, as they might only be reverted if the nodes are willing to adopt future expansion forks.Nakamoto could probably never have imagined the fuzz caused by his 1 MB block size reduction created as a remedy for denial-of-service (BTC 0.3.12).So far, this reduction is nearly irreversible in practice for the expansion-reluctant BTC community.Therefore, it can be valuable to consider the fork type of temporary reduction forks when there is any doubt whether such a reduction fork should be permanent.The error is classified as a KB mistake because the developer did not foresee the future challenges of expanding the consensus rules.Listing 19, shows frustration for the 'irreversible' block size limit since the early days of Bitcoin (additional related quotes are in Section A.16, available online).
Listing 19.Caveden 2010-11-20 Forum, ID: 1347 I'm very uncomfortable with this block size limit rule.This is a "protocol-rule" (not a "client-rule"), what makes it almost impossible to change once you have enough different softwares running the protocol.Take SMTP as an example... it's unchangeable.

F. Not Prepared for Forward Compatibility
Nakamoto implemented support for Bitcoin to be forwardcompatible.As shown in Listing 20, he created domains of undefined behaviour by initially defining block versions, transaction versions, and later OP_NOP opcodes (BTC 0.3.6)(additional related quotes are in Section A.17, available online).This facilitates specifying future changes as reduction forks, which are safer.Most of the planned reduction forks in Bitcoin have depended on forward compatibility.To ensure forward compatibility, a blockchain should be defined with domains of undefined functionality, such as version numbers and empty opcodes.
Listing 20.Sipa 2011-10-02 IRC: #bitcoin-dev (...) OP_EVAL == OP_NOP1 can be safely rolled out as soon as 50% of the miners upgraded Forward compatibility was further adopted when SegWit was created.The developers defined a 4-byte nVersion field to allow future changes to the script specification to be created as reduction forks (BTC 0.13.1,BTC 0.21.1).The error of not preparing for forward compatibility can be seen as a KB mistake, as developers might not be aware of future compatibility issues.However, some people in the BCH community and many other blockchain projects (e.g.Ethereum and Dash) do not value compatibility between versions.They instead perform less safe expansion forks if that makes the end product more elegant.This can be seen as choosing functionality over consistency.
The standard feature can be used to facilitate forward compatibility.This was done for all consensus rules dealing with malleability (BTC 0.10.0,BTC 0.13.1,BTC* 0.14.0,BTC* 0.14.1,BCHN 0.16.0, and BTC 0.21.1).All the rule changes related to malleability were already softly enforced by standardness.The standard nodes would minimize the success of malleability attacks before activating the consensus change by not including or relaying those transactions.
Additionally, the parallel feature is relevant to forward compatibility as it makes it possible to perform several deployments simultaneously or sequentially.This was not the case for the first established deployment method ISM used in BTC 0.7.0,BTC 0.10.0, and BTC 0.11.2.These versions were deployed without forward compatibility and would not allow other deployments in parallel.Furthermore, this technique permanently consumed versionBits.So, they could never be used in a reduction fork again.

G. Lacking Knowledge Regarding Network Dynamics
Some rule changes may require nodes to broadcast additional information to other nodes in the network.The worst-case outcome of this behaviour could be that the network would create partitions of nodes that could only validate blocks made within that partition.Therefore, changes in the peer-to-peer network must be handled to enable compatibility with legacy nodes and avoid network partition.
A potential error of network partitioning would be a KB mistake because of the lack of knowledge regarding network dynamics.For instance, the extension blocks introduced for SegWit contain signature data the legacy nodes would not recognise or relay.As seen in Listing 21 the peer-to-peer network relied on the signal feature by using a service bit for a node to signal the ability to provide the witness data (BTC 0.13.1)(additional related quotes are in Section A.18, available online).To ensure that the network is not partitioned and that segwit blocks are being passed to segwit enabled nodes, a Core 0.13.1 node will use its outgoing connection slots to connect to as many nodes with the NODE_WITNESS service bit as possible (...)

H. Insufficient Damage Control
Forks are necessary for the evolution of blockchains.As history has proven and Murphy's law will ensure, consensus failures will occur in the future.End-users and miners should take measurements to perform damage control.Past failures to perform these measurements would be a KB mistake because the actors did not know or did not have experience with chain splits.
These measurements would be to detect a chain split and suspend transactions or increase the number of confirmations required.In the past, merchants have been subject to doublespend attacks, and pool funds have been drained by miners working on the chain that eventually orphaned (BTC 0.8.0,Listing 22, other related quotes are in Section A.19, available online).In Bitcoin, there are mechanisms for detecting chain splits.Additionally, one can run nodes with different versions to monitor that they stay on the same chain.Listing 23 discusses some ways to perform damage control in case of a consensus failure.The simplest solution in case of a consensus failure is to stop accepting and processing transactions (Listing 24).Replay attacks can be performed by re-broadcasting transactions from one chain to another in the event of a permanent chain split.To prevent this attack, one of the chains should implement replay protection [8].BCH implements replay protection for all planned consensus changes [71].
Some damage control can be prepared up front, e.g. by incorporating a kill switch mechanism [72] to activate an emergency rollback.However, this kill-switch mechanism increases the risk of centralisation and foreign interference if a single person or a closed community holds it.

I. Improper Miner Incentives to Enforce New Rules
Even though miners give a signal for an upgrade in blocks, this does not guarantee that these miners will enforce the new rules.Simple-payment-verification (SPV) mining has become popular because less validation gives an advantage in the block race.The incentive mechanism in Bitcoin rewards the first valid block, and the tradeoff between the risk of not being first and the risk of being invalid may favour being first, as it was seen in BTC 0.10.0 [4] (Listing 25, other related quotes are in Section A.20, available online).The grace time between the time of reaching the first threshold and the time of activation added through BIP9 [26] was likely included because of this incident to give miners some time to ensure proper validation in time for activation.If there is a cost to verifying transactions in a received block, then there is an incentive to *not verify transactions*.However, this is balanced by the a risk of mining atop an invalid block.
This error is caused by routine or exceptional violations where miners generate blocks without performing validation.Measurements should be taken to incentivise validation.For instance, Ethereum's slashing mechanism [73] discourages reckless behaviour.Alternatively, Dash incentivises validation by requiring collateral for master nodes [74] and giving them extra rewards.

J. Insufficient Incentives to Review the Code
Another incentive issue concerns reviewing code.Most actors in Bitcoin benefit from having bug-free code deployed in the network to secure the currency's value.However, testing and reviewing can be tricky, costly, and tedious.The average Bitcoin participant (e.g.end-users and miners) may not have the skills to perform that task.The stakes might be high for anyone pushing code that affects the network badly, as it may harshly influence their reputation.At the same time, there needs to be more incentive to encourage spending substantial time and resources on secure development and code review.The lack of incentives could make developers lazy and errors are made by routine violations.
Many of the critical bugs contained in Bitcoin have been fixed since its conception, and new ones arise as developers make mistakes (BTC 0.8.0,BTC 0.14.0, and BCHN 0.17.0).However, these mistakes are not for developers to bear alone but for those who naively adopt flawed code.A project directly incentivising its development is Dash, where 10% of block rewards are allocated to development [75].The takeaway for this lesson is that blockchain communities should allocate incentives to review code and minimise the chance of bugs being accepted into production.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. Comparison With Related Work
The available literature on the evolution of Bitcoin mainly concerns socio-technical aspects of governance [76], [77] and their economics [78].For instance, [76] addresses "how and when code development practices combine into a pattern of selforganizing."Deployment of consensus rules is slightly mentioned in the paper as a practice that can result in competing infrastructures.Our research builds upon their work by showing how to tame the evolution of the infrastructure.
Kiffer et al. [8] evaluate the event of the Decentralised Autonomous Organization (DAO) hack in Ethereum and replay attacks.The case of Ethereum's chain split is relevant for damage control.This paper provides a bigger picture by showing how chain splits appear and how to resolve them.
Abortable and adaptable consensus [79], [80], [81] are similar because they look at how and when the network should switch to an arbitrary consensus algorithm.The main goal for switching the consensus protocol is to gain performance when needed and increase fault tolerance when the network fails.However, abortable and adaptable consensus does not discuss much about how the switching mechanism should work to perform deployment in a decentralised environment.Our paper encompasses any consensus change and entails changing the consensus algorithm.
The similarities between open source software and blockchain evolution are seen in the decision-making on a code repository level [82], [83].Anyone is free to propose a change, and it is up to the repository's maintainers whether they want to include that change.However, the decision is not only based on the maintainers' preference in blockchain but also on the opinion of the community, miners and possibly developers of other implementations of the same protocol.Even when a change is included in a code repository, it does not mean the network will adopt it.
The adoption rate of a new network protocol spans several years, does not require a specific threshold for adoption, and can tolerate different versions running in parallel [84].In contrast, blockchain relies on abandoning old consensus rules by activating new ones.BTC and TLS are similar in that they value backward compatibility to allow nodes running old protocols to be part of the network during and after an upgrade.
The most significant difference when comparing blockchain deployment techniques to techniques in distributed systems, such as fast reboot, rolling upgrade, and big flip [10], is that blockchain is decentralised, and the network must reach a consensus before changes can be activated.The system reaches consensus through unique deployment features, such as standardness, signals, thresholds, and inclusiveness.

B. Implications
To our knowledge, this research is the first to show a holistic overview of deployment techniques for runtime evolution in blockchains.The deployment processes of consensus rule changes in the blockchain are vital as they can cause and remedy consensus failures.By generalizing the adoption logic of blockchain, this paper can contribute to the field of self-adaptive systems where processes for runtime evolution of new domain logic require further exploration [85].
Our findings apply to any decentralised blockchain.Regardless of how practitioners perform consensus rule evolution today, their method is a variant of those covered by our paper.Our work can provide guidance on how to strengthen the existing practices of consensus evolution to avoid failures.Further, we summarise deployment techniques and best practices for different scenarios, such as planned consensus deployment or emergency deployment.
Practitioners can utilize the contributions of this paper to perform consensus rule changes predictably and safely.Our results present a comprehensive overview of measurements to avoid and handle consensus failure.The lessons learned in Bitcoin are valuable to prevent history from repeating itself.These lessons can strengthen the security of blockchains, hinder direct financial loss, and preserve blockchains' value as cryptocurrencies.
Interestingly, BTC is not necessarily a trendsetter in the space yet.Their conservative approach favouring miner-activated reduction forks (MARFs) and predictable consensus support by using signals and thresholds is unique.Most other blockchains are willing to drastically change the consensus rules by expansion forks and make older nodes obsolete by the use of user-activated expansion forks (UAEFs) and a flag day to set the activation time.UAEF deployment can be reasonably safe to avoid chain splits as long as the network and community act unanimously.However, as blockchains mature and communities become content, they may prefer more conservative and predictable approaches to consensus evolution, similar to BTC.

C. Threats to Validity
The sheer amount of data and the limited number of researchers dedicated to this project may raise questions about missing data or analysis.Regular cross-author discussions have evaluated the analysis and results to address this.All the samples used for the analysis are also available at [49].We have gathered a holistic picture of Bitcoin evolution and other blockchains outside the initial domain by conducting rigorous data collection through snowball sampling and triangulation.The results are also strengthened by combining GT with root cause analysis.
This study seeks to avoid bias by looking at other viable projects with similar attributes, such as BCH, Ethereum and Dash.Different actors have different concerns, and results are represented by diverse perspectives provided by those who have worked on Bitcoin over the last decade.We included additional quotes in Appendix Section A (available online) to show different people from different times supporting our claims.The thoroughness applied in this study gives confidence that the results apply to different blockchain architectures.

VII. CONCLUSION AND FUTURE WORK
Safe deployment of consensus rules in blockchains is vital to hinder failures causing financial losses for miners and end-users.The paper demonstrates an extensive study using the grounded theory approach, flexible coding, and root cause analysis to address these issues.This study specifies nine deployment techniques for blockchain with nine different features.Additionally, the study shows how contention may arise during consensus rule changes in Bitcoin, resulting in ten lessons learned.The findings bring novel insights to promote a safe evolution of blockchains.
Decision-making and governance of the consensus rules were intentionally left out of scope for this study to focus solely on the technical approaches and the implications of consensus change.However, the decision-making progress in blockchain communities is exciting and differentiates itself from typical open-source projects.Therefore, these aspects should be explored further.
The greatest challenge in blockchain evolution is compatibility and transformation assurance [86].Like adaptive systems, blockchain systems would benefit from identifying whether a change is a fork and what type of fork it will be.One future work is to study transformation assurance to reduce the risk of deployment failures significantly.Another future work is to handle deployments in a multichain environment.Various research indicates that these environments will rely on middleware to relay actions across chains.We believe that this middleware must be responsible for listening to and triggering changes based on the deployment techniques used in the attached chains.

Fig. 1 .
Fig. 1.A Bitcoin block that includes the header and the body.

Fig. 2 .
Fig. 2. Choosing the longest chain in the event of a collision.

Fig. 3 .
Fig. 3. Research method implementation.The oval shapes indicate processes, the lines show the process flow, the rectangular shapes indicate data objects and the cylinders indicate archives.Different colours highlight whether the concepts relate to data sampling, grounded theory, or root cause analysis.

Fig. 4 .
Fig. 4. Analytical codes and categories.The relation between the categories indicates a pattern in the cycle of consensus evolution.

Fig. 5 .
Fig.5.The timeline of consensus changes in Bitcoin Core and Bitcoin Cash.The date and order of these changes are based on either the flag day/block for the activation, the date where the changes were activated based on signal thresholds, or when these versions were released.The red boxes indicate some issues where the deployment was performed by emergency, caused a chain split, or other issues.

Fig. 7 .
Fig. 7.The feature-code relations.The features were derived from low-level analytical codes and showed consensus evolution's essential technical building blocks.

Listing 4 .
Gmaxwell 2015-11-04 IRC: #bitcoin-dev I belief we shold flesh out luke-jr's idea for cleanly deploying segregated witness in bitcoin as a soft fork and see what that looks like.

Listing 7 .
gavinandresen 2012-02-17 IRC: #bitcoin-dev <gavinandresen> luke-jr: you're a mining pool operator, would you be willing to coordinate with the other big pools to get this fixed quickly[?]

Fig. 8 .
Fig. 8. Inclusive expansion forks can cause frequent chain splits before gaining super-majority adoption.

Listing 17 .
Insti 2010-08-15 Forum, ID: 823 knightmb, do you still have any of your monster network available to turn on to help build the new valid chain?

Listing 22 .
Eleuthria 2013-03-12 IRC: #bitcoin-dev I've lost way too much money in the last 24 hours Listing 23.Erisian 2015-12-18 Email: bitcoin-dev So I think the only way Mallory gets free beer from you with segwit soft-fork is if: -you're running out of date software and you're ignoring warnings to upgrade (block versions have bumped) -you've turned off standardness checks -you're accepting low-confirmation transactions -you're not using any double-spend detection service Listing 24.Pieter Wuille 2013-03-12 Email: bitcoin-dev If you're unsure, please stop processing transactions.

Listing 25 .
nathan 2015-07-11 Email: bitcoin-dev It can start being in versions way ahead, so by the time it reaches that block number and goes into effect, the older versions that don't have it are already obsolete.

TABLE II DEPLOYED
RULE CHANGES IN BTC AND BCH.STAR(*) = FORKED REPOSITORY.TRIGGERS: BH = BLOCK HEIGHT, FD = FLAGDAY, AND RW = ROLLING WINDOW Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE III DEPLOYMENT
TECHNIQUES.THE NOTATIONS INDICATE WHETHER THE FEATURES ARE REQUIRED TO REDUCE THE CHAIN SPLIT RISK AND THE IMPACT OF A CHAIN SPLIT: BOLD: ESSENTIAL, ITALIC: USEFUL, -: INSIGNIFICANT.ABBREVIATIONS: MINER-ACTIVATED (MA), USER-ACTIVATED (UA), EMERGENCY-ACTIVATED (EA), REDUCTION FORK (RF), EXPANSION FORK (EF), BILATERAL FORK (BF), AND SUPER-MAJORITY (SM)

TABLE IV LESSONS
LEARNED, IMPACTED FEATURES, CORRESPONDING ERROR CATEGORIES AND IMPACTED BITCOIN VERSIONS