SoK: Network-Level Attacks on the Bitcoin P2P Network

Over the last decade, Bitcoin has revolutionized the global economic and technological landscape, inspiring a new generation of blockchain-based technologies. Its protocol is today among the most influential for cryptocurrencies and distributed networks. In particular, the P2P layer represents a reference point for all permissionless blockchains, which often implement its solutions in their network layer. Unfortunately, the Bitcoin network protocol lacks a strong security model, leaving it exposed to several threats. Attacks at this level can affect the reliability and trustworthiness of the consensus layer, mining the credibility of the whole system. It is therefore of utmost importance to properly understand and address the security of the Bitcoin P2P protocol. In this paper, we give a comprehensive and detailed overview of known network-level attacks in Bitcoin, as well as the countermeasures that have been implemented in the protocol. We propose a generic network adversary model, and propose an objective-based taxonomy of the attacks. Finally, we identify the core weaknesses of the protocol and study the relationship between different types of attack. We believe our contribution can help both new and experienced researchers have a broader and deeper understanding of the Bitcoin P2P network and its threats, and allow for a better modeling of its security properties.

• We review in detail all known network-based attacks 104 in the literature, including their procedures, impact, and 105 implemented countermeasures; 106 • We identify weak spots of the Bitcoin P2P protocol as 107 inferred from the reviewed attacks; 108 • We show and study the relationship between attacks; 109 • We summarize the current status of the Bitcoin P2P 110 protocol security, and indicate future research directions. 111 c: METHODOLOGY 112 In order to perform an exhaustive search of network-level 113 attacks and their solutions, we gathered information from the 114 following sources:

115
• Scholar-indexed research papers: we queried Google 116 Scholar with related keywords, and retrieved all papers 117 describing a network-level attack; we then recursively 118 followed citations from and to retrieved papers to have 119 a comprehensive list of published works; 120 • Bitcoin Core Github repository: we studied the offi- 121 cial Bitcoin source code and explored Issues, BIPs 122 (Bitcoin Improvement Proposals), and Pull Requests to 123 understand if and how network-level threats have been 124 addressed in the protocol;

125
• We complemented the information retrieved from the 126 above-listed sources with online blogs and forums, 127 which were often used to disclose and discuss 128 network-level threats in the early years of Bitcoin.

129
From each retrieved source, we also extracted information 130 about the Bitcoin P2P protocol and verified it against the 131 official source code. First, in Section II, we discuss related work and motivate 134 the need for this work. In Sections III and IV, we describe 135 Bitcoin and its P2P protocol, focusing on the aspects involved 136 in security attacks. 137 Then, we introduce our Network Adversary model in 138 Section V, and our attack taxonomy in Section VI. In Sec- 139 tions VII to IX, we systematically review network-level 140 attacks. Then, in Section X, we discuss all attacks and study 141 their relation with each others. Finally, in Section XI, we sum-142 marize the current status of the Bitcoin network security, and 143 indicate future research directions. Section XII concludes the 144 paper. 146 Numerous surveys in the literature covered the security of 147 blockchain networks. These works often provide similar con-148 tents but differ in their perspective and level of detail. 149 For instance, Saad et al. [9] explore the attack surface of 150 public blockchains dividing attacks into three broad cate-151 gories: blockchain, P2P network, and application context. 152 For each attack, the authors give a generic description and 153 report real-life case studies. Similarly, Wen et al. [10] review 154 attacks on blockchain, based on a six-layer framework: 155 data, network, consensus, incentive, smart contracts, and 156 application. For each attack, the authors provide a description 157 VOLUME 10,2022 and a list of countermeasures. With a different approach, called Bitcoin) without the use of a central authority. Users 213 can participate freely by joining a P2P network with compat-214 ible client software [18]. 215 To operate with Bitcoin, users make use of asymmetric 216 keys, of which the public part is used, in the form of a derived 217 Bitcoin address, to receive coins, and the private part is used 218 to spend them. In particular, coins are transferred by means 219 of transactions, which specify the recipient by its address. 220 In turn, the recipient can spend such coins by digitally signing 221 a new transaction with its private key. To verify the validity 222 of new transactions, clients maintain a ledger in the form of 223 a blockchain, which stores all past transfers. 224 Transactions are added to the ledger by special users called 225 miners, who compete against each other to create new blocks. 226 To generate a valid block, miners have to find (by brute-force 227 search) a nonce value that, hashed with the block header, 228 yields an output below a certain threshold. This threshold, 229 known as difficulty, is periodically adjusted to have an aver-230 age production rate of one block every ten minutes. Since 231 creating a block requires a non-trivial amount of computation, 232 the nonce is regarded as a Proof of Work (PoW). Network 233 nodes independently store a copy of the ledger, 1 and verify 234 the validity of blocks as they are received. Due to mining competition, it is possible that two different 237 blocks are created approximately at the same time. This gen-238 erates a so-called block race, in which the two blocks compete 239 in being accepted as the new chain tip. When this occurs, two 240 separate blockchain branches, or forks, will co-exist for some 241 time. As nodes can only accept one of the two blocks, the 242 network will split into two parts, each using a different fork. 243 To resolve this situation, a consensus mechanism is used, 244 which allows the nodes to agree on a single fork. In particular, 245 Bitcoin nodes always choose the fork with the most cumula-246 tive PoW (typically the longest one). This mechanism, known 247 as PoW consensus, allows the network to automatically solve 248 conflict situations, and have all nodes use a single branch 249 (known as the main chain). 250 As we will see in § VIII-E, the occurrence of a block race 251 can be exploited by an attacker to perform double spending 252 ( § IX-B) and selfish mining ( § IX-D1). When a transaction is included in a block of the main chain, 255 it is said to be confirmed. Similarly, each new block subse-256 quently appended on top of it acts as an extra confirmation 257 (because it implicitly accepts the previous block as valid ). 258 Specifically, a transaction is said to have n confirmations if 259 there are n − 1 blocks appended on top of the confirming 260 block. In contrast, a transaction that has not yet been included 261 in the blockchain is said to be unconfirmed. 262 1 For the sake of simplicity, we only consider full nodes in this work, excluding clients that do not download or store the entire blockchain.

II. RELATED WORK
Unlike reachable nodes, unreachable ones only establish 317 outgoing connections, thus having much less varied connec-318 tivity compared to their peers. This characteristic makes them 319 more vulnerable to deanonymization attacks ( § IX-C). On the 320 other hand, as discussed in § X, their inability to accept 321 connections from other peers, protects these nodes from a 322 large number of other attacks. 323 l: TOPOLOGY 324 Bitcoin nodes choose their peers in a pseudo-random 325 way, with the goal of creating a random-looking topology 326 graph [26]. 327 However, for security reasons, information on the actual 328 Bitcoin network topology is hidden in the protocol. At the 329 same time, several techniques have been found to infer 330 connections among reachable nodes (we describe such tech-331 niques in § VIII-B). Thanks to these techniques, an approx-332 imation of the actual network graph has been calculated, 333 which proved to be relatively centralized. In particular, 334 studies revealed the presence of highly connected nodes 335 (probably used as gateways by mining pools) [26] and highly 336 interconnected communities [27]. 337 As we will see in § VIII-B, knowledge of the 338 topology graph can facilitate attacks like Double Spending 339 ( § IX-B) and Deanonymization ( § IX-C), and enable Parti-340 tioning attacks ( § VIII-D). 342 The Bitcoin P2P protocol defines how nodes choose their 343 peers and spread information through the network. In this 344 section, we review the most relevant details of the protocol, 345 with particular emphasis on those related to network-level 346 attacks. While few resources are available to learn how the Bitcoin 349 protocol works at a high level [28], [29], no official doc-350 umentation has ever been released. Therefore, the protocol 351 is only specified by its reference implementation, that is the 352 Bitcoin Core client sources [30]. As a consequence, obtaining 353 up-to-date information on how Bitcoin nodes behave can be 354 challenging. 355 In this section, we provide a detailed description of the 356 Bitcoin P2P protocol, focusing on the aspects relevant to 357 network-level attacks. Our description is based on the Bit-358 coin Core source code and related documents on the offi-359 cial Github repository, and complemented with information 360 retrieved from the literature. We refer to the latest version of 361 the client, v0. 22. Moreover, in the rest of the paper, we will 362 refer to previous versions, with the notation ''v0.x'', to high-363 light important changes to the protocol.  To understand the details of network-level attacks, it is often 374 important to be familiar with the specific protocol messages. 375 Here, we give a list of all the relevant ones.    The VERSION message includes information about the 410 node (client name, current protocol version, services pro-411 vided, etc. . . ) as well as the current tip of the blockchain 412 (i.e., the last known block), which is used to synchronize the 413 ledger. Additionally, a timestamp is included, which is used 414 to synchronize a network clock. 415 The connection is said to be outbound for the node that 416 initiates the handshake, and inbound for the other node. 417 Similarly, a peer is outbound or inbound depending on its 418 connection type. 419 To keep the connection alive, nodes send a PING mes-420 sage every 30 minutes, to which the peer replies with PONG 421 messages. If no message is received during 90 minutes, the 422 connection is closed. 423

424
Bitcoin nodes usually maintain 8 outbound peers and, 425 if reachable, accept up to 117 inbound peers. 426 In addition, since v0. 19, two extra block-relay-only out-427 bound connections are also maintained by each node [31]. 428 This feature mitigates Topology Inference attacks ( § VIII-B), 429 thus protecting from network partitioning ( § VIII-D). Since 430 v0.21, these nodes are also used as anchors: when restarting, 431 the node will try to re-connect to the same block-relay-only 432 nodes it was connected before [32]. This feature mitigates the 433 risk of Eclipse attacks ( § VIII-C). 434 Since v0.12, if all inbound slots are occupied, and a new 435 connection request is received, an eviction mechanism is used 436 to disconnect one of the current inbound peers [33]. This 437 mechanism mitigates the risk of an attacker controlling all 438 inbound connections at the same time (see § VIII-C). In par-439 ticular, the choice of the evicted peer is made to minimize the 440 possibility of the attacker to avoid being disconnected. When a Bitcoin client runs for the first time, a bootstrapping 443 procedure is executed to discover other nodes. This procedure 444 queries a list of DNS servers, called seeds, which maintain 445 an up-to-date list of known IP addresses where a reachable 446 Bitcoin node is running. When queried, DNS seeds return 447 a random subset from such list. The client will then choose 448 8 random addresses as outbound peers. If the DNS method 449 fails, a hard-coded list of known stable nodes is used. 450 Once the client has joined the network, it learns new 451 addresses from other peers. Specifically, when a new connec-452 tion is opened, the client sends a GETADDR message, to which 453 the peer replies with an ADDR message with a list of known 454 addresses. All the addresses learned by the client are stored in 455 a local database, which is periodically saved to file to be used 456 subsequent executions. This means DNS seeds are usually 457 only needed in the first execution of a client.

458
In fact, as discussed in § VII-D, the use of DNS seeds is 459 undesirable, as it is can lead to node eclipsing ( § VIII-C).  ADDR message, which is stored by the requesting node into 514 the address database. When replying to a GETADDR message, 515 up to 23% of the addresses stored in the database is sent. 516 However, since 2020 (v0.21), to prevent Topology Inference 517 attacks ( § VIII-B), nodes only send up to 1000 addresses 518 per day, in response to GETADDR messages [34]. Addition-519 ally, only one GETADDR message will be replied for each 520 connection.

521
GETADDR messages are typically only used at the begin-522 ning of a connection. After, newly discovered nodes can be 523 advertised through unsolicited ADDR messages. When the 524 client receives an ADDR message with up to 10 addresses, 525 it forwards it to a limited number of peers. Specifically, reach-526 able addresses 2 are advertised to two peers, while unreach-527 able ones to a single one. For each peer, a node remembers 528 which addresses have been forwarded, so as to avoid repeat-529 ing their advertisement. This information is kept for each 530 connection (i.e., per session, not per IP) and cleared every 531 24 hours. Recipient peers are also fixed during 24 hours, 532 so that each address is typically propagated once per day. This 533 prevents a peer from increasing the propagation of its own 534 address by repeatedly advertising it.

535
Note that, for privacy concerns, the transmission or ADDR 536 messages is implemented using the Diffusion protocol, 537 described in § IV-B1.a. Outbound peers are generally considered safer than inbound 544 ones. This is due to the fact that an attacker can easily open 545 multiple inbound connections towards the same node (when 546 reachable). In contrast, she has limited control over outbound 547 connections besides deploying multiple nodes. As shown in 548 § IV-B, this fact is taken into account in several choices 549 throughout the protocol, and is at the base of various coun-550 termeasures against network-level attacks (see § VIII). Some protocol choices are based on the assumption that a sin-553 gle adversary, even when controlling numerous IP addresses, 554 is not likely to control nodes in many different geographical 555 areas. For example, this is one of the reasons why buckets in 556 the address database are based on the prefix or AS. Another 557 limitation is on the number of outbound connections that 558 can be established towards the same network area, which is 559 limited to one per group (e.g. /16 for IPv4). 560 2 In this context, reachability is simply determined by the network family of the address (i.e., IPv4, IPv6, OnionCat, I2P). Specifically, a node considers an address reachable if they both belong to the same family. VOLUME 10, 2022 timeout expires, the transaction is requested to another peer 601 that announced it, if any.

602
To mitigate the risk of Delay attacks (described 603 in § VIII-E), the request queue is currently randomized, that 604 is, it does not follow the order in which peers advertised 605 the transaction. With the same purpose, outbound nodes are 606 currently preferred over inbound ones when requesting a 607 transaction. 608 a: TRANSACTION RELAY 609 Currently, Bitcoin transactions are spread using Diffusion. 610 In this protocol, when the client receives a new transac-611 tion, it does not relay it immediately. Instead, it delays its 612 transmission to mitigate the risk of Danonymization attacks 613 ( § IX-C). In particular, nodes maintain a queue for each peer, 614 containing the transactions to be forwarded. This queue is 615 periodically flushed by sending an INV message to the peer, 616 announcing all pending transactions. Each queue is flushed 617 with an independent, exponential delay, which follows a Pois-618 son distribution [36]. This delay is halved for outbound peers, 619 since these are less risky for deanonymization.

620
Diffusion was introduced in 2015 (v0.12) to replace 621 the Trickle protocol [37]. This protocol implemented the 622 so-called trickling, that is, the addition of random delays to 623 the relay of a message. In particular, in Trickle, at each loop 624 iteration, a random node was selected, and its queue flushed. 625 The protocol switch was done to mitigate Deanonymization 626 attacks ( § IX-C).

627
As a protection against DoS attacks, all transactions are 628 also required to pay a minimum relay fee (MRF) to be propa-629 gated.  If a transaction is received before its parent transactions, it 638 is considered an orphan. Since such transactions cannot be 639 validated, they are not immediately relayed to peers. Instead, 640 they are stored in an orphan pool so they can be validated if 641 the parent is received. Similar to the UTXO set, the size of 642 this buffer is also limited (currently to 100 entries). A compact block can be sent in reply to a GETDATA 709 message using a COMPCTBLOCK message, which includes 710 the block header, the list of TXIDs in the block, and a small set 711 of full-data transactions that the sender believes have not been 712 seen by the receiver. When a compact block is received, nodes 713 check the list of TXIDs to verify whether they are known. 714 Missing transactions are then requested (by their index in 715 the list) using a GETBLOCKTXN message. The peer will then 716 send the requested transactions with a BLOCKTXN message. 717 Compact blocks also introduced the concept of High Band-718 width (HB) neighbors, which can be instructed to relay blocks 719 unsolicited, that is, without previously announcing them. 720 HB nodes send a compact block unsolicited if they believe 721 the receiver knows the previous block but not the new one. 722 This strategy allows to both receive blocks faster and mitigate 723 DoS/Delay attacks (see § VIII-E).

724
Each node maintains an HB Neighbor (HBN) list, com-725 posed of the last three nodes that relayed a valid block. To that 726 purpose, when a new valid block is received, the sender is 727 added to the HBN list. If there already are three nodes in 728 the list, the one that sent a block least recently is evicted. 729 To instruct a peer in the HBN list to send blocks unsolicited, 730 nodes send a SENDCMPCT(true) message. If such a node 731 is later evicted, a SENDCMPCT(false) message is sent. 732 The new block propagation scheme is depicted in Figure 3, 733 along with the old relay protocol. Like transactions, new blocks are validated by each node 736 when they are received (via BLOCK messages). This process 737 involves verifying all the transactions included in the block. 738 In particular, when a block is received, all included transac-739 tions are expected to be in the mempool (missing transactions 740 are requested to the node that advertised the block). After the 741 block is validated, the mempool is emptied. In other words, 742 the mempool acts as a cache of unconfirmed transactions, 743 which remain in memory until they are included in a block. 744 This structure can represent a bottleneck for Bitcoin nodes, 745 since if the transaction's arrival rate exceeds that of mining, 746 the mempool size increases, slowing down the verification 747 process (and overall performances of the network) [ To allow an easier and quick specification of the threat 776 model for each attack, we will also introduce a uniform 777 notification, which will be described in the rest of this section.

779
When considering the entities affected by an attack, we dis-780 tinguish between network-level devices, referred to as targets, 781 and real-life users, referred to as victims. 782 In particular, from a network perspective, the target T of 783 an attack can be:     [46]. In particular, we 803 define A by specifying the following characteristics: 804 • Goals; 805 • Assumptions: environment, knowledge, and resources; 806 • Capabilities: interaction, and connectivity. 807 Additionally, we introduce the following notation to specify 808 the adversary model for each attack: denotes an AS-level adversary, with observing capabilities, 813 who knows the IP of the target, controls a botnet, has mining 814 equipment, and connects to the target's node. To complement 815 the security framework, we also define the possible targets of 816 a network-level attack.  • Double Spending (DS): A aims at spending the same 821 bitcoins twice, typically once to a merchant to obtain a 822 good or service, and the other one to herself to revert the 823 payment.

824
• Deanonymization (D): A aims at distinguishing the user 825 who created a specific transaction, or at recognizing 826 transactions from a specific user.

827
• Unfair Revenue (UR): A, participating as a miner, aims 828 at gaining more revenues than its due share according to 829 the computation done. interacts with other nodes according to the protocol; 834 this is the most common type of adversary, which 835 only exploits weaknesses in the Bitcoin protocol to 836 achieve her objective; since this is the most common 837 case, we omit this notation. • Resources:

844
-Server: A has unlimited bandwidth, memory, and 845 storage; we implicitly assume this resource and 846 hence omit its notation.

893
In the following sections, we will review all the network-level Note that, a missing paragraph in the description indicates 935 the lack of relative information for the attack. For instance, 936 if no countermeasure has been implemented, the Mitigation 937 paragraph will be omitted. In addition to the notation introduced for the Adversary 940 Model, we will use the following object identifiers:  We combine this notation with A, T , and V to specify whether 945 the object belongs to, or is intended for, the adversary, the 946 target, or the victim, respectively. For instance, in double-947 spending attacks, A generates a transaction tx V for V, and 948 a conflicting one tx A , for herself. Similarly, A will use her 949 own node N A to connect to the target node N T .    In Bitcoin, MitM attacks allow A to perform DoS (by drop-1008 ping or delaying messages) and deanonymization (e.g., in the 1009 PERIMETER attack, § IX-C4).    In a Spoofing attack [52], A sends messages on behalf of 1026 another entity. To do so, she sets the IP address in her packets 1027 to the address of another device (real or fake). In Bitcoin, spoofing can be used to perform node-level DoS 1030 attacks (e.g., in Tapsell's attack § IX-A1). Moreover, it can 1031 help an AS-level adversary to perform Eclipse attacks (see 1032 § VIII-C2). Finally, spoofing can be used for Tor DoS attacks 1033 ( § VII-F1).  As explained in § IV-A1.a, to join the network, nodes make 1041 use of seed DNS servers to discover other peers.

1042
By compromising a seed, A could tamper with its DNS 1043 records, and replace legit addresses with others under her 1044 control. To that purpose, A can use known weaknesses of the 1045 DNS protocol, such as cache poisoning [54]. As a result, join-1046 ing nodes would likely connect to A's nodes. MitM attacks 1047 on the connection with the DNS server could also achieve the 1048 same result.    Due to its lack of geographical diversification, Bitcoin is par-1100 ticularly vulnerable to AS-level attacks. In fact, most nodes 1101 concentrate in just a few ASes [58], [59], making it easier for 1102 A to intercept a large share of connections. For instance, three 1103 of the major ASes together would be able to intercept more 1104 than 60% of all possible Bitcoin connections [58]. 1105 Thanks to the lack of encryption, and the use of a standard 1106 port (8333), A AS can quickly identify all Bitcoin connec-1107 tions.

1109
Tor is a popular anonymity network based on onion routing 1110 that allows clients to connect to a server without revealing 1111 their IP address. Each Tor connection goes through three 1112 relay nodes: Guard, to which the client connects, Middle, 1113 which acts as an intermediate relay, and Exit, which actually 1114 connects to the server. Therefore, a Tor connection is seen 1115 as coming from the Exit node. At each hop, the connection 1116 is encrypted with a pair of asymmetric keys, negotiated, 1117 during the setup of the connection, between the client and the 1118 corresponding Tor relay.

1119
Bitcoin supports Tor to allow users to improve their 1120 anonymity. However, some studies revealed that combining 1121 the two protocols can create new attack vectors, such as DoS 1122 and even MitM, which we review in the following.   2) For each Exit node N E , send a malformed message 1138 causing a penalty score of 100. As a result, T discon-1139 nects N E and bans it for 24 hours.

1140
When all Tor Exit relays have been banned, T will be 1141 unreachable via Tor.

1142
By repeating the procedure for all reachable nodes, A can 1143 completely isolate Tor from the Bitcoin network. When this 1144 VOLUME 10, 2022 occurs, all Bitcoin nodes are forced to connect directly (i.e., using their own address).

1194
In this section, we describe Auxiliary attacks, which are not 1195 harmful per se, but enable, ease, or improve other attacks. 1196 In other words, these attacks can be used by A as a prelimi-1197 nary or intermediate step to achieve her primary goals. This attack can have two objectives: (1) to establish multi-1202 ple (and diversified) connections to the same target, or (2) to 1203 increase the chances of a target connecting to A (the larger the 1204 fraction of Net R controlled by A, the higher the probability 1205 the target connects to her).  The attack is based on the fact that Bitcoin nodes advertise 1301 their own address upon establishing a new connection. When T connects to a node N E , it sends an ADDR message 1308 containing its own address a T ; in turn, N E forwards a T to 1309 N A with a certain probability (which depends on how many 1310 connections N A maintains with N E ).

1311
To infer a subset of the entry nodes of T , A simply tracks 1312 which nodes advertise T 's address a T . The attack allows determining whether T is connected to a 1330 peer N P , and it consists of two phases.

1331
In the first phase, A performs the following steps:   2) Send all transaction to their peers at the same time. 1493 3) Monitor incoming INV messages: if tx X is received, 1494 mark N X as peer of T .

1495
To infer multiple links, the procedure is repeated several 1496 times.

1497
To reduce the number of false positives, A can repeat 1498 the procedure. For instance, a connection can be considered 1499 valid after the number of transactions confirming it reaches a 1500 certain threshold. In particular, when connected to all reachable nodes, the 1514 recall reaches 95% after 100 runs. Implementing all improve-1515 ments results in a recall of 96% after 25 runs, with a precision 1516 of 94% when N M is connected to all peers.

1517
Experimental results showed, after 50 runs, 60% recall and 1518 97% precision.  In particular, it leverages the following fact: when a node 1523 receives an INV message advertising a transaction previously 1524 stored as orphan, it omits such transaction from subsequent 1525 GETDATA messages. The attack allows inferring links between a source set, con-1533 taining i nodes, and a sink set.

1534
For the attack, A performs the following steps:              To eclipse T , A aims at filling its address database with her 1612 addresses, so that, when the node restarts, it only connects to 1613 A's nodes.

1614
To do so, A follows these steps: (see § IV-A1), which make eclipsing much harder.

1659
Additionally, it is worth noting that miners, merchants, The objective of the procedure is to force T to connect to 1674 other (honest) nodes so that all its connections traverse AS A .

1675
In this condition, A can perform MitM attacks against T .

1676
A use shadow IP addresses, that is, addresses whose route 1677 from T traverses AS A . Therefore, any attempt of T to connect 1678 to a shadow IP is intercepted by A.

1679
The attack is divided in two phases. In phase 1, A deter-1680 mines the shadow IPs for T . To do so, she follows these steps: 1681 1) Enumerate all the ASes whose route from T would 1682 traverse AS A .

1683
2) Enumerate all the available IP addresses in the selected 1684 ASes and tag them as shadow IPs.
1685 3) Test whether the packets from T to the shadow IPs 1686 actually traverse AS A .

1687
In phase 2, A creates the connections between T and the 1688 shadow peers, until T is only connected to shadow IPs. To do 1689 so, she follows these steps: 1690 1) During several weeks, connect to T using (spoofed) 1691 shadow IPs: this slowly fills up T 's address database. 1692 2) Flood T with a large number of ADDR messages, con-1693 taining the shadow IPs, until the new table is filled.

1697
Upon rebooting, T connects all outgoing connections to 1698 shadow IPs with high probability, thus eclipsing the node.

1699
To reduce the duration of the attack, A keeps track of 1700 T 's outgoing connections going through AS A , and triggers 1701 a reboot whenever she believes it would be beneficial.  According to their experiments, A can easily fill up the 1708 new table in 30 days, and control a large portion of the 1709 tried table after 40 days. In particular, the success proba-1710 bility mostly depends on the duration of the attack and the age 1711 of the victim. For example, young nodes (up to 30-day old) 1712 proved to be more vulnerable to the attack, with a success rate 1713 of around 30% after 50 days of attack. Nevertheless, a 50-day 1714 attack against older nodes (40-to 50-day old) has still around 1715 a 20% probability of success. Notably, AS ranking does not 1716 seem to influence the results.

1717
The EREBUS attack has also a very low bandwidth cost 1718 for A (around 520 bit/s), which makes it highly scalable to 1719 several nodes.  In a Partitioning attack, A tries to isolate a portion of nodes 1728 from the rest of the network. In other words, the attack aims 1729 at splitting the network in two parts that cannot communicate 1730 with each other (hence the name partitioning). In this respect, 1731 the attack can be seen as an Eclipse attack against multiple 1732 nodes.         To partition the network, A follows these steps:   The attack can continue until either a block is created by 1877 an unaffected miner, or T resets its clock.  [26], can be 1893 used to prevent a node from receiving a specific transaction 1894 for an arbitrary amount of time, and, by extension, to delay 1895 the propagation of such a transaction through the network. 1896 The attack exploits the timeout set by nodes after request-1897 ing a transaction with GETDATA (see § IV-B1). At the 1898 time of publication, nodes waited 2 minutes after sending 1899 a GETDATA(tx) message, before requesting tx to another 1900 peer. Moreover, the queue of peers from which to request 1901 each tx was ordered by the time each INV(tx) message was 1902 received, and allowed the same peer to advertise the same 1903 tx multiple times. This enabled A to delay the delivery of 1904 transactions for an indefinite amount of time. This procedure prevents T from receiving tx for 2 minutes. 1916 To delay the reception of tx for τ minutes, A sends τ/t 1917 INV messages (possibly from different nodes) advertising tx, 1918 without replying to the corresponding GETDATA requests. 1919 As a consequence, T will wait τ minutes without receiving tx.  To delay the delivery of a block to T , A follows these steps:        2) If traffic from T is intercepted, corrupt block-1982 requesting GETDATA messages, so that the requested 1983 block is not sent; to have the message accepted by the 1984 recipient and keep the connection alive, preserve the 1985 message length and structure, and update the TCP and 1986 Bitcoin checksums.

1987
3) If traffic towards T is intercepted, corrupt BLOCK mes-1988 sages, so that the block is considered invalid; while 1989 discarding the block, T will not request the block until 1990 the timeout expires.

1991
In both cases, the corresponding block is not received by 1992 T for the duration of the timeout (20 minutes, at the time of 1993 publication).  The attack proceeds in two phases.

2036
In Phase 1, A's nodes follow these steps:      To that purpose, A typically infers or injects information 2079 that allows her to recognize the target in a subsequent session. In particular, fingerprinting techniques can be divided into 2081 block-based, which identify nodes by the blocks they store, 2082 and address-based, which inject a set of addresses in a 2083 node's database and later retrieve them to recognize the 2084 node.    The attack exploits the fact that Bitcoin clients accept 2119 unsolicited ADDR messages, and the fact that it is possible to 2120 query items from a node's address database using GETADDR 2121 messages. To set a cookie for T , A follows these steps:  [103], or through Bitcoin-specific network-2210 level attacks, such as Eclipse ( § VIII-C), Delay ( § VIII-E), 2211 or Direct attacks such as the Tapsell attack, described in this 2212 section.

2213
On the other hand, network-scale attacks typically aim at 2214 decreasing the network performances and increasing costs. 2215 These attacks are achieved by means of spamming, or flood-2216 ing, attacks, which we describe in this section.   7 times, from 0.33 to 2.67 hours [104], with peaks of 2277 almost 24 hours [108]. Similarly, transaction fees increased 2278 from 45 to 68 Satoshis per byte [104]. 2279 Curiously, in the attempt of cleaning up these spam trans-2280 actions, the biggest transaction in Bitcoin history (999 KB) 2281 was also produced (by a mining pool), causing even further 2282 distress to the network [109].  To waste network bandwidth, A generates transactions 2299 that will not be included in the blockchain, such as 2300 double-spending or orphan transactions. In a more sophis-2301 ticated approach, A can exploit transaction malleability 2302 ( § VII-A) to create multiple copies of a transaction (possibly 2303 from another user), which get validated and propagated, but 2304 only one of which will be eventually mined.

2305
To waste CPU, A creates very large transactions (e.g., with 2306 lots of inputs and lots of outputs). As validation depends on 2307 the transaction size, these cause heavy computation loads.

2308
To waste memory, A targets the UTXO set and the mem-2309 pool. As these structures are typically stored in memory, their 2310 size directly impacts the efficiency of validation speed [111]. 2311 To bloat the UTXO set, A creates transactions that split few 2312 inputs into many outputs [104]. To bloat the mempool, A 2313 creates either large transactions or multiple dust transactions 2314 (i.e., spending trivial amounts). In § IX-A4, we show a spe-2315 cific instance of this attack. Memory-wasting attacks are particularly worrisome: since 2318 the contents of both UTXO set and mempool are typically 2319 shared by all nodes, an increase in their size can affect the 2320 efficiency and speed of the whole network.

2321
Again in 2015, an attack brought the mempool size to 2322 almost 1 GB, causing a thousand nodes to crash [112]. In 2323 2017, the mempool exceeded 115k due to unconfirmed trans-2324 actions, resulting in 700 million USD stalled [113]. 2325 Other attacks reported in 2015 include a malleability-2326 based attack [114] and a money drop, where a number of 2327 VOLUME 10, 2022 private keys were released to the public to trigger a race to spend them, generating multiple double-spending transac-2329 tions [104].    In this type of attack, the adversary A deceives a victim 2386 V (usually a merchant) by tricking him to accept a trans-2387 action in exchange for a service or good, while having a 2388 double-spending one (typically sending the coins back to A) 2389 accepted by the rest of the network and mined into the 2390 blockchain (thus invalidating the payment). To this purpose, 2391 A targets V's node, whose IP address must be public and 2392 known to A, and manipulates its view of transactions or 2393 blocks. The attack occurs in a limited time frame, in which 2394 A obtains the desired service or good from V before he 2395 discovers the scam.

2399
In the rest of this section, we will denote the paying trans-2400 action with tx V , and the double-spending one with tx A . In this attack, first described by Karame et al. in 2012 [118], 2403 A takes advantage of a scenario in which V cannot wait for the 2404 paying transaction to be mined, due to the long confirmation 2405 times (around 20 minutes on average [119], [120]).

2406
In this situation, V relies on receiving the paying transac-2407 tion from the network, which supposedly prove the expected 2408 payment has been broadcast, and is considered sufficient 2409 to believe it will be confirmed. However, A can deceive V 2410 by sending him the paying transaction while broadcasting a 2411 double-spending one to the rest of the network. Two conditions have to be met for this attack to succeed: 2421 (1) T has to receive tx V before tx A , and (2) tx A has to be 2422 mined into the blockchain.

2423
For the attack, A sends tx V to T , and, after a short time, 2424 tx A to the helper nodes, which broadcast it to their peers.  tx V to all its neighbors, none of these will accept the 2467 double-spending tx A , which hence will never reach tx T ; 2468 By adopting these measures, the success rate of the attack is 2469 reduced to just 0.09% [124]. The attack succeeds if no other block is found between the 2488 broadcast of tx V and the broadcast of B A . On the other hand, 2489 if this occurs, A loses both tx V 's money and B A 's reward. 2490 The attack works even if V waits a few seconds to verify that 2491 the network accepts tx V .

2492
Nevertheless, the success probability of this attack heavily 2493 depends on the mining power A controls. In particular, given 2494 the current hashrate of the Bitcoin network, the success prob-2495 ability of an adversary that does not control a considerable 2496 fraction of the mining power is negligible. This attack, proposed in 2011 [126], is a combination of the 2503 race and Finney attacks, and can be used for 1-confirmation 2504 double spending.  2) Create tx V for the deposit, without broadcasting it.

2515
3) Start mining a block B A containing tx V . 2516 4) When the block is found, withhold it.
2517 5) As soon as a new block is mined in the network, send 2518 B A to T , which will then believe tx V has one confir-2519 mation. 2520 6) Request a withdraw, create tx A , and broadcast it.

2521
If B A is accepted into the blockchain, then A has simply made 2522 a deposit and a withdrawal, thus losing nothing. Otherwise, 2523 if the withdrawal did not use tx V 's outputs, then A gains the 2524 amount of the deposit.  2) Isolate these subgroups by disrupting the delivery of 2588 blocks during a time τ .

2589
3) As soon as a block B is mined in a subgroup G i , send 2590 tx V to T in a different group G j : this way tx V will be 2591 confirmed in G j 's chain.  In most cases, protecting from malleability attacks is just a 2684 matter of best practices [47], [130], such as checking con-2685 firmed transactions outputs instead of relying on TXIDs [14]. 2686 Morover, Bitcoin Core protects users from this attack 2687 by checking transactions by their outputs instead of their 2688 TXID [14]. At the network level, a Bitcoin transaction can be 2691 deanonymized by identifying the node that generated it, 2692 or, more precisely to its IP address. This strategy is based 2693 on the assumption that (autonomous) users broadcast their 2694 transactions from their own devices. As such, all transactions 2695 generated from a given device are likely to belong to the same 2696 owner. In other words, a transaction tx belongs to a user U if 2697 its propagation (via INV) started from U 's node. Network-wide deanonymization was first proposed by Dan 2720 Kaminsky in 2011 [132]. This attack is based on the obser-2721 vation that the node that creates a transaction will be the first 2722 one in the network to announce it. In the basic attack, A follows these steps:  The switch from Trickle to Diffusion [37] should have 2813 improved the protection against this attack. However, the 2814 effectiveness of such measure is unclear [70].

2815
To prevent the fingerprinting method, a periodic random-2816 ization of the outbound peers was proposed in 2014 [140]. 2817 However, this was eventually discarded due to possible new 2818 vectors for Partitioning attacks. In this attack, described in 2017 by Wang et al. [25], 2821 A deanonymize unreachable nodes by using separate 2822 monitoring nodes.  In the previous sections, we reviewed and classified 2980 network-level attacks against the Bitcoin P2P protocol. 2981 In particular, we organized available information in a struc-2982 tured format, based on a generalized Network Adversary 2983 model and an objective-focused attack taxonomy.

2984
In this section, we concisely summarize such information, 2985 and take a broad-spectrum look over it to infer useful insights 2986 on the Bitcoin network security.

2988
In Table 1, we summarize all the attacks reviewed in this 2989 paper (except for Infrastructure attacks, which operate out-2990 side the Bitcoin protocol). For each attack, we specify target 2991 and adversary model, the Auxiliary attacks it depends on or 2992 can helped by, the type of weakness exploited, and the type 2993 of mitigation.

2994
By looking at this table, it is possible to infer common 2995 characteristics of network-level attacks, such as their typical 2996 setting (adversary and targets) and which components of the 2997 protocol are involved (weaknesses and mitigation). In the 2998 following, we report our observations. Most attacks require the adversary to be connected to the 3001 target. For this reason, the target is often a reachable node 3002 (or multiple ones), whose IP address is known. Similarly, the 3003 adversary often connects to all reachable nodes to observe 3004 or manipulate the propagation of data. To that purpose, she 3005 typically deploys multiple nodes, through which she actively 3006 interacts by injecting, modifying, or dropping messages. 3007 Less commonly, the adversary targets unreachable nodes and 3008 miners. Unreachable nodes are both more vulnerable, when 3009 connected to an adversary, and less exposed, because she 3010 cannot ensure such a connection. On the other hand, miners 3011 are typically protected by the use of relay networks. Attacks 3012 targeting the whole Bitcoin network are rare and typically 3013 limited to DoS and Partitioning attacks.

3014
While the above-mentioned characteristics give a generic 3015 overview of network-level attacks, it is interesting to also 3016 study the threat model in the different attack categories. 3017 In particular, by looking at the table, we can observe the 3018 following settings: 3019 • Direct attacks: 3020 -DoS: in these attacks, the adversary typically targets 3021 the whole network by injecting transactions; less 3022 commonly, she targets specific nodes, whose IP is 3023 known.

3024
-DS: in these attacks, the adversary is typically a 3025 miner, injecting blocks, and targeting a merchant 3026 connected with a reachable node, whose IP is 3027 known.

3028
-D: in these attacks, the adversary typically connects 3029 to the target and all other reachable nodes, and 3030 observes transactions. Auxiliary attacks are a prominent threat for the Bitcoin net-3103 work, as they enable the adversary to achieve one or more pri-3104 mary goals, or to improve the effectiveness of other attacks. 3105 In this section, we study how Auxiliary attacks relate to each 3106 other and to Direct attacks. We summarize this relationship 3107 in Table 2 and graphically represent it in Figure 4.

3108
These resources allow us to infer interesting information 3109 about Auxiliary attacks. First of all, it is clear how the 3110 Sybil attack is the most influential, enabling all other Aux-3111 iliary attacks (except Fingerprinting), and facilitating Dou-3112 ble Spending and Deanonymization. The Eclipse attack is 3113 arguably the most powerful, enabling all Direct attacks as 3114 well as Delay and Partitioning. Topology Inference is also 3115 relevant, influencing Double Spending, Deanonymization, 3116 and Unfair Revenue, while also facilitating Eclipse and Par-3117 titioning attacks. On the other hand, Fingerprinting is only 3118 related to Deanonymization.

3119
Another interesting observation is the fact that Eclipse, 3120 Partitioning, and Delay attacks have an almost overlapping 3121 TABLE 2. Relationship between network-level attacks: an x in a cell indicates the attack in the corresponding row influences (i.e., enables, facilitates, or improves) the attack in the corresponding column.
set of influence relationships, indicating they are strictly 3122 related to each other. In fact, these attacks can be conceptually 3123 seen as variations of a MitM-like attack where the adversary 3124 affects the communications between a target and the rest 3125 of the network. In Figure 4, we reflect this commonality 3126 by grouping the three attacks in a single block. In addition, 3127 we introduce the term EPD attacks (from the initials of their 3128 names) to refer to these attacks collectively.

3129
Similarly, for Direct attacks, we can observe a close rela-3130 tion between Double Spending and Unfair Revenue, which 3131 share most of the influences, suggesting that they might 3132 belong to a common threat category. On the other hand, using 3133 the same criteria, DoS and Deanonymization seem to belong 3134 to a separate category.

3136
As evidenced from our references, most Bitcoin network 3137 issues have been exposed, and in many cases addressed,   Double-spending attacks have been extensively addressed in 3162 the Bitcoin protocol, and are currently unlikely to be per-3163 formed successfully. To achieve this objective in a direct way, 3164 the adversary is required to control a large portion of the 3165 network nodes and to have powerful mining resources.

3166
The Balance attack [128] has not explicit mitigation in the 3167 protocol. However, its implementation is very complex. Transaction deanonymization is likely the most realistic 3170 issues in the current Bitcoin protocol. In fact, it has been 3171 shown that the current relay protocol offers very poor 3172 anonymity properties for reachable nodes [45]. While some 3173 solutions have been proposed in research, no effective mea-3174 sure has been implemented so far. Unreachable nodes can also 3175 be easily deanonymized [25]. 3176 Furthermore, the PERIMETER attack [141] can be used 3177 against all nodes in the network. No mitigation has been yet 3178 introduced in the protocol. Selfish-mining [66] and other akin strategies [67], [144] are 3181 currently another open problem and a prominent topic in 3182 Bitcoin security research. However, very few real-world cases 3183 have been reported so far, probably due to the resources 3184 needed for the attack, the implementation complexity of the 3185 strategies, and the relatively-low profits for the attacker.

3186
At the same time, not much can be done at the network level 3187 without introducing changes to prevent block-withholding 3188 and similar malicious behaviors.

3190
Many of these attacks have been exposed during the early 3191 stages of the Bitcoin network. However, due to the difficulty 3192 in finding proper solutions, and partly to lack of encrypted 3193 communications, most of them are still a realistic threat.  exist. However, these attacks require a highly-privileged 3210 adversary (e.g. an ISP), which can be potentially exposed 3211 by the attack itself. This acts as a potential deterrent for the 3212 attacker, making them less likely to occur.   Except for timejacking [86], most of these attacks have been 3254 mitigated in the Bitcoin protocol, which currently implement 3255 privileged channels to rapidly receive blocks from the net-3256 work. At a more abstract level, it would also be interesting to 3275 study the intrinsic relation between Eclipse, Delay, and Parti-3276 tioning attacks: is it possible to entail the three attacks under 3277 a single category? Similarly, the relation between these and 3278 Sybil and Topology Inference could also be analyzed.

3279
Another important research direction is towards a formal-3280 ization of the Bitcoin P2P protocol and its security. The 3281 information provided in this paper could be used as a basis 3282 for creation of a network model and to define the fundamental 3283 threats to the security of its nodes and users.

3284
To that respect, another useful task for future research 3285 would be to gather and organize information on the Bitcoin 3286 P2P protocol, with both high-level documentation and low-3287 level details. Today, both new and experienced researchers 3288 have to face the challenge of learning how the protocol actu-3289 ally works. Having a single, complete source of information 3290 would be immensely beneficial for the community, and even-3291 tually strengthening Bitcoin security.

3293
Bitcoin network-level security is a complex subject. Lack 3294 of official protocol documentation and scattered information 3295 on relative threats and solutions make it hard to properly 3296 study the subject. In turn, this hampers the identification of 3297 structural issues in the protocol, and the design of long-term 3298 solutions.

3299
In an effort to help the community fill this gap, and move 3300 towards a more formal network security model, we provided  We believe this work can help researchers and developers 3323 better understand the Bitcoin P2P protocol and eventually 3324 pursue a more formally-secure blockchain network.

3325
Since 2022, he has been a Research Engineer 3773 at QPQ. His research interests include IT security, 3774 blockchain and P2P networks, operating systems and virtualization, malware 3775 and botnets, and memory forensics. 3776 VANESA DAZA was born in Barcelona,Spain. 3777 She received the B.Sc. degree in mathematics from 3778 the Universitat de Barcelona,Barcelona,Spain,3779 in 1999, and the Ph.D. degree in mathematics 3780 from the Universitat Politècnica de Catalunya, 3781 Barcelona, in 2004.

3782
She worked as a Researcher in the industry 3783 (Scytl, Spain) as well as an Academia (Rovira i 3784 Virgili University, Spain). Among other positions 3785 serving at UPF, she chaired the Information and 3786 Communication Technologies Department, Pompeu Fabra University. She 3787 has coauthored more than 30 papers, including international journals and 3788 top conferences of cryptography and cybersecurity. Her research interests 3789 include the use of distributed cryptographic techniques to enhance security 3790 and privacy to secure emerging technologies, with special emphasis on 3791 blockchain technology.