A Refined Analysis of Zcash Anonymity

With the continuous development and popularity of blockchain technology, anonymity of cryptocurrency has attracted wide attention. Zcash is an altcoin of Bitcoin aiming to protect blockchain anonymity. Its anonymity is highly guaranteed by zero-knowledge proofs. However, it is still practicable to decrease Zcash’s anonymity. In this paper, we provide a refined empirical analysis of Zcash anonymity. We improve current address clustering methods and increase the clustering rate by 9%. We also analyze the whole process of distributing mining reward and identify 87.5% addresses and 25.7% transactions. Besides, we simplify Zcash transaction network and then pick out nodes (edges) which play important roles in network connectivity. We show that these nodes are mostly mining pools. In particular, users participating in shieldedpool are mostly founders, miners and mining pools, although shieldedpool itself is designed for protecting anonymity of users with high privacy requirements. Our results, to an extent, are opposite to the original intention of Zcash.


I. INTRODUCTION
Bitcoin [1] is a peer-to-peer digital cash system proposed by Nakamoto in 2008. The entire transaction history of Bitcoin is stored in a distributed public ledger denoted as blockchain. Bitcoin system guarantees the pseudonymity [2] of transactions in two aspects. Firstly, the addresses, in form of hashed cryptographic keys, used for sending and receiving BTCs, are created pseudo-randomly. Secondly, one user can create any number of Bitcoin addresses in order to protect his identity. However, a series of previous studies [3]- [7] indicate that the anonymity in Bitcoin system can be greatly reduced. It is possible to track Bitcoin transaction flow, cluster different Bitcoin addresses belonging to the same user and match Bitcoin addresses to users' real identities.
Several techniques are proposed to improve the anonymity of Bitcoin, such as mixing services [8] and joint transaction [9]. A series of altcoins have also been created to improve anonymity such as Dash [10], Monero [11] and Zcash [12]. Among these altcoins, Zcash has its own unique The associate editor coordinating the review of this manuscript and approving it for publication was Jiafeng Xie.
advantage. Zcash's anonymity relies on shieldedpool, 1 where partial transaction information such as input/output addresses and transaction value is no more directly available from blockchain compared with Bitcoin. The theoretical basis for shieldedpool is practical zero-knowledge proofs called zk-SNARKs [12].
Several researchers [13]- [15] consider Zcash anonymity in practice. On the one hand, they use similar methods in Bitcoin to analyze Zcash, mainly aiming at transactions between t-addresses (addresses not related to shieldedpool). They cluster and tag addresses, then match them with actual identities. On the other hand, researchers study how to use shieldedpool for deanonymization. After establishing some cluster heuristics related to shieldedpool, they investigate how coins are deposited into and withdrawn from shieldedpool. However, current address clustering methods only consider part of all transaction types. Some other Bitcoin deanonymization methods not used in Zcash before are also suitable for investigating Zcash anonymity. Thus, in this paper we improve the deanonymization results by considering more transaction 1 Users may choose whether or not to use shieldedpool in a transaction. If not, then the transaction is purely in valuepool. Details are shown in Section II. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ types and using more deanonymization methods such as user behavior identification and complex network analysis.

A. OUR CONTRIBUTION
In this paper, we give a refined analysis of Zcash anonymity and improve current Zcash deanonymization results. The main contributions of this paper are as follows: 1) We improve address clustering methods and take more transaction types into account. We propose a refined address clustering heuristic and coalesce 36% addresses into multiple entities. The clustering rate is increased by 9% compared to previous research [15]. 2) We study the whole process of mining reward distribution. We focus on intermediate transactions that have not been thoroughly studied before and obtain improved results. We discriminate 87.5% of all addresses involving in this process and 25.7% of all transactions serving for it. 3) We build a transaction network and analyze its basic topological properties such as degree, clustering coefficient and Pagerank. We conclude that it is a heterogeneous and sparse network, which is consistent with the actual trading situation. Furthermore, we also simplify this transaction network according to results in 1) and 2). We compare relevant properties and pick out important nodes (edges) using the new simplified network. Note that few studies focus on the topological properties of transaction network itself and deanonymization on the level of complex network. 4) We find that users participating in shieldedpool are mostly founders, mining pools and miners. This is opposite to the intention of Zcash where shieldedpool is designed to transfer coins with high privacy need.
To the best of our knowledge, this is the first detailed research on identity and proportion of users inside shieldedpool.

B. ORGANIZATION
In Section II, we introduce how Zcash works. In Section III, we analyze the general statistics of Zcash blockchain. We give our deannonymization methods and results in Section IV. Section V gives the conclusion.

C. RELATED WORK 1) DEANONYMIZATION OF BITCOIN
There have been multiple studies focusing on deanonymizing Bitcoin transactions. Reid and Harrigan [6] firstly analyzed Bitcoin anonymity. They built two types of networks (i.e., transaction network and user network) and analyzed their topological characteristics. However, research only on network topological properties lacks practical significance. Several other researchers explored the development of transaction network and tracked the flow of transactions [3], [6], [7]. For instance, Ron and Shamir [7] tracked 364 transactions over 50,000 BTCs and gave a detailed transaction flow analysis. However, research above lacks analysis of connections among different addresses. Thus, some researchers applied clustering heuristics [5], [7], [16] which cluster different addresses belonging to the same user. One common assumption is multi-input heuristic, which means all the input addresses of one transaction belong to the same user. Another assumption is change heuristic, which means input addresses and change addresses in a single transaction also belong to the same user. Androulaki et al. [16] applied the above two heuristics to build an anonymity attack model and made an experiment in a college. They found that the clustering simulation results of the model were close to the actual situation.
Besides, there also exists TCP/IP layer analysis. Koshy et al. citeK:AAB:14 analyzed the matching relationship between Bitcoin addresses and IP addresses. The main idea is that the first node to inform the receiving node of a transaction is the source of the transaction.

2) DEANONYMIZATION OF ZCASH
Since Zcash was proposed as an altcoin of Bitcoin, many researchers in Zcash borrow techniques from Bitcoin research. For instance, Kappos et al. [15] ran multi-input heuristic to analyze the deanonymization of Zcash, although this heuristic is only appropriate between transparent transactions. Several studies are unique in Zcash, aiming at the deanonymization of transactions related to shieldedpool. Jeffrey [14] found a common regularity of transactions related to shieldedpool. This regularity is performed as round-trip transactions (RTT for short). First, coins are sent from a t-address 2 to a z-address. Shortly afterwards, coins with the same or very similar value (usually with a gap of common fee value) is moved from shieldedpool back to valuepool. Jeffrey believes that the two t-addresses of RTT are likely to be controlled by one entity. It is found that 31.5% of the coins sent to shieldedpool may be involved in RTT, and this regularity is likely due to the behavior of miners and mining pools.
Alex and Daniel [13] analyzed how mining rewards are distributed from mining pools to miners. In Zcash, mining rewards need to be put into shieldedpool before being given to miners. They found two main patterns of paying mining rewards to miners. In the first pattern, called Pattern T, reward is moved from a z-address to a t-address of the mining pool, and then distributed to miners. In the second pattern, called Pattern Z, reward is distributed directly from z-addresses to miners' t-addresses. By analyzing Pattern T and Pattern Z, Alex and Daniel identified 96% of mining rewarding transactions.
Kappos et al. [15] proposed a new heuristic in Zcash, linking t-addresses and z-addresses. The main idea of this heuristic is from change heuristic. They identified and classified In-address and out-address represent input and output address, respectively. In-infor and out-infor refer to input and output information, respectively. The notation • represents that the information of current grid is attainable, and × represents the information is unattainable. various participants in Zcash, analyzed the transaction characteristics and gave an in-depth analysis of all interactions with (and within) shieldedpool.

II. BACKGROUND
In this section, we introduce how Zcash works. Zcash was launched on 29th October, 2016 [17]. The currency in Zcash blockchain is called ZEC. Since the original version of Zcash is planned to be a fork of Bitcoin, the structure of transactions in Zcash is similar to Bitcoin. There are two types of addresses in Zcash. One is transparent address and the other is shielded address. Transparent transactions (i.e., the sending and receiving addresses are both transparent addresses) are nearly the same as transactions in Bitcoin. That is, one can easily obtain transaction information such as value, fee, input (output) number and senders' (receievers') addresses from blockchain. As the public keys of these transparent addresses always start with a letter t, we denote them as t-addresses.
In order to protect anonymization, shielded address is used in Zcash system. As public keys of these shielded addresses always start with a letter z, we refer to these addresses as z-addresses below. Next we explain how transactions with z-address become ''shielded''. z-addresses are not exposed in blockchain and the coins sent to or received by a z-address are also not revealed. In addition, any number of t-addresses and z-addresses are permitted in one transaction.
In Table 1, we distinguish several kinds of transactions in Zcash system. The t-t transactions, as mentioned before, are nearly the same as those in Bitcoin. However, in a z-t transaction where all the input addresses are z-addresses and all the output addresses are t-addresses, one can only collect little input information from the blockchain as the input address of this transaction is ''null'', the number of input addresses is ''zero'' and the value of unspent inputs is also ''zero''. Similarly, in a t-z transaction, output information is hard to attain and in a z-z transaction both input and output information is unattainable. A simple view of transaction types in Zcash is shown in Figure 1.
Although the value of ZECs sent to or received by a particular z-address is not attainable, the variation of coins after a transaction can be obtained. This is why valuepool and shieldedpool are brought in. Shieldedpool describes the value variation of z-addresses and valuepool describes the value variation related to t-addresses. In detail, in a t-t or z-z transaciton, the value of valuepool and shieldedpool will not change. In a t-z transaction, the value of valuepool will decrease and the value of shieldedpool will increase. Thus, the t-z transactions may be vividly considered as putting ZECs from valuepool to shieldedpool. Similarly, the z-t transactions can be thought as transferring ZECs from shieldedpool into valuepool. Two parameters V old pub and V new pub are used in blockchain script to describe the value in valuepool and shieldedpool. V old pub (V new pub ) means the value of valuepool before (after) the operation of the current transaction. Then the variation of shieldedpool's value V sld can be obtained by Equation (1), Zcash's main users include founders, miners, and mining pools formed by a number of miners. Each coinbase transaction generates about 12.5 ZECs, of which 2.5 ZECs are returned to founders and 10 ZECs are distributed to the miners or mining pools as rewards for generating blocks. Block rewards will be halved every several years. We emphasize that in Zcash's protocol [17], new coins must be put into shieldedpool before subsequent transactions are executed, which, to an extent, strengthens the anonymity.

III. ANALYSIS OF BLOCKCHAIN DATA: POOR USE OF SHIELDEDPOOL
In this section, we give a general analysis of Zcash Blockchain. We download Zcash blockchain, and mainly use Python to achieve data processing. We collect blockchain data from Oct 29th, 2016 to Feb 28th, 2019. The total value of Zcash at that time is 5,258,353 ZECs. There are more than 474,822 blocks, about 4,000,000 transactions and about 200 GB of transaction data. We pay special attention to shieldedpool, as this is the main difference between Bitcoin and Zcash. Recall that all coins in Zcash are assigned in valuepool and shieldedpool. A comparison of the total value in valuepool and the total value in shieldedpool is shown in Figure 2. The total value of valuepool increases basically at a linear rate due to the continuous generation of new blocks. However, its peak total value is only 366,417 ZECs, which accounts for only 6.9% of the total value. Therefore, we believe that few users use shieldedpool.
We then further investigate different types of transactions in Zcash. The total number of each transaction type is listed VOLUME 8, 2020    in Table 2. We find that the majority types of transactions are transparent transactions and coinbase transactions, accounting for 76.3% and 10.3% of all transactions, respectively. None of them is related to z-address and shieldedpool. Only  13.6% transactions include z-address. This can be further obtained in Figure 3 and Figure 4 . It seems that most transactions including z-addresses are shielded and deshielded transactions instead of private transactions.
Note that the shielded transactions and the deshielded transactions are close in terms of number, number percentage and total input value. This may indicate that after ZECs are moved into shieldedpool, they are withdrawn within a few hours. Similar analysis is available in previous research [14] and this kind of ''deposit and withdrawal'' mode is called ''RTT'' (Round-trip transactions). Figure 5 gives details of RTT. According to [18], 31.5% of all the transactions related to shieldedpool belongs to RTT.
However, identities of users inside shieldedpool lack further research. Previous work paid more attention to ''who is in the shieldedpool'' but we focus on ''the proportion of different members in shieldedpool.'' This question is naturally drawn up by an interesting experiment below. From Figure 6, we observe the total value of shieldedpool over time (the blue and thicker line). The red and thinner line represents the total value operated by founders according to the heuristic in previous research [15]. We find that at the early stage of Zcash, ZECs involved in shieldedpool are almost operated by founders. However, as time goes by, the disparity between these two lines gradually widens. This means there are other entities contributing the value of shieldedpool. The general structure of TN. The size of each node is proportional to its degree. The largest node represents shieldedpool and connects with many addresses with large degree due to frequent deposits and withdrawals.
We give a possible explanation of this ''disparity'' in Section IV-E.

IV. OUR WORK
We present our deanonymization results in this section. In Section IV-A, we build a transaction network and analyze its topological properties. We introduce the new clustering heuristic in Section IV-B and analyze the whole process of mining reward in Section IV-C. We simplify the transaction network in Section IV-D and give conclusions in Section IV-E.

A. BUILDING TRANSACTION NETWORK
We choose 1,000 blocks from height 29400 to 29500 and build a transaction network using Gaphi. Note that we do not focus on the whole Zcash blockchain as the data processing will be greatly slowed down particularly in the establishment and visibility of our transaction network. Besides, there are also deanonymization research on partial blockchain data [19]. In fact, due to the large number of users and transactions in Zcash, a sample of data is also quite representative.
The transaction network is built as follows. Every t-address is seen as a node. If one node acts as input and another node acts as output in a transaction, then a directed edge is established between these two nodes. Considering that one transaction may include multiple nodes, there may be multiple edges in one transaction. Due to the invisibility of z-addresses, it is hard to refer to these addresses as nodes. However, the in&out information of shieldedpool itself is available. As all z-addresses are in shieldedpool, we consider shieldedpool as a unique node, representing the set of z-addresses.
By applying the rules in the last paragraph, we obtain a transaction network with 98,554 nodes and 183,771 edges. We call this network TN. In Figure 7, we show a general structure of TN, and the biggest node is shieldedpool. The  degree of a node in a network is the number of connections to other nodes. TN has an average degree of 3.7, which indicates that one address has connection with 3-4 addresses in average. Pagerank 3 is used to measure the relative importance of network nodes [20]. Figure 8 shows the top 10 nodes of degree and Pagerank in TN. Shieldedpool has a degree of over 70,000, indicating an important role in connectivity. Meanwhile, nodes with top 10 degree and nodes with top 10 Pagerank are the same. Besides, although these 10 nodes only account for 0.01% in number, they contribute 33.3% edges of all the network. These two aspects both imply that in TN, the connectivity and importance of nodes have a positive correlation to some extent. This means nodes with high connectivity tend to be more important.
A clustering coefficient is a measure of the degree to which nodes in a network tend to cluster together [21], [22]. TN has an average clustering coefficient of 0.185, which means it is a sparse network. Roughly speaking, even if an address addr A is involved in two transactions at the same time, there is often no transaction between the addresses connected with addr A in two transactions. For example, in transaction t 1 , addr A and addr t1 connect. In transaction t 2 , addr A and addr t2 connect. A low clustering coefficient means that there is often no transaction between addr t1 and addr t2 . The cumulative 3 We use a simplified version of Pagerank adapted from [20]. Let u be a node in a complex network and R(u) be the Pagerank of node u. Then let Deg out u be the set of nodes u points to and Deg in u be the set of nodes which points to u. Let N u = |Deg out u | be the number of links from u and let c be a factor used for normalization. Then R(u) = c v∈Deg in VOLUME 8, 2020 degree distribution of TN is shown in Figure 9. We find that the degree distribution of TN network basically presents a power-law distribution (P(X ≥ x) ∝ x λ ), which is very similar to many complex social networks [23].
In conclusion, TN is a heterogeneous network with powerlaw degree distribution and low clustering coefficient, where a few major addresses play crucial roles. However, we can only obtain network topological properties from the current TN and it is hard to use these properties to link Zcash addresses with users' identities. Therefore, other deanonymizing methods are needed to simplify the network, extract important nodes (edges) and deanonymize users. We will show them shown in the following sections.

B. ADDRESS CLUSTERING
In Zcash, there are two main address clustering methods. One is multi-input heuristic (Heuristic 1) and the other is change heuristic (Heuristic 2). Heuristic 1 holds because a sender, who knows the private key corresponding to each input user's public key, would not reveal his private keys to others [5], [16]. Therefore, input addresses in a transaction might be linked. Heuristic 2 means that when a sender would not put all his ZECs into shieldedpool, he might transfer part of them to a t-address. So the sending address and this t-address might be linked.
Heuristic 1 (Multi-Input Heuristic) [15]: If two or more t-addresses are inputs in the same transaction (whether that transaction is transparent, shielded, or cross), then they are controlled by the same entity.

: If one (or more) address is an input t-address and a second address is an output t-address in the same tz-tz transaction, then if this is the only transparent output address, the second address belongs to the same user who controls the input addresses.
We apply these two heuristics and get 735 entities including 26,406 addresses. So the clustering rate is 27%, close to the result 26% in previous research [15]. We emphasize that Heuristic 1 contributes the majority of entities but Heuristic 2 only contributes 11 entites and 34 nodes. In fact, Heuristic 2 only involves shielded transactions. If we take other transaction types into account, the clustering rate might be improved.
We improve Heuristic 2 based on the observation on transaction data. For example, a transparent transaction has one input address and two output addresses. 4 All of them are t-addresses and the transaction value is 566.355519 ZECs. Considering the two output addresses, one received 566.2228206 ZECs and the other one received 0.1326927 ZECs. That is to say, the fee of this transaction is 6 × 10 −6 ZECs. This fact strongly indicates that the second address is a change address. This is because the two output value have a huge gap and it is hard for one single account to meet the above two conditions at the same time. The transaction only has one input and two outputs. This strengthens 4 Txid of this transaction is ''fffacc59f3dc6e48b50bcc79199ef96ee135fbbd db261f4639fefa1069260136''. our guess as two outputs usually mean the transaction is pure value-transferring instead of functional ones such as mixing service or procedural ones such as mining reward distribution. Our variable change heuristic (Heuristic 3) is based on this special circumstance.
Heuristic 3 (Variable Change Heuristic): If a transparent transaction has one input and two outputs, and the value of one of the two outputs is more than 20 times that of the other, then the address with smaller value is the change address. 5 After applying Heuristic 3, we obtain a group of 4,472 entities, among which 580 entities (13% of all) are the same as the result by applying Heuristic 1. This result, to some extent, reflects the reliability of Heuristic 3.
We then merge the above two entities obtained after applying Heuristic 1,2 and Heuristic 3, respectively. If two entities have a common node, then nodes in these two entities are merged into one entity. Repeat this process until there are no duplicate nodes in any two entities. Finally, we get 593 entities with 36,169 nodes, increasing the clustering rate from 27% to 36%. The top 10 large entities is shown in Table 3. The top 10 entities include 23,990 nodes, accounting for 66% of all the clustering nodes, which once again indicates that Zcash transaction network is a highly heterogeneous network.

C. IDENTIFING THE WHOLE TRANSACTION PROCESS OF MINING REWARD
Currently, every coinbase transaction generates 12.5 ZECs. Among them, 2.5 ZECs is transferred to founders and 10 ZECs is allocated to miners as reward. Similar to Bitcoin, miners often gather together to form a mining pool and mining rewards are distributed to its miners. According to Zcash protocol, the mining reward must be put into the shieldedpool before use [17]. This actually gives us another way to identify related transactions.
Previous studies present two possible patterns for mining pools to distribute mining rewards [13]. One is Pattern T, which means that after coins are put into the shieldedpool (z-addresses of mining pools). Coins are firstly transferred to t-addresses of mining pools, and then allocated to t-addresses of miners. The other is Pattern Z, where ZECs are directly allocated to miners from the mining pools' z-addresses. By applying these two patterns, transactions with miners as receivers can be found [13].
The process of reward distribution is as follows. First, in a coinbase transaction, mining rewards are transferred to mining pools' addresses. Second, rewards are deposited into shieldedpool as required by Zcash protocol [17]. Finally, after a series of intermediate procedures, rewards are distributed to miners. In fact, not only transactions in the last procedure (transactions where miners take as receivers), but also intermediate transactions can also be explored and used for deanonymization. The whole process is shown in Figure 10. Mining rewards are transmitted to mining pool's t-addresses and mining pool's z-addresses. Then it goes in two ways. One directly goes to miner's t-addresses and the other one goes to mining pool's t-addresses and then to miner's t-addresses. Numbers above arrows mean the number of transactions we identify.
We build two lists, including miners list L m and mining pools list L p , and then present Heuristic 4. L p is updated according to the first to third items and L m is updated according to the second and third items. We repeat the update process until no new address is added to these two lists.
Heuristic 4 (Mining Heuristic): 1. In a coinbase transaction, if the output address with a value about 10 ZECs belongs to a mining pool, then L p can be updated.
2. If the input of a transaction is a t-address, and its output contains 50 or more t-addresses (the output may have some addresses belonging to L p ), then the output t-addresses except mining pools' t-addresses belongs to miners, and the input t-address belongs to the mining pool. Thus, L p and L m can be updated. This is a part of Pattern T.
3. If the input of a transaction is a z-address, and its output contains 50 or more t-addresses (the output may have some addresses belonging to L p ), then the output t-addresses except mining pools' t-addresses belongs to miners. Thus, L m can be updated. This is a part of Pattern Z.

If the input of a transaction is a z-address, and its output is a t-address in L p , then this transaction is a part of Pattern T.
After running Heuristic 4, we obtain L p with 44 addresses and L m with 86,176 addresses. At first glance, we are surprised at so many L m addresses since the total number of addresses is 98,554 and 36,169 addresses are clustered. This suggests that many entities such as exchanges, services also participate in mining. Besides, we also obtain 1,261 coinbase transactions, 648 transactions from mining pools' t-addresses to shieldedpool, 532 transactions  from shieldedpool to t-addresses belonging to miners, 135 transactions from shieldedpool to t-addresses of mining pool and 60 transactions from t-addresses of mining pools to t-addresses of miners. These results are shown in Figure 10.

D. A SIMPLIFIED TRANSACTION NETWORK TN R
In this section, we neglect secondary nodes (edges), extract primary nodes (edges) and construct a simplified network TN s . We compare the two networks and show that shieldedpool, mining pools and miners instead of common users play a crucial role in Zcash trading. The construction of TN s is as follows. Firstly, as there is no need to handle transactions inside one entity, we denote an entity as one node. For other users who have transactions with the same user, there is no difference on the activated target address. Secondly, the miner addresses of the miner list L m not appearing in the clustering results can also be seen as one node since transactions among miners are sparse and less important. Depending on the above two rules, we get a simplified transaction network (TN s ). A general structure of TN s is shown in Figure 13. We find that shieldedpool, mining pools and miners instead of common users play a crucial role in the network. Besides, the connection between miners and shieldedpool is far less important than the other crucial parts. This indicates that most mining rewards are distributed by mining pools.
A comparison of properties in TN and TN s is showed in Table 4. TN s is a network with 6,636 nodes and 8,674 edges, accounting for 6% and 4% of TN. This suggests a great extent of simplification. TN s has a lower average  degree and higher average clustering coefficient. This means that after our replace and simplification, correlation among nodes is strengthened, which is our target of deanonymization. Figure 11 describes the cumulative degree distribution of TN s . The largest degree in TN s is 2,519, and is much smaller than that in TN. As shown in Figure 12, the nodes in TN s with top 10 degree and Pagerank are still in high consistency. Nodes with top 10 degree are also with top 10 Pagerank. Most of them consist of addresses belonging to mining pools and miners. That is to say, other entities are less involved. Compared to TN, the much smaller Pagerank indicates that we do highlight the important nodes. Note that 6 of the largest degree nodes in TN are regarded as entities in TN s , which to some extent reflects the superiority of our address clustering hypothesis. TN s highlights the important users and transactions and provides a reference for the analysis of the whole transaction network.

E. USERS IN SHIELDEDPOOL
In Section IV, we identify 87.5% nodes regarded as mining pools or miners, which implies that pure users who do not participate in mining only make up a small fraction. We also identify 25.7% transactions in the whole process of mining reward distribution.   Next we discuss the participation of various entities in shieldedpool. Figure 14 and Figure 15 respectively show the deposits and withdrawals of shieldedpool operated by founders and mining pools (miners). Figure 16 shows statistics of actual shieldedpool. We identify 95% of deposits and 87.5% of withdrawals. It means that even among very low proportions of transactions involving z-addresses, the majority users are founders, mining pools and miners, instead of changes, services or individual users. This further shows that few users actually use shieldedpool.

V. CONCLUSION
In this paper, we give a refined deanonymization analysis in Zcash. We first build a transaction network TN. Then we use deanonymization methods to simplify TN. These deanonymization methods include an improved address clustering heuristic (Heuristic 3) and mining heuristic (Heuristic 4). We compare and analyze several characteristics of TN and the simplified network TN s . In particular, we find that users participating in shieldedpool are mostly founders, miners and mining pools after investigating regular behaviors of these participants. Future work may improve our heuristics (Heuristic 3 and 4) from a more rigorous perspective. Other methods of studying complex network, such as community division and dynamic analysis might also be useful for denanonymization.