A Node Ranking Method Based on Multiple Layers for Dynamic Protein Interaction Networks

Constructing dynamic protein interaction networks (DPIN) is a common way to improve identification accuracy of essential proteins. The existing methods usually aggregate DPIN into a single-layer network where all nodes are sorted by their importance. This treatment makes the dynamic information about proteins in multiple layers lost in the single layer, and thus affects the identification accuracy of essential proteins. This paper proposes a node ranking method based on multiple layers for DPIN to address the problem. First, we calculate the centrality values of all nodes for each time-specific layer, then work out the centrality score of each node by dividing the total of its centrality values across all layers by its layer activity, and finally sort the importance of all nodes by their centrality scores. Different from the methods based on single layer, our method makes full use of centrality values of each protein in time-specific layers, and thus can more effectively utilize the dynamic information of proteins. To evaluate the effectiveness of the node ranking method based on multiple layers, we apply ten network-based centrality methods on multiple layers and compare the results with those on a single layer. Then the predictive performance of the ten centrality methods are validated in terms of sensitivity, specificity, positive predictive value, negative predictive value, F-measure and accuracy. The experimental results for the identification of essential proteins show that the node ranking method based on multiple layers is superior to those based on a single layer and can help to identify essential proteins more accurate.

ies have shown that essential proteins are also related with 23 human disease genes [3]. Therefore, the identification of 24 essential proteins is of great significance for discovering drug 25 targets, developing new drugs, and promoting the develop- 26 ment of the biomedical industry. A lot of experimental tech- 27 nologies have been proposed to identify essential proteins, 28 The associate editor coordinating the review of this manuscript and approving it for publication was Vincenzo Conti .
such as gene knockouts [4], RNA interference [5] and condi-29 tional knockouts [6]. However, these experiment methods are 30 time-consuming and expensive. 31 The rapid development of modern high-throughput 32 technologies has accumulated large quantities of omics 33 data [7], [8], [9], which provide new opportunities for pre-34 dicting essential proteins, protein complexes and functional 35 module from large molecular networks. A series of com- 36 putational methods have been developed to predict essen- 37 [22], [23], [24], [25], [26], [27]. Xiao et al. [22] 62 constructed an active PIN based on the SPIN and gene 63 expression data for the identification of essential proteins. 64 Li et al. [24] integrated gene expression data, subcellu-65 lar localization and the SPIN to identify essential proteins. 66 Zhang et al. [25] [29], LID [30], TP [31], ClusterC [32] and MNC [33]) 103 on multiple layers of the DPIN (M-DPIN) and compare the 104 results with those on a single layer (i.e. the SPIN and the 105 aggregated single layer of the DPIN (A-DPIN)). The predic-106 tive performance of the ten centrality methods are evaluated 107 in terms of prediction accuracy, sensitivity, specificity, pos-108 itive predictive value, negative predictive value, F-measure 109 and accuracy [30], [31], [32], [33], [34]. The experimental 110 results for the identification of essential proteins show that 111 the node ranking method based on multiple layers is superior 112 to those based on a single layer. It has been illustrated that the 113 proposed method can help to identify essential proteins more 114 accurate.

II. METHODS
where µ i and σ i respectively are the mean and standard devi-129 ation of gene expression values of protein v i , and the param-130 eter k is not constant, which can be adjusted in the range 131 of (0, 3).

132
The activity a h i of protein v i in the time t h can be computed 133 by where a h i = 1 (i.e. the protein v i is active at time t h ) if its gene 136 expression value g h i is not less than the activity threshold τ i ; 137 otherwise 0.
For any edge (v i, v j ) in the SPIN, if there is a time point 144 t h ∈ T , such that both v i and v j are active at t h , then the edge 145 ( The layer activity of a node v i , denoted by l i , represents the 3) Accord to 1) and 2), deriving the subnetworks G 1 , 172 G 2 , . . . , G m from G S ; 173 4) Calculate the centrality value of each node in each sub-174 network by using a centrality method; 175 5) Calculate the layer activity of each node by formula (3), 176 and its centrality score for the M-DPIN by formula (4); 177 6) Sort centrality scores of all nodes in descending order, 178 then taking the top 100 -top 600 ranked nodes as essen-179 tial proteins, and finally calculating the identification 180 accuracy.

181
As shown in Table 1, we calculate the degree centrality 182 values (for the A-DPIN) and degree centrality scores (for 183 the M-DPIN) of proteins YGL004C, YDR335W, YER179W, 184 YPR181C, YJR093C and YKR002W. These protein nodes 185 have the same degree in the A-DPIN, but in fact, the degrees 186 of these nodes in different layers (subnetworks) usually are 187 different. Therefore, the centrality values in the A-DPIN can-188 not effectively reflect the variation existing in different layers 189 that is caused by the dynamics of gene expression. Never-190 theless, in the M-DPIN, the degree centrality scores of these 191 nodes tend to be distinguishable. This is because they are 192 obtained by averaging the centrality values in different layers 193 where they are active.  cycles ration (CR) [29], local interaction density centrality 223 (LID) [30], topological centrality (TP) [31], cluster coeffi-224 cient centrality (ClusterC) [32] and maximum neighbor com-225 ponent centrality (MNC) [33], which are defined respectively 226 as follows:  It can be seen from Figure 1 that compared with the SPIN 270 and A-DPIN, the M-DPIN can improve the number of essen-271 tial proteins identification of these centrality methods, espe-272 cially for the path-based and degree-based centrality methods 273 (such as BC, CC and DC). This indicates that the node rank-274 ing method based on multiple layers is superior to the existing 275 methods based on the single layer.  Table 2, among the ten centrality methods,     Table 3.

338
It can be seen from Table 3, we can see that the sensi-  In Figure 3, the green line is always straight, because the 354 SPIN is independent of k. So we use it only as a reference.

355
It is easy to find that in most cases, the curve of the number In order to further validate the effectiveness of our method, 364 we conducted experiments in the BioGRID [38] database, 365 which contains 5,616 proteins and 52,833 pairs of pro-366 tein interactions, and 1,199 are known to be essential pro-367 teins. We apply ten network-based centrality methods on the 368 M-DPIN and compare the results with these of the same 369 methods applied on the static network (SPIN) and the aggre-370 gated single-layer network (A-DPIN). The number of essen-371 tial proteins identified in SPIN, A-DPIN and M-DPIN is 372 counted as the Top 1%, Top 5%, Top 10%, Top 15%, Top 373 20% and Top 25%, as shown in Figure 4.

374
It can be seen from Figure 4, the number of essential 375 proteins at the Top 1%, Top 5%, Top 10%, Top 15%, Top 376 20% and Top 25% in M-DPIN is almost higher than SPIN 377 and A-DPIN, which indicates the effectiveness of our method 378 based on the multiple layers for identifying essential proteins. 379

380
The construction of the DPIN by integrating gene expression 381 data into the SPIN is a common method for the improve-382 ment of the identification accuracy of essential proteins. This 383 method usually needs to aggregate all layers of the DPIN into 384 a single-layer network, and then calculate centrality values 385 in the single-layer network by using a centrality method. 386 Since the different states in different layers of the DPIN 387 are aggregated into a state in the single-layer network, and 388 thus the dynamic information about the node (or the edge) 389 in different layers is lost in the single layer. This makes the 390 aggregated single-layer network (A-DPIN) has less accurate 391 than the M-DPIN, and hence affects the estimation accuracy 392 of centrality methods.

393
In this paper, we propose a node ranking method based on 394 multiple layers for the DPIN to address this problem. Dif-395 ferent from the existing node ranking methods, our method 396 calculates the centrality score of each node by averaging its 397 centrality values in all active layers, which ensures that the 398 time-specific biological knowledge inherent in each layer is 399 used effectively when calculating centrality scores. To eval-400 uate that the node ranking method based on multiple layers 401 for the DPIN is more suitable to be used in the identification 402 of essential proteins, we apply ten network-based essential 403 protein discovery methods on multiple layers of the DPIN 404 and compare the results with those on a single layer (i.e. 405 the SPIN and the aggregated single layer of the DPIN). The 406 experimental results for the identification of essential proteins 407 show that the node ranking method based on multiple layers 408 is superior to those based on a single layer. This shows our 409 method can more effectively utilize the dynamic information 410 of proteins than those methods based on single layer.

411
As future work, it would be interesting to evaluate the 412 importance of a node in the DPIN by utilizing dynamic infor-413 mation of each time-specific layer more effectively.