Identification of Bridging Centrality in Complex Networks

Bridging nodes are critical for maintaining information, material, and energy exchanges throughout a complex network. However, the importance of bridging nodes has often been ignored in previous studies, which have instead focused on hub nodes. Here, we propose a novel approach named Bridging Node Centrality (BNC) to identify bridging nodes. BNC is a method based on different levels of network paths, and it combines traffic flow and positional properties of nodes, which greatly diminishes the effect of node degree. The performance of BNC was tested in many synthetic and real-world networks including LFR benchmark networks, social networks, biological networks, collaboration networks, etc. By comparing with other methods, and the results indicated that whether based on accuracy or approximate accuracy, BNC could be accurate and robust all the time in different types of complex networks.


I. INTRODUCTION
A network node is an essential attribute of network topology. In a real-world network, different nodes play different roles or functions in order to control and maintain the complex system. Thus, investigating the property of the crucial nodes is very important to understand the topology and function of a complex network. For example, study the lethality and centrality in protein networks [1]. Crucial nodes of networks mainly including hub nodes and bridging nodes (Figure 1a). Until now, most research has focused on hub nodes, while little work has been done to identify bridging nodes.
A hub node plays an important role in network topology, and is normally located inside a module. In the last decade many methods have been proposed to identify hub nodes of networks. Several widely used methods utilize different centrality measures that are based on the concepts of degree, shortest pathway or position information. For example, degree centrality [2] is a metric based on the degrees of nodes, and is the simplest method to identify hub nodes. Betweenness centrality [3] is a metric that computes the The associate editor coordinating the review of this manuscript and approving it for publication was Sun Junwei. shortest path, and is used to quantify the capacity of a node as a bridge to connect any node pair in a network. K-core decomposition [4] is based on the positional information VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ of nodes, and evaluates the importance of a location of a node with its k-core scoring. In general, these centrality measures can be classified as global and local methods based on their topological properties. Information centrality [5], closeness centrality [6] and Eigenvector centrality [7]- [12] are global methods, while degree centrality, subgraph centrality [13] and clustering coefficient [14] are local methods. While little work has been done to identify bridging nodes. Though there are several methods for identifying hub nodes, bridging nodes in complex network are generally ignored. However, bridging nodes play an important role in maintaining information, material and energy exchanges of crossing modules throughout a network. Bridging nodes are partly overlapped with hub nodes, but there is still some distinct difference between them, such as their topological positions and their indices in the network. For a long time, few studies focus on identifying bridging nodes [15]- [18], such as bridging centrality proposed by Hwang et al [17], [18], which combined betweenness centrality and bridging coefficient together to predict bridging nodes. Although bridging centrality took into account both global and local properties, it is still dominated by the degrees of nodes which is similar to the centrality measures for hub nodes.
In this study, we developed a definition of bridging nodes, and proposed a novel ''Bridging Node Centrality'' (BNC) metric for identifying bridging nodes. Bridging Node Centrality combines the traffic and positional information of each node, and significantly diminishes the influence of node degree. Most of other centralities are based on a single factor, while the idea of BNC is similar to Bridging Centrality (BrC) [17], which also combines two factors. But the two factors of BNC are completely different with that of Brc (see the methods 2.2). We have tested the performance of BNC in many different types of networks, including LFR benchmark networks [19], social networks [20], [21], biological networks and collaboration networks [22], [23] etc., and found that in the priori networks, most of the bridging nodes identified by BNC are consistent with our expectations, while in the posterior networks, most of them are located in between modules and mediate module-module communication. The comparison of BNC with other centrality measures demonstrates its robustness and higher accuracy in different network topologies. Furthermore, we used BNC in the E.coli transcriptional regulation network, and found that all the bridging nodes identified by BNC are located between existing functional modules, based on network topology and biological functional annotations. Thus, BNC offers us an effective tool to predict crucial nodes that maintain material and energy exchanges between modules in a network.

II. METHODS
Three steps are highlighted here to define and evaluate bridging nodes in a complex network. First, we propose a hierarchical structure of network pathways. Then, based on the hierarchical structure, a mathematic model is defined to identify bridging nodes. Finally, an effective metric was developed to assess the performance of the identification.

A. HIERARCHICAL ELEMENTS OF NETWORK PATHWAY
Some basic concepts of pathways are elaborated in the study based on the shortest pathways in the network. The network pathways are divided into five levels. The first level is the shortest paths, the second level is routes, next is trunk routes, then bridges, and the last is bridging nodes.
(1) Shortest path (level one): A path with the minimal distance between two nodes in a network.
(2) Route (level two): The shortest path between a pair of nodes excluding the start and ending vertices, whose distance is larger than 1. For example, the colored paths in Figure 1b are the routes in the network.
(3) Trunk route (level three): A kind of route that explains most of the information-exchange within a complex network. The trunk routes should be a subset of routes, for example, those colored solid lines in Figure 1b

B. BRIDGING NODE CENTRALITY
(1) Route-Betweenness (BeR): For each node i, its route-betweenness is the sum of weights of routes that pass through this node.
where d i is the degree of node i, δ j is a Dirichlet function, and N is the number of routes.
and ω j is the weight of route j, where P j is the probability of information flow through route j, and L j is the length of route j.
(2) Bridgeness-Coefficient (BrCoe): the bridgenesscoefficient of node i is the reciprocal of sums of the distance from it to all other nodes excluding its neighbors and indirectly adjacent nodes (dis ik = 1, 2).
where n is the number of nodes, and dis ik is the distance between nodes i and k. The importance of node position is measured by bridgeness-coefficient based on the shortest pathway (at least longer than two). It is an important indicator for evaluating node location centrality.
(3) Bridging Node Centrality (BNC): BNC is a novel metric to identify bridging nodes, and it is defined as the product of normalized route-betweenness shown in the formula (1) and (2) and bridgeness-coefficient in formula (3). and Thus, based on the formula (4), (5) and (6), the bridging node centrality score can be calculated for each node, and bridging nodes can be identified based on a selected threshold. The specific steps are shown in figure 1d.

C. ALGORITHM PROCEDURES
As shown in figure 2, the main procedures of BNC algorithm are as follows: Step1: Input the adjacency list of the complex network; Step2: Calculating dis ik and BrCoe for each node; Step3: Find all the routes and calculating corresponding weights for each node; Step4: Calculating BeR for each node; Step5: Calculating BNC for each node; Step6: Output bridging nodes based on the threshold.

D. EVALUATION METRICS
In the priori networks, accuracy can be calculated based on known bridging nodes directly. However, in the posterior networks, due to the difficulty of identifying bridging nodes, it is hard to find an effective metric to evaluate the performance of the measures for identifying bridging nodes. In order to solve this problem, we proposed a metric called approximation accuracy, to assess these different measures. The approximation accuracy is an approximate value of true accuracy.
The formula for the accuracy and approximation accuracy are showed as follows Accuracy = N p N r (7) where N p denotes the number of predicted bridging nodes, and N r denotes the number of real bridging nodes.
Accuracy app = N p N e (8) where N p denotes the number of predicted bridging nodes, and N e denotes the number of estimated bridging nodes, and Accuracy app denotes the approximation accuracy.
That is to say, we took the common bridges nodes as an estimation of the real bridges ones, and then calculate approximation accuracy based on common bridges nodes. Approximation accuracy can be used to evaluate the performance of different measures in the posteriori networks instead of accuracy analysis.

III. RESULTS
We proposed an index named bridging node centrality (BNC) to quantify the significance of a node in maintaining the connectivity of the whole network. BNC is determined by multiple network properties, but here we only consider two main factors: one is the traffic flow of each node, and the other is its topology position. That is, we combined two main factors including road betweenness and bridging coefficient together, and calculated the BNC score for each node, and then took those nodes with score larger than a particular threshold as predicted bridging nodes. The BNC index can diminish the effect of high degrees significantly, and improve the prediction accuracy of bridging nodes. To evaluate BNC in empirical networks, we compared its performance with seven centrality measures: Degree centrality (DC) [2], Betweenness Centrality (BC) [3], Closeness Centrality (CC) [6], K-Core decompose (KC) [4], Eigenvalue Centrality (EC) [7]- [12], Bridging Centrality (BrC) [17], [18] and Information Centrality (IC) [5] To further evaluate the performance of BNC in the identification of bridging nodes, six priori networks were used and analyzed, including two random synthetic networks (LFR benchmark networks [29]) and four empirical real-world networks: karate (Zachary [30]), dolphins (Lusseau,[31]), polbooks (Newman 32]) and jazz [33].

A. PERFORMANCE ANALYSIS ON SYNTHETIC NETWORKS
Two typical synthetic networks were shown as representative networks with or without overlapping modules. The first synthetic network (bench_o) consists of 256 nodes (Figure 3a) that belong to three modules without any overlapping nodes. All 16 bridging nodes identified by BNC are accurate, and thus the accuracy of prediction is 100% in this network (Figure 3b). Another benchmark network consists of 128 nodes that separate into four overlapping modules (Figure 3c). In total, there are eight overlapping nodes among four different modules in this network. As shown in figure 3c, 11 out of 12 bridging nodes identified by BNC are correct, the accuracy is 91.7% in this network. Among all the 12 nodes, five of them are overlapping nodes among modules. Furthermore, we compared the accuracy of BNC with other seven centrality measures. As shown in figure 3b and 3d, BNC had the best performance in all synthetic networks with or without overlapping modules, which indicates that BNC is an effective metric to identify bridging nodes in synthetic networks.

B. PERFORMANCE ANALYSIS ON REAL-WORLD NETWORKS
The Karate network is a classical real-world benchmark network, and it is split into two factions for the conflict between the club manager and coach. As shown in Figure 4a, there are 10 bridges, including 13 bridging nodes. By checking the top 13 bridging nodes identified by BNC, we find that 11 out of 13 nodes are correct. In addition, the top two nodes with the highest centrality identified by other measures are nodes 1 and 34, which indicates that these measures are dominated by the topology property of node degree. By contrast, the top n 1/2 bridging nodes identified by BNC are nodes 32, 3, 20, 9, 14, 1 in turn, and they are all real bridging nodes, indicating that BNC greatly diminish the effect of node degree.
Just like the Karate network, similar results were observed on the dolphins, polbooks and Jazz networks. The dolphins network consists of two larger groups, and the two groups communicate with each other through several individuals (bridging nodes). Figure 4b shows the eight bridging nodes identified by BNC in this network, and 6 of them are real ones.  The polbooks network has three kinds of books: liberty, conservative and neutral. Here we only took liberty and conservative books into account. BNC identified 7 real bridging nodes out of the top 10 predicted nodes (Figure 4c). Similarly, between the two major modules of the Jazz network, BNC identified 13 real bridging nodes out of the top 14 predicted nodes. To better evaluate the performance and robustness, we further compared the results of BNC with seven other centrality measures. We found that BNC has the highest accuracy ( Figure 5) in all these empirical real-world networks, indicating that BNC is a powerful measure to identify bridging nodes in real-world networks.

C. APPROXIMATE EVALUATION
For the priori networks, we can analyze the accuracy of different measures as showed above. However, in the real world, most networks are posteriori ones, and it is hard to know their true topological structure, which means the accuracy indicator is invalid. As a result, the approximate accuracy (see Methods) is proposed to replace the accuracy as an evaluation indicator.
The approximation accuracy is an effective approximation to the true accuracy. With the approximate accuracy indicator, we compared BNC with other centrality measures in both the priori and posteriori networks. As Table 1 shows, 17 networks were analyzed in total, including 11 priori networks and 6 posteriori networks. We compare the approximate accuracy with the accuracy based on the priori networks shown above. More testing results were shown in the supplementary materials.

1) VALIDITY OF APPROXIMATE ACCURACY BASED ON PRIORI NETWORKS
In the synthetic networks shown above, the ranking of approximation accuracy of different measures fits well with the corresponding accuracy. For those networks with overlapping modules, by comparing the figure 6a with figure 3b, we found that except for the approximation accuracy of IC, which is abnormally higher than corresponding accuracy, the ranking of approximation accuracy of other measures coincides with that of corresponding accuracy. For those networks without overlapping modules, by comparing figure 6b with figure 3d, we found that besides the approximation accuracy of BrC, which performs worse than corresponding accuracy, the ranking of approximation accuracy of other measures consistent with that of corresponding accuracy. We obtain similar results (see supplementary materials) from other synthetic networks (see Table 1), which means that  approximate accuracy is an effective tool to test the performance of different methods in synthetic networks. The results show that whether it is accuracy or approximation accuracy, BNC always performs well to identify bridging nodes in synthetic networks.
In the four real-world networks shown above, the ranking of approximation accuracy of different measures is similar to that of the corresponding accuracy as well. For the karate network, by comparing the figure 7a with figure 5, we found that except for the fact that the approximation accuracy of BC and BrC are a little worse than their corresponding accuracy, the ranking of approximation accuracy of other measures is in agreement with that of corresponding accuracy.
For the dolphins network (figure 7b), polbooks network (figure 7c) and jazz network (figure 7d), we got similar results that the ranking of approximation accuracy of most measures is agreement with that of corresponding accuracy.  But by comparing these figures with the figure 5, some abnormal fluctuations can be observed between accuracy and corresponding approximation accuracy of some measures. Such as, the approximation accuracy of DC, CC and EC are abnormally higher than corresponding accuracy in the dolphins network, the approximation accuracy of CC and BrC are worse than corresponding accuracy in the polbooks network, and the approximation accuracy of BC is abnormally higher than corresponding accuracy in the jazz network. Similar results (see supplementary materials) could be observed from other real-world networks, which proved that approximate accuracy is an effective tool to test the performance of different methods in real-world networks. In total, whether it is evaluated with accuracy or approximation accuracy, BNC always performs well in real-world networks.

2) PERFORMANCE ANALYSIS IN POSTERIORI NETWORK
The aim of developing BNC is to precisely identify bridging nodes in complex networks, especially biological networks. Therefore, we use the transcriptional regulation network of E. Coli as an example to show the usability of BNC. The E. Coli transcriptional regulation network [34] consists of the vertices representing operons and the edges representing the regulation of a transcription factor to an operon. As shown in Figure 8, 21 modules were detected in this network with NeTA method [22]. The functional annotation with DAVID [35], [36] indicated that all 21 modules have significant biological functions. All bridging nodes identified by BNC are located between functional modules that seems to intermediate their communication (Figure 8). For example, node 155 is a bridging node connecting the pink module and light cyan module. The genes of pink module are enrichment with cellular macromolecule metabolic process (BP: 1.80E-33), while the genes of light cyan module are enrichment with xenobiotic metabolic process (BP: 9.1E-6), which indicates that the importance of node 155 in connecting these two functional modules. This shows that BNC is an effective metric to identify bridging nodes based on the approximate accuracy (see supplementary materials). Similar results can be got in other posteriori networks (see supplementary materials), which means BNC always performs well whether in a priori or a posteriori network.

D. COMPLEXITY ANALYSIS
BNC consists of two factors: BeR and BrCoe. The calculating of BeR cost time complexity is O(n 3 + m + mn), and BrCoe cost time is O(n 2 +n), where n is the number of nodes, and m is the number of routes in this complex network. Thus, the total time complexity of BNC is O(n(n 2 + n + 2) + m(n − 1)). Obviously, it is almost the same as BC, KC and BrC in which time complexity is O(n 3 ). Among these measures, DC has the lowest only O(n) time complexity, while in most cases it gets poor results.

IV. DISCUSSION AND CONCLUSION
Crucial nodes play important roles in the network [42]- [48]. For example, the failure of a hub node often leads to the functional failure of a local module, and the cascading failure of bridging nodes often leads to the communication failure of the whole network. That is to say, bridging nodes are crucial nodes to maintain the integrity of a network. The failure of bridging nodes could be much more disruptive than expected. Although bridging nodes and hub nodes are both crucial nodes of networks and overlap with each other to a certain extent, significant differences between them exist. Hub nodes are always located inside a module, while bridging nodes are normally located between modules. Moreover, bridging nodes are important for traffic flow, while hub nodes maybe not be.
Unlike hub nodes, little work has focused on identifying bridging nodes. The reason is that bridging nodes are harder to identify and evaluate than hub nodes. In this paper, we defined hierarchical elements of network pathways, and proposed an effective model named ''Bridging Node Centrality'' to identify bridging nodes. We tested BNC on synthetic and real-world networks, including social networks, LFR benchmark networks, biological networks and collaboration networks. By comparing with seven other centrality measures, BNC always showed high robustness and good performance in all the networks we tested, whether it is based on the accuracy or approximation accuracy. Therefore, we conclude that BNC is an effective method to identify bridging nodes in complex network.
It is a hard problem to confirm the number of bridging node in a posterior network. The usual approach is to select a threshold, and take those nodes with score larger than this threshold as destination nodes. However, how to take a reasonable threshold is still a hard problem. By testing in a large number of networks, we observed that in general the number of bridging nodes is rapidly reduced below top n 1/2 (n is the number of nodes in a network) components. Therefore, the performance analysis of different measures is mainly focused on the top n 1/2 nodes in all above examples.
We have to figure out that the identification of bridging nodes has its own limitation. If there are too many bridges in a network, then it is nonsense to identify bridging nodes. For example, in the US football network almost each node has a link cross modules, which means most of nodes in this network are potential bridging nodes. So it is meaningless to explore bridging nodes in this network.