A Survey on Centrality Metrics and Their Network Resilience Analysis

Centrality metrics have been studied in the network science research. They have been used in various networks, such as communication, social, biological, geographic, or contact networks under different disciplines. In particular, centrality metrics have been used in order to study and analyze targeted attack behaviors and investigated their effect on network resilience. Although a rich volume of centrality metrics has been developed from 1940s, only some centrality metrics (e.g., degree, betweenness, or cluster coefficient) have been commonly in use. This paper aims to introduce various existing centrality metrics and discusses their applicabilities in various networks. In addition, we conducted extensive simulation study in order to demonstrate and analyze the network resilience of targeted attacks using the surveyed centrality metrics under four real network topologies. We also discussed algorithmic complexity of centrality metrics surveyed in this work. Through the extensive experiments and discussions of the surveyed centrality metrics, we encourage their use in solving various computing and engineering problems in networks.


Motivation
Identifying central nodes in a network is critical to designing a network that is resilient against faults or attacks. However, identifying which nodes are vital in a network is a nontrivial task. Centrality metrics have been studied since the 1940s and began being more formally incorporated into graph theory in the 1970s [60]. Although many of these early studies had particular applications and language in the social sciences, a more interdisciplinary approach emerged in the late 1990s and the early 2000s in the nomenclature of Network Science [144]. In the resilience context, there is an extensive literature studying the effect of targeted attacks, or attacks on nodes that have high centrality [10,136]. A typical scenario includes an intelligent attacker that selects a target node or nodes to disrupt or compromise the network. Since the 2000s, centrality metrics have grown in significance in communication networks as network resilience and cybersecurity concerns have become more prominent. The most common centrality metrics used in this area of research are degree and betweenness [81,193], but they are often used because they are popular and without justification of their relevance to the particular scenario. Other studies have used other metrics [2,97,137], such as eigenvector, closeness, pagerank, and so forth. However, given the rich volume of existing centrality metrics that have been studied in other scientific fields for decades, their merits and relevant usages have been insufficiently appreciated and leveraged in various communication and network domains. In this survey, we aim to present this rich volume of centrality metrics and how they can be used and useful in various network and communication research. In addition, we demonstrate the performance of each centrality metric in terms of the one metric is more relevant than others based on a comparative performance analysis using four different real network topologies. Due to the space constraint, we placed these experimental results in the supplement document.
• Based on the extensive survey and experimental performance comparison of the centrality metrics, we share what we have learned, providing both insights, limitations as well as promising future research directions.

MULTIDIMENSIONAL CONCEPTS OF CENTRALITY AND ITS APPLICATIONS IN DIVERSE DOMAINS
The multi-disciplinary development of concepts of node or network centrality has generated multifaceted interpretations of the subject. In this section, we discuss how centrality has been described and applied in several different disciplines.

Multidimensional Concepts of Centrality
A fundamental motivation for the study of centrality is the belief that one's position in the network impacts their access to information [113,175], status [93], power [17], prestige [159], and influence [63]. We categorize these concepts into three classes as follows: (1) communication activity based on individual characteristics; (2) influence based on both individual and network characteristics; and (3) communication control based primarily on network characteristics. Individual characteristics refer to the way an individual node (i.e., user) interacts with other nodes such as the frequency of interactions (e.g., posting or sending information in online social networks, OSNs, or sending signals or packets in communication networks), the degree of information sharing with others, and the quality of the signals (e.g., posted comments). Network characteristics predominantly indicate the manner in which the node is connected with other nodes; it is these characteristics which can be captured by centrality. Communication Activity. This aspect of centrality covers the amount and type of activity an individual node participates in as part of its communications with other nodes. The relative activity, compared with other nodes, can ultimately affect its power or influence. Klein et al. [104] demonstrated a connection between the communication activity and the influence of a user in an OSN. In OSNs, influential users tend to more easily spread information they choose to communicate. However, such well-connected users are less likely to disseminate information received from their extensive network. Hence, this characteristic in terms of frequency or type of interactions of information sharing is a critical factor related to centrality [7].
Influence. The term influence has been used to interpret what centrality may represent in networks. In addition, a number of terms are used to characterize and study the 'influence' of a node as follows: • Power: Friedkin [63] examined the relationships between network centrality and the mutual influence of members in a group. An individual member's centrality affects other members' opinions and informs a dynamic process of updating their opinions. • Status: Katz [93] proposed the idea that a member's centrality within a network depends upon not only the number of adjacent neighbors but also the status of each neighbor, i.e., the highest-status member who obtains the majority of choices in a network becomes the most influential. Katz introduced an advanced metric to calculate the status of each member in a network based on the total number of choices, implying the edges in a directed graph, toward each member via a single step up to multiple steps that entail attenuation in a connection of a series [161]. • Prestige: Bonacich [17] and Katz [93] defined a vertex's prestige in a network based on its neighbors. For example, eigenvector centrality is used to derive the prestige of each vertex [159].
• Resources: How much resource one can obtain from their network has been discussed within the context of an exchange network [36]. In an exchange network, consisting of a set of members exchanging opportunities, each member needs to decide whether to connect with others to increase their opportunities or resources even when unaware of members outside of its own set of exchange opportunities [51]. This feature facilitates the analysis of the power distribution as related to the position in the network [36,51]. In exchange networks, a node's power is not necessarily aligned with the number of connections [36] while most centrality metrics that are more relevant to quick spreading or mitigating influence (e.g., information diffusion or disease transmission) are more reliant on the number of direct or indirect connections with other nodes. Bonacich [17] reflected this belief in his eigenvector-type centrality where a node's power is measured based on the power of its neighbors. Laumann and Pappi [111] discussed community elite, a set of necessary members in exchange networks in which their position and other attributes determine the structure of influence. • Bridging: Saito et al. [162] introduced the concept of super-mediators as the set of nodes that transfer information between nodes. The capability of a certain node to receive information from numerous nodes and propagate this information to others indicates the their influence [113,175]. Betweenness metrics [59,136] is an example representing a bridging role in a network where the node with high betweenness can connect other nodes as a key mediator. This concept of a broker in sociology is commonly described as a node with high betweenness that can play a key role in bridging two separate groups [136].
Communication Control. A node's communication control describes how the node can control communications with others, which can naturally affect the node's centrality. The common two factors affecting this communication control are [27,113]: • Commnicability: With respect to group performance and individual behavioral patterns, Leavitt [113] stressed the importance of a network topology because it determines information accessibility that can affect successful task executions. • Network size: A network can be viewed as resources as each individual gathers information via connections within networks [27]. A node's network size is a typical measure of the node's centrality in terms of the resources available to it, including both the quality and the quantity of information in its network [125].

Centrality Metrics Research in Multidisciplinary Domains
Centrality metrics have been studied since the 1940s. Even in the late 1970s, there exists a rich volume of studies discussing and experimenting with centrality [60]. Hence, in the late 1970s, Freeman [60] tried to clarify the concepts and utility of existing metrics. In this section, we have surveyed how centrality has been studied in various disciplines, including mathematics, chemistry, anthropology, geography, economics, psychology, sociology, biology, management, computer science, political science, and psychiatry. Due to the space constraint, our discussions on this are placed in Section 1 of the supplement document. As a highly multidisciplinary academic field, we discuss how 'Network Science' has studied centrality from a graph theoretical perspective. Fig. 1 summarizes the evolution of centrality across diverse disciplines along with the emergence of the Network Science discipline. The origin of developing centrality metrics is linked with the birth of graph theory [44]. Although many fields have used centrality metrics for a variety of purposes, high visibility of the usefulness of these metrics has been much increased as the Network Science field has officially formed in 2000s. In particular, in 2006, US National Research Council (NRC) defined Network Science as an academic field [144]. In 2009, The Department of Defense (DoD) initiated a research effort on Network Science for developing battlefield platforms with advanced technology reflecting the theme of network-centric warfare. The US Army Research Laboratory (US-ARL) initiated a collaborative research program, the Network Science Collaborative Technology Alliance (NS CTA), in order to encourage the development of advanced network sciencebased technologies to support ground soldiers in network-centric warfare [156], which has further triggered the advancement and maturity of network science research. Now we discuss a variety of centrality metrics in various disciplines. We categorize the types of centrality metrics into three classes: point centrality metrics, graph centrality metrics, and group selection centrality metrics. The following three sections address these three classes of centrality metrics.

POINT CENTRALITY METRICS
In this section, we introduce various types of centrality metrics using the following common notations. A network is represented by a graph G with a set of vertices, V = { 1 , 2 , . . . , } representing nodes and a set of edges, E = { 1 , 2 , . . . , } representing connections (links or relationhships) between pairs of nodes, where is the number of nodes in the network and is the number of edges [19]. An adjacency matrix A captures the links between nodes by the value in its entries, e.g., ≠ 0 only when an edge exists between nodes and with the value 1 in a simple, undirected graph. We classify point centrality metrics in terms of three classes: local centrality metrics, iterative centrality metrics, and global centrality metrics.

Local Centrality Metrics
Local centrality metrics measure the centrality of a node based on its local neighborhood topology. Each of these metrics are variations of the degree of a node, sometimes in combinations with the degree of nodes in the local neighborhood.
Degree centrality. The simplest and most well-known centrality metric is the node degree or the number of links or edges incident to the node. The degree of vertex is defined, mathematically, by: In the social network context, the degree indicates amount of activity of the actor [60,184]. Hanneman and Riddle [76] describe the degree as measuring the opportunity and alternatives for the actor. In social, communication, and computer networks, degree represents a measure of the number of channels for information exchange (i.e., sending and receiving data) [24,136]. A standardization or normalization of degree is given by deg ( )/( − 1). This form is useful for comparison across networks [60]. Nodes with high degree are called hubs. In a directed network, the in-degree and the out-degree of the node may be unequal, so the adjacency matrix is not symmetric. For in-degree, in-deg ( ) = # edges directed toward = =1 , with the out-degree defined analogously based on nonzero entries in the th row of the adjacency matrix. A node with significantly higher in-degree than out-degree or higher in-degree on average compared with other nodes is considered to have prestige [184]. A popular example exists in citation networks where the directed edges correspond to one document citing another. Documents with many citations have high in-degree. Other modified examples corresponding to in-degree in citations include the number of citations of a given author and journal impact factor [65].
Semi-Local centrality. While a hub node has immediate access to a large number of neighbors, the hub may exist on the periphery of the network where most of those neighbors have little to no access to the rest of the network. Hence, hubs may not be the ideal nodes for measuring influence, the capability of spreading (information or disease) with efficacy. Seeking a middle ground between hub nodes and nodes that have high betweenness (see Eq. (26)), Chen et al. [28] developed semi-local centrality, sometimes called local centrality, as a low-complexity approach that takes into account neighbor degrees of the node. This semi-local centrality of a node is defined as: where ( ) is the set of neighbors of and 2 ( ) is the number of nearest and next nearest neighbors of . This metric compares favorably to ranks generated from an SIR process.
Hybrid degree centrality. In the context of spreading processes, whether it be for information sharing or disease transmission, the spreading probability can determine the difference between the influence of the local and near-local neighborhood topology. A small would intuitively favor a measure like degree centrality, while a larger would favor a more global measure. Ma and Ma [122] incorporated the influence of the scale of into centrality by adapting degree centrality and semi-local centrality [28] to create the hybrid degree centrality of node , defined mathematically as: where is the spreading probability, deg is the degree centrality, m-local is the modified local centrality, is a normalizing factor to scale the degree centrality to the magnitude of the modified local centrality, and is an optimization parameter. 1 The modified local centrality is defined as where ( ) is the set of neighbors of , 2 ( ) is the number of nearest and next nearest neighbors of node , and ( ) = deg ( ) is the number of neighbors of .
Volume centrality. If the spreading process dies out, has a limited reach from its initial source, or has a time out component, then it makes sense that this might be entirely captured by the topology in the local neighborhood of the source node. Let ℎ ( ) denote the set of neighbors within a distance ℎ of . Then, the volume centrality of the node for a given ℎ is defined as [99]: This is actually a slight modification of the original definition [187] that uses the set˜ℎ ( ) = ℎ ( )∪ { }. With this latter definition, then when ℎ = 0, volume centrality is degree centrality. However, this is already captured when calculating the degrees of nodes in 1 ( ). Kim and Yoneki [99] showed that larger ℎ correlates well with closeness centrality (see Eq. (32)). However, as ℎ increases, the complexity of the method will increase. Hence, Wehmuth and Ziviani [187] demonstrated that ℎ = 2 results in a good trade-off between identifying nodes that diffuse information well and the cost of this identification.
Clustering coefficient. One of the characterizations of small-world networks is the increased likelihood of neighbors of a node to be connected. Social networks tend to exhibit this property and an early characterization of this high clustering property is the density of an ego network (i.e., as described by Burt [26], the network of the neighbors of a given node excluding that node). Watts and Strogatz [186] proposed the same metric independently as a way to quantify the clustering of nodes in a given graph and characterize the position of the graph within the spectrum of random to small-world graphs. Their definition has proven incredibly popular. It is expressed by: Note that each edge will be counted twice in an undirected graph in the summation and the number of such unique edges is normalized by | ( ) | 2 , which is the number of possible edges between the neighbors of . For a directed network, there are twice as many possible directed edges as the undirected case since the adjacency matrix is no longer symmetric, i.e., may not equal and the set out ( ) of neighbors links to is used. This measure is often called the local clustering coefficient to distinguish it from a global measure of transitivity.
Redundancy. Burt [26] introduced the notion of redundancy in social networks to describe the concept of neighborhood overlap of a node and its neighbors within the node's ego network. Burt demonstrated redundancy's detriment to social capital within socio-economic networks. This is defined as: where = + ∈ ( ) + and = + max ∈ ( )∩ ( ) + . Burt uses redundancy to calculate the effective size (or degree) of a node's ego (or neighborhood) network taking redundancy into account as − redundancy ( ). Borgatti [18] reformulated these expressions to show that for a simple undirected graph, the redundancy is simply redundancy ( ) = 2 / degree ( ), where is the number of links between the neighbors of , and the effective size of is degree ( ) − 2 / degree ( ).
Entropy-based measures. In the thermodynamics context, entropy is a measure of the order of systems. In the information theory context, entropy measures the amount of information absent in a given process. These concepts of entropy have been used in networks, either in characterizing systems or processes [4]. Nie et al. [140] adapted the concept of entropy to centrality. They constructed two variants to measure the entropy, local entropy as the node's contribution to network entropy and mapping entropy to incorporate a consideration of the neighbors of the node, defined by: ClusterRank. As noted with the redundancy measure, high clustering can have an adverse effect on information propagation or spreading. With this insight, Chen et al. [29] proposed ClusterRank, incorporating both the degree as well as the interactions among the neighbors via the clustering coefficient [186]. The ClusterRank of node is defined as: where Chen et al. choose ( clustering ( )) = 10 − clustering ( ) , out ( ) is the set of directed edges emanating from (i.e., the "followers" of ), and clustering ( ) is the local clustering coefficient defined for directed networks. The summation also adds the degree of the node in the unity term. The coefficient acts as a damping weight where higher clustering is penalized for having fewer unique links to different parts of the network. This damping weight is mitigated if many of the neighbors of have large numbers of additional neighbors.
H-index. Hirsch [78] introduced the h-index to measure the impact of the scientific output of a researcher. A researcher has index ℎ if ℎ is the largest integer ℓ such that the researcher has at least ℓ papers each having at least ℓ citations. Korn et al. [108] adapted ℎ-index (calling it the lobby index) to discover important nodes in networks. A node has index ℎ if the node has at least ℎ neighbors, each having at least degree ℎ, with the rest of the neighbors having at most degree ℎ. Extending this concept, Lü et al. [120] defined the H operator that, for any node , takes the degrees of the set of its neighbors as an input and returns the maximum number ℎ such that ℎ inputs have value at least ℎ. This can be expressed as: where 1 , 2 , . . . , ( ) are the neighbors of . If the zero-order ℎ-index of node is its degree, i.e., ℎ (0) ( ) = ( ), then the value in Eq. (9) can be called the first-order ℎ-index. Then the -order ℎindex is defined as ℎ ( ) ( ) = H ℎ ( −1) ( 1 ), ℎ ( −1) ( 2 ), . . . , ℎ ( −1) ( ( ) ) ; this sequence converges to the coreness as the order increases, i.e., = →∞ ℎ ( ) ( ).
Curvature. The success of hyperbolic models for networks [109] in reproducing observations from real networks has spurred some interest in measuring the intrinsic geometry of complex networks. Curvature in networks is a particularly interesting aspect to measure since the models typically presume a constant curvature but the reality (and data) is rarely that convenient. There are several competing approaches for curvature. One early measure by Eckmann and Moses [50] derives a curvature that is identical to the local clustering coefficient of Watts and Strogatz [186] and is used to reveal a connection between high curvature and common topics in the World Wide Web. A popular approach is derived from a Gaussian curvature on planar graphs [77,94], that has been generalized for complex networks [107] as: where is the number of -cliques incident to . A truncated version of this is used in [189] to compare a network model with data. A third approach of recent interest adapts a notion of Ricci curvature to networks via the transfer of a mass distribution from one vertex to another, and hence can be defined on an edge [88,145]. The curvature at a vertex is then a weighted sum of the curvature of the incident edges, Ricci-curv ( ) = 1 ∈ ( ) ( , ), where ( , ) = 1 − ( , ) and ( , ) is the optimal mass transport cost and the mass is typically a unit weight distributed proportionally by an edge weight to the neighbors of the vertices. Curvature has been shown to have relevance to network fragility [165] and network congestion [181]. An alternate adaptation of Ricci curvature [58,174] has also received some interest.

Iterative Centrality Metrics
Iterative centrality metrics rely on iterative processes to calculate. In some cases, the number of iterations is fixed and determined by a characteristic of the network (e.g., maximum degree), and these metrics still incorporate mostly local information of the network. However, in most cases, the number of iterations depends on the convergence rate of values at each node. Global information is incorporated into the metric at the node via these iterative processes.
-shell index or coreness. The most efficient spreaders have been found to reside in the core of the network [103], which can be determined by the process of assigning each node an index (or a positive integer) value derived from the -shell decomposition. The decomposition and assignment are as follows: Nodes with degree = 1 are successively removed from the network until all remaining nodes have degree strictly greater than 1. All the removed nodes at this stage are assigned to be part of the -shell of the network with index S = 1 or the 1-shell. This is repeated with the increment of to assign each node to distinct -shells. The -shell of node is: where is the maximal subgraph of with all nodes having degree at least in . The coreness and -shell of networks have been used to characterize network structure, determine network degeneracy, and identify clusters [184].
Mixed degree decomposition. -shell decomposition methods ignore differences in the degree of nodes within the same shell. Zeng and Zhang [194] developed a mixed degree decomposition that retains elements of the degree mixed with the -shell index; for node , this is given by: where each node starts with mixed degree equal to the residual degree ( ) ( ) (i.e., the -shell index) and the nodes with smallest mixed degrees ( ) are removed and assigned to the -shell. Via Eq. (12), the mixed degrees of the remaining nodes are updated by the current residual degree ( ) ( ) and the exhausted degree ( ) ( ) (i.e., removed edges from due to the nodes in the -shell) and nodes with updated mixed degree not larger than are also removed and assigned to the -shell. This is repeated iteratively for the next smallest remaining mixed degree to determine each node's mixed degree. When = 0, then mixed degree is simply the -shell index; on the other hand, when = 1, then mixed degree is simply the degree.
Neighborhood coreness. This metric adapts the notion of -shell (or -core) of vertices that, although linked to efficient spreaders in networks [103], lacks sufficient diversification for ranking. The -shell is a maximal connected subgraph where all vertex degrees are at least . The core of a network consists of nodes with high -shell index. Unfortunately, many nodes have the same -shell index. Bae and Kim [6] introduce more diversity by considering the -shell of neighbors. The neighborhood coreness and the extended neighborhood coreness are defined as: These metrics introduce a more distinguishable monotonicity than using the -shell.
Eigenvector centrality. This metric is occasionally called Bonacich's degree centrality [16,17,76]. Bonacich supported a claim of Cook [36] that centrality is not the same as power and a node with high centrality (e.g., degree) is not necessarily powerful or influential. Accordingly, Bonacich developed an eigenvector centrality, which incorporates notions of both centrality and power, where a node's centrality is determined from its direct connections with other nodes and its power is from the centralities of these neighbors directly and other nodes in the network indirectly. The eigenvector centrality of node is defined as [16]: where is an entry of the adjacency matrix A and is an eigenvalue associated with the eigenvector. 2 . Note, the second equality makes clear that the ranking of centralities is determined by the eigenvector of the adjacency matrix.
Katz centrality. Katz [93] proposed a new status measure by considering the number of direct connections to a node and the statuses of nodes connected to the node. Katz centrality is well-defined in vector notation [136] as: where is a weight that determines the relative influence of the centrality of the node's neighbors to other nodes in the network by their distances and is a 'free part' representing a constant extra credit all nodes receive. This can be reformulated with = 1 as C katz ( ) = (I − A) −1 1. Newman [136] indicates that Katz centrality resolves a problem of zero-valued eigenvector centrality of nodes not in strongly connected components of directed graphs.
Authority and Hub centralities. For a directed network, the in-degree of node alone does not provide any notion of the relevant nodes to node . Moreover, the out-degree of node does not provide any notion of the important nodes to node . K. [89] introduced an iterative process in the context of hyperlinked web pages to determine which pages are authoritative and which pages are hubs to authoritative pages to assist in web search queries. In this process, each page is assigned two non-negative weights, one corresponding to its relevance as an authority and another corresponding to its relevance as a hub . Each set of weights are normalized so that the sum of their squares is unity, i.e., 2 = 1 and 2 = 1. The update process is given by ← :( , ) ∈ and ← :( , ) ∈ subject to the normalization invariance. A page's authority depends on the hub weights of the pages linking to it. Similarly, a page's hub weight is determined by the authority weights of the pages it links to. In matrix terms, where x and y are vector collections of the authority and hub weights of the nodes, respectively, then the update equations can be expressed as ← /( ) and ← /( ). Some simple linear algebra can be used to show that these converge to the principle eigenvectors of the matrices and , respectively, provided the initial weights in the process are not orthogonal to the principle eigenvectors. Thus, the authority and hub centrality of the node is given by: where 1 (·) denotes the principle eigenvector. Kleinberg proposed stopping the process after 10, 000 iterations, as convergence may be slow for large networks.
PageRank. PageRank is a modern-day variant of Katz centrality that was developed by Brin and Page [25], the founders of Google. PageRank measures the importance of websites by the number of links to the website, and is defined by [136] pagerank ( , , ) = where out-deg ( ) refers to the out-degree of node . The interpretations of and are similar to the ones described for the Katz centrality in that is a weight damping the influence of nodes further away from , while represents a weight for free part or credit that each node receives. The key difference is the relative weighting of links to by the out degree of the nodes linking to . In vector form, page rank can be expressed, with = 1, as C pagerank ( , Contribution centrality. Alvarez-Socorro et al. [3] refined the eigenvector centrality to account the similarity of the neighbors that link to a node. The concept presumes that nodes with greater dissimilarity, in the sense of Jaccard [9], should have a greater contribution weight than more similar nodes. Dissimilar nodes may provide different information than similar nodes. This contribution centrality is given by: where , = , , is the contribution of node to node , A is the adjacency matrix, , = 1 − | ( )∩ ( ) | | ( )∪ ( ) | is a dissimilarity coefficient, and ( ) refers to a set of 's neighbors. This measure can also be considered as the eigenvector centrality of a weighted network, where the weights are informed by the structural dissimilarity coefficient. The weighted adjacency matrix can be expressed as W = A D, where is the Hadamard or element-wise product. The in Eq. (18) is the maximum eigenvalue of W.
Diffusion centrality. This metric approximates communication centrality (i.e., a fraction of the number of participating nodes, such as in buyers of a product, after being informed over the total number of informed nodes), and is given in vector form by [8]: where A is the adjacency matrix, 1 is a vector of ones, is the passing probability, and is the number of iterations. The diffusion centrality of th node is the th entry. This centrality actually captures a number of different measures depending on the value of or the number of iterations of passing. When = 1, then diffusion centrality will be proportional to degree centrality. When → ∞, A is diagonalizable (this is always true for real symmetric matrices, thus true for undirected network adjacency matrices), and ≥ 1 (where is the maximum eigenvalue of A), then diffusion centrality is proportional to eigenvector centrality. But when < 1 , this is a type of Katz-Bonacich centrality.
Subgraph centrality. Subgraph centrality measures the weighted sum of the closed paths starting and ending at in the network, including both cyclic and acyclic paths, where the contribution or weight if each path in the sum decreases as the path length increases [55]. Thus, this metric measures the inclusion of the node in all connected subgraphs of the network but is characterized significantly by the inclusion of the node in motifs. Subgraph centrality is given by: where ( ) = (A ) , is the th eigenvalue of A and is its corresponding eigenvector ( is the th element of this vector). Inclusion in smaller subgraphs (closed walks) is given more significance due to the scaling, which is also necessary for convergence of the sum. The measure is useful to distinguish between nodes with equivalent values of degree centrality, betweenness, closeness, or eigenvector centrality. The authors conjecture that if the subgraph centrality is identical for all nodes, then these other measures will also be identical. Note that the average centrality of all the nodes is trivial to determine to be ⟨ subgraph ⟩ = 1
LeaderRank. Lü et al. [119] proposed LeaderRank to find prominent members, or leaders, and thereby rank them in terms of their influence, particularly in a social network context. Given a leadership network or a directed graph with leaders and fans, where a directed edge existing signifies the subscription from a fan to a leader, LeaderRank generates a supplemental network, created via the addition of ground node with bidirectional edges between all the nodes in the leadership network. This ensures a strongly connected graph with + 1 nodes and + 2 directed edges containing the subgraph of the original leadership network of nodes and directed edges. Each node, except the ground node, is assigned an initial unit score. In each unit of time or iteration, the current score of each node is equi-distributed to the neighbors the node is linked to, until equilibrium. The proportion of score allocated from node to node in one unit of time is / out-deg ( ), where A is the adjacency matrix so = 1 if points to (is a fan of) and = 0 otherwise. At time , the amount of score allocated at node is ( where (0) = 1 for all non-ground nodes and (0) = 0. At the equilibrium time , the score of the ground node is equi-distributed to the other nodes, which ensures no loss of value in the distribution scheme for the leadership network. Hence, the final LeaderRank score of node is: Dynamical influence. Klemm et al. [106] proposed the concept of dynamic influence as a centrality measure that can quantify the influence of a node's dynamic state on the collective system behavior based on the interplay between dynamics and structure in complex networks. Given systems with time-dependent real variables, x = [ 1 , · · · , ] associated with linear dynamics denoted by × real matrix, M, we have the update function x = Mx. The largest eigenvalue max for M is considered to obtain a first classification of dynamics. When max is negative, x( ) converges to a null vector as a stable, fixed solution. When max is positive, x( ) will grow indefinitely from the initial state x(0). Assuming that there exists a non-degenerate max for M, we define a scalar product = c · x as a conserved quality where c is a left eigenvector of M for max governed by When the conserved quality exists, the final state can be calculated from the initial state x(0) by: where e refers to a right eigenvector of M for max . The above equation means that C dynamic-influence is projected based on x(0) where represents the effect of x(0) on the final state x(∞).
Cumulative nomination. Poulin et al. [154] introduced cumulative nomination whereby the reputation of a node is derived from the nominations of its neighbors and, hence, a node located at the center of the network is nominated more frequently than a node located on the periphery. Initially, a unit of nomination is provided to each node in the network. Then for each nomination round or iteration, the nomination value of each node is updated as the sum of the nominations from its neighbors, i.e., for node , It is convenient to normalize this process at each step: . At equilibrium, the cumulative nomination of node is given by: This metric is analogous to the one proposed by Bonacich [16], but it is empirically proven to be faster in convergence to the steady state [154].
SALSA. Lempel and Moran [115] developed a Stochastic Approach for Link Structure Analysis (or SALSA) as an alternative to the hubs and authorities approach of K. [89] for web links. The given directed graph G is converted into an undirected bipartite graph˜between a hub side ℎ and an authority side . Each node in G is represented by two nodes, one on the hub side ℎ and one on the authority side . Each directed edge from to in G is represented by an undirected edge between ℎ and in˜. Two random walks, starting from either side of˜, of path length two, construct Markov chains that reveal a ranking of nodes as hubs and authorities in the network. The transition matrices of these Markov chains can be defined by a hub matrix H, with element entries˜, = ∈ G | ( ℎ , ),( ℎ , ) ∈˜1 deg( ℎ ) · 1 deg( ) , and an authority matrixÃ, with entries˜, where the degree is in˜. The updates for these transition matrices are h =Hh −1 and a =Ãa −1 , where the initial value assigned for each node is 1. As with the mutual reinforcement approach of Kleinberg's hubs and authorities, the principal eigenvectors of the transition matrices are the convergent points of the iterations, i.e., SALSA-hub where 1 (·) denotes the principle eigenvector.

Global Centrality Metrics
Global centrality metrics require a measurement using possibly the entire network topology. These approaches involve the measurement of path lengths between nodes that are separated (nonadjacent) in the network. The calculations of shortest paths often do not scale well with network size; hence, these metrics are generally more computationally expensive.
Improved method. As observed in the prior subsection, the -shell method [103] does not discriminate between nodes within the same -shell, leading to approaches like mixed degree decomposition and neighborhood coreness. Liu et al. [117] introduced an improved method as an alternative approach to distinguish these intra--shell nodes, whereby each node in the core is further ranked by ( | ) = ( max − + 1) ∈ ( , ), where max is the largest -shell index in the network, is the network core (nodes in the subset with the largest -shell index), and ( , ) is the length of the geodesic (shortest path) between nodes and . This centrality can be considered as a two element vector: where nodes are sorted first by large and then, for the same , by small ( | ). Essentially, nodes within the same -shell are distinguished by how close the nodes are to all other nodes in the network core.
Betweenness centrality. One of the earliest concepts of centrality, learned from studies on human interactions in a laboratory setting [12,113], was developed from the observation of certain nodes having control on the communication between a pair of other nodes based on their position in the network. The ability of a node to control this communication grants it a position of influence as a broker or enabler. Locally, a node with high degree has potential for fulfilling such a role, depending on the level of clustering (links) between the neighbors of the node, but this would be true only for its immediate neighbors. It does not capture the control the node has on the communication between a pair of nodes that are distant from each other. A centrality that encapsulates this concept was formally described by Freeman [59] as betweenness centrality and is mathematically defined for node by: where is the number of the shortest paths between and and ( ) is the number of the shortest paths between and that include in the paths. For comparing the relative betweenness between nodes in different networks, the centrality can be scaled or normalized by −1 [60], the number of possible pairs of shortest paths node can be between. This extreme example only occurs for the center node in a star network. Betweenness centrality has received significant interest in applications in information flow [191], network resilience [81], or network classification [68]. A variant of this centrality adapted for edges is popularly used to detect community structure [66]. This interest has led to a number of algorithms for faster computation [21], although for large and dense networks, the measure can become computationally prohibitive.
-betweenness centrality. Betweenness centrality is often an expensive calculation, especially for large networks. Ercsey-Ravasz and Toroczkai [53] formalized a notion of betweenness, originally described by Borgatti and Everett [19], considering shortest paths of length at most , i.e., If is at least the diameter of the network, then -betweenness is equivalent to betweenness centrality. Ercsey-Ravasz and Toroczkai [53] explicitly express this quantity in terms of the summation of betweenness centralities at each vertex for shortest paths of fixed length ℓ over the range ℓ = 1, . . . , . That construction is particularly useful for their analysis demonstrating a scaling factor with respect to and that for relatively small values of , the -betweenness centrality is a good indicator of the true betweenness centrality in terms of ranking the nodes with highest centrality. For small , this metric straddles the boundary between the classes of global and local centrality metrics.
Flow betweenness centrality. Freeman et al. [61] proposed a variant of betweenness to capture the capacity of information that can flow in a valued or weighted graph. The concept borrows from maximum flow-minimum cut theory [57]. Given the maximum flow between vertices and , denote by ( ) the portion of this flow that passes through node . Then the flow betweenness for node is given by: This expression can be normalized by replacing each summand ( ) with ( ) . This metric can be used to estimate the mean difference between the highest centrality and the centralities of the other nodes as a graph centrality metric, as discussed in Section 4.
Random-walk betweenness centrality. Like flow betweenness, this also captures a notion of betweenness beyond shortest paths. Newman [135] introduced random-walk betweenness to incorporate the contribution from all paths (short and long) with more weights given to shorter paths. Actually, Newman first defined the measure via a current flow analogy and showed it to be equivalent to random walks. Formally, this measure is defined by: where − is the Laplacian with the -th row and column removed (e.g., the last column and row). Note ( ) = ( ) = 1.
Load centrality. In the context of the transportation of data over a network, high centrality nodes encounter a heavy load in terms of the data packets that may be transmitted over shortest paths. Goh et al. [67] defined the load centrality of node as the total quantity of data packets traversing over node after every node in the network sends a single packet to every other node along the shortest path. For the scenario where more than one shortest path exists between two nodes, the quantity is divided at each branching point evenly. Explicitly, where ( ) is the amount of the unit quantity that passed through node from node to node such that the quantity is split uniformly at each branch encountered in the shortest paths from to . There has been some confusion that this load centrality is equivalent to the betweenness centrality (even in the original paper by Goh et al. [67]). However, the quantity in betweenness is split evenly along each shortest path and not at the branching points. For this reason, it is often the case that even in simple graphs the load due to a pair of vertices is not symmetric at every vertex, i.e., ( ) ≠ ( ). A simple algorithm for the calculation of load is provided in [22].
Routing betweenness centrality. Considering the traffic load on the network like load centrality [67], Dolev et al. [49] defined a variant of betweenness based on the routing strategy. This routing betwenness centrality measures the expected number of packets passing through a given vertex. For the vertex , the routing betweenness is calculated by: where ( ) is the probability that a packet will go through when it is sent from to , and ( , ) is the total number of paths from to . This probability is dependent on the particular routing protocol.
Closeness centrality. Bavelas [13] was interested in distinguishing between different positions in small group networks. One approach was closeness centrality, defined as the reciprocal of farness, or the inverse proportion of the average distance to all other nodes in the network. Formally, this can be expressed as: Often, this quantity is normalized for comparisons across networks by multiplying by − 1 (or for large networks). Another approach to compare the relative position of nodes with the same farness in different structure groups is given by [13], bavelas ( ) = , ∈V ( , ) ∈V ( , ) , which is equivalent to closeness ( )/ ∈V closeness ( ).

Information centrality.
Stephenson and Zelen [175] developed a centrality measure that uses all paths between pairs of nodes to incorporate the notion of the potential transmission of information. This information centrality borrows from the statistical estimation perspective that there is noise from a signal transmission captured by the variance of the signal passing through a path so that the information decreases as the distance between nodes grows. Treating this variance as unity for each link, the information for node is then defined as the harmonic mean of the information between and every other node, that is, where is the information along all paths from to , weighted by the length of each path. This quantity is ultimately given by D is a diagonal matrix of node degrees and 1 is a vector of ones. Hence, the information centrality can be rewritten as −1 information ( ) = + tr(C) − 2 2 . Current-flow betweenness and closeness. An alternative notion of flow, similar to the maxflow-min-cut approach for flow betweenness, is to model information spread over a network as an electric current [23]. Current-flow betweenness is defined as: where ( ) is the electrical current that passes through node given a supply entering the source node and exiting the terminus node . More formally, ( ) = 1 2 −| ( )| + : ∈ | ( − → )| , where ( ) = 1, ( ) = −1, and is zero elsewhere and satisfies Kirchhoff's Current and Potential Laws. This is equivalent to random-walk betweenness [135]. This approach with current can be extended to other path-based centralities. For example, current-flow closeness is defined as: where ( − → ) = ( − → )/ ( ) by Ohm's Law, and where the conductance ( ) is the inverse of the resistance ( ) or length of an edge. This variant of closeness has been shown to be equivalent to information centrality [175].
Residual closeness. Dangalchev [41] developed residual closeness to determine the vulneratiblity in the graph using a variation of closeness. This is defined by: Rather than taking the reciprocal of the sum of distances, residual closeness uses a weighting scheme. A generalization of this idea already exists in the literature [84], although it was not explicitly expressed as a centrality metric until later [83]. Jackson [83] calls this metric decay centrality, expressed as decay ( ) = ≠ ( , ) . Recently, Tsakas [180] has shown that the maximum decay centrality often coincides with the maximum degree centrality when > 1 2 and with the maximum closeness centrality when < 1 2 , at least on Erdös-Rényi graphs.
Spatial centrality. In spatial networks, the distance between neighbors is not uniform (or unweighted). Crucitti et al. [38] applied and developed generalizations of some common metrics that account for the network's embedding in space. Closeness and betweenness centralities are identical to their weighted distance versions [184], i.e., the distance between two nodes is the true distance (or weight) from one node to the other. The new metric developed by Crucitti et al. [38] is straightness centrality, which is given for node by: where Euclidean ( , ) is the Euclidean distance in the real or embedded space. Straightness centrality measures the efficiency of the route between two nodes using node .
AHP-based centrality. Bian et al. [15] developed the Analytic Hierarchy Process (AHP) as a decision making process to identify influential nodes. The steps to process are as follows: (1) Calculate centrality values (e.g., degree, betweenness, closeness) for each node and combine in an × 3 matrix. (2) Calculate weights. Bian et al. [15] appended another vector to the above matrix derived from results of SI (Susceptible-Infected) processes run on the nodes, i.e., is results of SI model [82]. The matrix is normalized and weights are determined by matching the attributes to the SI column, i.e., where s is × 3 matrix with columns s for = 1, 2, 3 and w is a transpose vector of w, which is a vector of weights for = 1, 2, 3, respectively. The presumption is that the SI scores in the above process are based on short time horizons, whereas the results of the AHP may have value for longer time horizons. Thus, AHP combines three classic centrality metrics and weights them via a short-run epidemic compartmental model process.
Generalized degree and shortest paths. For weighted networks, extensions to the usual centrality measures already exist for degree [11], closeness [138], and betweenness [21]. In incorporating weights, the measures ignore the number of ties or intermediaries. Opsahl et al. [146] sought to remedy this with the creation of generalized measures that also encompass both the traditional measures and the weighted versions: where the shortest path weighted distances given by ( , ) = min 1 1 + · · · + 1 are replaced with ( , , ) = min 1 ( 1 ) + · · · + 1 ( ) . For each generalization, when = 0, the measures are the usual (unweighted) centrality measures; when = 1, the measures are the common weighted measures. When ∈ (0, 1), having many weak ties correlates with higher generalized centrality; and when > 1, having fewer weak ties correlates with higher generalized centrality.
Weight neighborhood centrality. Wang et al. [182] included a notion of the diffusion importance of links based on the power-law property found in the distribution of many measures (e.g., degree, betweenness) in real networks. Their weight neighborhood centrality is defined as: where the weights are given by = ( deg ( ) · deg ( )) and is the benchmark centrality (e.g., degree, betweenness, -shell). ( ) is the neighbors of node , is a tunable parameter between 0 and 1, and ⟨ ⟩ is average weight for edges. This metric can be classified as a local or iterative centrality metric provided ℓ is small and the benchmark centrality is also local or iterative; otherwise it is a global centrality.
Percolation centrality. Piraveenan et al. [150] developed percolation centrality to capture the dynamic changes of a network topology based on the percolation process. Typically, the percolation state of a node at time might be denoted by ( ) and has discrete values, where a 0 value indicates is not percolated (e.g., infected) at time and a value of 1 indicates it is percolated. When 0 < ( ) < 1, then might be said to be is in the process (or probability) of being percolated. Hence, a higher value of ( ) implies that is closer to (has a greater chance of) being percolated. Piraveenan et al. defined this percolation centrality as the proportion of percolated paths passing through a node, which for node is measured by: where is the total number of shortest paths between and and ( ) is the total number of shortest paths between and passing through . When only a single source node is (partially) percolated, then the average of the percolation centrality for every node over all possible sources (excluding itself) is proportional to betweenness centrality (see Eq. (26)) as ( )/( [ ∈ G ( )] − ( )) = 1 when only when is the source, thereby contributing a 1/( − 1) factor. If all nodes are (partially) percolatied at the same level, all shortest paths are percolated paths, leading to the state that percolation centrality is proportional to betweenness centrality.
Eccentricity. Based on the idea that the centrality of a node depends on the distance, i.e., the shortest path, between other nodes in networks, H. and H. [74] introduced the concept of eccentricity, which is the maximum distance between a node and any other node in the network. Lower eccentricity indicates higher centrality. Eccentricity centrality can be mathematically expressed as: where ( , ) is the distance between the nodes and .

GRAPH CENTRALITY METRICS
In Section 3, we surveyed an individual node's centrality. Now we look into the centrality of a given graph, which represents the degree of centrality in an entire network, not just points (or vertices). We discuss the existing 14 graph centrality (GC) metrics as below.
Distance-based GC. This measures the distances between all pairs of vertices in order to measure the compactness of a network. The distance-based GC is defined by [60,171]: where ( , ) refers to the distance between vertices and . Shimbel [171] used this same metric but called it dispersion as this metric is interpreted as vertex's accessibility to G. The average shortest path [186] is a similar metric in order to compare the breadth of a network at different scales.
Degree-based GC. This metric measures the relative dominance of a single vertex in a network. Nieminen [141] measured this metric by: where G has the degree set { 1 , 2 , . . . , } and * denotes the maximum degree in the graph G. The maximum sum of the differences between the largest centrality and all other centralities can be derived as follows: The maximum degree of a vertex, deg-GC ( * ), is − 1. If the graph is a star or wheel, other vertices have only one neighbor and deg ( ) .
Betweenness-based GC. This metric is calculated by the mean difference between the maximum betweenness and all other betweennesses [59], as below: where ′ bet ( ) and ′ bet ( * ) are determined based on the normalized betweenness [60].
Flow betweenness-based GC. This metric determines the centrality of a weighted (or valued) graph based on the difference between the highest maximum flow of a node with the highest betweenness and the maximum flow of other nodes. This is computed by [61]: where ′ flow-bet-GC ( * ) refers to the normalized flow centrality of the most central node and ′ flow-bet-GC ( ) is the normalized flow centrality of node based on Eq. (28).
Closeness-based GC. Freeman [60] generalized the closeness-based graph centrality measure based on the previous trials [113,160]. This metric can be simply derived based on the normalized closeness metric, ( − 1) closeness ( ), from Eq. (32) by: where ′ closeness ( * ) is the largest closeness metric among ∈ G and ′ closeness ( ) is the closeness metric of .
Reciprocity. Newman et al. [137] measured a network reciprocity based on the number of bidirectional edges between two nodes over the total number of possible edges in a network. In directed networks, for an edge from node to node , if there is an edge from node to node , it is said the edge from node to node is reciprocated, which is also called co-links in the World Wide Web context [50]. Formally put, the reciprocity can be denoted by: where is the number of edges.
-component. This metric refers to a maximal subset of nodes where each node can reach from each of other nodes based on minimum paths that are vertex-independent. Note that two paths are said to be vertex-independent if they do not share any of the same vertices [136]. A variant of the -component can be identified based on edge-independent paths, implying that removing less than edges cannot make the component disconnected [136].
-clique. A clique refers to a maximum subset consisting of vertices in an undirected network where each member of the subset is directly connected to each other [168,179]. If the size of the clique is large, it represents a highly cohesive network with close connectedness between each other [136].
-plex. This metric relaxes the condition of the clique as we cannot find a perfect clique in reality. A -plex refers to the maximum size of the subset of vertices in a network where each vertex is connected with minimum − other vertices [168]. 1-plex with = 1 is indeed a clique.
-core. This metric is a very close concept to the -flex. It refers to the maximum size of a subset consisting of vertices that have minimum connections with other vertices in the subset. In this sense, the -core is a ( − )-flex. But given a value, the set of all -cores is not the same as that of all -flexes because is different for a different -core. Further, different from -flexes, each -core is distinct because when two -cores share one or more vertices, a single, larger-sized -core can be formed [136,168].
Global clustering coefficient. Based on the mean of (local) clustering coefficient for a given graph, Watts and Strogatz [186] also defined the global clustering coefficient (GCC) as: where clustering ( ) is the local clustering coefficient of node [186]. Network transitivity is often defined based on GCC using the concept of transitivity among three nodes in a network [80,136].
Degree assortativity. Newman [133] first defined the assortativity of a network as a graph measure to represent to what extent nodes are associated with other nodes in terms of network structural characteristics, such as degree, betweenness, node weight, node coreness as well as node characteristics, such as ethnic, language, and/or culture. In [133], given a simply undirected, non-weighted network, assortativity is defined as a scalar value . For example, degree assortativity is denoted by which can be simply defined based on the linear correlation coefficient between two nodes' excess degrees 3 , which are random variables and given by: where refers to the joint excess degree probability for excess degrees and . is a normalized distribution of a randomly selected node and given by = ( +1) , where is the standard deviation of in Eq. (50). Newman [134] further defined degree assortativity in non-weighted, directed networks, as =

( − )
, where indicates the probability that a node with out-degree and a node with in-degree is connected for , ∈ N , is the normalized excess in-degree distribution where is the in-degree for a randomly selected node, is defined similarly, and and are the standard deviations of and , respectively. Noldus and Van Mieghem [143] discussed multi-layered assortativity to be applied in directed networks, including: (1) in-degree assortivity measuring the tendency of a particular in-degree node that is connected to the same in-degree or different in-degree nodes; (2) out-degree assortativity estimating the trend of a particular out-degree node's connectedness with the same out-degree or different out-degree nodes; and (3) overall assortativity calculated based on both in-degree assortativity and out-degree assortativity.
Local Assortativity. Piraveenan et al. [151] defined local assortativity to measure an individual node's assortativity based on its degree and its neighbors' degree. The local assortativity is measured by: where is the excess degree of node (i.e., −1),¯is the average excess degree of node 's neighbors (i.e., [ ∈ ( − 1)]/ where is the set of 's neighbors), is the standard deviation of the distribution of over all nodes in the network, is the average , and is the number of edges in the network. Note that the sum of all local assortativities is the network assortativity, = .
Graph curvature. One hypothesis to explain the phenomenon observed in many large networks of traffic congestion occurring at a core set of nodes in the network is that the network as a whole is negatively curved. Evidence supporting this hypothesis includes the success in embedding networks in hyperbolic space or deriving various properties using hyperbolic network models [109]. If the network is negatively curved, then routing paths influenced by shortest path selection are somewhat forced to traverse this core, leading to congestion. Point centralities are useful in potentially identifying this core set, but they do not measure the network curvature of the graph as a whole. To address this problem, Narayan and Saniee [131] developed a large scale curvature measure by adapting to graphs the " -thin triangle condition" [71] that defines negative curvature. For any triple of nodes , , , we define the distance function from any other node to the triangle of nodes by ( ; , , ) = max{ ( ; , ), ( ; , ), ( ; , )} where ( ; , ) is the minimum distance from the node to the geodesic between and . Then, the curvature of a network with respect to the triple can be defined as: , , = min ( ; , , ).
An infinite network is negatively curved (hyperbolic), if = max , , , , < ∞. Obviously, finite networks would not satisfy this condition, hence comparing to the perimeter length of the triangle formed from the geodesics among the triple ( , , ). This ratio does not exceed 3/2 for constant nonpositively curved Riemannian manifolds [87]. To relax the constraint that every triple satisfies this condition and for computational reasons, Narayan and Saniee [131] considered a random sampling of triples and determine if the ratio Δ /ℓ converges for large ℓ = min{ ( , ), ( , ), ( , )}.

GROUP SELECTION METRICS
When a group of nodes is selected for many of the problems in the application space (e.g., influence maximization, network destruction), simply selecting the top-ranked nodes is a naïve approach. Many networks exhibit assortativity, with respect to degree or another centrality, or redundant clustering. A simple example demonstrating the problem with top-selection strategy is the observation of the importance of the -shell (certainly for influence maximization), as the topnodes all may reside in the same -shell and be neighbors. -shell based centrality approaches would only push the selected nodes to the edge of the top -shell, which may be highly localized instead of distributed throughout the network. One approach to resolving this issue to to iteratively select a single node and recalculate the centrality measure for the remaining network excluding the selected node(s). This strategy has been studied for network robustness [81] and the recalculation can be trivial for certain measures (e.g., degree, coreness). For other measures, this recalculation may be expensive. Hence less costly approaches have been developed, seeking to discover a more optimal set of nodes.
DegreeDistance. Sheikhahmadi et al. [170] introduced a degree-distance metric to ensure the selected nodes are well-dispersed in the network. The strategy first computes the degree of each node and selects the node with highest degree. It then excludes for selection all nodes within a chosen threshold distance from any of the previously selected nodes and selects the node with highest degree. Hence, given a current set of selected seed nodes , the next selected node is chosen to be = argmax Since this threshold distance can omit from potential selection high degree nodes that are within the threshold distance but have limited common neighbors (or neighbors of neighbors) with the previously selected nodes, the authors introduced two improvements to DegreeDistance. The first improvement of DegreeDistance (FIDD) does not exclude a node within the threshold distance provided the number of common neighbors and common neighbors of neighbors with previously selected nodes in is below a chosen threshold . The second improvement of DegreeDistance (SIDD) adds another check to determine an influence score P( , ) + ∈ ( , ) (P( , ) · P( , )), where P( , ) is the activation probability that will influence , and ( , ) is the set of common neighbors of and . Nodes within the threshold distance with influence above some threshold are excluded from being selected for inclusion in even when the common neighbors is below the threshold . Essentially, sufficient pathways exist for the node to be affected by a seed node indirectly.
SingleDiscount. This is essentially the iterative recalculation of degree. Chen et al. [30] used this basic heuristic to compare against several greedy approaches to estimate the cascade models of [96]. The node with maximum degree is selected for the seed set (ties broken randomly). Each neighbor of a selected node had a unit value reduction in its degree. This selection can be represented by where deg ( ) − | ( ) ∪ | = | ( )| − | ( ) ∪ | is the degree of node excluding the current links to the seed set .
DegreeDiscount. The SingleDiscount approach ignores the probability that a node may be affected by a neighbor in the seed set. Chen et al. [30] constructed an alternate heuristic to account for this and better match the independent cascade model of [96]. Under the assumption of a small propagation probability of , that neighbors of are already in the seed set, and that deg ( ) = (1/ ) and − (1/ ), then the expected number of additional vertices in ( ) that will be influenced by the selection of can be shown to be 1 + deg ( ) − 2 − ( deg ( ) − ) + ( ) · . This is derived via the probability (1 − ) that would not be influenced by nodes already in the seed set and the expected number of vertices 1 + ( deg ( ) − ) · that influences its neighbors that are not in the seed set. This ignores indirect influences, which would be expected to be minimal for small . Hence, the selection criteria, using an appropriate DegreeDiscount is where is the current seed set.
DegreePunishment. To account for indirect influence from nodes in the seed set, Wang et al. [183] introduced a strategy that punishes nodes near the seed set. The punishment is determined by how many short paths the node is on, the penalty more severe if the node is closer to a seed and, consequently, closer to the seed on the paths. This punishment is → = deg ( ) −1 ℎ=1 (A ℎ ) ℎ , where A is the adjacency matrix, is a weaken factor (typically assigned to be the propagation probability), and is the radius of influence or length of the considered paths. Then given the current seed set , the DegreePunishment selection of the next node is given by The complexity of this process grows with the radius of the paths from the seed set, so Wang et al. limited the radius to = 2 in their simulations.
Collective influence. Morone and Makse [129] introduced a scheme to capture the collective influence (CI) of a set of nodes using the concept of optimal percolation. The influence of a single node is determined by its corona, defined in a similar manner as volume centrality (see Eq. (4)). This influence of a node is collective-inf ( , ℓ) = ( deg ( ) − 1) ∈ ( ,ℓ) ( deg ( ) − 1), where ( , ℓ) is the set of nodes within the distance of ℓ from node . Hence, given the current seed set , the next node selected is where the collective influence is in the remaining graph with the nodes in S removed. Morone et al. [130] also provided a stopping criteria for their approach by updating an estimate of a lower bound on the minimum eigenvalue of the non-backtracking matrix when a fraction of nodes are removed. This estimate is given by , where ⟨ ⟩ is the mean degree of original network. When (ℓ; ) = 1, the selection process is finished.
Based on our comprehensive survey on centrality metrics conducted in Sections III-V, we summarized them based on their published years in order to capture the overall evolution of centrality metrics in Table 1 of the supplement document due to the space constraint. Instead, we summarized how many metrics are studied over time from the 1960s or earlier until the 2010s in Fig. 2. From Table 1 of the supplement document, we observed that the centrality metrics developed in the 1960s or earlier until the 1980s (e.g., degree, betweenness, closeness, eigenvector centrality) have been still commonly used in the research under various network domains. But we can also clearly notice from Fig. 2, various types of centrality metrics have been significantly studied since the 2000s and more actively in the 2010s.

APPLICATIONS OF CENTRALITY METRICS IN VARIOUS NETWORK TYPES
In this section, we give an overview of how centrality metrics have been applied in various types of networks, including social networks, contact networks, computer communication networks, and biological networks.

Social Networks
Information Diffusion. This problem involves determining the initial set of nodes that efficiently propagates information throughout the network. Kim and Yoneki [99] and Kim et al. [98] investigated this selection process under different information diffusion strategies. They found that when the initial set of seed propagators are high-degree nodes, then the choice of which neighboring nodes to spread the information does not affect the long-term propagation significantly.
Network structure features, such as network topology, node in-degree, out-degree, edge weight, and clustering coefficient have also been considered in studies of false information propagation [31,110,155,188]. Cho et al. [31] built a uncertainty-based subjective opinion model using a belief model, called Subjective Logic. They developed different types of agents that can propagate false information intentionally (i.e., disinformers) and mistakenly (i.e., misinformers), where true information is also propagated to counter the false information. Kumar et al. [110] developed four feature sets including network features to identify hoaxes in Wikipedia. The network features measure the relation between the references of the article in the Wikipedia hyperlink network. Ratkiewicz et al. [155] built a 'Truthy' system to enable the detection of 'astroturfing' (fake grass root campaigning with hidden sponsors) on Twitter. Wu et al. [188] summarized false information spreader detection based on network structures.
Kimura et al. [101,102] considered the problem of identifying the most influential nodes in a large-scale social network as a combinatorial optimization problem. Tang et al. [177] investigated an email dataset as a dynamic, social network in order to study dynamic interactions using a proposed 'temporal centrality metric. ' Kandhway and Kuri [91] studied information diffusion using an epidemic model to maximize information diffusion for a certain period of campaign running in a social network.
Influence Maximization. Bae and Kim [6] focused on classifying the ability of influential nodes in order, avoiding the assignment of multiple nodes to the same order, using neighborhood coreness centrality. Bian et al. [15] adopted the SI (Susceptible-Infected) model to identify influential nodes spreading a disease in complex networks by using the AHP decision making strategy that combines different centrality metrics which typically include degree, closeness and betweenness. Chen et al. [28] introduced semi-local centrality metric and used a modified version of the SIR model to verify its correctness. Bavelas [13] indicated that centrality position in small groups influences the perceptions of leadership (as well as morale). Newman [135] demonstrated how random-walk betweenness is a better measure than degree in the Florentine families intermarriage network [147]. Mochalova and Nanopoulos [128] examined the relationships between the influence of key members and the attitude the remaining members have towards information and how the relationship impacts information diffusion and its outcome.
A key goal in marketing or information diffusion research is to identify influentials, a small set of nodes that can significantly affect a large portion of their network. Watts and Dodds [185] questioned this hypothesis and studied if the size of influence cascades is truly caused by the information propagated from the influentials. Saito et al. [163] studied the identification of supermediators, nodes playing a significant role in receiving or passing information between other nodes in social networks. Goyal et al. [70] studied a fundamental problem in terms of where or how the input parameters to study an influence model in social networks can be obtained.
Influence Minimization. Kimura et al. [100] solved an influence minimization problem by blocking a limited number of links that spread false information or rumors, where betweenness and out-degrees are used to identify links or nodes to remove. This study found that removing high out-degree nodes is not necessarily effective compared to blocking a limited number of links to maximize the containment. Dey and Roy [45] also studied what nodes to block in order to minimize information propagation. This work used betweenness, edge betweenness, degree, and closeness to block influential nodes. Similarly, Yao et al. [192] solved the same problem but by blocking a limited number of nodes where the centrality metrics considered are out-degree and betweenness. Luo et al. [121] proposed an algorithm that identifies a set of critical nodes to minimize disinformation in timevarying online social networks. The authors conducted a comparative performance analysis and demonstrated that their proposed algorithm outperforms a centrality-based heuristic counterpart, particularly using degree and closeness.
Behavior Adoption for Marketing. Centrality metrics have been also studied as a way to identify initial target populations as a marketing strategy. In adopting technological innovations or purchasing some products, word-of-mouth processes are also modeled using information diffusion models [39]. In particular, as marketing tools, what population to focus advertising is a major concern, wherein centrality metrics are adopted to identify the target populations [96]. Many marketing applications aimed to leverage social networks or media by targeting populations using simple centrality metrics, such as degrees [47,190], betweenness [169,190], closeness [164,169].
To study the spreading process of technology adoption, various information maximization algorithms have been proposed and applied to investigate the effect of word of mouth in markets, or game theoretic strategies [96]. Kempe et al. [96] showed that the influence maximization problem is NP-hard and many heuristic or greedy algorithms to solve this problem can provably guarantee a solution to within 63% of the optimal solution, with performance guarantees close to 1 − 1/ .
Community Detection. Nikolaev et al. [142] developed a variant of entropy centrality to understand 'the entropy of flow destination' in networks and showcased how the new entropy centrality is more useful over the original entropy centrality in community detection applications. Jiang et al. [86] proposed an efficient centrality measure, called -rank, designed for selecting the top-nodes with the highest centrality. The top nodes are used as the initial seeding nodes and updated based on -means iterations.

Contact Networks
Christley et al. [33] attempted to identify the risk of disease infection of nodes using centrality metrics, such as degree, random-walk betweenness, shortest-path betweenness, and farness. Dekker [43] also used six different centrality metrics, including degree, betweenness, two types of closeness, distance-based centrality, and eigenvector centrality in order to identify the super spreaders of infectious diseases. Bell et al. [14] investigated the co-relationships between various types of centrality metrics and their variants such as degree, betweenness, closeness, eigenvector centrality, information centrality, and power prestige. Gómez et al. [69] studied high-risk hosts for emerging infectious diseases based on various centrality metrics (e.g., strength, degree, betweenness, closeness, eigenvector centrality) for their control and surveillance. The authors used network tools to predict parasitism and the host spreading future infectious diseases.

Communication Networks
Centrality metrics have also been used to make decisions to solve various problems in communication networks. Centrality metrics have been used to select critical nodes to prevent or mitigate computer virus or malware spreads. Newman et al. [137] conducted an empirical study of investigating the email network structure to examine what nodes can significantly contribute to spreading computer viruses. Kim [97] measured the risk of websites exposing security vulnerability (e.g., malware, fake infectious sites) based on degree, betweenness, eigenvector, and closeness.
Albert et al. [2] showed scale-free networks, following a power-law degree distribution, are highly robust to random attacks while highly vulnerable to targeted attacks on high degree nodes. Holme et al. [81] also investigated the network resilience in complex networks when targeted attacks are applied based on degree or betweenness. Yoon et al. [193] developed a scalable centrality-based traffic measurement based on software defined networking functionalities.

Geographic Networks
Crucitti et al. [38] analyzed spatial networks based on different centrality metrics to characterize the geographic properties of cities as networks. Gao et al. [64] used the betweenness centrality to measure urban traffic flow with GPS-enabled taxi trajectory information in Qingdao, China. This study demonstrated that betweenness is not necessarily a good metric to measure the traffic flow distributions. Porta et al. [153] developed a 'Multiple Centrality Assessment (MCA)' framework that uses centrality metrics to understand why the current design features of a city do not attract more people or increase social life. Guimerá et al. [73] examined the impact of a city's global role based on degree and betweenness. Li et al. [116] examined how centrality of each shipping area, with 25 geographical areas, plays a key role in changing the centrality of the global shipping networks (GSNs) during the years 2011-2012.

Biological Networks
Estrada and Rodríguez-Velázquez [55] used centrality to study the removal of proteins from the yeast S. cereviciae. The lethality of protein removal has been shown to correlate with the degree of the protein. Jeong et al. [85] conducted an experiment of arranging proteins in order of the degree they have and testing the consequences after each protein has been removed. Dirk and Falk [48] analyzed the structure of gene regulatory networks based on the ranks of nodes, which are measured by centrality metrics. Karabekmez and Kirdar [92] proposed a new centrality metric called weighted sum of loads eigenvector centrality (WSL-EC) in order to identify critical nodes in biological networks. Mistry et al. [127] developed a new centrality metric to predict central and critical genes and proteins based on a protein-protein interaction network.
We summarized what centrality metrics have been used in various network types based on our discussions in this work in Table 2 of the supplement document. Although our discussions on the applicability of centrality metrics are limited, this table shows a trend of what centrality metrics have been substantially utilized in contact and biological networks compared to other network domains. Despite a large volume of centrality metrics studied in the literature (see Sections 3,4, and 5, we clearly observe that the uses of centrality metrics have been mostly limited to several common centrality metrics, such as degree (including in/out-degree), betweenness, closeness, and eigenvector centrality.

CONCLUDING REMARKS
In this section, we discuss what we learned from this present study and how to improve the limitations of the existing centrality metrics by suggesting future research directions.
In particular, we implemented over 60 centrality metrics surveyed in this work under the three centrality metrics categories (i.e., point, graph, and group selection centrality metrics). We tested their effect on network resilience based on a size of the giant component when each centrality metric is used to model targeted attacks. We evaluated the performance of each metric under two undirected real network datasets and two directed real network datasets. Due to the space constraint, the details and experimental results along with the explanations of observed trends are addressed in Section 4 of the supplement document. In this section, we also discuss some insights learned from the findings obtained from the extensive simulation results.

Limitations, Insights, and Lessons Learned
We have found limitations of the existing centrality metrics surveyed in this work, learned lessons and obtained the insights from them as follows: • The meaning of centrality is not only limited to how a node is connected to other nodes, but also implies how actively the node communicates to each other and how it can control or influence other nodes in their centrality or vulnerability. In brief, node centrality determines influence in terms of connectivity, communicability, and controllability in a given network. However, node connectivity is not commonly aligned with the capacity to deal with traffic (e.g., communicability) because nodes with high connectivity are often congested. • Centrality metrics can be applicable in various disciplines with different purposes. In addition, there is a rich volume of centrality metrics available that can be used for various design goals. For example, we may want to investigate how to balance traffic loads, how to set edges between nodes to make a network robust against faults or attacks, what types of targeted attacks to develop, how to identify vital nodes based on various criteria, or what is the most (least) influential or vulnerable node in a given network. • We investigated the effect of each centrality metric on network resilience in terms of a size of the giant component. We found that if a centrality metric measures how well a node is connected with its close neighborhood (i.e., locally well connected), its impact upon removing the node with high centrality tends to be limited. For example, removing nodes with high clustering coefficient or volume centrality is not as severe as the random removal of nodes in network resilience (i.e., the size of the giant component). However, if the centrality metric refers to how well the node is globally linked with other nodes which may belong to another cluster of the network (e.g., another community), when the node fails, the network is highly impacted by the node's failure. • We found that when an attack using a given centrality metric is non-infectious, what metric to choose is highly critical because the effect of a different centrality metric can be vastly different. However, when the attack is infectious, using different centrality metrics doesn't introduce a significantly different impact on network resilience as the infectious attack itself may be powerful. In addition, we found how a node is connected in a given network (i.e., network topology characteristics such as network density) is a more important factor that influences the network resilience (i.e., a smaller size of the giant component).
• Although a large volume of centrality metrics has been developed so far, only common centrality metrics have been used, such as degree, betweenness, closeness, clustering coefficient, or pagerank, which has been developed for several decades ago. Although degree is a simple metric, other metrics, such as betweenness or clustering coefficient, require high complexity with high running time. It was interesting to observe that even if there have been many centrality metrics developed in the 2010s, not many of them have been used in the existing network applications while the metrics developed from the 1970s to the 1990s have commonly been used in the literature. • Unlike centrality metrics that are applicable in undirected networks, centrality metrics in directed networks may not be appropriate to study their effect on network resilience. This is because even a node's failure with high centrality (e.g., hub, authority, or leaderrank) in sparse networks may not introduce any significant impact where centrality is mainly measured based on in-degree, not out-degree. in the supplement document). The overall trend is that centrality metrics tested under directed networks (e.g., SALSA authorities, SALSA hubs, leaderrank, clusterrank) tend to show higher running time than centrality metrics tested under undirected networks. This may be because undirected networks innately have higher connectivity than directed networks. Recall that many centrality metrics rely on the (shortest) path distances between two nodes as part of the metric calculation. • The running time of each metric (see Figs. 10-14 of the supplement document) is mainly influenced by network size, network or node density. In addition, in some metrics, we optimized the code to expedite the running time while others may not. Therefore, there may be an inaccuracy introduced in the running times of centrality metrics demonstrated in this work. However, we believe that this imperfect code optimization won't significantly affect the order of running time performance of centrality metrics compared in this work. • Most point centrality metrics are extensions from notions of degree of the node or its neighbors (e.g. semi-local, -shell, ℎ-index), connections between neighbors of the node (e.g., Burt's redundancy, clustering coefficient), path finding processes involving the node (e.g., betweenness, closeness), or iterative processes between the node and its neighbors (e.g., eigenvector, pagerank). The extensions attempt to capture something missing or ignored in a fundamental metric, e.g. the degree of the node by itself ignores the degree of its neighbors whereas semi-local centrality aggregates that information and both -shell and ℎ-index consider threshold effects on that information. New centrality metrics can be considered by supplementing an existing approach with missing information that may be relevant to the particular problem criteria. • For an insightful comparison of network resilience under infectious attack using different centrality metrics, the infection rate variability is highly dependent on the characteristic of the network (e.g., network or node density or network topology). Infection is spread more easily in a dense network wherein all the nodes are more easily accessible. On the other hand, a sparse network has a structural insulation protecting itself from an infectious attack.

Future Research Directions
• More efficient centrality metrics are needed: Since there are many centrality metrics that suffice to meet certain tasks but require less complexity (i.e., low running time), we can leverage these or perhaps modify to enhance their effectiveness for the task (e.g., increasing the effect of removing a node with high centrality) or efficiency (e.g., running time). Some metrics are representative of a broader meaning of centrality, such as communicability or controllability (e.g., load centrality in Eq. (30)), in addition to a simple connectivity. However, their high complexity hinders applicability in various domains. • More meaningful metrics are needed to measure network resilience: The size of the giant component, as a common metric to measure network resilience, does not reflect a broader concept of network resilience. Network resilience can be defined in terms of how adaptable a network is to deal with sudden changes or attacks/failures (i.e., adaptability), how tolerant the network is to prevent its failure against attacks or failures (i.e., fault tolerance), and how easily recoverable the network is from attacks or failures (i.e., recoverability) [32]. As a future work direction, we need to develop metrics that can measure network resilience embracing adaptability, fault tolerance, and recoverability, or other properties based on system requirements. • Graph centrality metrics can be enhanced as a novel measure of network resilience: Graph centrality metrics measure certain characteristics of a given network, such as the distances between nodes, connections between neighbors, or redundant paths between nodes. However, as we observed in Tables 4 and 5 of the supplement document, it is not necessarily correlated to the size of the giant component, which is a conventional metric measuring network resilience in some graph centrality metrics. We can improve the existing graph centrality metrics or invent ones that can be used as indicators related to the key properties to network resilience. For example, when a certain graph centrality value is high, it may indicate the network has the ability to easily recover from attacks or failures. • Centrality metrics embracing a broader concept of influence need to be developed: Although a rich volume of centrality metrics has been explored in the literature, most of them rely on the concept of centrality based on connectivity. However, in reality, being connected with less critical nodes does not introduce high impact on network resilience, as long as a small set of critical nodes are still kept safe and operating in a reliable manner. In addition, although controllability is one of the key centrality concepts as discussed in Section 2.1, not many centrality metrics are developed without explicitly considering a node's controllability over a given network. There should be more efforts to develop centrality metrics that can fully consider its ability to control the network. • Enhancement of the infection process for modeling infectious attacks: In the infection process considered in this work, a node is infected with a given probability. If the node is not infected with the probability, we simply assumed that it is immune to the attack and are not infected again. However, in real world scenarios, various types of attacks are spread out in a network and there is the possibility that a node can be attacked by multiple or different types of attackers, which allows the same node to be infected multiple times easily. Hence, as a future research direction, a more realistic infection process can be considered where an infected node can recover and be reinfected. • In-depth analysis of network resilience under various network conditions is important: Due to the space constraint, we have not demonstrated more sensitivity analyses to investigate the effect of using a different centrality metric under various network conditions in terms of network density (i.e., the number of edges), node density (i.e., the number of nodes in a given area), or the variance in the number of degrees (e.g., for a scale-free network or a random graph).
We can take another in-depth analysis of network resilience by using a different centrality metric in order to identify what metric would be more powerful under what network conditions. In addition, more comprehensive, diverse, larger, and real network topologies can be considered to obtain more meaningful findings to provide generalizable guidelines for selecting useful centrality metrics in a given application.

Appendix A CENTRALITY METRICS RESEARCH IN MULTIDISCIPLINARY DOMAINS
As the extensions of Section II.B of the main paper, we discuss how different disciplines have studied centrality metrics and applied them to solve critical problems in their domains.
Mathematics. The study of networks has its origins in the analysis of data with certain relations in various disciplines. For mathematics, this dates back to the 1730s with Leonhard Euler's solution to the Seven Bridges of Königsberg problem, which is the foundation of graph theory. Centrality metrics are explored based on graph theory, which has been described as the study of networks [44].
Chemistry. Graph theory has been applied in Chemistry since the 1870s [176]. Chemical process plants can be represented by networks in which centrality metrics are used to identify more important units and controllers [105,157].
Anthropology. Network centrality was first investigated in Anthropology by studying human behaviors in groups [12]. Many human organization or group-based decision making research communities have studied centrality metrics to measure influence and/or power of a group or organization [34]. In the recent Anthropology research, Collins and Durington [35] discussed 'networked anthropology' by using diverse multimedia and OSN platforms. In addition, how community centrality affects scholarly activities in social science has been studied in Anthropology [40].
Physics. Network centrality metrics have been heavily studied in the area of complex networks/systems by physicists [136]. In particular, physicists have been major players in the area of Network Science, which has been studied in multiple disciplines, including all these disciplines discussed above. Network science is defined as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena" [144].
Geography. Historical geographers were interested in how the centrality of a region (e.g., Moscow) can affect dominance and evolution of the region in which the area can be described based on graph theory [152]. Taras et al. [178] studied urban street networks based on graph theory in order to identify important areas in terms of the influence of topology and geo-referenced data extracted from the network.
Economics. Souma et al. [172] studied business networks to investigate the probability of business networks becoming scale-free and the effect of the merger among banks on the cliquishness of companies or the separation between two companies. Mayer [124] also investigated how social and economic factors (e.g., economic incentives or socioeconomic background) can introduce the changes in social network structure and its composition which were measured by centrality metrics (e.g., Bonacich centrality).
Psychology. Centrality metrics have been used to measure socio-cognitive aspects of human behavior in various contexts. Kameda et al. [90] defined a person's power in a group based on his/her centrality measured by the degree of information the person shares with others. The person's influence based on network centrality has been shown to be critical to forming consensus in the decision making process. Lee et al. [114] looked at how a person's centrality in a network position affects consumer influence as well as susceptibility to the influence of others. Epskamp et al. [52] also provided how to measure centrality in psychological networks.
Sociology. Centrality metrics have been used in Sociology for a long time in order to examine various types of social networks. The Bonacich centrality metric has been studied in [16,17] in order to measure status and power in society. Borgatti's centrality metrics have been used to investigate the relationships between a person's centrality and other significant factors [18][19][20]. Metrics measuring social relationships are also developed such as social proximity [61] based on betweenness measure and faster betweenness algorithm [21] in the mathematical sociology domain.
Biology. Centrality metrics have been used in Biology in selecting central nodes, such as pathogen-interacting, cancer, ageing, HIV-1 or disease-related or immune-related proteins [48,92,127] in gene regulatory networks, protein-interaction networks, and metabolic networks.
Management. Centrality metrics have been investigated to identify the key factors to be successful in business management. The management research has investigated how a founder's centrality affects top management group, the group's culture and vision Kelly et al. [95], Nicholas et al. [139] and how network centrality is critical to increasing financial performance [79].
Computer Science. Centrality metrics have been highly leveraged and investigated for diverse applications in the computer science domain. For example, centrality metrics are used in mobile social network applications [196], visual reasoning in online social networks [37], water network distribution [132], or traffic management for space satellite network [195].
Political Science. Graph centrality measures have been considered in identifying power and/or influence of individuals and/or attracting resources in political networks since the 2010s [75]. As social media and social network services (SNSs) become more and more popular, the availability of social network data allowed the analysis of political views and/or attitudes with respect to various centrality measures [112,126].
Psychiatry. Network science has been applied in Psychiatry under the name of Network Psychiatry [166] based on computational models to investigate the structure of psychiatric disorders which are treated as complex systems. Zuo et al. [197] considered centrality metrics to measure 'functional connectivity' in a brain connectome. They investigated the relationship between the extent of centrality and certain disease or body conditions/characteristics (e.g., age and sex). Their findings backed up how the centrality in the brain connectome can be used as the underlying physiological mechanisms to study 'neurodegenerative and psychiatric disorders. ' Fried et al. [62] also used centrality metrics to determine the centrality of the Diagnostic and Statistical Manual of Mental Disorders (DSM) symptoms and non-DSM symptoms where a network consists of 28 depression symptoms. In this work, centrality is used as an indicator of the relationships between different depression symptoms.

Appendix B EVOLUTION OF CENTRALITY METRICS
Based on our comprehensive survey on centrality metrics conducted in Sections III-V of the main paper, we summarized them based on their published years in order to capture the overall evolution of centrality metrics in Table 1. As discussed in the main paper, we observed that the centrality metrics developed in the 1960s or earlier until the 1980s are still commonly used in the research literature under various network domains. However, we can also clearly notice that various types of centrality metrics have been developed since the 2000s and more in the 2010s.
D.1.1 Datasets. We selected the following real datasets for network topologies used in the performance demonstration of the surveyed centrality metrics: • Directed Network Topologies: (1) The UCI Social Network [148] is a collection of interactions from private messages sent over an online social network at The University of California, Irvine. (2) The Rocketfuel Network [173] is a snapshot of router connections on an Internet Service Provider (ISP) topology from measurements.  The practical examples include partial physical destruction of a system [1], non-critical nodes that are not functioning due to denial-of-service (DoS) attacks [123], or a node accessed by a unauthorized party aiming to illegally obtain credentials [123]. The fraction of removed nodes, , is the same as the number of attackers without propagating infections. • Infectious attacks: Unlike the above non-infectious attack, this attack propagates infections towards other nodes. The common examples are malware or virus spreads. Botnets can propagate malwares or viruses through mobile devices, which can use mobile malware such as a Trojan horse, which acts as a botclient to obtain a command and control from a remote server [123]. We model this infectious attacks by selecting the initial attackers with , a fraction of nodes being selected as initial seeding attackers. We assume that the infectious attackers follow the Susceptible-Infected-Removed (SIR) epidemic model [136]. Nodes in the susceptible state (S) refer to healthy nodes, not being infected by the attackers yet. Nodes in the infected state (I) are the compromised nodes, becoming an inside attacker, which can also replicate infections to their neighboring nodes. Nodes in the removed state (R) are the nodes detected and isolated from the network by cutting all edges of the detected node. The compromised and detected nodes are treated as failed nodes. A susceptible node (S) can become infected (I) and later recover or be removed (R). When the size of the giant component is captured, we only consider healthy nodes, which are still in the state. We consider the probability that a node is infected as the infection rate, .

D.1.4 Centrality Metrics Tested and Parameter Settings.
For the volume and flow betweenness centrality metrics, we used the number of hops (ℎ) set to 2. In the group selection metrics, we used = 4 in the degree distance metric and each group is defined with 10 nodes. Due to the high complexity of some metric computations (i.e., too slow even for one simulation run), we excluded the following point centrality metrics: random-walk betweenness, routing betweenness, dynamical influence, load centrality, and curvature. In the point centrality metrics, we didn't show communicability centrality as it is the same as subgraph centrality when it is used to measure node centrality. In the graph centrality metrics, since reciprocity was the only metric that can be measured in a directed network, we excluded it.

D.2 Network Resilience Analysis of Point Centrality Metrics
D.2.1 Under Non-Infectious Attacks. Fig. 4 shows the size of the giant component in the URV Email Network and UCI Social Network when varying the fraction of removed nodes (i.e., attacked nodes) selected via different point centrality metrics. Hence, this models a targeted attack based on the given point centrality metric where the attack is not infectious. From the observation of Fig. 4 (a) -(f), we found the following: (i) Most targeted attacks are stronger attacks than random attacks (notated as 'random' in black), showing a significantly lower size of the giant component; (ii) Betweenness in (a) and GDSP betweenness in (e) show the best performance (i.e., in the sense of reducing the size of the giant component) with the network dissolved after a little more than 4\10ths of the nodes are removed; and (iii) Although most targeted attacks with given point centrality metrics outperform a random attack, the attack with clustering coefficient in (c) performs close to the random attack without showing a higher impact in disconnecting a given network. We can conjecture the reasons as follows: Since the clustering coefficient measures the number of triangle relationships among a node's adjacent nodes, removing a node with high clustering coefficient still allows neighboring nodes to remain connected. The impact of removing a node is lessened if the selection criteria (or centrality) has a more local, rather than a global, scope. Therefore, removing a node with high clustering coefficient does not introduce a dramatic effect in reducing the size of the giant component. In Fig. 5    general trends observed from the results shown in Fig. 5 are highly similar to the results in Fig. 4.
The key observations are already discussed above while discussing Fig. 4.  Note that this attack is not infectious so an attacked node cannot compromise adjacent nodes. In undirected networks, most centrality metrics showed a larger size of the giant component in a dense network, which is the EU Email Network. On the other hand, in the URV Email Network, which is a sparse network, we observe a smaller size of the giant component. Diffusion, percolation, and volume centrality metrics performed relatively poorly perhaps indicating these metrics are less informative for sparser networks. Except for the clusterrank metric, all metrics evaluated under directed networks performed better (i.e., a smaller size of the giant component from the attacker perspective) under the UCI Social Network than the Rocketfuel Network. The key observations from Fig. 6 are: (i) Katz and dynamic influence centrality metrics show a weaker impact on the size of the giant component, compared to other centrality metrics. This is because both metrics are derived based on eigenvalues and measure the influence of the node based on the influence of its neighbors. Even if the node itself is removed, the adjacent nodes are connected in the giant component of the network. Hence, the impact of removing nodes with high Katz or dynamic influence centrality is not stronger than that of removing nodes with high centrality of other types; (ii) The effect of the point centrality on the degradation of the network depends also on the network topology. For example, with volume centrality, node removals in the EU Email Network results in a significantly larger size of the giant component than node removals in the URV Email Network. In addition, all point centrality metrics tested in the right side of the plot (e.g., from eigenvector centrality to contribution centrality) show a larger size of the giant component for the URV Email Network compared to the EU Email Network; and (iii) In the metrics evaluated under directed networks, we can clearly see poor performance of authorities, SALSA (a) Infectious attacks with degree, closeness, betweenness, pagerank, eigenvector, local entropy and mapping entropy (b) Infectious attacks with local betweenness, volume, redundancy, kshell, improved kshell, percolation and hybrid degree (c) Infectious attacks with neighborhood coreness, flow betweenness, katz, deffusion centrality, subgraph and clustering coefficient (d) Infectious attacks with information centrality, residual closeness, semi local, mixed degree decomposition, dynamic influence and weight neighborhood (e) Infectious attacks with GDSP degree, GDSP closeness, GDSP betweenness, eccentricity, cumulative nomination, h index, -betweenness and contribution (f) Infectious attacks with hubs, authorities, clusterrank, SALSA authorities, SALSA hubs, leaderrank in the (directed) UCI Social Network hubs, and SALSA authorities on a sparse network as the Rocketfuel Network. This is because an attack only infects in the direction of its directed edges. But these three centrality metrics measure the centrality based on incoming edges, which even prevents the infection from being spread over the network.

D.2.2 Under Infectious Attacks.
We also evaluated the performance of point centrality metrics surveyed in this work under infectious attacks. As discussed in Section D.1.3, an seeded attacker can infect neighboring nodes with an infection probability . Fig. 7 shows the size of the giant component under targeted attacks of the URV Email Network and the UCI Social Network for 39 point centrality metrics. Here, we varied the fraction of the initial attackers by an increment of 0.01 from 0.01 to 0.1. A node is immune to the attack if the node is attacked but is not infected based on the given infection probability, . Note that we report results over a smaller fraction of initial attackers because of the stronger impact of infectious attacks on the size of the giant component. We observed the following from the results shown in Fig. 7. First, overall the decrease of the size of the giant component is linear. Most targeted attacks reduce the size of the giant component compared to random attacks. Second, curiously, three point centrality metrics tested in this work resulted in a comparable or larger size of the giant component than random attacks. These are clustering coefficient, flow betweenness, and redundancy. For the clustering coefficient, as discussed in Fig. 4 (c), removing a node with high clustering coefficient has a limited effect on its local network due to high connectivity. More generally, when local neighborhoods are well connected, which is the case for nodes with high clustering coefficient, the reduction of the network is tempered. Similarly, since redundancy captures the overlap of a node's neighborhood with that of other nodes, the network is less likely to be dismantled because the nodes in the neighborhood remain connected. Volume centrality is estimated based on a given hop ℎ which is set to 3 in our work. This means that even when a node with high volume centrality is removed, an infectious propagation of the attack may be limited in scope depending on the immunity of the immediate neighbors. Lastly, the performances of betweenness and pagerank in (a) and GDSP betweenness and -betweenness in (e) are impressive compared to other centrality metrics, resulting in a significantly smaller size of the giant component for the undirected URV Email Network. In addition, in the (directed) UCI Social Network, clusterrank, leaderrank, hubs, and SALSA authorities are quite impressive in their performance, resulting in a significantly smaller size of the giant component, compared to other centrality metrics. Fig. 8 shows the size of the giant component under targeted infectious attacks on the EU Email Network and the Rockefuel Network. Again, the infection probability is = 0.05, and there are 39 point centrality metrics tested. The overall trends are similar to Fig. 7. However, some differences are as follows. First, seeding attackers based on flow betweenness in Fig. 8(c) performs better in the EU Email Network as the fraction of initial infectious attackers increases whereas in the URV Email Network, selection based on flow betweenness performed no better than random selection, as shown in Fig. 7(c). Second, volume centrality-based seeding didn't perform as well in the EU Email Network (Fig. 8(b)) compared to the URV Email Network (Fig. 7(b)). This could be because of the reason discussed earlier regarding the clustering coefficient, which also didn't perform better compared to the random attack. That is, removing a node with high volume centrality may only collapse the local network of the node. This means that under dense networks, the removal of nodes with a highly connected local neighborhood does little to separate the network into smaller components. Third, the resulting size of the giant component is similar in the EU Email Network for all centrality metrics in Fig. 8(d), while the performances are more distinctive in the URV Email Network, as shown in Fig. 7(d) showed distinctive performances. Based on these observations, we can say the network topology really affects the performance of centrality metrics. In particular, the key difference between these two datasets (i.e., the URV Email Network in Fig. 7 and the EU Email Network in Fig. 8) is that the EU Email Network is a denser network than the URV Email Network. This can explain why flow betweenness can significantly perform better than random in the EU Email Network, compared to its performance in the URV Email Network. That is, since a higher network density (i.e., more edges between nodes) can increase the impact of infectious attacks, the (a) Infectious attacks with degree, closeness, betweenness, pagerank, eigenvector, local entropy and mapping entropy (b) Infectious attacks with local betweenness, volume, redundancy, kshell, improved kshell, percolation and hybrid degree (c) Infectious attacks with neighborhood coreness, flow betweenness, katz, deffusion centrality, subgraph and clustering coefficient (d) Infectious attacks with information centrality, residual closeness, semi local, mixed degree decomposition, dynamic influence and weight neighborhood (e) Infectious attacks with GDSP degree, GDSP closeness, GDSP betweenness, eccentricity, cumulative nomination, h index, -betweenness and contribution (f) Infectious attacks with hubs, authorities, clusterrank, SALSA authorities, SALSA hubs, leaderrank in the directed Rocketfuel Network flow betweenness-based attacks can take an advantage of the network density to increase its effect in compromising other nodes in the network. In addition, higher network density can also make the performances of targeted attacks less distinctive because the opportunities for infection are more relevant than the marginal benefits of optimizing the selection of initial attackers. Fig. 9 shows the effect of point centrality-based targeted attacks in the undirected networks (EU Email Network, URV Email Network) and directed networks (UCI Social Network, Rocketfuel Network) in terms of the size of the giant component as an indicator of the network resilience when the single top-ranked node based on a given metric is selected as an infectious attacker. The trends are very similar to Fig. 6 in terms of the performance under different networks. Repeating the trends observed in Fig. 6, the effect of targeted attacks based on point centrality metrics is greater (i.e., smaller size of the giant component) in the sparse URV Email Network than in the dense EU Email Network. It is not surprising that the dense network can absorb the impact of removing nodes and better maintain a connected network. However, interestingly, in directed networks, the sparsity of the directed Rocketfuel Network can mitigate the infection process, leading to a larger size of the giant component while the higher density of the UCI Social Network allows attacks to more easily spread. Fig. 10 shows the mean fraction of nodes infected by a single, initial attacker when the fraction of initial attackers vary from 0.001 to 0.01 with an increment of 0.01 using 38 point centrality metrics to determine the initial selection for the undirected EU Email Network (i.e., Fig. 10(a)-(e)) and in the directed Rocketfuel Network (i.e., Fig. 10(f)). Most metrics evaluated in this work showed higher rates of infection spread per initial attacker. However, some metrics, such as flow betweenness, clustering coefficient, diffusion centrality, mixed degree decomposition, and SALSA authorities, showed lower rates per initial attacker. Note that an attack resulting in a smaller size of the giant component does not necessarily mean there are more infected nodes because there may exist many uninfected nodes in smaller components. Conversely, lower infection rates due to a given centrality-based selection does not imply that the network is resilient to that particular attack.
(a) Infectious attacks with degree, closeness, betweenness, pagerank, and eigenvector (b) Infectious attacks with local betweenness, volume, redundancy, kshell, improved kshell, percolation and hybrid degree (c) Infectious attacks with neighborhood coreness, flow betweenness, katz, diffusion centrality subgraph and clustering coefficient (d) Infectious attacks with information centrality, residual closeness, semi local, mixed degree decomposition, dynamic influence and weight neighborhood (e) Infectious attacks with GDSP degree, GDSP closeness, GDSP be tweenness, eccentricity, cumulative nomination, h index and contribution (f) Infectious attacks with hubs, authorities, clusterrank, SALSA authorities, SALSA hubs and leaderrank in the directed Rocketfuel Network Fig. 10. Mean fraction of infected nodes after infectious, initial targeted attackers are selected from 0.001 to 0.01 with the increment of 0.01 based on 38 centrality metrics in the undirected EU Email Network (i.e., (a)-(e)) and in the directed Rocketfuel Network (i.e., (f)).

D.3 Network Resilience Analysis of Graph Centrality Metrics
We surveyed 14 graph centrality metrics in Section IV of the main paper. Since the range of each metric varies, we cannot compare their maximum values. However, we can at least investigate whether the value of each metric increases or decreases depending on how many nodes are removed at random and accordingly the size of the giant component. In order to easily observe this, we where is the value of a given graph centrality (GC) from the original network with the size of the giant component being 1 and ′ is the value of a given GC after removing a certain percentage of nodes being removed at random. If we observe the RGC value increases under a smaller , it implies that the GC value decreases under the smaller . On the other hand, if the RGC value decreases under a small , this means the GC value increases under a smaller .

D.3.1 Under Non-Infectious Attacks.
For the validation of group selection centrality (GC) metrics, we considered two sets of random attacks with 30% removal and 70% removal of nodes in two undirected network datasets (EU Email Network, URV Email Network). Since we considered random attacks in this case to investigate how the GC values are affected under two different scenarios, we observed that the size of the giant component was the similar with approximately 0.3 and 0.7 for the respective cases. Since -plex, -clique, and -core return a set and reciprocity needs to be applied in a directed network, we omitted the discussions of those metrics. In Table 4, we summarized the RGC values.
The key observations are as follows: (i) Overall, the size of giant components under different GC metrics is similar because the attacks are random; and (ii) The effects of random attacks on the extent of GC values are different depending on each GC metric. We found that increasing the number of initial attackers reduces the GC value in the following graph centrality metrics: distance-based GC, degree-based GC, -component, degree assortativity, local assortativity, and global clustering. On the other hand, we observed greater GC when increasing the number of attackers in the following GC metrics: betweenness-based GC, closeness-based GC, and graph curvature. The reason of exhibiting the different trends can be explained as follows. If the GC metric measures how the node is locally connected with its close neighbors, then the GC value decreases due to the breakdown of local connections when random attacks are performed. However, if the GC metric estimates how the node is globally connected with other nodes, its value can increase as the normalization of the GC calculation depends on the size of the network. Therefore, we cannot simply rely on whether a network is dense or sparse based on the GC metric because a higher GC metric doesn't always necessarily imply a denser network.  Table 5 shows the RGC values of the graph centrality (GC) metrics when random infectious attacks are performed. Again, the network is seeded with 30% or 70% of infected nodes and the results are for the two undirected networks (EU Email Network, URV Email Network). Due to the infectious nature of this attack, the size of the giant component is observed to be smaller compared to that under non-infectious attacks. But similar to what we observed in Table 4, some GC metrics (e.g., the top 6 GC metrics in Table 4) show a similar tendency with decreasing GC under a graph with a smaller size of the giant component. However, other GC metrics (e.g., the bottom 4 GC metrics in Table 4) do not show a consistent trend. For example, for degree assortativity, the size of GC decreases in the dense EU Email Network while it increases in the sparse URV Email Network. In addition, GC does not always keep increasing or decreasing depending on the size of the giant component even for the same network, as observed in the closeness-based metric. Therefore, the scale of some GC metrics can be used to predict the size of the giant component. Fig. 11 shows sizes of the giant component in both undirected networks (EU Email Network and URV Email Network) as the indicator of network resilience when a set of groups (where a group is defined as 10 nodes) chosen based on a given group selection metric are removed as targeted attacks. Under non-infectious attacks, each metric's performance is more distinct. In particular, attacks on more dense networks (with more edges) in the EU Email Network are less severe when degree punishment is the selection criteria while attacks on larger networks (with more nodes) are less severe with degree distance. Under infectious attacks, the results are more interesting. First, for a less dense network like the URV Email Network, the effect of the four metrics on the size of the giant component is similar although the degree discount seems to be the best selection strategy. However, under the denser network like the EU Email Network, the degree punishment strategy outperforms the others because high network density mitigates the effect of the penalty. From this observation, we found that under infectious attacks, higher network density can significantly mitigate the effect of the targeted attacks. If a network is not sufficiently dense, regardless of what metric is used to select targets to attack, the network can more easily collapse. Thus, it is more important to select the right group selection metric for developing more powerful attacks under dense networks than under sparse networks.    Appendix E RUNNING TIME ANALYSIS Fig. 12 shows the running time in log 10 sec. to show the efficiency of 39 point centrality metrics surveyed in this work using the undirected URV Email Network and the UCI Social Network. Degree, pagerank, and GDSP degree exhibit the best efficiency among the point centrality metrics considered in this work. This is one reason why even though a large volume of centrality metrics have been created in the 2000s and 2010s (see Fig. 2 of the main paper), simple degree-based or similar centrality metrics still dominate in practice due to their efficiency in calculation. We also observe high running time from contribution centrality to leaderrank centrality in the right side of Fig. 12. Although these metric offer certain useful features in capturing insightful centrality concepts in terms of power or influence, their high running time may not be attractive particularly in sizable or resource-constrained, distributed environments. We also display the running time analysis of the point centrality metrics using the undirected EU EMail Network and the directed Rocketfuel Network in Fig. 13. Comparing the results here with the other networks in Fig. 12, we find there are only slight differences in the performance order. This is because the characteristics of a network dataset affect each centrality metric's running time. However, the trends are similar since the performance order is still dependent on the inherent complexity of each metric. Fig 14 shows the running time of 13 graph centrality (GC) metrics per simulation run on the undirected URV Email Network. We found most -metrics, except -core, are fairly slow while common metrics such as degree-based metrics are faster, which is one reason for their common utilization in various domain applications. However, it seems there is no clear relationship between algorithmic complexity and the nature of the GC metrics, such as local or global metrics, in the process of their calculation. Similarly, Fig. 15 shows the running time of 13 graph centrality (GC) metrics per simulation run but using the other undirected EU Email Network dataset. We found a slightly different performance order compared to Fig. 14 using the UCI Social Network. However, the overall trend is similar. As discussed regarding Fig. 14, it seems there is no relationship between algorithmic complexity and local or global centrality nature in the GC metrics. Fig. 16 shows the running time of the four group selection metrics per simulation round. We found that the degree distance is more expensive than other counterparts that are the enhanced  versions to improve the complexity of the degree distance using heuristics. We also found there is a longer running time for calculating the metrics using the URV Email Network than using the EU Email Network. Even though the URV Email Network has more nodes than the EU Email Network, the EU Email Network has five times higher network density (i.e., more edges) than the URV Email Network. This implies that the complexity of a group selection centrality is more affected by node density rather than network density.

Appendix F ALGORITHMIC COMPLEXITY OF CENTRALITY METRICS
In Tables 6, 7, 8, and 9, we summarized asymptotic complexities of all centrality metrics surveyed in this paper. Table 6. Point centrality's meaning, metric, and complexity Table 7. Point centrality's meaning, metric, and complexity Table 8. Graph centrality's meaning, metric, and complexity