Effective Edge-Based Approach for Promoting the Spreading of Information

Real–world systems, ranging from social to infrastructural, can be abstracted into complex networks. Promoting the spreading of some typical information (for instance, the commercial message, vaccination guidance, innovation, and political movement) on these networked systems can bring benefits to all aspects of society. In this study, we propose an effective edge-based approach for promoting the spreading of information on complex networks. Specifically, we first quantify the potential influence that the addition of each latent edge (that is, edges that do not exist before) could cause to the information spreading dynamics. Then, we strategically add the latent edges to the original networks according to the potential influence of each latent edge. Numerical simulations verify the effectiveness of our strategy and demonstrate that our strategy outperforms several static strategies, namely, adding the latent edges between nodes with the largest degree or eigenvector centrality. This study provides an effective way to promote the spreading of information by modifying the network structure slightly and helps in understanding what a better network structure for the spreading dynamics is. Besides, the theoretical framework established in this study provides inspirations for the further investigations of edge–based promoting strategies for other spreading dynamics.


I. INTRODUCTION
The subject of promoting the information spreading in networked systems is attracting substantial attention from multiple disciplines, for instance, computer science, statistical physics, and network science [1], [2]. Maximizing the spreading prevalence of some typical information, including the vaccination guidance, innovation, commercial message, and political movement, can bring benefits to all aspects of the socio-economic systems [3]- [7]. The study of promoting these spreading dynamics is of great importance in both theoretical and practical perspectives.
Understanding the evolutionary mechanisms of the information spreading dynamics in real life and building suitable models to describe them play essential roles in developing promoting strategies. Various spreading models have been proposed for information with different evolution-The associate editor coordinating the review of this manuscript and approving it for publication was Chao-Yang Chen . ary mechanisms. For instance, in some simple contagions (e.g., news diffusion and innovation spreading) where the informed individuals could transmit the information to those susceptible ones by a single contact, the classic susceptibleinformed-susceptible (SIS) model [8], [9], susceptibleinformed-recovered (SIR) model [10], [11] and many of their extensions [12]- [16] have been widely applied. Besides, for some complex contagions (e.g., behavior adoption [17] and political information spreading [18], [19]), researchers have proposed the threshold model which incorporates the social reinforcement mechanism (i.e., the mechanism that the susceptible individuals receive the information with a probability that increases with the cumulative number of contacts with the informed ones) [20], [21]. More spreading models with other complex mechanisms can be found in [22]- [24].
Based on these spreading models, researchers go further to develop strategies to promote or enhance the information spreading. Some of the researchers focus on designing effective transmission strategies [25]- [30] such as developing smart protocols to avoid invalid contacts (for instance, the contact between two informed nodes). Besides, some of the researchers are committed to identifying vital nodes [31]- [38] with high centralities (for instance, degree, betweenness, and closeness centrality); and they suggest that selecting these vital nodes to the initial seeds can maximizing the spreading prevalence. Recently, several researchers find that structural perturbations (that is, modifying the network structure slightly) can be used for promoting spreading dynamics as well [39]- [41].
Nevertheless, despite all of these efforts, no previous study has investigated the problem of how to effectively promote the information spreading dynamics described by the SIR model using structural perturbations, to the best of our knowledge. To fill up this research blankness, we propose an effective edge-based strategy for promoting the SIR spreading dynamics in this study. The SIR model is first proposed to study the epidemic transmission. Later on, researchers extend it to describe various kind of information spreading, including innovation spreading, promotion of commercial products and the spread of political movements [42]- [49]. Our strategy enhances the spreading dynamics of the SIR information spreading model by adding edges that do not exist before. To be specific, we first develop a mathematical model to quantify the influence that the addition of each latent edge (i.e., each edge that does not exist in the original network) could cause to the spreading dynamics. This developed mathematical model is able to facilitate the determination of the spreading prevalence of the SIR model as well. Then, we strategically add the latent edges to the original networks according to the influence of each latent edge. Note that our strategy incorporates both the information of network structure and spreading dynamics. This study will show that our strategy is effective and outperforms those static approaches, such as adding the latent edge between nodes with the highest degree or eigenvector centrality.
We organize this paper as follows. First, Sec. II describes the information spreading model and our strategy in detail. Then, Sec. III gives the theoretical framework for determining the influence of each latent edge. Further, Sec. IV presents the numerical simulations to verify the effectiveness of our strategy. Finally, Sec. V concludes the paper.

II. MODEL DESCRIPTION
In this study, we consider a discrete-time SIR information spreading dynamics that runs on a complex network G with adjacency matrix A. The number of nodes and edges of G is denoted by N and M , respectively. Generally, each node in this model will be assigned with one of three different states, that is, the susceptible state (S), the informed state (I), or the recovered state (R). Denote the state of node i by ε i ; thus, ε i ∈ {S, I , R}. Initially, all the nodes are set to be in the S state. Then, a small fraction of nodes are selected to be in the I state. For every time step, every node in the I state will inform each of its neighbors in the S state with the transmission probability λ. After the transmission process, each node in the I state will turn to the R state with the recovery probability γ . We refer to β = λ/γ as the effective transmission probability. The spreading dynamics will be terminated once there is no node in the I state, and the fraction ρ of nodes in the R state after the termination of spreading dynamics is referred to as the information spreading prevalence.
According to the evolutionary rules of the SIR information spreading model described in the above paragraph, we can obtain the probabilities of nodes and edges in different states when the dynamics is terminated, for instance, the probability P(ε i = R) of node i being in R state or the joint probability P(ε i = R, ε j = S) of edge (i, j) being in RS state. Our objective is to maximize the spreading prevalence of the information by adding a fraction of latent edges, i.e., the edges that do not exist in the original network G before. To determine which latent edge should be added first, we need a measure to rank the influence of each latent edge.
Consider we add a latent edge (i, j) to the original network G. If the final states of nodes i and j are ε i = ε j = S, then this added edge will make no difference to the information spreading prevalence since both node i and j will still be in the S state and influence no other node. Similarly, if the final states of nodes i and j are supposed to be ε i = ε j = R, then adding an edge between them will barely bring new nodes to the I state because nodes i and j will be informed regardless of whether they are directly connected. Therefore, only when the final states are ε i = R and ε j = S (or ε i = S and ε j = R), the spreading prevalence will be increased by adding an edge between nodes i and j. Take the former situation as an example, that is, the situation when the final states of nodes i and j are ε i = R and ε j = S, respectively. In this case, if we add an edge between nodes i and j, and node i gets informed in the time t 0 , then node i can bring node j into the I state with probability λ in the time t 0 + 1. When it comes to the time t 0 + 2, as a new node in the I state, node j goes ahead to influence its neighbors in the S state. Obviously, if node j has a large expected number of neighbors whose final states are S, then adding the edge (i, j) can bring a large number of new nodes into the I state and increases the final spreading prevalence. Therefore, we only consider node j and its neighbors whose final states are S. For convenience, we refer to node j and its neighbors who have a final state of S as the candidate nodes. Then, the expected number of new informed nodes that come from the candidate nodes after adding the latent edge (i, j) can be calculated as (1) where P(ε r = S|ε j = S) is the conditional probability that node r is in the S state when j is in the S state. Similarly, we can obtain the expected number σ ji when the final states of nodes i and j are ε i = S and ε j = R, respectively. Take both cases of σ ij and σ ji into consideration, we define the influence of latent edge (i, j) as Our approach to effectively promote the information spreading is based on adding the latent edge with the highest influence σ ij . Thus, we refer to our strategy as the latent-edgeinfluence (LEI) strategy. Hereafter, the problem reduces to solving Eq. (2), that is, finding the probabilities of nodes in different states and the conditional probabilities.

III. THEORETICAL ANALYSIS
In this section, we will develop a new theoretical framework to study the discrete-time SIR information spreading dynamics on complex networks. Based on this developed framework, Eq. (2) can be well solved. Inspired by the epidemic link equations (ELE) model proposed by Matamalas et al. [50], we first define a set of discrete-time equations for the probabilities of edges in different states and then solve the equations at the final state. For the sake of simplicity, we denote the joint probabilities The evolution of these denoted joint probabilities depends on each other according to the evolutionary rules of the SIR information spreading model.
For instance, the iteration of II ij (t) sponges on SS ij , SI ij , and IS ij . Specifically, we can obtain the iteration formula of II ij (t) as follows: where q ij (t) represents the probability that node i (in the S sate) is not brought into the I state by any of its neighbors (excluding j). Note that Eq. (3) has taken into account all the possible state changes of nodes i and j. Given the states of nodes i and j at time t + 1 as ε i (t + 1) = ε j (t + 1) = I , the first term of Eq. (3) considers the situation when ε i (t) = ε j (t) = S and both nodes i and j are brought into the state I by their neighbors at time t. Besides, the second term represents that the states of nodes i and j at time t are ε i (t) = I and ε j (t) = S, respectively, and then node i holds its state but node j is brought into the state I by its neighbors. Moreover, the third term accounts for that the state of node i (j) is ε i (t) = S [ε j (t) = I ] at time t and then node i is brought into the state I while node j holds its state. Last, the fourth term considers that nodes i and j are both in the state I at time t and remain in the state I when it comes to time t + 1. Similarly, the iteration formulas of joint probabilities SS ij (t) and RR ij (t) can be obtained as and respectively. Note that for the joint probability XY and XY ji (t) may have different values. That is to say, we should calculate XY ij (t) and XY ji (t) separately for a single edge (i, j) when X = Y . We obtain the expressions of these asymmetric joint probabilities, i.e., SI ij (t), SR ij (t), and IR ij (t) as follows: and In addition, q ij (t) in Eqs. (3)- (8) can be expressed as where h ij (t) = P[ε j (t) = I |ε i (t) = S] stands for the probability that node j is in the I state when node i is in the S state. The conditional probability h ij (t) can be expressed as Iterating Eqs. (3)-(8) from any meaningful initial condition [e.g., SI can give the probability of any possible state of edge (i, j) at the final state. For a network made up of N nodes and M edges, we will have 9M equations in total for determining the probabilities of states of all the edges. We refer to the approach of using the 9M equations to solve the SIR information spreading model as the SIR-edgeequations (SIRee) approach. Denote the final value of XY ij (t) as XY ij . Then, we can obtain the probabilities of node i in R as Thus, the spreading prevalence can be computed as Besides, we can get the conditional probability P(ε r = S|ε j = S) in Eq. (2) as Define short notations for convenience as follows, Substituting Eqs. (11) and (13) back into Eq. (2) yields the following expression of latent edge influence σ ij : Eq. (15) reveals that the influence of each latent edge depends on both the network structure (e.g. the adjacency matrices A) and the spreading dynamics (e.g. λ and γ ). As described in Sec. II, our strategy for promoting the spreading of information is based on the addition of the latent edge with highest influence σ ij iteratively. In order to ensure that we really add the current latent edge with the highest influence, we need to resolve Eqs. (3)-(8) and recalculate Eq. (15) after adding any single edge because the network structure changes after each edge addition.

IV. SIMULATION RESULTS
This section will present extensive numerical simulations on both synthetic and real-world networks to verify the effectiveness of our approach in promoting the information spreading.
To begin with, we test the agreement between our SIR-ee numerical approach proposed in Sec. III and the empirical simulations for the SIR model. Figs. 1 (a) and (b) show the information spreading prevalences predicted by Eq. (12) and obtained by Monte Carlo simulations on two synthetic scale-free (SF) networks G 1 and G 2 , respectively. These two SF networks have the same degree exponent α = 2.3 but different average degrees. Specifically, G 1 has an average degree of k 1 = 5 while G 2 has an average degree of k 2 = 3. More information about these two synthetic networks can be found in Tab. 1. As can be seen, there is a marked agreement between the results of our SIR-ee numerical approach and Monte Carlo simulations in the full range of effective transmission probability β on both the synthetic network we studied. Thus, it is valid to use our SIR-ee approach to determine the global impact of the SIR model. Then, we go further to test the performance of our strategy in promoting the spreading of information on the two synthetic SF networks. As described in Sec. III, our strategy is to add the latent edge L, which has the highest influence σ ij calculated by Eq. (15) iteratively. After the addition of a single edge, we resolve Eqs. (3)-(8) and recalculate Eq. (15) to ensure that we really add the current latent edge with the highest influence. For comparison, we also test three additional strategies. First, we consider the approach to add the latent edge L D , which has the largest degree product f d , that is, the product of the degree of the nodes connected by the latent edge. This strategy is referred to as the degree-product (DP) strategy in the rest of the paper. Similarly, we also consider the strategy to add the latent edge L E , which has the largest eigenvector centrality product f e , that is, the product of the eigenvector centrality of the nodes connected by the latent The Spearman's rank correlation coefficient m s between the theoretical edge ranks scored by strategy LEI (pink solid line), strategy DP (orange dashed line), or strategy ECP (green dotted line) and the numerical edge ranks on the SF networks with average degree (b) k 1 = 5 or (d) k 2 = 3. The corresponding degree exponents of both these two synthetic networks are α = 2.3. More information about these two synthetic networks is presented in Tab. 1. We have set the recovery probability of the SIR model to be γ = 0.5. edge. We refer to this strategy as the eigenvector-centralityproduct (ECP) strategy. Last, we carry out the strategy to add the latent edge L R selected by random and refer to this strategy as the random (RD) strategy. Note that we recalculate all the measures in the three strategies after the addition of any single edge, as in the case of our strategy.
Denoteρ as the incremental spreading prevalence obtained by the SIR-ee numerical approach after adding the selected latent edge. Then we rank all the latent edges according to the values ofρ. We call this kind of edge rank the numerical edge rank r and denote the normalized numerical edge rank as ζ = r/M u , where M u is the number of all the latent edges. Fig. 2 presents the correlations between the theoretical edge ranks scored by different strategies and the numerical edge ranks. Specifically, Figs. 2 (a) and (b) demonstrate that the normalized edge rank of the optimal latent edge L selected by our strategy is close to 1/M u for the full range of effective transmission probability β on both the networks G 1 and G 2 . The results prove that our strategy performs well in finding the optimal latent edge, which is the key step in promoting strategies. However, the normalized edge ranks of the optimal edges L D and L E become large when β is big. Besides, Figs. 2 (c) and (d) also show the Spearman rank correlations m s between the theoretical edge ranks scored by different strategies and the numerical edge ranks, that is, where r l andr l denote the theoretical edge rank and numerical edge rank of edge l, respectively. It can be seen that the Spearman rank correlation between the theoretical edge ranks scored by our strategy and the numerical edge ranks is close to 1 for the full range of effective transmission probability β on both networks. This suggests that our strategy can well predict the overall numerical ranks of the latent edges. However, the Spearman correlation between the theoretical edge ranks scored by the strategy DP or ECP, and the numerical edge ranks are close to 1 only for β of small values. This can be explained by the fact that nodes with a high degree or eigenvector centrality will be informed with a larger probability compared with those nodes with small centralities when β is small. If we add the latent edges between them, then these high-centrality nodes together with their neighbors can form an informed cluster that facilitates the spreading. Thus the DP and ECP strategies perform well in finding the optimal latent edge or predicting the overall numerical ranks when β is small. However, when β becomes large, the globally spreading outbreak occurs; thus, connecting the nodes with high centralities becomes unnecessary, but additional connections to those nodes with low centrality are required for the promoting of the spreading. Therefore, both the DP and ECP strategies fail. Note that random strategy is useless in finding the optimal latent edge or predicting the numerical ranks of the latent edges; thus, we have not included the corresponding results of random strategy here. All in all, Fig. 2 shows strong evidence for the potential superiority of our strategy in promoting the spreading of information. Afterward, Figs. 3 and 4 give intuitive demonstrations of the performance of different strategies on the two synthetic networks from two perspectives. On the one hand, Fig. 3 compares the original spreading prevalence and the spreading prevalence after adding a number of N /2 edges (that is,  increasing the average degree of the network by 1) using different strategies. The results lead to the conclusion that our strategy performs the best in promoting the spreading of information for the full range of the effective transmission probability β on both networks. Meanwhile, the DP and ECP strategies have good performance only when β is small, and the RD strategy performs well only for β of large values. It also should be mentioned that the incremental spreading prevalences are much larger in the more sparse network G 2 after adding the same number of edges by our strategy. That is to say, the effectiveness of our strategy is more obvious in sparse networks, which are common in the real world. On the other hand, Fig. 4 demonstrates that our strategy can bring the fastest full-blown break-out of information. In the numerical simulations, we set the recovery probability to be γ = 0.5 and choose the transmission probability λ such that the original information spreading prevalence is about ρ = 0.8 for both the two synthetic networks, that is, λ = 0.252 and λ = 0.487 for G 1 and G 2 , respectively. It can be observed that our strategy performs the best in increasing the spreading prevalence to ρ = 1 on both networks. Besides, the DP and ECP strategy both perform worse than the RD strategy since the value of effective transmission probabilities β are relatively large on both networks. These results about the three strategies (i.e., DP strategy, ECP strategy, and RD strategy) coincide with the findings we obtained from Fig. 3. Sum up, Figs. 3 and 4 give the direct proofs of the effectiveness and superiority of our strategy.
Finally, we test our strategy on 9 real-world networks: (a) ca-CSphd [51]; (b) 1138-bus [51]; (c) Air traffic control [52]; (d) web-EPA [51]; (e) tech-routers-rf [51]; (f) Physicians [52]; (g) inf-USAir97 [51]; (h) econ-wm1 [51]; and (i) Jazz musicians [52]. Detailed information of these real-world networks is presented in Tab. 1. They cover a wide range of average degree (between 2.035 and 27.697). Note that all the strategies could have high computational complexity for large networks since each edge needs to be traversed to pick out the optimal latent edge. Thus, the node number of all the real-world networks we employed in this manuscript are less than 5000. We plot the incremental spreading prevalenceρ after increasing the average degree by 1 (that is, adding a number of N /2 edges) as a function of the effective transmission probability β in Fig. 5. It can be seen that our strategy leads to the largest incremental spreading prevalenceρ for the full range of effective transmission probability β on all the 9 real-world networks. Besides, the DP and ECP strategies perform better than the random strategy only for β of small values. Moreover, the incremental spreading prevalenceρ is larger in the network with a smaller average degree. The results of these real-world networks are in concordance with the conclusions we draw on the synthetic networks G 1 and G 2 . Besides, it is worth to be mentioned that we can come to the same conclusion regardless of the number of edges we added to these real-world networks.

V. CONCLUSIONS
Promoting the spreading of some typical information (for instance, the vaccination guidance, commercial message, innovation, and political movements) in networked systems can be of both theoretical and practical importance. In this study, we proposed an effective edge-based strategy for promoting the information spreading dynamics on complex networks.
To be specific, we first quantified the potential influence that the addition of each latent edge could cause to the information spreading dynamics by a mathematical model. This mathematical model could also facilitate the determination of the information spreading prevalence. Then, we strategically added the latent edges to the original networks according to the potential influence of each latent edge. Note that previous approaches for promoting the information spreading dynamics on complex networks mostly only consider either the structure of networks or spreading dynamics. However, our strategy incorporates both the information of network structure and spreading dynamics. Extensive numerical simulations verified the effectiveness of our strategy and demonstrated that our strategy outperforms those static approaches, such as adding the latent edge between nodes with the highest degree or eigenvector centrality.
This study provides an effective approach for promoting the information spreading by modifying the network structure slightly and helps to understand what a better network structure for the spreading dynamics is. Besides, the theoretical framework we developed in this study offers inspirations for further investigations on edge-based promoting strategies for other spreading dynamics. DAN  LIMING PAN received the Ph.D. degree from the University of Electronic Science and Technology of China, Chengdu, China, in 2019. He is currently an Assistant Professor with Nanjing Normal University, Nanjing. His current research interests include investigating the spreading mechanisms of information, epidemic, rumor, and associated critical phenomena in complex networks.
WEI WANG received the Ph.D. degree from the University of Electronic Science and Technology of China, Chengdu, China, in 2017. He is currently an Associate Professor with Sichuan University, Chengdu. He has published more than 60 articles in the field of network science and spreading dynamics. His research interests include investigating the spreading mechanisms of information, epidemic, rumor, and associated critical phenomena in complex networks.
TAO ZHOU received the B.S. degree in physics from the University of Science and Technology of China and the Ph.D. degree in physics from the University of Fribourg. He is currently a Professor with the University of Electronic Science and Technology of China. He has published many research articles in prestigious journals. His works have been reported by several academic media outlets such as Nature News, PNAS News, and MIT Technology Review. His main research interests include data mining, network science, and collective dynamics. VOLUME 8, 2020