Evolutionary Divide-and-Conquer Algorithm for Virus Spreading Control Over Networks

The control of virus spreading over complex networks with a limited budget has attracted much attention but remains challenging. This article aims at addressing the combinatorial, discrete resource allocation problems (RAPs) in virus spreading control. To meet the challenges of increasing network scales and improve the solving efficiency, an evolutionary divide-and-conquer algorithm is proposed, namely, a coevolutionary algorithm with network-community-based decomposition (NCD-CEA). It is characterized by the community-based dividing technique and cooperative coevolution conquering thought. First, to reduce the time complexity, NCD-CEA divides a network into multiple communities by a modified community detection method such that the most relevant variables in the solution space are clustered together. The problem and the global swarm are subsequently decomposed into subproblems and subswarms with low-dimensional embeddings. Second, to obtain high-quality solutions, an alternative evolutionary approach is designed by promoting the evolution of subswarms and the global swarm, in turn, with subsolutions evaluated by local fitness functions and global solutions evaluated by a global fitness function. Extensive experiments on different networks show that NCD-CEA has a competitive performance in solving RAPs. This article advances toward controlling virus spreading over large-scale networks.

Evolutionary Divide-and-Conquer Algorithm for Virus Spreading Control Over Networks Tian-Fang Zhao , Student Member, IEEE, Wei-Neng Chen , Senior Member, IEEE, Sam Kwong , Fellow, IEEE, Tian-Long Gu, Hua-Qiang Yuan, Jie Zhang, Member, IEEE, and Jun Zhang , Fellow, IEEE V IRUS spreading, including the pervasion of infec- tious diseases and the diffusion of computer malwares, has caused great panic and a huge economic loss in past centuries [1].Necessary resource interventions have been proved to be effective in virus control.For example, Africa malaria has posed a great threat to the life of people in decades, especially in developing countries, but it has fallen by 40% between 2000 and 2015 due to the wide distribution of preventive resources (insecticide-treated nets) [2].The recent worldwide cyberattack-WannaCry ransomware attack-has affected more than 150 countries and caused billions of economic losses, and anti-malware programs are the key to ensure the security of networking devices.
Since virus spreading is hard to be simulated in the real world due to huge expenditure and high risk of control failure, mathematical simulation and analog control become effective ways to help determine the public policies in the real-world crises.In virus spreading control, relevant studies can be divided into three complementary lines of research.

A. Topology Adaption
Some early studies based on complex network theory suggested that network topologies took significant impacts on spreading dynamics [3], [4].As a result, a fair number of topology adaption strategies were developed, for example, removing certain connections by using topology-manipulative algorithms [3], [5] removing a fraction of nodes with highdegree centrality [5], [6] and isolating the nodes at high infection risk [7]- [10].These strategies could efficiently eradicate virus diffusion to some extent, but they usually ignored the cost of adaption [3], [6], [8] or caused a great loss on network connectivity [3], [5]- [7].

B. Node Intervention
In contrast to the topology adaption, the node intervention emphasized transitioning the node state or reducing the infection rates instead of cutting off contacts in networks.
This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/ The measures to prevent epidemic spread includes vaccinating the healthy nodes by the zero-determinant strategy [11] or convex-based optimization [12]; protecting nodes from infection by distributing timely information [13], [14]; and healing the top-q infected nodes by contact tracing [15] or priority planning [16].The measures to defense malware diffusion includes deploying malware detectors by genetic algorithms [17] and genetic programming [18], and developing anti-malware software by evolutionary computation techniques [19].Besides, some studies focused on auditing anti-malware tools through the evolving android malware [20], [21], which provided new insights for malware defense.Though the above strategies have shown their advantages in the allocation of single-category resources, it is still a difficulty to integrate the allocation of these different resources.

C. Combinatorial Resource Allocation
Recent studies have paved the way for the allocation of two or more resource categories [22]- [23].Convex or quasiconvex optimization strategies, such as geometric programming [22], [24], [25] and gradient-based optimization [23], [26], became very popular.These strategies performed well on the combinatorial allocation of continuous resources.But empirically, the real-world resources in virus spreading control are some concrete goods, tools, or services, which have the discrete attribute.Allocating such discrete resources becomes typical subset selection optimization, which is nondeterministic polynomial hard (NP-hard) [27].The solutions to NP-hard problems cannot be obtained in polynomial time.Therefore, approximate optimization strategies, instead of convex or quasiconvex optimization, become the most promising choice [7].Besides, since nonlinear objectives in virus spreading control have high computing complexity, most existing strategies were investigated in small-scale networks, such as the networks with N = 20 nodes [22], N = 50 nodes [23], or N = 100 nodes [1], [24], [26].To conquer the challenge of large-scale networks, divide-and-conquer policies should be considered, which can effectively reduce the dimensions of the problems.
Therefore, the motivation of this article is to design an approximate optimization strategy with a graph-based divide-and-conquer policy, to solve the combinatorial, discrete resource allocation problems (RAPs) in large-scale networks.
As an efficient approximation optimization policy, evolutionary algorithms (EAs) have demonstrated their advantages in solving traditional scheduling problems, such as the knapsack problems [28], cloud resource scheduling [29], task allocation [30], etc.But so far, there are still few cases of EAs for the scheduling of discrete resources in the networkbased virus control.One challenge lies in that existing EAs are designed for continuous optimization problems which may be trapped in a local optima in dealing with the discrete optimization problems [28], [31].Another challenge is that the fitness functions of virus spreading problems are complicated, nonlinear, and high dimensional.As the network scale increases linearly, the dimensionality of the solution space increases exponentially.In large-scale networks with more than 1000 nodes (large-scale optimization), existing EAs lose their effectiveness and efficiency, and thus there is an urgent need to develop novel EAs suitable for such situations [32], [33].
The major contribution of this article is to propose a coevolutionary algorithm with network-community-based decomposition (NCD-CEA) so that the aforementioned issues can be alleviated.The major highlights are as follows.
1) A network-community-based decomposition strategy is designed to help the divide-and-conquer problem.In the strategy, the community structure of a network is detected by a modified Louvain algorithm (MLA).Compared to the original Louvain algorithm (Louvain) [34], an MLA includes a specific control mechanism to control the number of communities and the size of each community, so that super communities and the mini-communities can be avoided.Then, based on the neighborhood propagation features of the problem and community structure characteristics of the network, the strategy divides the problem into multiple subproblems with low-dimensional solution space.In contrast to the existing spreading control policies, the proposed decomposition strategy can effectively reduce the time complexity and improve solving efficiency.2) An alternative evolution process is designed to coordinate the solving of subproblems and the global problem, corresponding to the local evolution of subswarms and the global evolution of the swarm in coevolutionary algorithms (CEAs).But in existing CEAs, the local fitness evaluations are the same as the global fitness evaluation.If there are intractable objectives in RAPs, the global fitness evaluation becomes especially time consuming.In NCD-CEA, the local fitness functions only refer to the subproblems which have low-dimensional solution space, thereby reducing the execution time of the algorithm.Besides, to avoid trapping into the local optima, we start the evolution of the global swarm after certain intervals to correct and guide the evolution of subswarms.The alternative evolution way can enhance the searching diversity of subswarms and promote global exploration.
3) The proposed NCD-CEA launches a new attempt for solving the combinatorial, discrete RAPs in largescale networks.As far as we know, this is the first attempt for solving virus spreading control problems by CEAs.Extensive experiments on various complex networks show that NCD-CEA produces high-quality solutions to the problems than its competitors.The remainder of this article is organized as follows.Section II introduces preliminary knowledge about the cooperative coevolution (CC) approach and community structure detection.Section III formulates the problems.Section IV elaborates on the proposed NCD-CEA.Section V presents experimental results and discussions.Section VI concludes this article.

II. PRELIMINARY
In this section, we provide the background of the CC approach and community structure detection.

A. Cooperative Coevolution Approach
The CC approach provides a user-specified decomposition possibility to divide the population into multiple interacting subpopulations [35], resulting in cooperative coevolution algorithms (CCEAs).CCEAs take advantage of maintaining the solution diversity due to the parallel evolution of multiple subpopulations.For a full review of CCEAs, refer to [36].We introduce the most relevant studies in the following.
The original CC was used to improve the performance of genetic algorithms (GAs), resulting in cooperative coevolution GAs (CCGAs) [35].The CCGA was proven to be with lower computational costs than the original GA in solving function optimization problems [35].Then, Van Den Bergh and Engelbrecht used the CC approach to improve the particle swarm optimizers (PSOs) in 2004 [37], and called the new algorithm as CPSO-S K .The experimental results showed CPSO-S K performed better than the original PSOs in increasing the solution diversity.Later, Li and Yao [38] developed a new cooperative coevolving particle swarm optimization (CCPSO2) to solve the highdimensional, nonseparable, and many-variable problems.In recent years, an increasing number of CCEAs was developed to solve the large-scale optimization and acquired favorable effect [39].These studies demonstrated that CCEAs had potential in large-scale optimization.Therefore, we consider CCEAs as a good choice for controlling large-scale virus spreading.
Though CCEAs have been receiving much attention for the large-scale optimization, there are still some important challenges unsolved.The first is the problem decomposition and variable linkage learning.Existing nongraph-based CCEAs tend to calculate each element in the solution space independently [35] or calculate the element-cluster gathered by random grouping methods [38] and differential grouping methods [40].These CCEAs have been tested to be effective and efficient in CEC benchmark sets, but they lost their effectiveness and efficiency in solving network-based optimization problems.That is because variables in networkbased optimization problems are not independent of each other but influenced by their neighboring variables, with the neighboring relationships closely connected to network topologies.The second challenge lies in the cooperation way among subpopulations and the fitness evaluation way.After dividing the solution space, how to effectively coordinate multiple interacting subswarms so that the high-quality solutions can be obtained becomes the focus [38].However, existing fitness evaluation ways take high time and space expenditure for maintaining frequent cooperation, especially in solving large-scale optimization problems.To address the above two challenges, we explore a novel swarm decomposition method and a new fitness evaluation way in this article which is enlightened by the technique of community structure detection.

B. Community Structure Detection
Community structure is very common in the real-world networks, such as population contact networks, social networks, and Internet [41].It is manifested in dense connections among nodes in same communities and sparse connections among nodes in different communities [42].Community structure characterizes the topology structure of the network and helps understand how the substructures affect each other [43].So far, Newman-Girvan's modularity [44] is one of the most popular criteria in the community structure detection, which evaluates how well the network is divided into communities.Concretely, the modularity function, marked as Q, is defined as the difference between the number of edges within communities and the expected number of randomly distributed edges between communities.Given an undirected and unweighted network with N nodes (identified by their number) and m edges, which can be divided into L communities, let C i represent the ith community, i = 1, . . ., L. The modularity is calculated by where E i represents the number of edges existing in C i , and K i is the sum of degree of nodes inside the community C i .
A larger value of Q corresponds to the better community structures.By using the modularity function, we test the classical community structure detection algorithms by conducting preliminary experiments on complex networks.The test algorithms include: the clique percolation method (CPM) [45], the expectation-maximization (EM) algorithm [46], the Girvan-Newman (GN) algorithm [47], the Louvain method (Louvain) [34], the Lancichinetti-Fortunato method (LFM) [48], the label propagation algorithm (LPA) [49], and the structural clustering algorithm for networks (SCANs) [50].Finally, we have found that the Louvain algorithm outperforms other algorithms in maximizing modularity.
The Louvain algorithm is a two-phase algorithm [34].It starts with an undirected and unweighted network, and is followed by two phases of operation.The first phase is to divide the network into communities by maximizing the gain of the modularity.The second phase is to build a new network by transforming the communities into nodes, and redefining the multiple edges between two communities as a new weighted edge which takes the sum of weights of former edges as the new weight.The two phases will not stop until there is no more gain of modularity ( Q j ).Q j is calculated by using where j,in is the sum of weights of edges within the community C j .j,tot is the sum of the weights of the weighted edges connected to nodes in C j .K i,j is the sum of the weights of the edges which connect the node i and the nodes in C j .K i is the sum of weights of edges linked to node i. m is the sum of the weights of all the edges in the new network.The Louvain algorithm is a stochastic algorithm with a greedy searching strategy, in other words, the results of community structure detection are affected by the sequence of the nodes.In the following, we do not change this greedy strategy in the original Louvain algorithm, but add some extra functions to control the community number and the size of each community.

III. PROBLEM FORMULATION
In this section, we introduce the models for virus spreading dynamics and resource allocation.Based on the models, two RAPs in virus spreading control are formulated.

A. Virus Spreading Model
The most commonly used spreading models are susceptibleinfected (SI) models [51], [52]; SI-susceptible (SIS) models [30]; SI-recovery (SIR) models [14], [53]; and SIR-susceptible (SIRS) models [54].As a variant of SIRS models, the susceptible-exposed-infected-vigilant (SEIV) model takes advantages on three aspects [22]: 1) it further considers the exposed state E, referring to the asymptomatic state of biological virus infection or the undetectable state of computer virus; 2) it replaces the recovery state R in SIR and SIRS models by the vigilant state V that covers more immune situations such as the information-based immunization; and 3) it can be generalized to SI, SIS, SIR, and SIRS models by adjusting parameters.Therefore, we base this article on the SEIV model.
A general view of the SEIV model is shown in Fig. 1, with N representing the total number of nodes in the network.The capital letters S, E, I, and V represent the susceptible, exposed, infected, and vigilant states, respectively, which are marked with different colors.The states S and V are two healthy states, and the states E and I are two infectious states which can spread the virus.More details about parameters, equations, and optimization objectives in the SEIV model are shown in Table I.By following the equations of state transition process and node infection process in Table I, the spreading dynamics, referring to the state probabilities of all nodes, can be gained.Based on the two optimization objectives of the SEIV model, two optimization problems are formulated in this article.

B. Resource Allocation Model
Generally, control measures to suppress virus spreading correspond to the allocation of specific resources [22].If we consider biological viruses and computer viruses, respectively, their control measures can be listed as follows.As to biological viruses, control measures include rapid vaccination of uninfected population [12], [55]; isolating the individuals at high infection risks [56]; reducing contacts between individuals [3]; etc.The defense measures of computer viruses are similar to those of biological viruses [57], for instance, allocating detectors to prevent malwares [17], [18]; immunizing nodes [58]; blocking network topology [59]; recovering the attacked systems; enhancing the anti-malware tools [19]- [21]; etc.As a whole, we classify the control resources into three groups: 1) the infection-free resource r 1 (e.g., biological vaccines and system bug fixes), which helps the node i transition from state S to state V by adjusting parameter θ i ; 2) the infection-prevented resource r 2 (e.g., antivirus masks, virus warning messages), which hinders the transition of the node i from state S to state V by adjusting the parameters β E i and β I i ; and 3) the infection-removed resource r 3 (e.g., hospital isolation wards, curative medicines, and antivirus programs), which helps the node i transition from state I to state V by adjusting the parameter δ i .The allocation model of resources is formulated in Table II.
To simplify the resource allocation process, we assume that when an individual is allocated with some specific resources, its basic attributes (infection rate, recovery rate, immunization rate, etc.) will be consistently changed.For example, when a healthy individual is allocated with the medical drug, it can recover from the infection quickly whenever it is infected.This may cause some resource waste in simulation experiments, but will not affect the practical effects.In reality, individual states are not probabilistic but definite, so the individual state will be binarized according to the corresponding probabilities with detailed operation process shown in [60].

C. Two Optimization Problems
Based on the two optimization objectives in the SEIV model, two RAPs in virus spreading control are formulated as follows.
The RAP with an easy objective (Easy-RAP), which aims to minimize the average infection rate of nodes with limited resource cost, is formulated by The RAP with a hard objective (Hard-RAP), which aims to minimize the maximum real part of eigenvalues of matrix L with limited resource cost, is formulated by min .λ(L ) In both the problems, the optimization vector is the resource allocation matrix R = [τ ri |τ ri ∈ {0, 1}], r = 1, 2, 3, i = 1, . . ., N. Each variable τ ri in the matrix represents the allocation situation of the resource r to the node i, as described in Table II.Note that the optimization objectives of the two problems are the fitness functions of EAs in this article and the limited resource budgets are the constraint conditions.The solving procedure is as follows.A virus spread environment is first built based on a network topology and the SEIV model is shown in Table I.Then, based on the qualified solutions provided by EAs, the resources are allocated to the nodes accordingly.These resources take effect by the function Effect(R) in Table II, namely, changing the parameters of the SEIV model.Finally, the fitness values are calculated by averaging over the infection rate of each node (for Easy-RAP) or taking the maximum real part of eigenvalues of the matrix L (for Hard-RAP).In summary, the optimization problems combine the resource allocation model and the SEIV model.The total resource cost is related to the state probabilities in virus spreading dynamics, and the resource effect functions are embodied in the two objectives of the optimization problems.

IV. ALGORITHM FRAMEWORK
In this section, the proposed NCD-CEA is introduced.It can be used to solve not only virus spreading control problems but also other network-based optimization problems.

A. General Framework
From a systematic perspective, NCD-CEA is an evolutionary divide-and-conquer control policy for virus control.From the perspective of algorithm design, it is a general framework, in which the basic optimizer can be customized.In our preliminary experiments, we find that a majority-voting binary PSO (termed as MVBPSO) [61] can produce the best solutions.Besides, MVBPSO is simply implemented and is very robust to the change of resource cost.Therefore, MVBPSO is selected as the basic optimizer in NCD-CEA.The major principle of MVBPSO is introduced as follows.Given a swarm with N s particles, for the kth particle, its position is represented by ] denotes its local best position.Gbest = [gbest (r,c) ] represents the global best solution of the whole swarm.Then, the position X k is updated according to the majority voting of two variants of Pbest k and Gbest.These variants can effectively avoid early maturing, which are generated as follows.We will sample n random elements from Gbest and Pbest k , and store their indices (r, c) into V G k and V P k , respectively.The updating rules of MVBPSO are formulated by where random(0, 1) generates a random number within the range of (0, 1).Sampling(Gbest, n) is the sampling function, where n is obtained by rounding down the Euclidean distance between Gbest and X k , namely, ] and XG = [xg (r,c) ] are the variants of Pbest k and Gbest, respectively.
The framework of NCD-CEA is shown in Fig. 2, in which the parameters of NCD-CEA are defined as follows.flag is a self-adaptive parameter which decides the alternating of subswarm evolution or swarm evolution.The parameter NC denotes the number of subnetworks/communities (also the number of subswarms).The parameter n in represents the The first phase (aggregate the two smallest community-nodes): 5 Select the two community-nodes with smallest size.6 Merge the two community-nodes to build a new community.7 The second phase (build a new community-node): 8 Transform the community found in the first phase into a new community-node, whose size is the number of nodes in the community, and whose weight is the sum of the weights of edges inside the community.9 The new edge denotes the links between communities, whose weight is the sum of the weights of edges between nodes in the two communities.10 until the number of communities is not larger than NC.11 Dichotomy Function: Select the community of largest size.14 Build a new graph based on the largest community.15 Divide the graph into communities by the Louvain algorithm.16 Merge the communities by Aggregation Function until there are 2 communities in the graph.17 until the number of communities is not smaller than NC 18 return the independent communities generated from G N iteration times of each flag turn.Iters defines the terminal condition of the algorithm, that is, the execution of the algorithm is terminated when the algorithm runs for Iters iterations.The pseudocode of NCD-CEA is shown in Algorithm 1, There are three main processes.
1) Initialization: First, a population of solutions are randomly generated within the scope of {0, 1} (3,N) .If a randomly generated solution fails to satisfy the constraint, a repair mechanism will be applied to generate qualified solutions.The repair mechanism works by repeatedly removing some resources from an unqualified solution, namely, changing the corresponding bits from 1 to 0, until the constraint is satisfied.To accelerate the repairing process, we randomly remove two resources at each repairing step.2) Decomposition: The network is decomposed into NC communities by Algorithm 2.Then, the global swarm can be decomposed to NC subswarms according to the mapping between the network and communities.Each subswarm solves an independent subproblem, which is extracted from the global problem.To achieve this, we first build an independent subnetwork from the community.Then, the local fitness functions are built on the subnetwork, whose calculation is the same as the global fitness function, and the constraint cost C is decomposed into NC parts according to the ratios between nodes in subnetworks and nodes in the network.Due to the good locality of the problem, such kind of decomposition is feasible.More details about the decomposition processes are introduced in Section IV-B.3) Conquering: During the evolution of subswarms, each subswarm generates a subsolution to the corresponding subproblem.The NC subsolutions generated by different subswarms form a complete solution to the entire problem.In contrast, the evolution of the global swarm directly generates a complete solution to the problem, but it takes a much longer time than the evolution of subswarms.Similar to the initialization process, a complete solution may not satisfy the constraint, thus we also need to use the repair mechanism to make the complete solution become feasible.In the proposed NCD-CEA, the evolution of subswarms and that of the global swarm work cooperatively, which will be introduced in detail in Section IV-C.

B. Network-Community-Based Decomposition
An MLA is presented in Algorithm 2 to divide the network into communities.Compared to the original Louvain algorithm, MLA includes two additional control mechanisms: 1) The Community Number Control: The number of communities in the original Louvain is uncertain due to the greedy strategy.Too many communities in the network decrease the influence of subsolutions, while too few communities increase the computational cost.Therefore, effective control to the number of communities is required.
2) The Community Size Control: The original Louvain algorithm may generate communities with very uneven sizes.The super communities become the bottleneck of the algorithm and slow down the algorithm execution time, while the mini-communities waste the machine resources.In fact, the community size and the community number depend on each other.Therefore, a specific control mechanism is designed to exert both kinds of control implemented by aggregation function and dichotomy function in Algorithm 2. We use the aggregation function to reduce the number of communities by aggregating the two smallest communities, dichotomy function to increase the number of communities by dimidiating the super communities by following the principle of modularity maximization.
After the community structure detection, each community is formulated as an independent undirected-and-unweighted subgraph, which serves as background environments of the subproblems.Synchronously, the global swarm is decomposed into multiple subswarms, where each subswarm inherits a proportion of information of the global swarm.

C. Alternative Evolution Process
In our preliminary experiments, we found that the global swarm evolution promotes the exploitation, but the global fitness evaluation is very time consuming especially in large scale networks.Though the fitness evaluation and the evolution of subswarms were time saving, it might be trapped into local optima due to the loss of global evaluations.To balance the exploitation and exploration, an alternative evolution process is designed in NCE-CEA.
Details about the alternative evolution process are embodied in Algorithm 1, which can be described as follows.We use a self-adaptive parameter flag ∈ {0, 1} to determine the specific evolution way in the next generation.If flag = 1, the offspring is generated by the cooperation of subswarms, otherwise it is generated by the evolution of the global swarm.Only when Gbest is not updated, flag is updated to 1−flag.We set flag = 1 by default.Whatever the turn is the subswarms or global swarm, they will evolve for n in times to fully explore the located solution space.Multiple subsolutions provided by the corresponding subswarms will constitute a complete global solution that can be used to update Gbest.
The two evolution processes play different roles in the algorithm: 1) the evolution of subswarms enhances the local searching, for the variables in each subswarm are closely connected to each other, which facilitate the local fitness evaluations and 2) the evolution of the global swarm promotes the global searching and improves the solution diversity by learning from its own best experience and the entire swarm's best experience.Therefore, to some extent, the evolution of subswarms acts as the early accelerator and the evolution of the global swarm acts as an engine of jumping out of local optima.

V. EXPERIMENTS
In this section, we first provide the parameter configuration used in the experiments.Then, we verify the competitive performance of NCD-CEA by comparing it with different variants of BPSO and some state-of-the-art EAs.The parameter sensitivity analysis is provided later.Finally, the model stability analysis shows that NCD-CEA can effectively prevent virus spread.

A. Experimental Configuration
Network Configuration: Detailed information about the networks is shown in Table SI in the supplementary material, including nine networks with different characteristics and size.All the artificial networks, respectively, regular (RG) networks [60], Barabasi-Albert (BA) scalefree networks [62], and Watts-Strogatz (WS) small world networks [63], are generated by the Python-Networkx package under the environment of Anaconda 3. The degree of nodes in RG networks is fixed, thereby RG networks have ordered structure but poor scalability.The degree of nodes in BA networks presents power-law distributions due to the preferential attachment, thereby BA networks are robust to random failures but weak to deliberate attacks due to such scale-free property [62].WS networks are generated by first initializing a neighbor coupled network with fixed node degree, and then randomly rewiring each edge by a probability p. WS networks are characterized by RG networks and random networks.The real networks are two sets of human contact network data and a computer network data, namely, Primary School Temporal Network Data in BMC Infectious Diseases 2014 (Ps-contact)1 [64], Face-to-Face Behavior Data of People during the Exhibition Infectious in 2009 at the Science Gallery in Dublin (Ex-contact)2 [65], and DNC emails corecipients network in the 2016 Democratic National Committee email leak (Email)3 [66].
Problems Configuration: Before allocating the resources, we first build a virtual virus spread scenario by evolving the SEIV model for several times with the time interval called as resource intervention time t.Originally, two fixed nodes act as the source of infection (state E).At time t, there has been a proportion of nodes are infected, shown by p E + p I .In the right part of Table SI in the supplementary material, we provide the optimal solutions for Hard-EAP and Easy-EAP based on the assumption of infinite resource budget, namely, the lower bound of u and lower bound of λ.The values of the optimal solutions differ in the network environment and spread dynamically.The lower bound of λ is very close but not equal to −0.01.In reality, the resource budget cannot be infinite, so we set it as C = 0.3C max in the following experiments, where C max representing the upper bound of resource cost when resources r 1 , r 2 , and r 3 are allocated to all nodes in the network.
Comparison Algorithms: The proposed algorithm is compared with multiple versions of BPSO [67] and some stateof-the-art EAs.The parameters of all algorithms used in the comparison have been carefully adjusted to the best.The detailed description and parameter configurations for the comparison algorithms are shown in Table SII in the supplementary material.The BPSO variants include BPSOS2, BPSOS3, BPSOS4, BPSOV1, BPSOV2, and BPSOV3.Thereinto, the classical particle swarm optimization (BPSO or BPSOS1) can be represented by where w is the inertia coefficient, c 1 and c 2 are two acceleration coefficients, and r 1 and r 2 are two random parameters uniformly distributed within [0, 1].The discretization method is in BPSO is a standard sigmoid function, called as "S1(•)."Besides, there are still other S-shape and V-shape discretization methods can replace "S1(•)," which are listed as follows: Table SIII in the supplementary material shows that MVBPSO is the best algorithm in above BPSO variants, so we only take MVBPSO as the comparison algorithm in the following.Besides, some popular EAs and some state-of-the-art algorithms are compared.Two GAs are included: 1) the GAs with elitist expected value selection, partially matched crossover, and bit-reverse mutation (called as GA1) [68] and 2) the GA with tournament selection, basic crossover rate pc, and basic mutation rate pm (called as GA2) [69].The above GAs have been successfully used in solving malware prevention problems [19], [20].Some state-of-the-art EAs include: ant colony system for solving 0/1 knapsack problems (ACS) [70], self-adaptive differential evolution with neighborhood search (SaNSDE) [71], the competitive swarm optimizer (CSO) [72], and the level-based learning swarm optimizer (LLSO) [32].Besides, the comprehensive learning PSO (CLPSO) [73] proposed by Liang et al. is also investigated.
Algorithm Configuration: The NCD-CEA and the algorithms used in comparison share the same initialization and solution repairing methods.The final solutions are obtained by averaging over some independent runs of algorithms called as Runs.In each run, their basic settings also stay the same.For simplicity, we use a fixed swarm size Ns = 20 for all experiments.In small-scale networks (RG100, BA100, and WS100), we set Iters = 500 and Runs = 50.In larger-scale networks (WS500, WS1000, Ps-contact, Ex-contact, and Email), we set Iters = 1000 and Runs = 30.
In addition, NCD-CEA and the comparison algorithms are implemented by Python language.All the experiments are executed on a PC with 8 Intel Core i5-3470 3.20-GHz CPUs, 8-Gb memory.The software platform is Anaconda3 for Ubuntu 12.04 LTS 64-b system.

B. Parameter Analysis
In NCD-CEA, two parameters are introduced: 1) the number of subswarms (equal to the number of communities NC) in the strategy of network-community-based decomposition and 2) the local iteration times (n in ) in the alternative evolution process.NCD-CEA with higher value of NC can be used to deal with higher dimensional optimization problems for the parallel evolution of the subswarms that contributes a lot to the speedup of computational time in NCD-CEA.But due to the limitations of computational resources, we set the maximum value of NC only to 8 in this article.Finally, we set NC ∈ {1, 2, 4, 8} and n in ∈ {2, 5, 10, 20} to test the algorithm performance.
The experimental results are shown in Tables SIV and SV in the supplementary material.The best parameter settings depend on the network topologies and differ on the optimization objectives.Overall, for solving Easy-EAP, NCD-CEA is not sensitive to parameter NC and a large value of parameter n in (n in = 10, n in = 20) can produce better solutions as the network scale increases.For solving Hard-EAP, the higher values of n in contribute to better solutions (see Fig. 3).The influence of NC depends on network topology.In smallscale networks (N = 100), the original MVBPSO (equal to NCD-CEA with NC = 1) is mostly recommended, because the local evolution of subswarms contribute less to the global optimization.While in networks with a larger scale (N > 100), there is a wider solution space, and the global evolution cannot locate the better solution quickly.In this situation, the local search conducted by the separable subswarm can accelerate the updating process.Meanwhile, since the subsolutions are evaluated by local fitness functions which consume less time than the global fitness evaluation, especially when the objective calculation is computationally complex.
It is worth mentioning that a fixed decomposition operation in NCD-CEA is better than the dynamic and repetitive network decomposition, according to our preliminary experimental results.Therefore, we keep the initially generated communities unchanged in the later stage.Moreover, for the networks with large scale, NCD-CEA with a large NC will further reduce the time complexity due to the lower cost of fitness evaluation, and it may produce better solutions, for example, solutions referring to NCD-CEA with NC = 8 in Ps-contact and Ex-contact networks are better than the those about NC = 4 (see Table SV).But to verify the competitiveness of NCD-CEA, we set NC = 4 for solving the two problems in all networks.

C. Comparison Experiments
In this section, we compare NCD-CEA with different EAs.Table III shows the comparison results among NCD-CEA and eight popular EAs for solving Easy-EAP and Hard-EAP in networks with N < 500.Thereinto, we find the top-three algorithms are NCD-CEA, MVBPSO, and CLPSO.Comparison results for the three algorithms are shown in Table IV, with the network size N ≥ 500.
For effective comparisons, the Kruskal-Wallis nonparametric statistical test [74] is first used for multiple comparisons, followed by pairwise comparisons using the Wilcoxon ranksum test [75], and the critical values are finally corrected using Holm's method [76].The null hypothesis (H0) assumes that there is no significant difference between the compared objects.For the Kruskal-Wallis test, if the p-value is smaller than 0.05, then the null hypothesis is refused, namely, the algorithms provide significantly different solutions.For the Wilcoxon rank-sum test, we choose the algorithm with the best "mean" value as the control group, and others as the experimental group by turn.When the p-value is larger than the corrected critical value, the solution of the experimental group  is regarded as being as good as the one in the control group marked by a superscript "=".As a corrected version of the Bonferroni method, Holm's method is used to generate the corrected critical values.As a corrected version of the Bonferroni method, it is less strict and becomes easier to detect significant differences.
Table IV shows the experimental results in networks with N < 500.For Easy-EAP which has a linearly separable object, NCD-CEA wins the first place according to the results obtained by averaging over multiple independent runs.The p-value obtained by the Kruskal-Wallis test shows that there is a significant difference among all comparison algorithms.The p-values of Wilcoxon rank-sum test demonstrate that NCD-CEA is significantly distinguished from other algorithms.Since NCD-CEA has best values of "mean", "best", and "std" in most networks, its first place in solving Easy-EAP is solid.For Hard-EAP which has a nonlinearly nonseparable objective, NCD-CEA defeats MVBPSO and other algorithms in three of five networks (BA100, Ps-contact, and Ex-contact), and has equal solution quality with MVBPSO in the RG100 network, and lose out to MVBPSO in the WS100 network.For both the problems, NCD-CEA, MVBPSO, and CLPSO-S1 defeat most of the other algorithms and win the first three places.Then, the places are SaNSDE-V1 > LLSO-V3 > CSO-S1 > (GA1, GA2).ACS ranks the last for Easy-EAP, but it defeats LLSO-V3, CSO-S1, GA1, and GA2 in four of five networks for Hard-EAP.
Table IV shows the experimental results in networks with N ≥ 500.Only the solutions provided by the first three algorithms (NCD-CEA, MVBPSO, and CLPSO) are shown.In these networks with larger size, NCD-CEA gives the best solutions in all the three network and both the problems.Results of Kruskal-Wallis test show the differences among the three algorithms.Results of Wilcoxon rank-sum test show that there is a significant difference between NCD-CEA and other algorithms in solving Easy-EAP and Hard-EAP.In other words, the leading performance of NCD-CEA is not accidental.
The above observations produce a crucial conclusion: NCD-CEA can produce the highest quality solutions in all comparison algorithms.It is naturally suitable for solving the problems with the linearly separable object and large-scale discrete optimization due to its divide-and-conquer strategy.
Convergence Comparisons: The convergence behavior of several representative algorithms in solving Easy-EAP (Fig. S1) and Hard-EAP (Fig. S2) in the supplementary material.From the algorithm comparison results in Figs.S1(a)-(b) and S2(a)-(b), three conclusions can be drawn: 1) NCD-CEA is able to preserve the highest quality solutions and competitive convergence performance than its competing algorithms.It conserves this superiority and stable performance in most of the networks; 2) MVBPSO and CLSPO are second to NCD-CEA, which have good ability to jump out of local optima, but the convergence speed of them is still lower than NCD-CEA on most networks; and 3) for Hard-EAP, ACS can produce good solutions at early stage due, but it becomes easy to be trapped into local optima at the later stage [see Fig. S2(a) and (b) in the supplementary material].This phenomenon is caused by the introduction of heuristic information and the utilization of an aggressive pseudorandom proportion selection rule in ACS.This conclusion is consistent with previous studies [77], [78].
In Figs.S1(c)-(e) and S2(c)-(e) in the supplementary material, we depict the convergence trends of the first three algorithms, with the original lines with square points to depict the trends of NCD-CEA.In the network with larger scales, such as WS500, Email, and WS1000 networks, MVBPSO and CLPSO significantly lose their competitiveness.Time Comparisons: To validate the competitive efficiency of NCD-CEA, we compare the execution time of NCD-CEA and the comparison algorithms.Results are obtained by averaging over all independent runs, where each execution process is not disturbed by other programs to make sure the execution time is accurate.The platform and programming language keep the same for all comparison algorithms, which have been introduced in Section V-A.All results are shown in "Average time" fields in Tables III and IV.Since the elapsed time of algorithms in RG100, BA100, and WS100 are almost equal, we only showed the average time for WS100, Ps-contact, Excontact, WS500, Email, and WS1000 networks.There are two statistical results.
1) As shown in Table III, NCD-CEA elapses the least time for solving Hard-EAP happening in all the three networks, and wins over other algorithms in two of three networks for solving Easy-EAP.In Table IV, as the network scale increases from 500 to 1000, the advantages of NCD-CEA in reducing the time complexity become more and more clear.2) In small networks, the time costs of NCD-CEA for Easy-EAP and Hard-EAP are close to each other, but as the network size increases, the elapsed time of Hard-EAP is increased exponentially, while the elapsed time of Easy-EAP is increased linearly.The solution space of both Easy-EAP and Hard-EAP is O(2 N ), with N representing the network size.As shown in Fig. 4, different from Easy-EAP whose objective can be calculated in polynomial time, the solving time complexity of the objectives of Hard-EAPs is O(N 2 ).It illustrates that the hard objective of Hard-EAP is still a bottleneck for the application of algorithms on the super-large-scale networks with tens of thousands of nodes.Solving such a bottleneck needs further investigations in the effective methods for computing maximum eigenvalues.
To conclude, compared to most EAs, NCD-CEA can provide higher quality solutions using less computational cost.Those advantages are especially significant for the problems happening in large-scale networks.

D. Effectiveness in Virus Control
The final goal of this article is to prevent virus spread.Existing studies [79] have provided both the theoretical and experimental evidences that as long as the algorithm takes priority in comparison experiments, it would take advantage of epidemic control.Since NCD-CEA performs better than the other algorithms in comparison experiments, it should perform best in actual virus control theoretically.
To verify this assumption, we investigate the virus spread dynamically after resource intervention in three practical networks (Ps-contact, Ex-contact, and Email).Three algorithms with good performance are compared: 1) CLPSO; 2) MVBPSO; and 3) NCD-CEA, taking λ as the solution evaluation criterion.Besides, we also consider a random solution (called as "Random") and an empty solution (called as "None").A population of feasible solutions is generated through the initialization process of NCD-CEA, and then the best one in them is selected as the random solution.The none solution represents the situation of no resource intervention.As the solutions determine the resource allocation, the variable of SEIV model in the network will be changed.Then, we evolve the SEIV model for 300 times to observe the virus spread dynamically.The experimental results are obtained by averaging over 20 independent runs.As is shown in Fig. 5, the number of infectious nodes is at a high level if without any resource intervention.The resource distribution, even with a random solution, can significantly decrease the number of infectious nodes.The higher quality is the solution, the better is the virus prevention effect.According to Tables III and IV, for Hard-EAP, NCD-CEA performs best in minimizing the optimization objective.Accordingly, the algorithm which leads to the fewest infectious individuals is NCD-CEA, followed by MVBPSO and CLPSO.
It can be observed from the experimental results that a better solution with a smaller λ can usually achieve a better control effect.Since NCD-CEA outperforms the other algorithms used in terms of minimizing λ, it also has the best performance in virus control.

VI. CONCLUSION
In this article, we consider the discrete resources to imitate the real-world resources.The virus spreading control, therefore, becomes typical subset selection optimization problems which are nondifferentiable, nonconvex, and NP-hard.To solve the problems, we propose an evolutionary divide-andconquer algorithm (namely, NCD-CEA), in which a networkcommunity-based decomposition strategy and an alternative evolution process are designed to improve its efficiency and produce high-quality solutions.
The major topic of this article is dividing and conquering the virus spreading control problems, so we only focus on the theoretical framework of the algorithm.There are still many aspects waited to undergo.First, as a general algorithm framework, NCD-CEA utilizes community detection techniques to divide the problem and the solution space.But if the network is highly coupled, namely, the community structure is not significant, NCD-CEA will devolve into its basic optimizer inside and lose the superiority.Second, the default optimizer of NCD-CEA is originally designed for discrete optimization.Facing continuous optimization, who is the most suitable optimizer is not tested and discussed in this article.
In the future, we hope to explore a more collaborative way in coordinating the evolution of subswarms, to adapt the highly coupled networks.Thereafter, the basic continuous optimizers for NCD-CEA will be explored to widen its application fields.

Algorithm 2 MLA
Input: the network G N , the adjacency matrix A, the number of communities NC.Output: communities 1 Divide G N into communities by the Louvain algorithm.2 Aggregation Function: 3 repeat 4

Fig. 3 .
Fig. 3. Convergence speed of NCD-CEA with different n in .

Fig. 4 .
Fig. 4. Time comparison of NCD-CEA for Easy-EAP and Hard-EAP happening in different size of networks.

TABLE I MAIN
PARAMETERS AND EQUATIONS OF THE SEIV MODEL TABLE II MAIN PARAMETERS AND FUNCTIONS OF THE RESOURCE ALLOCATION MODEL

the terminal condition is satisfied. 22 return Gbest.
Algorithm 1 NCD-CEA Algorithm parameters: swarm size N s , parameter n ic , local iteration times n in , number of communities NC, Problem parameters: the network G N , the cost constraint C.

TABLE III ALGORITHM
COMPARISON ON SMALL-SCALE NETWORKS WITH N < 500.(a) FOR EASY-EAP.(b) FOR HARD-EAP.

TABLE IV ALGORITHM
COMPARISON ON LARGER-SCALE NETWORKS WITH N >= 500