Research on Dynamic Community Detection Method Based on an Improved Pity Beetle Algorithm

In the last decade, community detection in dynamic networks has received increasing attention, because it can not only uncover the community structure of the network at any time but also reveal the regularity of dynamic networks evolution. Although methods based on the framework of evolutionary clustering are promising for dynamic community detection, there is still room for further improvement in the snapshot quality and the temporal cost. In this study, a dynamic community detection algorithm based on optional pathway guide pity beetle algorithm (DYN-OPGPBA), which is a novel dynamic community detection method based on the framework of evolutionary clustering, is proposed. We propose an improved PBA for community detection of the network at the first time step, including a discrete search strategy based on adjacent nodes, a closeness-based community modification strategy and a crowded community split strategy. Compared with many representative static community detection methods, the proposed method has some superior detection accuracy. A neighbour vector competition-based individual update strategy and an external population size restriction mechanism are also proposed for community detection at subsequent time steps. Results show that DYN-OPGPBA has a better balance between snapshot quality and temporal cost than two representative dynamic community detection methods.


I. INTRODUCTION
Complex networks exist widely in many fields, including biological science, computer science, social science, and computer science, etc [1]. Community structure is one of the most important characteristics of complex networks, which can be intuitively regarded as a group of nodes with dense intraconnections and sparse interconnections. Many networks derived from the real world are dynamic in nature. For instance, some members might join or leave a team, or a team could even dissolve. To further utilize, transform and predict networks, the dynamics of the community structure in dynamic networks, called dynamic community detection, need to be traced. In recent decades, we have experienced increasing demand for dynamic community detection from a wide range of real-world applications, such as analyzing The associate editor coordinating the review of this manuscript and approving it for publication was Feiqi Deng . and observing changes in news focus from news headlines, analyzing and discovering stock trends from the stock market, and analyzing and discovering community development trends from the blog space.
In general, dynamic networks can be represented as a sequence of snapshots of static networks at each time step. Dynamic community detection aims to accurately uncover the community structure of the snapshot at each time step and reveal the regularity of dynamic networks' evolution. Dynamic community detection algorithms can be roughly divided into two categories according to the methods used.
The first category is improved traditional community detection algorithms in static networks directly employed to dynamic community detection. For instance, Bilal S et al proposed a community detection algorithm based on evolutionary algorithm and modularity [2]. In this approach they use an evolutionary algorithm to find the first community structure that maximizes the modularity. Finally, find the community structure with the highest value of modularity through merging communities. H.Papadakis et al proposed a standing for synthetic coordinate community detection (named SCCD) [3], which finds the entire community structure of a network on the basis of local interactions between neighboring nodes and an unsupervised distributed hierarchical clustering algorithm. Zhang et al proposed seed expansion with generative adversarial learning (SEAL) [4], a framework for learning heuristics for community detection. Marya et al proposed a community detection method for temporal multilayer networks [5]. Mahsa S et al introduced a new Louvain-based dynamic community detection algorithm that relied on the derived knowledge of the previous steps of network evolution [6]. Kamal B et al proposed a local community detection algorithm based on the detection and expansion of core nodes [7], text-associated DeepWalkspectral clustering (TADW-SC) [8] and attributed spectral clustering (ASC) [9].
The second category is the algorithms based on evolutionary algorithms. It starts from the concept of temporal smoothness of network evolution proposed by Chakrabarti et al., which further revealed the characteristic regularity of network changes [10]. On the basis of the temporal smoothness, Pizzuti et al. proposed a dynamic community detection method based on evolutionary clustering, which not only improves the detection efficiency but is also more in line with actual network changes [11]. This method mainly contains two steps. Firstly, the snapshot quality of the static network is optimized at the first time step. Secondly, the snapshot quality and temporal cost are optimized simultaneously at the subsequent time steps. Mathematically, the above two steps can be essentially viewed as a single-objective optimization and multi-objective optimization problems, respectively, which are rated at NP-hard. In the past few decades, many experiments confirmed that swarm intelligence evolutionary algorithms are the most competitive and effective methods for solving optimization problems. Therefore, since then, a relevant number of dynamic community detection approaches based on the framework of evolutionary clustering have been proposed, which adopt EAs as optimization strategies, such as genetic algorithm (GA), particle swarm algorithm (PSO), and differential evolutionary algorithm (DE). For instant, a consensus community-based particle swarm optimization for dynamic community detection (CCPSO) proposed by Zeng et al [12], FaceNet [13], DYN-MOGA [14], DYN-DMLS [15], MOEA-SA [16], DYN-MODPSO [17], L-DMGAPSO [18] and ECD [19]. Up to now, a series of dynamic community detection approaches based on the frame of evolutionary clustering has become the most competitive and widely utilized approach of uncovering the community structure in dynamic networks.
However, dynamic community detection approaches based on the framework of evolutionary clustering have two disadvantages. First, the detection accuracy of the snapshot at the first time step has not received enough attention. If the community division of the snapshot at the first time step is not accurate, then the community structure division at the subsequent time steps will become increasingly inaccurate, that is, the problem of error accumulation occurs. Second, the modularity and normalized mutual information (NMI) obtained at each time step can be still improved further because EAs as optimization strategies of the dynamic community detection approaches based on the framework of evolutionary clustering are usually traditional, such as GA, PSO and DE. These EAs generally have many drawbacks, such as slow convergence speed and easily falling into the local optimum.
To address these issues, this article proposes an dynamic community detection method based on the framework of evolutionary clustering, which introduces a novel bioinspired meta-heuristic algorithm called the pity beetle algorithm (PBA) [20] as evolutionary strategy. Because many experimental results on various complex continuous numerical optimization problems show that PBA outperforms many widely used meta-heuristics methods (such as GA, DE, PSO, and ABC, etc) in terms of accuracy, stability and speed. The proposed algorithm is hereafter called DYN-OPGPBA. The major innovations and contributions of this article can be summarized as follows: 1) To improve the community detection accuracy of the snapshot at the first time step and avoid the error accumulation problem, a static community detection algorithm based on the optional pathway guide pity beetle algorithm is proposed, hereafter called CD-OPGPBA. The main idea of CD-OPGPBA is to optimize the modularity function based on an improved PBA with excellent optimization performance. In CD-OPGPBA, a discrete search strategy uses adjacent nodes and network topological information as the crucial factors to enhance its global optimization performance. To guarantee that the number of divided communities is equal to that of real communities, a closeness-based community modification strategy and a crowded community split strategy are designed.
2) To improve modularity and NMI at each subsequent time step, a dynamic community detection algorithm based on the multi-objective optional pathway guide pity beetle algorithm is proposed. The initial community structure is obtained by choosing the optimal community division at the previous time step on the basis of our proposed novel identification method based on the module density, and an improved multi-objective pity beetle algorithm based on decomposition (MOEAD/PBA) with a neighbour vector competition-based individual updating strategy. An external population size restriction mechanism is introduced to optimize modularity and NMI simultaneously.
In experiments, two main scenarios were considered. 1) In the first scenario, the effectiveness of CD-OPGPBA in community quality is verified because the detection quality of the network at the first time step seriously affects the community detection at the subsequent time steps. Tested on static networks, including LFR and real-world networks, the superior community quality of CD-OPGPBA is further validated through comparison with other well-known community detection methods. 2) In the second scenario, DYN-OPGPBA is compare with two well-known algorithms over Infectious Social Patterns dynamic networks and dynamic LFR networks. At most time steps of the datasets, a high level of modularity and NMI was achieved by DYN-OPGPBA. Moreover, both CD-OPGPBA and DYN-OPGPBA work without prior knowledge of the total number of communities and require only a few specific optimization parameters.
The rest of this article is organized as follows. Section II introduces definitions and concepts about the dynamic community detection problem and related work. Section III describes the details of the proposed DYN-OPGPBA. Section IV shows the empirical results to illustrate the effectiveness of the proposed method. Finally, Section V concludes this article. Table 1 gives the main formal notations involved in this article, including graphs, membership matrix, and PBA.

B. DESCRIPTION OF DYNAMIC NETWORK
In dynamic networks, the nodes and the edges generally change with time. For instance, some nodes and edges might be added, and others could be deleted. Fig. 1 gives an example of a dynamic network which changes from step t = 1 to t = 4. Compared with the network at t = 1, node 9 is added edges (9,5) and (9,6) are newly added at t = 2. At t = 3, edge (9, 5) is removed, and edge (9, 3) is newly added. At t = 4, node 7 is removed, node 10 is newly added, edge (3,5) is removed, and edges (10,3) and (10,4) are newly added. Generally, a dynamic network can be described as a sequence G = G 1 , G 2 , · · · , G t , · · · , G T , where t ∈ {1, 2, · · · , T } represents the t-th time step, G 1 represents the original network, and G t is a snapshot of nodes and connections among these nodes at time step t, denoted as and i = j are the nodes set and the edges set of G t , respectively. Generally, the topology information of G t can also be represented by the adjacency matrix A t = A ij t n t ×n t , where A t ij represents the node type of G t and whether there is an edge between nodes i and j at t-th time step. In this paper, we merely consider unweighted and undirected networks, i.e., A t ij = A t ji and A t ij ∈ {0, 1}.

C. DESCRIPTION OF DYNAMIC COMMUNITY DETECTION
Dynamic community detection often involves two objectives: to optimize snapshot quality at each time step, thus enabling us to gain insights into network topological features as much as possible; and to guarantee slow changes between the network structures at consecutive time steps. After dynamic community detection, we finally obtain a sequence of community structures, denoted as C = {C 1 , C 2 , . . . , C t , . . . C T }, where C t is the community division in G t at the t-th time step, and C t = {C t 1 , C t 2 , . . . , C t k t } has k t divided subgraphs. This article focuses only on nonoverlapping community structures, i.e. C t needs to satisfy the following conditions: Briefly, dynamic community detection needs to optimize the snapshot quality and the temporal cost at each time step simultaneously. Generally, we adopt the normalized mutual information (NMI) [19] to estimate the temporal cost, while the snapshot quality can be measured by the following fitness functions: community score (CS) [22], modularity density (D) [23] and modularity (Q) [24]: 43916 VOLUME 10,2022 (1) Normalized Mutual Information (NMI) . . , c t−1 m t−1 ) represents the community structure of the snapshot at the t-th time step (the previous time step), m t (m t−1 ) represents the number of the divided subgraphs in G t (G t−1 ), Num t (Num t−1 ) is the number of nodes in G t (G t−1 ), L is the confusion matrix whose element L ij represents the number of nodes both in the community C t i ∈ C t and the community C t−1 j ∈ C t−1 , and L i· and L .j represent the sum of the elements of L in row i and column j, respectively. NMI C t , C t−1 ∈ [0, 1], and a high value of NMI C t , C t−1 indicates greater similarity between C t and C t−1 , i.e. NMI C t , C t−1 = 0 means C t and C t−1 are different.
(2) Modularity (Q) where M is the number of edges in the network, and k i and k j are the degrees of nodes i and j, respectively. δ(i, j) represents the community relationship between nodes i and j, i.e. if nodes i and j belong to the same community, δ(i, j) = 1; otherwise, δ(i, j) = 0. The higher the value of Q, the better the quality of community divisions.
(3) Modularity density(D) where m represents the number of communities in the network, |V i | represents the number of nodes in the i-th community, and L (V i , V i ) and L V i , V i represent the number of internal edges of the i-th community and the number of edges with other communities, respectively. (4) Community score(CS) where r is the adjustment parameter (usually, r = 2), i,j∈C k A ij represents the number of edges in k-th community C k and µ i is conducted as (6).

D. EVOLUTIONARY CLUSTERING-BASED METHODS FOR DYNAMIC COMMUNITY DETECTION
As shown in Fig.2, evolutionary clustering-based methods for the dynamic community detection contains two steps. At the first time step t = 1, to obtain excellent snapshot quality, the criterion function (i.e. Q, D, CS, etc.) is optimized by using a single-objective evolutionary algorithm. Afterwards, optimize the snapshot quality and the temporal cost at subsequent time steps simultaneously by a multi-objective evolutionary algorithm.
is assumed as the selected best community division of the snapshot at the previous time step. The snapshot quality at the current time step can be optimized by maximizing Q, D, CS, etc., and the temporal cost can be optimized by minimizing NMI t C t , C t−1 set . Finally, a sequence of community divisions C = {C 1 , C 2 , . . . , C t , . . . C T } will be obtained.

E. PITY BEETLE ALGORITHM
The pseudo-code of the PBA is shown in Algorithm 1, where RST (•) represents a random sampling technique. Here X 0 = RST B, A, D, N pop is generated as follows. First, each dimension space in the D-dimensional search space is evenly divided into N pop segments, and a value is randomly generated according to Eq. (7) in each segment. Then randomly pair the N pop samples obtained in the k-th search space with the k-th dimension of the N pop individuals in the population.
where k,i represents the sample value of the i-th segment in the k-th dimensional space, and rand represents a random number in the range of [0, 1].

III. PROPOSED DYNAMIC COMMUNITY DETECTION METHOD A. PROCEDURE OF THE PROPOSED ALGORITHM
In this article, we contribute to the framework of evolutionary clustering for dynamic networks, and propose a dynamic community detection method based on the optional pathway guide pity beetle algorithm (DYN-OPGPBA). As shown in Fig. 3, the proposed DYN-OPGPBA contains two stages. The first stage aims to obtain the optimal snapshot quality at the first time step by using an improved singleobjective PBA (described in Section III.B), while the second stage uncovers communities by optimizing the snapshot quality and the temporal cost at subsequent time steps simultaneously by a multi-objective PBA (described in Section III.C). Obviously, the objectives to be optimized and the adopted optimization algorithms in the two above stages are different. In the first stage, we optimize the modularity as Eq. (3) by using an improved PBA with a discrete search strategy based on adjacent nodes, a closenessbased community modification strategy and a crowded community split strategy. In the second stage, we optimize the two following objectives simultaneously: the modularity as Eq. (3) which is used to evaluate the snapshot quality and NMI as Eq. (2) which is used to evaluate the temporal cost, by using MOEAD/PBA (Algorithm 3), which employs the framework of decomposition-based approaches, a neighbour vector competition-based individual update strategy and an external population size restriction mechanism. To identify the optimal community division from a set of community divisions obtained by MOEAD/PBA at each subsequent time step, we proposed a novel identification method based on the module density, as mentioned in Section III.C.(2).

B. COMMUNITY DETECTION FOR THE FIRST TIME STEP
In this section, the proposed CD-OPGPBA for community detection at the first time step is described. As shown in Fig. 3, the CD-OPGPBA mainly contains four parts: population initialization, the optimization process, the closenessbased community modification strategy and the crowded community split strategy.

1) POPULATION INITIALIZATION
Our proposed method uses a popular and straightforward adjacent node-based encoding method [21] to generate a set of initial individuals, where the process of generating an individual according to the adjacent node-based encoding method is as follows: Assume that a network contains n nodes, the size of an individual X is equal to n, and each dimension X j is a random integer between 1 and n, which represents the node the j-th node is connected to. Fig. 4 illustrates of how an individual is encoded and decoded. The network G contains 7 nodes. Assume that X = {2, 3, 2, 5, 6, 5, 4} is generated according to the adjacent node-based encoding method. It indicates that each node from nodes 1 to 7 is connected to nodes 2, 3, 2, 5, 6, 5 and 4, respectively. Generally, we call two connected nodes as a two-node set with adjacency relation. Therefore

2) ADJACENT NODE-BASED DISCRETE SEARCH STRATEGY
In this article, a discrete search strategy for updating individuals is designed based on adjacent nodes, because original PBA is proposed to solve continuous optimization problems and cannot be applied in the discrete community detection problem directly. The details of our adjacent nodesbased discrete search strategy are shown in Algorithm 2, which contains five search modes. Note that all the offspring individuals in each brood population are generated only by the same search mode.
The details of the cutoff operation and search modes 4 and 5 are as follows.

Search Mode 4:
A node ir is selected randomly, and all the neighbour nodes that are not in the community to which the node ir belongs are found. Then, node jr is selected according to the roulette wheel selection method from those found neighbour nodes. Finally, the ir-th dimension of the candidate individual X (ir) is replaced by jr.
Search Mode 5: The search modes 5 and 4 are similar. The difference is that the offspring individual is generated by changing the jr-th dimension of the candidate individual X (jr) to ir.
Cutoff Operation: Given an individual X and its offspring individual XX, the value of some dimension of the individual X(r) is assumed to be not equal to XX(r), and the connected vector to which the node r belongs is found from two-node set obtained by decoding individual X . Then, all the nodes from the connected vector to which node r belongs whose dimensions in individual XX are equal to r are selected, and their dimensions in individual X are changed to the values of the labels of nodes connected to themselves randomly. for k = 1: N broods do 8: if k == 1 %% neighborhood search mode 9: %% medium-scale search mode 14: elseif rand < Pr %% large-scale search mode 16: else %% memory store search mode 18: y ← MEM(randint([1,Npop],1,1),:) 19: for jj = 1:D do 20: y1 ← y; 21: 22: fit_xx ← max(F(MEM)), xx ← arg max(F(MEM)) 23 4, 4, 1, 6, 7, 6, 9, 11, 8, 8, 11}, which represents that nodes {1, 2, 3, 4} are in the same community with community identifier equal to 1 (the community 1), nodes {5, 6, 7} are in the same community with commID equal to 2 (the community 2), and nodes {8, 9, 10, 11, 12} are in the same community with commID equal to 3 (the community 3). Assume that when performing the search mode 4 is performed on individual X , X (9) is changed from 11 to 7. Thus, an offspring individual XX is generated, i.e. XX = {4, 4, 4, 1, 6, 7, 6, 9, 7, 8, 8, 11}, which represents that nodes {1, 2, 3, 4} are all in community 1 and nodes {8, 9, 10, 11, 12} are all in community 2. Notably although only When the cutoff operator is applied, only the commID of some node is changed, and others still belong to the original communities.

3) CLOSENESS-BASED COMMUNITY MODIFICATION STRATEGY
Though the randomness of the adjacent node-based encoding method can help the search modes of Algorithm 2 provide more alternative community divisions, the nodes that should belong to the same community are assigned to different communities with high probability. Therefore, the number of communities obtained by Algorithm 2 might be much larger than the number of real communities.
To address the above issues, we proposed a new closenessbased community modification strategy to correct the unreasonable divisions and enhance their quality. We firstly calculate the closeness between each node and each community according to Eq. (8) and then divide each node into the community with the highest closeness.
where Cl k i is the closeness of the node i to the k-th community C k ; J i is the neighbor nodes of the node i; D(j) represents the degree of the node j; j k is the nodes of J i belonging to C k ; s k is the size of j k . Fig. 6 gives the community division obtained by performing Algorithm 2 on the Karate network. Compared with the real community division as shown in Fig. 7, node 3 is wrongly assigned to a unreasonable community.  We perform our closeness-based communication modification strategy on node 3 as shown in Fig. 8. The details are as follows. First, we identify all the neighbour nodes of node 3, i.e. nodes {1, 2,4,8,9,10,14,28,29, 33}, where nodes {2, 4, 8, 9, 10, 28, 33} are in community 1, node 1 is in community 2 and nodes {14, 29} are located in another community with commID equal to 3. Then, we calculate the closeness between node 3 and each community according to Eq. (8). We take an example of calculating the closeness between node 3 and community 1. Community 3 contains seven neighbour nodes of the node 3, i.e. nodes {2, 4, 8, 9, 10, 28, 33}, whose degrees are 9, 6, 4, 5, 2, 4 and 12, respectively. Thus, according to Eq. (8), we can obtain Cl 1 3 = 7×(9+6+4+5+2+4+12) = 294. Similarly, we can obtain   Fig. 9 shows the community divisions after our closenessbased community modification strategy is performed. The network in Fig. 6 has 5 communities, while Fig. 9 contains 4 communities equal to the number of real communities shown in Fig. 7. Node 3 is assigned to the real community 1 from the original community 2, and we can obtain improved partition with a larger modularity value.

4) CROWDED COMMUNITY SPLIT STRATEGY
Two undesirable cases may occur in the community detection: insufficient community division and excessive community division. Insufficient community division means VOLUME 10, 2022 Step 2: generate brood population 13: for i = 1 to N pop do 14: if Search mode == 1 then 15: ir←a random integer with [1,n] 16: Neighbor_set←all the neighbor nodes of node ir 17: X i ←only re-encode nodes ir and Neighbor_set using adjacent node encoding 18: elseif Search mode == 2 then 19: X i ←regenerate individual by adjacent node encoding 20: elseif Search mode == 3 then 21: ir←a random integer with [1,n] 22: X i ←only re-encode node ir using adjacent node encoding 23 the number of communities obtained is greater than the real one, while excessive community division means the number of communities obtained is less than the real one. The insufficient community division can be addressed by Algorithm 2 and our closeness-based community modification strategy. To solve the excessive community division, we propose the following crowded community split strategy. The concepts defined in our crowded community split strategy are listed in Table 2, including the crowded community, reference node, comparison node, reference node, comparison node set and central node set.
Our crowded community split strategy could be performed only when the best individual with the optimal modularity in the population does not change in successive iterations. The concrete steps are as follows. First, we identify the best individual X birth with the optimal modularity. Thus a community structure C can be identified from X birth by performing decoding operation. Then we split the crowded community identified from C into two new split subcommunities as follows. In the first step, the reference node (i.e. node a) is identified and assigned to split sub-community 1. In the second step, for each comparison node (i.e. node b), the number of nodes in its reference node set, comparison node set and central node set is calculated according to Eqs. (9)-(11), respectively (denotes as µ, η b and θ ab ). In the third step, if θ ab /µ ≥ ρor θ ab /η b ≥ ρ(ρ is equal to 0.6) is satisfied, then most neighbour nodes of node a are the neighbors of node b, and then node b is assigned to split sub-community 1 to which node a belongs to; otherwise, node b is assigned to split sub-community 2.
where card() represents the number of elements in the set; E is the set of edges; and Cr is the crowded community. X birth should be updated, because the connection relationships between the nodes of the original crowded community were changed after the crowded community split strategy. The details are as follows. First, we identify each node that belongs to the split sub-community 1 whose dimension in X birth is equal to the identifier of the node belongs to the split sub-community 2 and change its dimension to the identifier of its neighbour node that belongs to split subcommunity 1. Second, we identify each node that belongs to the split sub-community 2 whose dimension in X birth is equal to the identifier of the node that belongs to the split sub-community 1, and change its dimension to the identifier of its neighbour node that belongs to the split sub-community 2.

C. COMMUNITY DETECTION FOR THE SUBSEQUENT TIME STEPS
As shown in Fig. 3, the community detection method for the subsequent time steps except the first time step designed in this paper, consists of two main aspects: a decompositionbased multi-objective pity beetle algorithm (MOEAD/PBA) and the identification method of the optimal community partition at the current time step.

Algorithm 3 MOEAD/PBA for Community Detection at a Single Time Step
Input: N pop : the number of the brood populations; λ 1 , . . . , λ N : N uniformly distributed weight vectors; wn: the number of neighborhood weight vectors for each weight vector Output: EP: external population 01 Initialize parameters (N pop , N, wn) 02 P = X 1 , X 2 . . . X N ←Population_initialization according to the adjacent node-based encoding method 03 FV = FV (X 1 ), . . . , FV (X N ) ←Calculate Q and NMI for each X ∈ P 04 z * ←Initialize the ideal point 05 λ = λ 1 , . . . , λ N ←Generate N uniformly distributed weight vectors 06 EP = ϕ, and t = 1; REPEAT 07 P1←Perform neighbor vectors competition-based individual update strategy on P 08 EP = EP ∪ y best // y best is the individual with the optimal module density in population P1 09 y best ←Perform closeness-based communication modification strategy on ybest 10 if FE > FE un 11 y best ←Perform crowded community split strategy on y best 12 end if 13 P = P 1 , P 2 . . . P Npop ←Perform adjacent nodes-based discrete search strategy on y best // update EP 14 for i = 1 to N pop do 15 X best ←the individual with the optimal module density in the i-th brood population y 16 update z * individual by comparing the original z * and the FV(X best ) 17 for j = 1 to |B(x best )| do //B(x best ) represents the set consist of individuals in the x best s neighborhood 18 if g te x best λ B(j) , z ≤ g te B(j) λ B(j) , z 19 x B(j) = x best , FV B(j) = FV (xbest) 20 end if 21 EP ← EP\xx, if xx ∈ EP, and ∀xx ≺ xbest 22 EP ← EP ∪ xbest if there is no xx ∈ EP making xx ≺ xbest 23 end for 24 end for 25 Perform external population size restriction operation on EP 26 Until termination criterion met

1) MOEAD/PBA FOR COMMUNITY DETECTION AT THE CURRENT TIME STEP
The detailed procedure of our decomposition-based multiobjective pity beetle algorithm (MOEAD/PBA) for community detection at the subsequent time step except for the first time step is shown in Algorithm 3, which contains two main operations: neighbour vector competition-based individual update strategy and external population size restriction mechanism.

a: NEIGHBOUR VECTOR COMPETITION-BASED INDIVIDUAL UPDATE STRATEGY
For the multi-objective evolutionary algorithms based on decomposition, the neighborhood of an individual can make contributions to generate its candidate individual in general. On this basis, we propose the following neighbour vector VOLUME 10, 2022 competition-based individual update strategy to generate the candidate individuals. First, the weight vector λ i associated with the individual x i and the individuals associated with the weight vectors with w n -nearest neighbour distances around λ i are identified. Then these individuals and the individual i are combined to constitute the competition population of the individual x i , i.e., X i = x i , x i 1 , . . . , x i wn T . Thus, the j-dimension of a new individual x i can be generated by selecting one of the elements of X i = x i , x i 1 , . . . , x i wn T in column j according to the roulette selection method, where the probability of element k of X i = x i , x i 1 , . . . , x i wn T in column j is as follows: where P i k,j represents the probability of element k of X i = x i , x i 1 , . . . , x i wn T in column j, D(X i k,j ) represents the degree of node X i k,j and D(X i k,j ) represents the number of nodes connected to node X i k,j in the community to which node X i k,j belongs.

b: EXTERNAL POPULATION SIZE RESTRICTION MECHANISM
In Algorithm 3, external population is used to store all the nondominated optimal solutions until the current iteration. At the end of the iterations, we can obtain a large number of local optimal community partitions, especially for the large networks, which are nondominated between each other. Some solutions are relatively dense, whereas others are relatively dispersed. Those relatively dense individuals have searched in their neighborhoods many times, which is why they have difficultly generating candidate individuals with higher values of modularity. The relatively dispersed individuals can generate excellent candidate individuals because they have not been explored deeply. On the basis of the above considerations, we propose the following external population size restriction mechanism. When the size of EP exceeds the threshold EP um , the individuals with EP um -farthest crowding distances are preserved in EP.

2) FINAL OPTIMAL COMMUNITY DIVISION IDENTIFICATION SCHEME
In general, after Algorithm 3 is applied to community detection at each time step, a set of nondominated solutions (PS) can be obtained. The community partitions of the networks at the current time step and the previous time step are required to change slowly, which is why the optimal solution of PS obtained in the previous time step needs to be identified to start with the community detection at the current time step. The modularity is served to evaluate the snapshot quality in Algorithm 3, which reveals the community structure through comparing the actual density of links in a subgraph with the density one would expect to have in the subgraph if edges fall at random without regard to the community structure. The modularity is the most competitive and widely used criterion function. Unfortunately, it can only detect communities larger than a certain size, i.e. resolution limitation problem. The modularity density described as (4) reveals the community structure by comparing interlink density in all communities and intralink density between different communities, which can solve the so-called resolution limitation problem caused by the modularity. We select the solution with the optimal modularity density from PS as the optimal solution at the previous time step.
The optimal solution at the previous time step is not directly served to detect the community structure of the new network at the current time step, because the network at the current time step is different from the one at the previous time step, such as the number of nodes and the membership. We need to modify the optimal solution at the previous time step to adapt the detection of the network at the current time step. The details are as follows. Firstly, the deleted nodes in the new network at the current time step are removed from the optimal solution. Then, the nodes added and the nodes that are not connected to the according dimensions in the optimal solution in the new network at the current time step are identified and subjected to adjacent node-based encoding.

D. TIME COMPLEXITY
In this work, n represents the number of nodes, T indicates the size of EP, k shows the average number of each node's neighbour communities and NP indicates the population size. The proposed DYN-OPGPBA contains two separate community detection stages for the first time step and the subsequent time steps, and the time cost of the latter is greater than that of the former. Therefore, the time complexity of the community detection algorithm for the subsequent time steps will be analyzed in this section. The community detection algorithm for the subsequent time steps contains two separate stages. The first stage is Algorithm 3, and the second stage is shown in the final optimal community division identification scheme.
( (2) The final optimal community division identification scheme needs O(NP 2 ) computation time.
Therefore, the worst overall time complexity cost of this algorithm in each iteration is simplified as O(NP 2 ) + O(T * NP * n).

IV. EXPERIMENTAL STUDY
In this section, two experiments are conducted to fully evaluate the performance of our proposed DYN-OPGPBA. We verify the effectiveness of the community detection of CD-OPGPBA, because the detection quality of the network at the first time step has a greater effect on the community detection of the dynamic networks, compared with that of the network at subsequent time steps. The effectiveness of the proposed DYN-OPGPBA is then verified. The proposed algorithm is implemented in Matlab 2016. All the tests are conducted on a personal computer equipped with a core i5 CPU (1.6GHz) and a 8.0-GB memory.

A. VERIFICATION OF THE DETECTION EFFECT OF CD-OPGPBA
In this section, the performance of the proposed CD-OPGPBA is verified on artificial synthetic networks, small-scale real-world networks and real-world networks.
In our experiments, our proposed CD-OPGPBA uses the same parameter setting on all test networks as follows. The maximum number of iterations T max is set to 800, the population size NP is set to 20, the number of the nodes selected randomly is set to10 (nod = 10), pr = 0.6, ρ = 0.6 in the crowded community split strategy.

1) RESULTS ON ARTIFICIAL SYNTHETIC NETWORKS
In this section, a number of synthetic networks produced by the Lancichinetti-Fortunato-Radicchi (LFR) [25] mode are generated to further evaluate the performance of CD-OPGPBA and the compared approaches, where the distribution of node degrees and size of communities are both power laws with tuneable exponents, thus making it much closer to real-world networks. The mixing parameter mu determines the fraction of edges between each node and its neighbours in other communities, and a bigger value of mu indicates an unclear corresponding community structure of the network and more difficult community detection. In our experiments, six different LFR networks are generated with the following settings: Each network contains 128 nodes, the average degree is fixed to 16, the community size is set to 32, and the value of mu varies from 0.1 to 0.6 with an interval of 0.1.
We choose six typical high-performance modularity optimization algorithms for comparison, including ECSD [26], FN [27], GN [28], Meme-net [29], MAGA-net [30], and walktrap [31]. The parameters of the other compared algorithms are set the same as in the original literature. For each network, the performance of CD-OPGPBA is evaluated and compared with that of the six competitors. The experimental results are shown in Fig. 10, where each data point represents the best value of Q achieved by each algorithm on each LFR network in a random run. Fig. 10 shows that, compared with the six other community detection algorithms, CD-OPGPBA can always obtain the best value of Q on all the six different LFR networks. We can  also find that, for the networks with small values of mu (mu ≤ 0.3), the performance of CD-OPGPBA is the same or slightly better than that of the competitors. However, with the growth of mu, the superiority of CD-OPGPBA over other competitors becomes much more significant. Especially when mu is equal to 0.5, the performance of each algorithms have the greatest differences. When mu = 0.6, the performance of CD-OPGPBA is the same or slightly better than that of FN and walktrap, because the community structure of the network is too vague to detect communities.

2) RESULTS ON SMALL-SCALE REAL-WORLD NETWORKS
To further investigate the performance of our CD-OPGPBA, we try to verify its effectiveness on four widely used smallscale real-world networks: the Zachary's karate club network (Karate), the Dolphin network (Dolphins), the American political books network (polBooks) and the American college football network (Football). Their unique topological information [7] is shown in Table 3.

3) RESULTS ON LARGE-SCALE REAL-WORLD NETWORKS
To further investigate the detection ability of CD-OPGPBA on the large-scale real-world networks, three well-known large-scale real-world networks are tested: the NetScience network, the Euroroad network and the PGP network. Their unique topological information [55] is shown in Table 5.
Experiments proved that most community detection methods can obtain the excellent performance on some networks and may lose their effectiveness on other networks. Therefore, different methods will be chosen as the competitors of CD-OPGPBA on different networks.
For the Euroroad network, seven well-known community detection methods are chosen as the competitors, including ECSD [26], FN [27], GN [28], Meme-net [29], MAGAnet [30], walktrap [31], and MDP [56]. Table 7 records Q best and Q mean obtained by each algorithm. The results of MDP were collected from the literature [56], and other results were obtained by running the codes of each algorithm for 15 times. Table 7 shows that CD-PBA obtains a partition with higher Q best . It is only less than that of FN and MDP which obtain the highest Q best value of 0.8722, which indicates CD-PBA outperforms Meme-net, ECSD, GN, MAGA-net, and walktrap on the NetScience network.  To further investigate the detection ability of CD-OPGPBA on the large-scale real-world networks with more than 10000 nodes, the PGP network is conducted. 7 well-known community identify methods are chosen as the competitors, including LPA [55], CC_GA [42], FSA [60], CDEP [58], FUC [59], SOSCD [54] and MDP [56]. Table 8 records Q best and Q mean obtained by each algorithm. Where the results of LPA, CC_GA, FSA and SOSCD are collected from the literature [54], the results of CDEP, MDP and FUC are respectively taken from the literature [58], [55], [56], and the results of CD-OPGPBA are obtained by run its codes until the maximum iterations reach to 800, 900 and 1000, respectively.
As we can see from Table 8, some conclusions can be obtained as follows. First, Q best of the proposed CD-OPGPBA only higher than that of LPA and CDEP. It indicated that the performance of CD-OPGPBA only outperforms LPA and CDEP, and is worse than other competitors; Second, the best value of Q obtained by CD-OPGPBA has been still increasing, as the number of iterations increases. Moreover, the best values of Q obtained in each iteration change rapidly at the beginning of evolution, and increase slowly at the end of evolution. In particular, the best value of Q obtained by CD-OPGPBA will be still increased when the iteration number is set to a value which is higher than 1000.
In sum, the proposed CD-OPGPBA is an effective community detection method for small-scale networks, and outperforms many state-of-the-art community detection methods. Whereas the performance of CD-OPGPBA decreases with the increase of network scale. However, CD-OPGPBA only is the first stage of the proposed dynamic community detection method which adopts the frame of evolutionary clustering, and the evolutionary clusteringbased methods for the dynamic community detection usually are used to sever for the small-scale dynamic networks. Therefore the performance of the proposed CD-OPGPBA on the small-scale networks is more important.

B. VERIFICATION OF THE DETECTION EFFECT OF THE DYNAMIC NETWORK
In this section, the performance of the proposed DYN-OPGPBA is validated by comparing it with that of two state-of-the-art dynamic community detection approaches: L-DMGAPSO (Label-based Dynamic Multi-objective Genetic Algorithm Particle Swarm Optimization) [8] and ECD (Evolutionary Community Detection) [9]. The comparisons are conducted on artificial synthetic dynamic networks and well-studied real-life dynamic networks.
To ensure a fair comparison, the populations utilized by different community detection approaches are initialized with the same size NP = 20 and the same termination criterion (i.e. the maximum number of iterations) T max = 800. The other parameters of each algorithm are set as follows: In DYN-OPGPBA, the number of the randomly selected nodes is set to 10 (nod = 10), pr = 0.6, ρ = 0.6 in the crowded community split strategy, and the number of neighbourhood weight vectors in Algorithm 3 w n = 4. In L-DMGAPSO, according to the author's suggestion, the crossover probability and the mutation probability are set to 0.9 and 0.1, respectively. The inertia weight ω = 0.7298 and learning factor s1 = s2 = 1.4961. In ECD, according to the author's suggestion, community connection threshold R is set to 0.2, the selection probability, the mutation probability and the immigration probability are set to 0.5, 0.2 and 0.5, respectively.

1) RESULTS ON REAL-LIFE DYNAMIC NETWORKS
In this section, a real-life Infectious Social Patterns dynamic network is adopted to test the performance of our algorithm. This network was built by Isella et al. by tracking science museum visitors [30], where the vertices represent the visitors and the link represents the close encounter relationships of visitors. Our experiment focuses on visitors over five days from 28 April to 2 May 2009. To perform daily analysis, the data are divided into five subsets, one for each day. Some statistical information regarding the network used in our experiments are given in Table 9. Fig. 11 depicts the performance of our algorithm and the compared algorithms on the above dynamic Infectious Social Patterns network, respectively, where each data point represents the value obtained by each algorithm at each time step in a random run. Fig. 11(a) plots the modularity values of the competing algorithms. Compared with the L-DMGAPSO, the modularity obtained by the DYN-OPGPBA is slightly smaller at the fifth time step, slightly larger at the second time step and significantly larger at the other time steps. In addition, the values of modularity obtained by DYN-OPGPBA are higher than those of ECD. DYN-OPGPBA is slightly better than L-DMGAPSO and     significantly better than ECD in terms of snapshot quality according to the statistical values of modularity. Fig. 11(b) plots the NMIs of the competing algorithms. The community structures discovered by DYN-OPGPBA have higher NMIs at each time step than those of ECD, which indicates that DYN-OPGPBA can discover smoother community structures than ECD. Compared with L-DMGAPSO, NMIs obtained by the DYN-OPGPBA are higher at three out of four time steps except for the second time step. This condition is due to the difference between the optimal community division at the second time step and the original community structure at the third time step is obvious, because compared with the network at the second time step, many nodes and edges are deleted at the third time step. In general, DYN-OPGPBA can contribute more to smoothing the community structure between two consecutive time steps than ECD and L-DMGAPSO.
In sum, DYN-OPGPBA is slightly better than L-DMGAPSO and significantly better than ECD for dynamic community detection on infectious social patterns networks.

2) RESULTS ON ARTIFICIAL SYNTHETIC DYNAMIC NETWORKS
In this section, we build five dynamic synthetic networks to further evaluate the performance of our proposed DYN-OPGPBA and the compared algorithms. Each dynamic synthetic dataset contains 10 time steps, where each original synthetic network at the first time step is respectively generated by LFR with mu = 0.1, 0.2, 0.3, 0.4 and 0.5, while the networks at other time steps are obtained by evolution events that may characterize the evolution of dynamic networks. The evolution events can be described as follows: 20% of the nodes in each community are selected randomly to be assigned to an another community randomly, and the edges between these nodes and other nodes in the new community are regenerated according to the mixing parameter mu.
Figs. 12-16 depict the performance of our algorithm and the compared algorithms on the five dynamic synthetic networks, where each data point represents the value obtained by each algorithm at each time step in a random run.
As mentioned above, the community structures of the networks with mu = 0.1, 0.2 and 0.3 are obvious, whereas those of the networks with mu = 0.4 and mu = 0.5 are vague. As shown in Figs. 12(a), 13(a) and 14(a), the values of modularity obtained by DYN-OPGPBA on the networks with mu = 0.1 at the sixth and eighth time steps are slightly larger than those obtained by the other two algorithms. Moreover, DYN-OPGPBA outperforms the compared algorithms in terms of snapshot quality according to the statistical values of modularity. As shown in Figs. 12(b), 13(b) and 14(b), DYN-OPGPBA can always obtain the best NMIs on all 10 time steps. As shown in Fig. 15(a), with the growth of mu, the statistical values of modularity obtained by all algorithms decrease. However, compared with its competitors, DYN-OPGPBA performs the best performance at 9 out of 10 time steps except for the first time step. Fig. 15(b) demonstrates that except for its slightly poorer performance than that of L-DMGAPSO at the seventh and eighth time steps, DYN-OPGPBA can always obtain the best NMIs. As shown in Fig. 16, the performance of DYN-OPGPBA is statistically similar to that of ECD in terms of snapshot quality according to the statistical values of modularity, while the performance of DYN-OPGPBA is statistically significantly better than that of ECD in terms of temporal cost according to NMIs. Moreover, the temporal cost and snapshot quality of DYN-OPGPBA are statistically significantly better than those of L-DMGAPSO.

V. CONCLUSION
This paper proposes a novel dynamic community detection method that based on the framework of evolutionary clustering and is called DYN-OPGPBA. It consists of two steps. In the first step, the PBA is improved to maximize the modularity for the community detection of the first time step. Then, with the influence of the community affiliation of nodes and their neighbouring nodes' community affiliation on the division quality considered fully, the evolutionary strategy of PBA is discretized, and a discrete search strategy based on neighbouring nodes is proposed. A closeness correction strategy and a crowded community splitting strategy are proposed to standardize the community division of nodes further. Tested on 12 general networks and compared with many representative static community detection methods, the proposed algorithm obtains greater modularity, indicating its superior detection accuracy and reducing the adverse effects caused by error accumulation. In the second step, for community detection at subsequent time steps, modularity and NMI are chosen as optimization objectives. Modularity ensures the community quality of the current time step, and NMI ensures the temporal smoothness of the network. Then, for the decomposition-based multi-objective framework, a neighbourhood vector competition strategy and an external population size restriction mechanism are proposed, along with solving the modularity and NMI entropy. A method is also proposed to determine the final community division results for each time step according to the modularity density, which is used as the starting point for community division in the next time step. Tested on one real dynamic network and five artificial synthetic dynamic networks, DYN-OPGPBA provides a better balance between snapshot quality and temporal cost than two other more representative dynamic community detection methods.

DATA AVAILABILITY
The codes and the data used to support the findings of this study are available from the corresponding author upon request.