A Multi-Label Propagation Algorithm with the Double-Layer Filtering Strategy for Overlapping Community Detection

Overlapping community structures are universal in complex networks. Multi-label propagation approach, as a method for detecting overlapping community, have been received widespread attention for its near linear time complexity. In recent years, some algorithms based on multi-label propagation approach have been proposed, but there is still room to improve the accuracy of these algorithms to detect overlapping communities. In this paper, based on the research of previous multi-label propagation algorithms, a double-layer filtering strategy is proposed to help nodes filter labels and the node centrality of each node according to the number of its neighbors and the number of edges between its neighbors is used to fix the node update sequence. Then based on the double-layer filtering strategy, the new fixed node update sequence and the previous multi-label propagation algorithm, a new algorithm called MLPA-DF is proposed. Theory analysis show that MLPA-DF still has the near linear time complexity. And experimental results on synthetic and real-world networks show that MLPA-DF has better performance than previous multi-label propagation algorithms (COPRA, SLPA and LPANNI) in terms of the accuracy of detecting overlapping communities without reducing stability.

update strategy, some algorithms based on COPRA have been proposed [12], [14]- [18]. Compared with COPRA, these algorithms all improve the accuracy of detecting overlapping communities, but they have their own shortcomings. The stability of SLPA [14], DLPA [15] and LPA-E [16] is not good, the time complexity of BMLPA [12] and CLBLPA [17] increases, and the performance of LPANNI [18] on some small-scale networks is not ideal. Therefore, there is still room for improvement in this type of algorithm. In order to further improve the stability and accuracy of detecting overlapping communities, a novel MLPA is proposed in this paper, which is called MLPA-DF. MLPA-DF is an improved version of LPANNI, and it adopts the double-layer filtering strategy and a new fixed node update sequence. The main contributions of this paper are summarized as follows: 1) A double-layer filtering strategy is proposed to help each node select the appropriate labels and classify the selected labels into two categories. 2) A new fixed node update sequence is proposed based on the node centrality. The centrality of a node is determined by the number of its neighbors and the number of edges between its neighbors. 3) The double-layer filter strategy replaces the label filtering method and dominant label in the label update strategy of LPANNI, and the new fixed node update sequence replaces the node update sequence of LPANNI. These changes allow the algorithm to achieve better accuracy and stability in detecting overlapping communities, and solve the problem of its poor performance on some networks. The rest of this paper is organized as follows. Section II introduces some popular MLPAs. Section III presents the details of our proposed algorithm. Experimental results on synthetic and real-world networks are shown in Section IV. The conclusion is given in Section V.

II. RELATED WORK
MLPA is one of the effective methods to detect overlapping communities in complex networks and is the extension of LPA. In LPA and MLPA, labels are community identifiers. Each node is initially given a unique label, and updates labels according to the label update strategy. At the beginning, there are a lot of different labels in the network. As nodes are iteratively updated, closely connected nodes may have the same label, so the types of labels in the network are greatly reduced. After all nodes stop updating, the labels of the nodes determine the communities these nodes join. The difference between LPA and MLPA is mainly whether their label update strategies allow each node to have multiple labels. When a node updates the label, in LPA it selects only one label as its new label, while in MLPA it selects more than one label as its new label. That is, in MLPA, there are some nodes with multiple labels, these nodes are overlapping nodes, and they belong to multiple communities.
COPRA is the first MLPA. It improves the label update strategy of LPA and solves the problem that LPA cannot detect overlapping communities. And COPRA is the first to define the belonging coefficient to indicate the possibility that a node belongs to the community identified by a label. The main procedure of COPRA is as follow: 1) Initialization. Every node is given a unique label, and the belonging coefficient of the label is set to 1. 2) Updating labels. Repeatedly, all nodes update labels at the same time according to the labels of their neighbors, until the termination criterion proposed by Gregory [10] is satisfied. When a node updates labels, it first clears its current labels and receives all the labels of all neighbors. Then it recalculates the belonging coefficients of all labels. According to a global parameter v, the node selects labels whose belonging coefficients are larger than the threshold 1/v as its new labels and deletes other labels. If no label's belonging coefficient is larger than 1/v, the node randomly selects one of the labels with the largest belonging coefficient as its new label. Then normalizing the belonging coefficients of the new labels. 3) Detecting communities. All nodes with the same label join the same community. If a node has more than one label, it is an overlapping node and join more than one community. Communities that are totally contained by others are removed. In sparse networks, the time complexity of COPRA is near O(n).
Although COPRA can detect overlapping communities, the randomness of its label update strategy leads to its poor stability. Therefore, some algorithms have been proposed based on COPRA to improve both its accuracy and stability.
SLPA [14] uses the label update history to improve COPRA. In SLPA, every node has a memory for recording the label update history. When the node updates the label, only one label is selected as the new label, and then record this label to its memory. If a label occurs more frequently in the memory of a node, it is more likely to be selected by neighbors of the node as the new label. After all node update several times, some labels are recorded in their memories. A global parameter r is used as the threshold, and the memories delete the labels whose occurrence times are lower than r. The remaining labels determine which communities these nodes join. In sparse networks, the time complexity of SLPA is also near O(n).
Each node is initially given a unique label in COPRA. BMLPA [12] and CLBLPA [17] improve the initialization method of COPRA. CLBLPA uses Leader Rank algorithm to select core nodes in the network, and only the core nodes are initially given a unique label [17]. BMLPA uses Rough core algorithm to classify nodes, and nodes in the same category are initially given the same label [12]. In addition, BMLPA uses balanced belonging coefficient instead of the belonging coefficient of COPRA to improve the accuracy of detecting overlapping communities. The time complexity of BMLPA and CLBLPA in sparse networks are both O(nlogn). Compared with COPRA, their time complexity has increased. DLPA [15] and LPA-E [16] redefine the belonging coefficient and set the threshold for filtering labels in different ways from COPRA. In LPA-E and DLPA, information entropy and Jaccard index are used to calculate the belonging coefficients, respectively. And when the node x updates labels in LPA-E and DLPA, their thresholds to help x filter labels are both set to 1/|Nb(x)|, where |Nb(x)| is the number of x's neighbors. Moreover, in DLPA, each node selects a label with the largest belonging coefficient from its own labels as the dominant label. When a node updates labels, only the dominant label of its neighbors can be received. The time complexity of LPA-E and DLPA in sparse networks are near O(n).
LPANNI [18] uses a series of methods to improve COPRA. It has two phases. In the first phase, it calculates NI (Node Importance) and NNI (Neighbor Node Influence) and sorts all nodes in ascending order based on NI. The calculation method of NI and NNI can refer to Eq. (1) and Eq. (6) in reference [18]. The second phase is the label propagation which is similar to COPRA, including three steps: 1) Initialization. Each node is given a unique label and the label is the dominant label. The belonging coefficient of this label is set to 1. 2) Updating labels. Repeatedly, each node updates labels according to the ascending order of NI. When a node updates labels, it first clears its current labels and receives the dominant label of each neighbor. Then it recalculates the belonging coefficients of the received labels according to Eq. (8) and Eq. (9) in reference [18]. The node x selects labels whose belonging coefficients are larger than the threshold 1/|Lr(x)| as its new labels and deletes other labels. Here |Lr(x)| is the number of different labels received from neighbors by x. At the same time, it selects one of the labels with the largest belonging coefficient as its dominant label. If a label with largest belonging coefficient is the dominant label of the node in previous iteration, it will be selected as the dominant label first in this iteration. The termination criteria are as follows: the size of label set and the dominant labels of all nodes don't change in two consecutive iterations, or the number of iterations reaches the upper limit. 3) Detecting communities. In this step, all labels are no longer divided into dominant labels and non-dominant labels. This step is the same as in COPRA. LPANNI has good accuracy and stability in large-scale complex networks, and its time complexity in sparse networks is near O(n). Table 1 lists the different methods of above algorithms to improve COPRA and their time complexity in sparse networks. Obviously, the belonging coefficient is important to the MLPAs. Most algorithms redefine the belonging coefficient, and recently there are still some researchers improving the calculation method of belonging coefficient.
For example, Kouni et al. [19] and Gao et al. [20] respectively proposed new calculation methods of belonging coefficient, and Attal et al. [21] compared the effects of different belonging coefficients in complex networks. However, the threshold for filtering labels is usually not paid enough attention in MLPAs. In the algorithms described above, their thresholds for filtering labels are set to a global parameter (SLPA and BMLPA), 1/|Nb(x)| (LPA-E and DLPA) or 1/|Lr(x)| (LPANNI) when node x updates labels. Usually, when a node updates labels, if it receives more different labels, or the belonging coefficients of the received labels are closer, its threshold should be smaller.
Meanwhile, when nodes update labels, nodes in DLPA and LPANNI only receive neighbors' dominant labels, but nodes in other algorithms receive all labels of neighbors. Compared with the above two methods, it is a better choice for nodes to receive only part of the labels with large belonging coefficients from their neighbors.
In LPANNI, the node update sequence is fixed by the ascending order of node importance. This fixed node update sequence greatly improves the stability of the algorithm, but a better fixed node update sequence can improve stability and accuracy at the same time. Thus, a new fixed node update sequence needs to be designed.
Based on the above analysis, a new multi-label propagation algorithm is proposed in Section III.

III. THE MULTI-LABEL PROPAGATION ALGORITHM WITH THE DOUBLE-LAYER FILTERING STRATEGY
Compared with LPANNI, MLPA-DF has two main changes. First, MLPA-DF replaces the method of receiving labels and filtering labels in the LPANNI's label update step with the double-layer filtering strategy. Second, MLPA-DF uses a different fixed node update sequence from LPANNI.

A. THE DOUBLE-LAYER FILTERING STRATEGY
In the first layer of the double-layer filtering strategy, a threshold th1 is designed to help nodes filter labels. This layer takes effect after a node receives labels from neighbors and recalculates the label's belonging coefficients. Only if a label's belonging coefficient is larger than th1, the node will select it as a new label. Therefore, if th1 of a node is smaller, the node tends to select more labels as its new labels. The th1 of node x is defined as follows: where |Lr(x)| is the number of different labels received from neighbors by x. b(c,x) is the belonging coefficient between node x and label c. g is a manually set parameter. Because MLPA-DF uses the same method as LPANNI to recalculate the belonging coefficients of received labels by the node, these belonging coefficients are normalized and the value range of ∑ 2 ( , ) The smaller the sum of squares, the closer the belonging coefficients of these labels are, and the smaller th1 is. g is a global parameter that affects the th1 of all nodes. It usually has different values in different networks. The first layer of the double-layer filtering strategy is the same as the label filtering method in LPANNI except for the threshold.
In MLPA-DF, the second layer of the double-layer filtering strategy is to divide nodes into active labels and inactive labels. Only the active labels of neighbors are received by the node that is updating labels. Another threshold th2 is set to divide the nodes into two categories after a node selects new labels and normalizes new labels' belonging coefficient. If a label's belonging coefficient is larger than th2, the label is an active label. And other labels are inactive labels. The th2 of node x is defined as follows: where |Ln(x)| is the number of new labels selected by node x.
The average belonging coefficient of the new labels is 1/|Ln(x)|, and labels with belonging coefficients less than the average are usually not important to x. However, if the belonging coefficients of all new labels are different but close, these belonging coefficients are usually close to 1/|Ln(x)| as well, and all labels may have similar importance to x. If th2 (x) is set to 1/|Ln(x)|, there must be a part of labels whose belonging coefficients are less than the average, then they are classified as inactive labels. For example, a node has three new labels, and their belonging coefficients are 0.34, 0.34, and 0.32 respectively. It is inappropriate to set the label whose belonging coefficient is 0.32 as an inactive label. This problem can be avoided by just making th2 (x) slightly smaller than 1/|Ln(x)|. In fact, the active labels in MLPA-DF play a similar role to the dominant labels in LPANNI.
In order to use the double-layer filtering strategy, the initialization step and termination criterion need to be changed accordingly in MLPA-DF. In initialization step, the initial label of each node is set to the active label. And the termination criterion for updating labels step is only that the number of iterations reaches the upper limit.

B. FIXED NODE UPDATE SEQUENCE
In MLPA, nodes update their labels according to the labels of their neighbors. If a node has a large number of neighbors, it sends its own labels to many nodes, and these labels are usually selected as the new labels by more nodes. In addition, if two neighbors of a node are connected to each other, the node and these two neighbors form a triangle, and the three nodes are likely to select the same labels as their new labels in the same iteration. Therefore, the influence of a node's labels can be measured by the number of node neighbors and the number of edges between node neighbors. And the greater the influence of the labels of a node, the more likely it is to become the core of the community, so the node centrality (NC) is designed as follows: where Nb(x) denotes the neighbor set of node x. |Nb(x)| denotes the number of neighbors of node x. NC is used to fix the node update sequence in MLPA-DF. Because labels of a node with a larger NC are propagated to more nodes, new labels of a node usually come from neighbors whose NC is larger than the node. If a node with a small NC update labels first in one iteration, its new labels may be the old labels of a neighbor with a large NC. Unless the new labels of the neighbor remain the same as the old labels, the node with a small NC make an invalid update. Based on this, nodes in MLPA-DF are updated in descending order of NC. Figure 1 is a sample network and MLPA-DF is used to detect overlapping communities on it.     Table 2 is NC of the sample network and Table 3 is NNI matrix of the sample network. The fixed node update sequence is 7-1-5-4-2-6-3-8 according to the descending order of NC.   Figure 4 is the label propagation phase of MLPA-DF on the sample network. The label c of the node and its belonging coefficient b are recorded in the form of (c,b), b retains three decimal places. The parameter g is set to 0.8. Figure 4(a) is the initialization, each node is given a unique label, and the label's belonging coefficient is set to 1. Figure  4(b) is the result after the first iteration. Labels of each node has changed. In node 5, b of label 5 is 0.330, which is less than th2 of node 5 (the th2 of node 5 is 1/3 at this time), so label 5 is an inactive label and is marked by strikethrough. Figure 4(c) is the result after the fourth iterations. Nodes have been clearly divided into two communities. Node 1 is an overlapping node and joins two communities. Calculate NNI according to Eq (6) in reference [18] 3

C. THE PROCEDURE OF MLPA-DF AND AN EXAMPLE ON THE SAMPLE NETWORK
Generate a nodes' sequence Nsort according to the NC in descending order 4 Initialize each node with a label cx = x and its b = 1. Set cx as an active label 5 for it = 1 to T do 6 foreach node x ∊ Nsort do 7 Lx ← ∅ 8 Add active labels of all neighbors of x to Lx 9 Update b of all labels in Lx according to Eq. (8) and Eq. (9)

D. TIME COMPLEXITY ANALYSIS
Given a network G (N,E), n=|N|, k is the average number of neighbors of each node. l is the average number of labels each node has. T is the maximum number of iterations. The time complexity analysis of MLPA-DF is as follows: 1) Calculating the NC of a node needs to know how many edges there are between the neighbors of the node. The time complexity for calculating NC of a node is O(k 2 ). The time complexity for calculating NC of all nodes is O(k 2 n). 2) A node needs to consider all labels of all neighbors to update labels once. The time complexity for a node to update labels once is O(lk). The time complexity for all nodes to update labels T times is O(Tlkn). The time complexity of other parts in MLPA-DF is the same as the corresponding parts in LPANNI. In sparse networks, n is much larger than k, l and T, and the time complexity of LPANNI is near O(n). Therefore, the time complexity of MLPA-DF in sparse networks is also near O(n).

A. DATASET
Experiments are conducted on synthetic and real-world networks. LFR benchmark [22] is used to generate synthetic networks. Each LFR benchmark network has correct overlapping community structures. In recent years, more than half of the label propagation algorithms have been experimented on the LFR benchmark networks [13]. Some parameters are needed to generate the LFR benchmark networks. These parameters and their meanings are listed in Table 4. Larger u makes overlapping communities more difficult to detect. On and Om can determine the overlap degree of the network. Two different types of LFR benchmark networks are generated. In the first type of LFR benchmark network, there are many communities in which the number of nodes is less than 10% of all nodes. In the second type of LFR benchmark network, the number of communities is small, but the number of nodes in these communities is not less than 10% and not more than 25% of all nodes. There are 26 different LFR benchmark networks in total. For these networks, k=10 and maxk=50. Other parameters are listed in Table 5. For different experimental purposes, these networks are divided into four groups. Networks in Group A and B are the first type of network, and Networks in Group C and D are the second type of network. Networks in Group A and C are used to test the performance of the algorithms in two types of networks with the same scale and different parameters. And Networks in Group B and D are used to test the performance of the algorithms in two types of networks with different scales and the same parameters. Table 6 lists the real-world networks used in the experiment. |E| is the number of edges.

B. METHODOLOGY
MLPA-DF is compared with COPRA, SLPA and LPANNI.  Network |N| |E| Karate [23] 34 78 Dolphins [24] 62 159 Books [25] 105 441 Football [1] 115 613 Facebook [26] 4039 88,234 Ca-HepPh [27] 12006 118489 Email-Enron [28] 36692 183831 COPRA and SLPA are popular MLPAs. LPANNI is the basis of MLPA-DF. Parameters for these algorithms are set as follows. For COPRA, v varies from 2 to 8. For SLPA, r varies from 0.05 to 0.5 with an interval of 0.05. For LPANNI, a is set to 3 which is the recommended value in reference [18]. For MLAP-DF, g varies from 0.3 to 1.5. The maximum number of iterations T of these algorithms is 25. For synthetic networks, the performance of these four algorithms is compared from two aspects: the accuracy of detecting overlapping communities and the stability of the algorithm. NMI [29] and NMImax [30] are used to evaluate the accuracy of different algorithms in detecting communities. NMI and NMImax both reach their best and worst value at 1 and 0, respectively. Considering the randomness of these algorithms, these experiments are repeated for 50 times under each parameter settings separately, and get the average value and standard deviation of the NMI and NMImax. NMIavg and NMImax_avg denote the average of NMI and NMImax respectively. NMIstd and NMImax_std denote the standard deviations of NMI and NMImax respectively. Therefore, for every algorithm, the largest NMIavg and NMImax_avg on each LFR benchmark network are used to evaluate its accuracy in detecting overlapping communities, and the corresponding NMIstd and NMImax_std are used to evaluate the stability. Smaller NMIstd and NMImax_std mean the better stability.
For real-world networks, the accuracy and stability of the four algorithms for detecting overlapping communities are compared. Qov [31] is used to evaluate the accuracy of different algorithms in detecting communities due to the lack of correct overlapping community structures of real-world networks. The definition of Qov in reference [31] is for directed networks. it needs to be modified to apply to undirected networks. Therefore, the definition of Qov in this paper is as follows: where Avw denotes whether nodes v and w are connected. If v and w are connected, Avw is 1, otherwise Avw is 0. kv denotes the number of neighbors of node v. bvc denotes the belonging coefficient of label c to node v. The function f is the same as the function recommended in reference [32]: Qov reaches its best and worst value at 1 and 0, respectively Similar to synthetic networks, these experiments are repeated for 50 times under each parameter settings on each real-world network separately, and get the average and standard deviation of Qov. Qov_avg and Qov_std are the average and standard deviation of Qov. The largest Qov_avg is used to evaluate the accuracy of detecting overlapping communities, and the corresponding Qov_std is used to evaluate the stability. Smaller Qov_std means the better stability.
All the experiments are conducted on a PC with 1.70 GHz AMD A8-5545M APU and 16 GB RAM. All algorithms are implemented in Python 3.7.  Table 7.  In Figure 5, LPANNI, SLPA and COPRA have similar performance in terms of largest NMIavg and NMImax_avg on the LFR benchmark networks in Group A. When Om=6, SLPA performs slightly better than the other two algorithms. When Om=4, the performance of LPANNI is slightly better than the other two algorithms. Meanwhile, the performance of MLPA-DF is obviously better than the other three algorithms. This shows that MLPA-DF has better accuracy in detecting overlapping communities than the other three algorithms on the first type of LFR benchmark networks that have a large number of communities with few nodes.

1) COMPARISON OF ACCURACY ON THE LFR BENCHMARK NETWORKS
In Figure 6, although the scale of the experimental network is constantly increasing, the largest NMIavg and NMImax_avg of the four algorithms have not changed much. This shows that on the first type of LFR benchmark network, the scale of the network has little effect on the accuracy.
In Figure 7, MLPA-DF performs best except on the networks with Om=2, and LPANNI performs worst on the most networks. This means that MLPA-DF can also have good accuracy on the second type of LFR benchmark networks where communities have a large number of nodes, while LPANNI is not suitable for this type of LFR benchmark networks.
In Figure 8, the change of the network scale has some influence on the NMIavg and NMImax_avg of the algorithms, but MLPA-DF still has the best performance in most cases.
Combining Figure 5 and Figure 7, in most cases, the NMIavg and NMImax_avg of the algorithms decrease with the increase of Om. And the NMIavg and NMImax_avg of MLPA-DF obviously decrease more slowly than other algorithms. This shows that for LFR benchmark networks where overlapping nodes tend to join many communities, MLPA-DF has a greater advantage than other algorithms.
In general, MLPA-DF can detect overlapping communities more accurately than other algorithms on most LFR benchmark networks, regardless of the type of networks.

2) COMPARISON OF STABILITY ON THE LFR BENCHMARK NETWORKS
NMIstd and NMImax_std are used to analyze the stability of the algorithm. Figure 9 is the distribution of NMIstd and NMImax_std when these algorithms get the largest NMIavg and NMImax_avg on 26 LFR benchmark networks in Table 5. On most LFR benchmark networks, NMIstd and NMImax_std of SLPA and COPRA are distributed between 0 and 0.05. All NMImax_std of LPANNI and MLPA-DF are close to 0. Therefore, for LFR benchmark networks, COPRA and SLPA has poor stability, LPANNI and MLPA-DF have good stability, and the stability of SLPA is better than COPRA, worse than LPANNI and MLPA-DF. This also means the new fixed node update sequence and the double-layer filtering strategy do not reduce the stability of the algorithm.

3) EXPERIMENTAL RESULTS ON THE SAMPLE LFR BENCHMARK NETWORK
The following shows the results of four algorithms for detecting overlapping communities on a sample LFR benchmark network.
The sample LFR benchmark network has 90 nodes, 3 communities and 9 overlapping nodes. Each overlapping node join two communities. There are about 1/3 of all nodes in each community. The correct community structure of the sample LFR benchmark network is shown in Figure 10. Figure 11 is the detection results of four algorithms.  LFR benchmark network (|N|=90, u=0.1, Om=2,  On=9, minc=25, maxc=35, k=5 and maxk=15). In Figure 10 and Figure 11, the color of each node indicates the community it joins and different colored areas indicate different communities. If a node joins multiple communities, it has multiple colors.
In Figure 11, MLPA-DF and SLPA detect 3 communities, but COPRA and LPANNI detect 4 and 2 communities respectively. The detection result of LPANNI differs the most from the correct overlapping community structure. This shows that LPANNI is not suitable for this type of network, which is consistent with the above conclusion. Meanwhile, the detection of overlapping communities usually has the following three problems: One community is mistakenly divided into more than one community, some communities mistakenly become one community, and nodes join the wrong community. On this network, MLPA-DF avoids the first two problems and is better than other algorithms in dealing with the last problem. Compared with other algorithms, MLPA-DF gets the correct number of communities and actively looks for overlapping nodes, although there are some errors. Table 8 lists the performance of the algorithms on the realworld networks. COPRA and SLPA have large Qov_std in all real-world networks, so the stability of these two algorithms is poor on these real-world networks. The Qov_std of LPANNI and MLPA-DF are close to 0, so they have good stability on the real-world networks.

D. Experimental results on the real-world networks
Except for Karate networks, MLPA-DF has the largest Qov_avg. On Karate network, the Qov_avg of MLPA-DF is not much different from the largest Qov_avg. This shows that MLPA-DF has good accuracy on the real-world networks. In addition, the overlapping community structure of realworld networks can be further analyzed based on the detection results of the largest Qov_avg in Table 8. Consistent with the experiments on the synthetic network, the communities in the real-world network detection results can be divided into large-communities (the number of nodes is not less than 10% of all nodes) and small-communities (the number of nodes is less than 10% of all nodes). Then the Nl and Ns in the detection result of each real-world network are listed in Table 9. Nl denotes the proportion of nodes joining the large-communities to all nodes and Ns denotes the proportion of nodes joining the small-communities to all nodes.
In small-scale networks, there are almost all largecommunities. This is probably because the number of nodes is small and there is no need to split out too many communities. In large-scale networks, both types of communities exist, and if the scale is large enough, the proportion of nodes in small-communities may be higher than the proportion of nodes in large-communities. This also explains why LPANNI performs better in large-scale networks. MLPA-DF does not have this limitation, so it solves the problem of insufficient accuracy of LPANNI on small-scale real-world networks.
Considering both accuracy and stability on these realworld networks, COPRA has poor accuracy and stability, SLPA has good accuracy but poor stability, LPANNI has good stability, and MLPA-DF can achieve good accuracy and stability at the same time. Therefore, MLPA-DF can perform well on real-world networks.

V. CONCLUSION
In this paper, a new multi-label propagation algorithm, MLPA-DF, is proposed to detect overlapping communities in complex networks. MLPA-DF is an improved version of LPANNI. In MLPA-DF, the double-layer filtering strategy and node centrality are used to help nodes filter labels and fix node update sequence respectively.
The theory analysis show that MLPA-DF has the near linear time complexity. MLPA-DF and three popular MLPAs (COPRA, SLPA and LPANNI) are tested on both synthetic networks and real-world networks. Experimental results show that MLPA-DF can improve the accuracy for detecting overlapping communities without reducing stability. And MLPA-DF solves the problem of insufficient accuracy of LPANNI in some networks.
However, MLPA-DF still has two problems: one is that it needs to set different parameters g on different networks based on experience. The other is that it cannot detect overlapping communities on directed or weighted networks. Solving these two problems will be our important work in the future.