Community Detection Algorithm Based on Nonnegative Matrix Factorization and Improved Density Peak Clustering

Community detection is a critical issue in the field of complex networks. Recently, the nonnegative matrix factorization (NMF) method has successfully uncovered the community structure in the complex networks. However, this method has a significant drawback; most of community detection methods using NMF require the number of communities to be preassigned or determined the number of communities by searching for the best community structure among all candidates. To address this problem, in this paper, we use density peak clustering (DPC) to obtain the number of centers as the pre-defined parameter for nonnegative matrix factorization. However, due to sparse and high dimensional characteristics of complex networks, DPC cannot be used to detect community directly. To overcome this issue, we employ degree and hop of nodes as the density and distance indexes, respectively; we use NMF and Symmetric NMF to deal with linearly separable data and non-linearly separable data, respectively. Experimental results show that the proposed methods exhibit excellent performance on artificial and real-world networks and superior to the state-of-the-art methods which are the most common method for community detection of complex networks.


I. INTRODUCTION
Community structure is ubiquitous in the form of networks, such as social networks [1], biological networks [2], citation networks [3], etc, which makes community detection especially crucial for better understanding the organization of networks and extracting useful information. Although the common definition has not yet been agreed, a community, also called a module or a cluster, is typically regarded as a group of nodes that are densely interconnected but loosely connected to the other communities of the network [4].
Since the seminal work by Girvan and Newman [1], a number of algorithms for community detection in complex networks have been proposed, such as modularity based algorithms [5], clustering based algorithms [6], [39], random walk based algorithms [7] and matrix decomposition based algorithms [8]- [11]. Readers who are interested can refer to comprehensive surveys of community detection [12]- [14].
The associate editor coordinating the review of this manuscript and approving it for publication was Rentao Gu .
As one of the hot research topics, nonnegative matrix factorization (NMF) has been broadly adopted in community detection, to name a few, [10], [15]- [17] etc. Despite a lot of methods proposed, determining the number of communities is an important and thorny issue in practice. Most of methods adopted maximum modularity (Q) as the criterion to determine the number of communities, such as SNMF-SS [18] proposed by Ma et al. and SBMF [17] proposed by Zhang et al., which is very time-consuming. Other methods rely on external algorithms such as Shi et al. [19] or need the number of communities as priori information such as [10], [15]. However, determining the number of communities in advance is often unfeasible for concrete application and it is also very inefficient to decide the number of communities (K ) by searching all possible candidates.
To address the above issues, we, inspired by DPC algorithm [20] and chen's method [21], propose DPCNMF (Density Peak Clustering with Nonnegative Matrix Factorization) for community detection on linearly separable data and DPCSNMF (Density Peak Clustering with Symmetric VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Nonnegative Matrix Factorization) for non-linearly separable data. The main advantage of this scheme is: it identifies the number of communities (K ) by a single run and can find hub points accurately. Unfortunately, in view of the sparse and high dimensional characteristics of complex networks, DPC algorithm cannot be used on these datasets directly; the similarity between nodes becomes meaningless due to the curse of dimensionality. Therefore, we adopt the degree of a node as the density index and the length of the shortest path between two nodes as the distance index in this paper. The advantage of applying the node degree as density index is that it can avoid the influence of the cut-off parameter d c to clustering results. Then, we utilize the number of density peaks as the number of communities instead of input in advance as usual for NMF. In summary, the key contributions of this paper are: 1) we propose parameter-free algorithm which is only based on topology structure to implement unsupervised community detection method.
2) we provide a method to derive appropriate number of communities with improved density peak clustering, and combine NMF based method to get communities structure. To the best of our knowledge, this is the first work to address NMF's input parameter K by the density peak clustering.
3) The advantages of adopting improved density peak clustering lie in that: it enables us not only to obtain hubs of the network, but also to analyze the different scale community via choosing the number of centers. In addition, unlike DPC which selects cluster centers manually, our method can choose centers automatically.
The rest of the paper is organized as follows. Section 2 introduces the related work on NMF based algorithms to identify community and briefly reviews DPC and several improved DPC methods. Section 3 elaborates our DPCNMF and DPCSNMF methods. Section 4 provides several experimental evaluations between our approaches and some representative algorithms on synthetic and real-world networks. Finally, Section 5 summarizes the paper.

II. RELATED WORK A. RELATED NMF BASED COMMUNITY DETECTION ALGORITHMS
Owing to its innate interpretability and good performance in practice, NMF has received extensive attention in community detection. Many researchers are devoted to improving the performance of NMF in community detection, and some of their representative work includes: Psorakis et al. [22] proposed an approach for finding overlapping communities using a Bayesian NMF model. This method has the advantages of automatically determining the number of communities and does not have the resolution limit. Unfortunately, its built-in estimation of the number of communities can mislead the factorization and return an incorrect solution. Wang et al. [23] investigated how to apply NMF to the community detection for directed, undirected and hybrid networks and developed the optimization algorithm by iteration. Besides, Chen et al. [16] proposed NMFOSC method that can detect network structure with unknown community number by feature matrix preprocessing and ranking optimization. Zhang and Yeung [24] proposed a constrained NMF triple decomposition model, named BNMTF. However, BNMTF requires the number of communities to be set in advance. Zhang et al. [17] applied NMF model for overlapping community detection. This method requires an estimation of the maximum number of communities in advance, same as [22]. However, it is very difficult to set a reasonable value for networks without ground-truth. Recently, Li et al. [15] proposed a method based on semi-supervised matrix factorization and random walk to execute community partition. Wu et al. [10] introduced hypergraph in NMF and utilized the higher-order relationship among the points to promote the clustering performance. However, both of these two methods have the problem that the number of communities is used as a priori information.
Nevertheless, determining the number of communities is still an open issue. Traditional NMF based community detection algorithm can get the number of communities by optimizing the embedded target function. However, these methods are susceptible to various factors, such as the initialized matrix and optimization target function, so it is difficult to accurately determine the number of communities.
To address the aforementioned issues, we use the improved density peak clustering to obtain the number of centers as an NMF input parameter. [20] proposed the Density Peaks Method (DPC) which is published in the Journal of Science. The key idea of DPC is that the cluster centers are characterized by higher density than their neighbors and by relatively larger distance from points with higher densities. Here, ρ i and δ i denote the local density of node i and the distance from i to points of higher density, respectively.

Rodriguez and Laio
There are two ways to calculate the local density ρ i , the first one is defined as: with where d(i, j) is the Euclidean distance between nodes i and j, d c is a cutoff distance, and ρ i is the local density of node i. The second way is defined as Gaussian Kernel function, which is shown as follows: The minimum distance between node i and any other nodes with higher density, denoted by δ i , is defined as: Only those nodes with relatively high ρ i and δ i are denoted as cluster centers. After cluster centers are chosen, each remaining node will be assigned to the same cluster as its nearest neighbor with a higher density.
DPC algorithm has several desirable advantages; arbitrary shape of the clusters can be detected, the number of clusters needn't to be set in advance and the selecting process of centers can be visualized via a decision graph. However, DPC still has deficiencies. First, cut-off distance d c has a greater impact on the clustering results. Besides, DPC requires human intervention to select appropriate cluster centers.
To tackle the above issues, researchers have made lots of effort [25]- [30], [38]. Xie et al. [31] presented an improved DPC by adopting fuzzy K-nearest neighbors to assign the remaining points to the most probable cluster. Chen et al. [32] proposed the CLUB algorithm by identifying the density backbones, which can automatically select cluster centers. Liang and Chen [29], motivated by a divide and conquer strategy and DPC algorithm, proposed the 3DC algorithm which can automatically detect the correct number of clusters as well. Mehmood et al. [28] proposed the CFSFDP-HD nonparametric algorithm by adopting a heat equation to calculate the density results and reduce the sensitivity of d c . Moreover, Bai et al. [33] proposed the OCDDP algorithm, which is the first method of applying the idea of the DPC to detect community.
As far as we know, there is no existing work for community detection that combines the DPC with NMF-based method and automatically selects cluster centers.

III. METHODS
We'll introduce the DPCNMF and DPCSNMF algorithms in this section. Start with a brief explanation of the improved DPC method, we will then describe the clustering process of DPCNMF and DPCSNMF in detail. Finally, the time complexity of our proposed algorithms will be analyzed.
The core idea of DPC algorithm is the premise that community centers, which reside far apart from each other, are characterized by a higher density node surrounded by neighbors with lower density nodes. The DPC algorithm includes two steps: first, detecting the density peaks, namely, the cluster centers; second, assigning the remaining nodes to their corresponding clusters. Though the DPC is theoretically efficient, the datasets of complex networks usually have high-dimensional and sparse characteristics, which makes the similarity between nodes meaningless. Moreover, two main indexes ρ i and δ i of DPC mainly depend on the distances between nodes, while the distances between data points are more uniform or even identical. Hence DPC method cannot be employed directly for community detection. To solve this problem, we adopted the improved DPC method to obtain the number of communities as the NMF input parameter.

A. IMPROVED DPC
In this section, we'll expatiate our DPCSNMF model. Let G = (V , E) be an undirected and unweighted network. The vertex set V contains N nodes, and the edge set E contains M edges, the adjacent matrix of the graph G is A, where a ij = 1 if there is a link between vertices i and j, or 0 if there isn't.
The local density of a node i, denoted by ρ i , is defined as: The benefit of employing degree of nodes as local density is that it can avoid the disadvantages caused by cut-off parameter d c settings and high time complexities.
The calculation of δ i is quite simple: the minimum distance between node i and higher density nodes. It is defined as following: where d ij refers to the shortest path distances between node i and j, namely hop counts.
The community centers are considered as nodes with relatively high ρ i and high δ i . Hence, for each point i, we calculate γ i using Eq.(7) Afterward, we sort the elements in γ in descending order and denote it as γ * . We choose those elements with satisfied γ (hubs(i)) > mean(γ ) + std(γ ) into the core list, where, hubs(i) presents the potential community centers, mean(γ ) and std(γ ) is the mean and the standard deviation of all γ , respectively. Finally, we compute γ * ij using Eq.(8) as following where i ∈ [1, 20%N ], j = i + 1, and is the ceiling function. As a rule of thumb, we only consider 20% nodes that have the chance to be selected as cluster centers. Next, we sort γ * in descending order and use this list to decide the density peaks, viz, the number of communities. To help readers better understand how to choose the number of centers, we present a visualization of selecting peaks in the next section.
The pseudo code of DPCNMF and DPCSNMF algorithms is presented in Algorithm 1. There are two key steps in this algorithm. The first step is the choice of core nodes based on the improved density peak clustering, and the second step is to adopt NMF and SNMF to achieve community partition. There are three sub-steps in the first key step: 1) Calculate ρ by Eq.(5). 2) Calculate δ by Eq.(6), 3) Select the community core nodes. Where algorithm output: NMI is an abbreviation for normalized mutual information, is defined in Eq. (14). We execute 10 independent experiments and VOLUME 8, 2020 Algorithm 1 DPCNMF and DPCSNMF Algorithms Require: The adjacent matrix A Ensure: label, NMI , NMI max , NMI mean , NMI std Step 1: Step 2: d ij ← shortest path distances; Step 3: Calculate δ i via Eq. (6); Step 4: Calculate γ i via Eq. (7); Step 5: [γ * i , hubs] ← sort γ i in descending order; Step 6: γ * ← diff(γ * ); Step 7: [y, Ind] ← sort γ * in descending order; Step 8: cores ← selection of the community cores; Step 9: Carry out NMF; Step 10: Carry out SNMF; evaluated the results by NMI, and we recorded the maximum, mean and the corresponding standard deviation of NMI as NMI max = max(NMI (1), NMI (2), . . . , NMI (10)), 10], n = 10.

B. NMF AND SNMF COMMUNITY DETECTION
Given a set of nonnegative data matrix A ∈ R M ×N , the basic idea of NMF [34] is to find two latent variables F ∈ R M ×K + and G ∈ R N ×K + , where K is the dimension of representation. The matrix A can be approximated by the product of these two matrices, namely, In order to find an approximate decomposition process, an objective function must be defined to guarantee the effect of the approximation. Usually, Frobenius norm is used and the following objective function is minimized by NMF.
where || · || is the matrix Frobenius norm. F ≥ 0 and G ≥ 0 indicate that the elements of F and G are nonnegative, and usually, K min{M , N }. F can be explained as a basis vector matrix and G is the coefficient matrix. Hence, we can interpret the result as a hard clustering by assigning the ith data point to the index that corresponds to the largest element in the ith row of G.
The multiplicative updating rules (MUR) updating rules for the Euclidean distance objective function are as follows: SNMF is a variant of NMF and is used for symmetric matrices, which is necessary for clustering non-linearly separable data [35]. Whereas, we don't know exactly the data is linear or nonlinear before carrying out the experiment. Hence, in this paper we also adopt SNMF model to achieve community detection task. The objective function of SNMF can be expressed as: Similar to the logics of the MUR for solving NMF, an alternating updating rule can be used to solve the SNMF and the updating rule is as follows: C. COMPLEX ANALYSIS Our algorithm consists of two key steps. The first step is to adopt the improved DPC to get the number of communities and the second step is to carry out the NMF and the SNMF. The complexity of the algorithm is as follows.

IV. EXPERIMENTS AND RESULTS
To assess our method, we use real and synthetic data that are used in related work or publicly available sources, taken from http://www-personal.umich.edu/ mejn/Netdata/ and http://snap.stanford.edu/data/. The compared methods are as follows: the Newman's fast algorithm [36], denoted as FN, the louvain method [5], denoted as BGLL, BNMF method [22]

A. EVALUATION METRICS
In this paper, we apply normalized mutual information (NMI) to evaluate the performance of different algorithms. NMI [37] is a widely used similarity measure metric, which is originated from information theory and proved to be reliable. NMI characterizes the similarity between the true community partition and the partition obtained by the algorithm. Let C be the confusion matrix whose element C ij denotes the number of shared vertices between real community A and detected community B. NMI (A, B) is defined as: where CA (CB) is the number of real (detected) communities. In order to evaluate the performance of networks without ground-truth, we use the widely accepted modularity function, namely Q [36] as the metric in the experiment. The modularity is defined as: where A ij represents the adjacency matrix with row and column as node i and j with k i , k j as the degree of node i and j. m is total number of edges in the graph and c i is the community to which vertex i is assigned. θ(c i , c j ) is defined as follow:

B. PARAMETER SETTING
For fair comparison, the parameters in compared methods were set according to the original papers. For BGLL, FN, DPCNMF, DPCSNMF methods which are parameter-free, community detection is implemented based on the topological information. For BNMF and SBMF methods, the maximum number of communities is required as an input parameter, so the max K is predefined according to the result of BGLL and FN.

C. SYNTHETIC NETWORKS AND PERFORMANCE COMPARISON
Firstly, we execute our approach on two types of synthetic networks-the GN benchmark networks [1] and H134 benchmark network. Each GN benchmark network consists of four equally-sized non-overlapping communities with 32 nodes. Each node has Z in edges connecting with other nodes in the same community and Z out edges connecting with the other three communities. On average, the total degree of each node is Z in + Z out = 16. As expected, the community structure becomes less clear and the community detection task becomes more challenging as Z out increases.
To demonstrate our methods' efficiency further, we adopt H134 benchmark graph. H134 has a hierarchical community structure, 256 nodes and two hierarchical levels. At a higher level, there are 4 communities, each with 64 nodes. Each high-level community has embedded four small communities, each small community with 16 nodes. Fig. 1 is the visualization of the sparsity pattern of GN8 network and H134 network.
In Fig. 2, elements of γ are plotted. It can be seen from Fig. 2   γ (5) = 25 < mean(γ ) + std(γ ) = 26.2578, so the cores list of H134 is [14,82,68,213,131,200]. Now, we can utilize the amendatory recommended number of communities as the input parameter of DPCNMF and DPCSNMF. Fig. 3 shows two evaluation metrics to compare six algorithms' performance on GN8 and H134 networks. For probabilistic algorithms like BNMF, SBMF, DPCNMF and DPCSNMF, we execute 10 independent experiments and record the mean and the corresponding standard deviation of NMI and Q. For deterministic algorithms like BGLL and FN, only one run is executed. As can be seen from Fig. 3, the performance of DPCSNMF and BGLL is nip and tuck on GN8 network but better than the other four algorithms; NMI BGLL = 0.5728 > NMI DPCSNMF = 0.5605, and Q BGLL = 0.2471 < Q DPCSNMF = 0.2828. Likewise, for H134 network, performance of DPCSNMF, FN and BGLL ranks first, the value of NMI = 1 and Q = 0.6702 superior to the other three algorithms. Note that although the average performance score of DPCNMF is not high, its maximum value on NMI reaches 1 and Q value is the best among all 6 algorithms.

D. REAL WORLD NETWORKS PERFORMANCE COMPARISON
To further evaluate our algorithm, we choose some representative networks with different sizes, tabulated in Table 1.
Here, n denotes the number of nodes, m denotes the number of edges, k denotes the average of node degree, K denotes the number of communities, and − implies that the number of communities is unknown. Fig. 4 is the decision graphs of 12 real-world networks. Decision graph of Egonet network is omitted due to limited space. We sort γ values in descending order and gain hubs and Ind, namely, density peaks and the recommended number of communities. Due to space constraints, only three networks' automatic selection for centers are discussed in detail. Karate_hubs = [34, 1, 33, 3, 2, 4], Karate_Ind = [2, 5, 3, 4] and (γ Karate (33) = 12) < (mean(γ Karate ) + std(γ Karate ) = 12.9369). Therefore, Karate cores list only includes [34,1] and the amendatory recommended number of communities is 2. Dolphins_hubs = [15,18,21,38,46,34,52], Dolphins_Ind = [2,3,5,7] and (γ Dolphins (38) = 11) < (mean(γ Dolphins ) + std(γ Dolphins ) = 13.6775). Therefore, Dolphins cores list only includes [15,18,21] and the amendatory recommended number of communities is 2 and 3. Risk_hubs = [5,12,23,16,26,33,36], Risk_Ind = [2, 3, 7, 10], mean(γ Risk ) < (γ Risk (16, 26, 33, 36) = 6) < (mean(γ Risk ) + std(γ Risk ) = 9.5586), and ρ Risk (16,26,33,36) = ρ max > (mean(ρ Risk ) + std(ρ Risk ) = 5.1201). So, Risk cores list only includes [5,12,23,16,26,33,36] and the amendatory recommended number of communities is 2,3 and 7. The superiority of this method lies in that cluster centers can be automatically   extracted, which eliminates the shortcoming of cluster centers determination by manual selection. Table 2 and Table 3 show the performance of six algorithms on real-world network in terms of the Q and NMI values, respectively. In the two tables, the number in the bracket of each cell represents the number of communities detected by a certain algorithm. The entries corresponding to the ground-truth's best detection result are emboldened and second-best are underlined. For BNMF, SBMF, DPCNMF and DPCSNMF which are probabilistic algorithms, we execute 10 independent experiments and record the maximum Q and NMI . Owing to space constraints, we omit the mean and the corresponding standard deviation of Q and NMI . For BGLL and FN which are deterministic algorithms, only one run is executed. Since BNMF and SBMF have to scan K in a large range, we predefined maximum K for BNMF and SBMF according to the result of BGLL or FN for the sake of computation efficiency.
As can be seen from Table 2, although Q value of our methods is not the highest among six algorithms for some datasets, the result of DPCNMF and DPCSNMF is consistent with reality. For example, on the karate network, DPC-NMF and DPCSNMF achieve NMI = 1 but only achieve Q = 0.3715. However, we know that Q = 0.3715 is indeed the true partition result of the karate network according to Table 3. Based on the result of the NMI and modularity Q in Table 3 and Table 2, we find that the higher modularity value does not always correspond to the better partition of the network. By observing Table 2, we also note that the Q value of DPCNMF and DPCSNMF is slightly inferior to the compared algorithms on School7 network, Jazz network, and Egonet network. However, this does not suggest that our methods perform poorly because it tends to detect large community structure. School6 network and School7 network are the same network with different labels so the detection result of our two methods are same, and the number of community Comparison in terms of Q on real-world networks (The entries corresponding to the ground-truth's best detection result are emboldened and second-best are underlined, for networks without ground-truth we don't make the mark).  is six. For Jazz network, the Q value of DPCNMF better than that of DPCSNMF, and the detected number of community is less than or equal to compare methods. The reason may lie that distribution of Jazz network is linear independent. For Egonet network the number of community detected by both of DPCNMF and DPCSNMF is less than that of compared methods.
In addition, from the Table 3, we notice that the maximum value of NMI obtained by DPCNMF method on Karate network, Polbooks network, Adjnoun network, Polblogs network is 1, 0.5979, 0.3742, 0.7276, respectively. Moreover, DPCNMF method on Dolphins network is the second-best with NMI of 0.7532. Therefore, DPCNMF achieves four best and one second-best NMI out of nine real-world networks with ground-truth. The maximum value of NMI obtained by DPCSNMF method on Karate network, Dolphins network, Football network, Polbooks network and Risk network is 1, 0.8141, 0.9336, 0.5979, 0.9453, respectively. Moreover, DPCSNMF method on Polblogs network is the second-best with NMI of 0.7197. So, DPCSNMF achieves five best and one second-best NMI out of nine real-world networks with ground-truth among six algorithms. This result clearly indicates that DPCNMF and DPCSNMF outperform the compared algorithms and can obtain superior community structure in real-world networks. Note that we assign groundtruth of 12 to BNMF and SBMF because the number of community obtained by BGLL and FN is 10, 6, respectively. Consequently, adopting ground-truth as an input parameter, SBMF not only achieves the best Q, but also gains the secondbest NMI at Football network. Furthermore, by observing Table 3, we also find that for School6 network, School7 network, Adjnoun network and Polblogs network, DPCNMF's maximum value of NMI is better than DPCSNMF's. This is probably because the aforementioned four networks contain linear independent data. Hence, asymmetric DPCNMF should outperform symmetric DPCSNMF. In a nutshell, each clustering method has its suitable scenario. Kuang et al. in [35] discovers that asymmetric NMF performs well when different clusters correspond to linear independent data, while symmetric NMF can capture nonlinear independent subspace. In many cases, the distribution of data is hard to be known in advance, which is the exact reason that we propose DPCNMF and DPCSNMF to deal with different networks.
The time costs are exhibited in Table 4. The fastest algorithms are bolded. From Table 4, we have the following observations. The BNMF, SBMF, DPNMF and DPSNMF methods share a much higher time cost than BGLL and FN. In addition, both BNMF and SBMF consume more running time owing to searching for the best community structure among all candidates. Although our methods suffer from higher computational cost than BGLL and FN, they can generally achieve the best detection performance among all compared methods. It should be noted that the runtime of DPCSNMF is slightly inferior to BGLL for small networks. While for larger networks such as Polblogs network and Egonet network, the running time of BGLL is nearly 7 times and 35 times faster than DPCSNMF, respectively. The proposed methods are not suitable for applying to large-scale networks due to the fact that we use the Floyds algorithm to calculate the shortest distances and its complexity is O(N 3 ). This bottleneck motivates us to improve our methods to address larger-scale networks in the future.
For the sake of clarity, we choose Karate network and Dolphins network as the case study to visualize the detected communities. The nodes are drawn in different colors corresponding to different communities. The first sub-figure of Fig. 5 is the visualization results of our methods on  Karate network (Q = 0.3715, NMI = 1). Both DPCNMF and DPCSNMF can correctly divide Karate network into two communities. The second sub-figure of Fig. 5 is the visualization results of original Dolphins network. The third and the fourth sub-figures of Fig. 5 are the visualization results of our methods on Dolphins network with 2 and 3 communities. As can be seen from the third sub-figure, only node 40 is misclassified by our method.
From Fig. 4, Fig. 5 and above experimental analysis, we can conclude that our methods are capable of detecting communities in different resolutions. If we want to get small groups of communities, we can adopt bigger K , vice versa. Adopting different K values is like viewing a network from a coarse-grained view (lower K ) to a fine-grained view (higher K ). Therefore, our methods can provide insight about the interior structures of communities in the networks.

V. CONCLUSION
In this paper, we put forward DPCNMF and DPCSNMF algorithms for community detection in complex networks, which combine NMF algorithm with improved DPC algorithm. Compared with traditional NMF-based methods, our methods mainly have three advantages. Firstly, our methods are parameter-free algorithms. Secondly, the proposed methods can automatically detect the correct number of communities rather than manual selection. Thirdly, the improved DPC method ensures the centers selection. Meanwhile, according to the recommended centers nodes, we can detect different scales of community. Finally, our methods have been tested on a series of benchmark graphs and real-world networks. The experimental results show that the DPCNMF and DPCSNMF methods have better performance in uncovering indistinct community structure compared with other state-ofthe-art methods.
However, there is still room for improvements in terms of performance. The potential avenue for future work includes how to extend our methods to handel extremely larger-scale networks and considering how to integrate constraint information into our model and we'll mainly focus on the semisupervised community detection algorithm.