Community Detection based on Fish School Effect (February 2022)

Community structure has a more intuitive, physical meaning than how it has traditionally been perceived; this representation plays an important role in networks and is accordingly becoming widely adapted. Consequently, community detection has attracted increasing attention. Although scholars have proposed many community detection methods from different perspectives, due to the complexity, diversity and dynamic characteristics of networks, efficient community detection in many real networks remains a challenge. Inspired by fish school effect in real life, this paper envisions networks as an ecosystem and proposes a novel dynamic model that aims to reveal the communities in a more intuitive way. Relying on the new model, we design a community detection algorithm, known as community detection based on fish school effect (CDFSE). CDFSE has plentiful desirable properties: high-quality community detection, parameters free and notable scalability. To evaluate the performance of CDFSE, this paper employs two widely utilized evaluation metrics and eleven representative algorithms to test the effectiveness of the algorithm in both synthetic and real-world networks. The experimental results show that in most cases, CDFSE is superior to the comparison methods in terms of the quality of community detection.


I. INTRODUCTION
With the rapid development of computer and information technology, modeling the complex networks found in nature has attracted increasing attention and became a popular research topic. Therefore, complex networks are becoming an important research direction. In the real world, a large number of composite systems involving such fields as biology, computer science, sociology, economics, transportation, and iatrology can be abstractly described as complex networks [1][2][3][4]. The framework of complex network is mainly contributed by nodes and edges, nodes indicate elements in complex systems and edges represent specific relationships between pair of nodes. Community detection provides a scheme for analyzing the structural characteristics of complex networks to study their organizational functions and explore their potential connections. For example, community detection could elucidate the mechanism behind human migration and social structure formation [5]; constructing a pathophoresis dynamics model to predict and mitigate the spread range of infectious diseases [6]; investigating brain networks to reveal the potential conjunction between various functions and discovering treatments for major diseases [7]; optimizing and processing massive amounts of data to realize network public sentiment monitoring and improvement of intelligent recommendations and precision marketing [8][9]; characterizing the interdependence and hidden network behind the economy to predict the occurrence and evolution of financial crises [10][11]; and detecting roads with frequent traffic accidents and congestion, ensuring the safety and smoothness of transportation networks. In consequence, the study and design of efficient and accurate community structure detection methods is crucial.
Currently, representative community discovery algorithms in the field of complex networks primarily include graph-based segmentation algorithms, hierarchical clustering algorithms and partition-based optimization algorithms [12]. The classical algorithm (KL) is based on bisection. Its basic idea is to introduce a trial function Q for the network. Q represents the difference between the number of edges in two quasi communities minus the number of edges between two quasi communities, and then get the partition method to maximize the value of Q. However, this algorithm and many others need to know the number of communities and the number of nodes in each community in advance. Hierarchical clustering algorithms primarily include splitting algorithms and aggregation algorithms. The GN algorithm [13] is one of the most popular splitting algorithms and is based on edge betweenness. In the process of partitioning, the algorithm needs to constantly calculate and eliminate the edge with the highest edge betweenness. The fast greedy algorithm [14] is a representative aggregation algorithm that uses the modularity index as the objective function of the algorithm partition and selects the structure corresponding to the optimal modularity value as the final community partition result [15]. The two algorithms have high time complexity and low effectualness for large-scale network community discovery. The main idea is to constantly adjust the community of each node such that the algorithm partition obtains the best community discovery. Although the k-means algorithm based on partitioning [16] has the advantage of simplicity, low complexity and fast convergence speed, its distinct method for randomly selecting initial community centers affects community partitioning to a certain extent. The label propagation algorithm (LPA) [17][18] has approximately linear time complexity. On the other hand, outcomes observed from community detection are regularly in volatile.Roy U K proposed a modified local random walk method to catch the fuzzy community based on neighbors' similarity [19]. Xiaodong Li surveys recent theoretical advances in convex optimization approaches for community detection [20]. Zhang, Y proposed the SEAL algorithm to detect communities using generative adversarial networks [21].
With the expanding scale of complex networks, traditional community discovery algorithms appear some shortcomings which limit the efficiency of the algorithm in practice, such as high complexity, the demand to specify the number of communities, community size and other prior conditions, or the need to optimize the predefined objective function.
This paper proposes a community detection method based on fish school effect (CDFSE) to characterize network community structure. The network is regarded as an ecosystem, and its dynamic processes over time are explored. Inspired by this principle, a new dynamic model is designed to simulate fish school effect. Compared with the traditional community detection method, the CDFSE method has better effect. Firstly, it can achieve high-quality community division. Secondly, it does not require parameter setting. Thirdly, it does not claim the prior knowledge of network structure; more importantly, it has the characteristics of low time complexity, and it works efficiently in identifying community structure in large-scale networks. In the next section, we explore the central principle behind this method.

A. Basic idea
In the physical world, common interests or hobbies will attract people to each other. These people may then attract other members to join or be attracted to and join a larger group. Once a group is large enough, it will have stronger resistance [22][23]. More importantly, this group is likelier to attract other potential members. In nature, and all aspects of human life, this behavior can be seen; for example, when two people like basketball, they may set up a basketball team to attract other members or join other basketball teams. People are attracted by activities according to their interests. This phenomenon is similar to fish school effect in the ocean [24]. Fish school effect refers to the phenomenon of groups of fish swimming in the sea, both chaotic and orderly. The fish move from east to west in a uniform manner according to the current and food availability; when attacked by hunters, they quickly gather and disperse, similar to an organization with a strict division of labor and cooperation. Fish school effect is mainly used in the field of intelligent transportation. Through the research on the consistency of fish swarm and swimming behavior, a mathematical model of fish school movement is constructed to simulate the vehicle path trajectory, so as to effectively realize the multi vehicle cooperative control in traffic [24][25]. Community can be regarded as a collection of users with strong common attributes. As time goes on, when an increasing number of members with this common attribute are in the community, they can form a potent organization, similar to fish swarming in the sea, and produce a force far greater than the individual. The individuals can not only resist risks but also attract other members to join to form a larger community. In complex networks, nodes share similarity and attract each other, especially in scale-free networks. Barely nodes occupy larger degree than others while majority nodes own small degree. Majority of edges cluster in the community while only few of them cross between communities within community structure. Based on this feature, we examine whether we can characterize community structure by reproducing fish school effect for nodes in a complex network. We propose a new method to divide community structure based on fish school effect. Regarding the network as an adaptive dynamic system and investigate its dynamics with time-varying becomes a significant notion. Specifically, in an ocean network, there are numerous kinds of fishes, which is the initial state; the same kind of fish attract each other because of their common characteristics (they share great similarity), forming an initial group. The group itself and a small number of fish in the group will attract and gather other fish of the same kind because of their greater attraction, forming a large-scale group that is more attractive, more stable, and even attracts other groups to form a larger group, which is the motion state of fish group. Over time, different kind of fish will be attracted to different groups according to their distinct characteristics, which is the steady state of the fish group, and the whole process is shown in Fig. 1.  Why do all kinds of fish with initial state in the sea move regularly and finally reach a steady state? Because at the beginning, each individual in the network has its own characteristics (this article refers to the characteristics that are different from other individuals, or its charm or resources) and commonness (this paper refers to the common characteristics that distinguish groups from each other, for example, common interests and common goals), it is these features and commonality to push them, over time, each individual is attracted by different groups (communities) to achieve a stable state.
We will formally define the fish school effect model in Section 3. Here, according to the three states of fish school effect, a small fish swarm network in the sea is employed as an example to illustrate this idea. The nodes in Fig. 2 can be regarded as individual fish in the sea. The edge connection between nodes indicates that the two fish have common characteristics, and the value of this connection is their similarity. The procedure of examination in community detection depending on fish school effect splits into three stages: at the beginning, each fish has its own characteristics and commonness, and they are all scattered. The similarity between individuals is calculated by the Adamic-Adar Index [26] (AA) expression. Here we need to introduce the AA similarity index. The index considers the degree value information of the common neighbors of two nodes. The idea is that the contribution of the node with small degree value is greater than that with large degree value. For example, the similarity between individuals 2 and 3 is 0.53, and the similarity between individuals 4 and 6 is 0.32 as illustrated in Fig. 2 (a). The following stage is that, fish with larger characteristics attract each other to form subgroups because of their greater similarity. For example, individuals 2 and 3, individuals 4 and 6, individuals 8, 9 and 10, and individuals 11 and 15 have greater similarity than other individuals, so VOLUME XX, 2017 they are the first four to form subgroups A, B, C and D respectively and presented in Fig. 2 (b). In the third state, an expanding amount of fish are attracted to join the subpopulation in order to form a large population due to fish school effect. For example, the affinity between individual 1 and individual 2 is 0.21, the similarity between individuals 1 and 3 is 0.39, and the similarity between individuals 1 and 7 is 0.22. Therefore, compared with group B, group A is more attractive to individual 1. In the same way, the similarity between 7 and individual 1 is 0.22, and the similarity between individual 7 and individual 6 is 0.30, so individual 7 is more attracted to group B. In the end, individual 1 is attracted by group A, fish individuals 7, 5 and 12 are attracted by group B, and fish individuals 13 and 14 are attracted by group D. From this prospective, each individual is attracted to different groups or communities in other words. Therefore, the community structure is exposed naturally and this procedure is demonstrated in Fig. 2(c).

B. Contributions
By simulating fish school effect, the CDFSE algorithm achieves several advantageous community detection characteristics for complex networks. The main advantages are as follows:  Novelty: "Fish school effect" reflects the behavior of fish in the ocean but now is introduced into the complex network for community detection. At the same time, it is also the principle of interlinkage of complex natural laws.  High efficiency: Because the CDFSE algorithm derives from a natural law, it is closer to a real network. Through experimentation, the algorithm is more efficient and has better community detection quality than several other representative community detection methods.  Simplicity: Compared with the traditional classical methods, CDFSE method is simpler and more convenient, that is, CDFSE method does not request parameter settings. Through experiments, we further verify the feasibility of applying fish swarm effect theory to community detection in complex networks.  Scalability: The time complexity of CDFSE is determined as O (n•k 2 ) due to the only restriction in calculating the attractiveness of adjacent individuals. It is worth noticing that the value of k represents the average degree of nodes and its numerically tiny. For this reason, large-scale networks can be more efficiently processed by applying CDFSE algorithm. The first section of this paper introduces the main ideas and innovations of CDFSE. The exposition of CDFSE method and fish school effect model including the corresponding algorithm will be introduced in the following section by a more specific way. In the third section, CDFSE is compared with several typical algorithms on the generation network and the real network. Finally, the conclusion is given in the fourth section.

A. Relevant Definitions
Before elaborating the CDFSE method in detail, we firstly formalize some basic notations. TABLE I describes all the key symbols used in this paper and gives a brief description. We denote an undirected and an unweighted network as G = (V, E) where V is a set of nodes and E is a set of edges.
where CN v1v2 is the common neighbor node of v1 and v2. In fact, any two nodes in the community may have similarity. Generally, the similarity of two nodes is relatively large only when they are directly connected. If two nodes are indirectly connected, their similarity is small.
There are many kinds of fishes in the sea. The identical kinds of fish share the same lifestyles and attract each other to form a school of fish. Equivalently, individuals have their own interests and characteristics in reality which caused mutual attractions into the same group among people who shares related interests. For example, both A and B like basketball, and both B and C like music. Compared with basketball, B likes music more. It is obvious that the similarity between A and B is not as great as that between B and C, so B and C will attract each other with same music rather than a basketball team. Because the attraction between individuals is related not only to the similarity but also to the degree of the individuals themselves, the attraction between individuals is defined as the multiplication of the degree of individuals and the similarity between individuals. Definition 2. (individual attraction) Given the undirected network G = (V, E), the attractiveness of node v1 to node v2 is defined as SAA v1v2 indicates similarity coefficient of AA between nodes v1 and v2, and D v1 represents the degree of node v1.
Once the fish in the sea form a group, it will produce the fish group effect. After joining a fish group, fish have to accept the constraints of the organization, simultaneously, they obtain stronger anti-risk ability and rich food. Thus, they attain a greater ability to attract new members. Correspondingly, the community itself is a collection of users with strong common attributes, which will also have a strong attraction for new members. For example, an excellent research team will attract more researchers to join. In this research, we apply the connection strength of individuals in a group to describe the attraction of groups to individuals.
Where d c v denotes the internal degree of node v in Community c. According to the number of edges from node v to different communities, the community attraction CA v→c can be divided into two cases: one is d c v inequality, and the other is d c v equality.
To better understand how to use formula (3) to calculate group attractiveness, we use an example network to illustrate. The example network consists of 14 nodes, which are divided into two communities A and B, as shown in Fig.  3. There are also two cases: (1) when a node's internal degree in two communities is not equal. For example, for node 12 in Fig. 3 (a), the degree of internality in community A is d A 12 = 1 , while the degree of internality in community B is d B 12 = 2 . The internal degree of node 12 in community A is significantly less than that in community B. The attraction of node 12 to community B may be greater than that of community A. To further determine this attraction, we consider the influence of indirect neighbors on node 12. In community A, node 3 is connected with node 12, and the degree of node 3 is 5, d A 3 = 4 , so the attraction of community A to node 12 can be calculated as CA 12→A = 1 2 + 4 = 5 according to formula (3). In community B, nodes 7 and 8 are connected with node 12, and the degree of node 7 is 4, so d B 7 = 3, the degree of node 8 is 4, so d B 8 = 3, so the attraction of community B to node 12 can be calculated as CA 12→B = 2 2 + 3 + 3 = 10 . Since the attraction of community B to node 12 is stronger than that of community A to node 12, node 12 is more likely to join community B, as shown in Fig. 3 (a). (2) When a node is equal within two communities, for example, for node 12 in Fig. 2 (b), d A 12 = d B 12 = 2 , it is difficult to judge which community, A or B, are more attractive to node 12. Consequently, the impact of indirect neighbors appears in node 12 must be attentively considered. In community A, nodes 5 and 6 are connected with node 12, and the degree of node 5 is 5, so d A 5 = 4 , the degree of node 6 is 5, d A 6 = 4 , so according to formula (3), the attraction of community A to node 12 can be calculated as CA 12→A = 2 2 + 4 + 4 = 12 . In community B, nodes 9 and 10 are connected with node 12, and the degree of node 9 is 3, so d B 9 = 2 , the degree of node 10 is 4, d B 10 = 3 , so according to formula (3), the calculation can be given is that the attraction of community B to node 12 is CA 12→B = 2 2 + 2 + 3 = 9 . Because community A is more attractive to node 12, node 12 is more likely to join community A, as shown in Fig. 3 (b). FIGURE 3. Example of group attraction. For node 12, it is ultimately attracted by that community, which is determined by its internal degree and the degree of its adjacent nodes, which can be calculated by formula (3).(a) When the internal degree of node 12 in the two communities is not equal, after calculation, the attraction of community B is greater than that of community A, and it is finally attracted to community B.(b) When node 12 has the same internal degree in the two communities, the attraction of community A is greater than that of community B, and it is finally attracted to community A.

B. Fish School Effect Model
On the basis of the relevant definitions in Section 2.1, the fish school effect model is constructed. The model consists of three stages: network initialization, subpopulation formation and fish school effect, as shown in Fig. 4.

1) NETWORK INITIALIZATION
In the sea, there are a variety of fishes, where each fish is a single individual with its peculiar living habits and characteristics. Therefore, we can regard the individual fish in the ocean as a node in the network. In this paper, we apply the degree and similarity of nodes to describe their characteristics. Initially, each individual has its own characteristics and is regarded as an independent group or individual, as shown in the first stage of Fig. 4. VOLUME XX, 2017

2) SUBGROUP FORMATION
In the sea, each fish is an independent individual with its own personality. Therefore, there are certain differences among individuals. Some individuals have greater characteristics and will attract other individuals, while others with smaller characteristics will be attracted by other individuals. Furthermore, distinctive resources have been allocated to particular node in the network. The more resources the nodes have, the more attractiveness they obtained to let other nodes to join in. Oppositely, nodes with fewer resources are attracted by remaining nodes In the second stage as shown in Fig. 4, when D v > max D v1 , v1 ∈ N v , node v still belongs to the atomic group C v . Otherwise, node v joins a new subgroup C v1 where C v is the community to which node v belongs and D v represents the degree of node v. By applying a series of iterations, nodes equipped with extra information are presumably to attract more neighbors to join their communities for building a subgroup.

3) FISH SCHOOL EFFECT
An expanding quantity of nodes are appealed by varied subgroups after the formation of subgroups. The structure and attractiveness of the subgroup may change when a node joins it. As long as there are nodes in the network that have not joined the group, a new round of iteration will be carried out. Eventually, driven by the network topology, all nodes will be able to change their attractiveness after processing many iterations. The nodes have joined different groups, the network structure reaches the equilibrium state when the group to which the nodes belong no parameters longer changes, as shown in the third stage of Fig. 4. In this way, the community structure of the network is formed naturally.

C. CDFSE Algorithm
The detailed algorithm named CDFSE will be interpreted in this part.

1) NETWORK INITIALIZATION
Initially, we characterize each node as an independent individual or group. To distinguish different nodes, we label each individual or group with a number for initialization.

2) CORE GROUP CALCULATION
Before calculating the core group, we need to compute the individual attraction first. Individual attractiveness is related to resources and similarities. In the network structure, we apply the degree of nodes to represent the resources owned by nodes and use the AA similarity coefficient to represent the similarity between nodes. According to formula (2), we can measure the attraction between individuals. It is because of the existence of individual attraction that the nodes in the network attract each other and form a subgroup (core group). We can calculate the core group through a round of iteration.

3) FISH SCHOOL EFFECT SIMULATION
According to the fish school effect model proposed in this paper, the interaction between subpopulations constantly attracts other individuals to join in and thus form larger subpopulations. The procedure of fish school effect can be simulated because the subpopulation tends to be stable after repeatedly iterations. In the community, we employ the normalized mutual information index to evaluate the performance of every community partition. With the passage of time and the influence of topology, the network structure achieves a steady state; thus, the best division can be obtained. The CDFSE algorithm is displayed in algorithm 1 CDFSE.

D. Algorithm Complexity Analysis
The CDFSE algorithm is primarily divided into three stages, and the time complexity of each stage is shown in TABLE II. Therefore, the computational complexity of CDFSE is O (k 2 •n). The time complexity of the CDFSE algorithm is relatively low because the average degree k of the network is generally small, for this reason the algorithm can deal with large-scale networks. When l times of iteration are needed, the network reaches a stable state, and l is usually 3-10

III. Experiments
To evaluate the achievement of the CDFSE algorithm, this experiment compares CDFSE with eleven representative community detection algorithms and conducts experiments on the generated network and real network respectively. Firstly, we briefly describe these algorithms.
Ncut (Normalized cuts) [27] is a typical spectral clustering method. Above all, the sample data set is defined as an affinity matrix describing the similarity between the two data points, and then the eigenvalues and eigenvectors of the matrix are determined. Next, the appropriate eigenvectors are selected to cluster different data points. The essence of Ncut is to transform the clustering problem into the optimal partition of graphs.
Infomap [28] is a community detection algorithm combining random walk and information theory. It regards community detection as a coding problem and obtains the optimal community structure according to the principle of minimum description length.
LPA (Label Propagation Algorithm) [29] is a graphbased semi-supervised learning method that uses the relationship between samples to establish a complete graph model. Its essential idea is to use the labeled information of marked nodes to predict the labeled information of unlabeled nodes. The algorithm is simple to implement and shorter execution time cost, but the result obtained from each run is unstable.
WT (WalkTrap) [30] is a random walk algorithm, which divides communities by calculating the similarity among nodes using random walk.
FG (Fast Greedy) [31] is a hierarchical aggregation Approach for community detection. The mechanism behind this approach is to optimize modularity greedily to reveal the community structure.
Louvain [32] is a well-known multilevel modularity optimization community detection algorithm that allows hierarchical community detection. Its time complexity is lower compared with the algorithm proposed by Newman.
MCL (Markov Clustering ) [33] refers to the Markov clustering algorithm, which is a fast and scalable clustering algorithm. It is based on the simulation of flow in graphs and is widely used in different fields, mainly in the field of life science.
FluidC (Fluid Communities) [34] is a scalable and diverse community detection algorithm based on propagation. It simulates the expansion and contraction of a fluid until equilibrium is found.
EDCD (Edge-Deleting Community Detection ) [35] is an optimized modularity algorithm that operates a restriction strategy to iteratively delete edges to find strongly connected communities. SCD (Silhouette Community Detection) [36] is an embedded clustering method, which reveals community structure by optimizing the contour measurement, particularly, extracting the real value representation of nodes from its neighborhood.
ASOCCA (Adjacent node Similarity Optimization Combination Connectivity Algorithm) [37] is a combination connectivity algorithm for optimizing the similarity of adjacent nodes to achieve accurate community testing. It uses the local similarity measure based on the clustering coefficient to identify the nearest neighbor of each node. Obtaining multiple groups of connected components by combining different node pairs in order to form the final initial community. Additionally make usage of community merging strategy to further optimize the community structure.
NBCD (Neighbour Based Community Detection) [38] is a neighborhood based community detection algorithm. It is based on two novel similarity measures using a similarity parameter α and a set of ground rules are proposed in this work. The similarity parameter α provides a choice for the user to select the tightness of the nodes within the communities.

A. Evaluation Index
To comprehensively investigate the achievement of various algorithms, this paper uses two popular evaluation methods to evaluate the accuracy of community detection. The following is a brief introduction of these two evaluation indicators.
Normalized mutual information [39] (NMI) is generally used in clustering to measure the similarity of two clustering results. It can objectively evaluate the accuracy of community classification compared with standard division. NMI is defined as follows: where I(a;b) represents the mutual information between a (true value) and b (Estimated value), and H(a) denotes the entropy of a. The value range of NMI is 0 to 1. The larger the value is, the more accurate the division will be. NMI = 1 means that the predicted community is completely consistent with the real community division, while NMI = 0 indicates that it is completely different.
Another indicator is the adjusted random index (ARI) [40], which is often used to calculate the similarity between two samples. ARI is given as follows: ARI = x 11 − x 11 + x 01 x 11 + x 10 x 00 x 11 + x 01 + x 11 + x 10 2 − x 11 + x 01 x 11 + x 10 x 00 (5) where x 11 indicates the logarithm of points belonging to the same community in both the real community partition and the predicted community partition, x 00 represents the logarithm of points that do not belong to the same community in the real community partition and the predicted community partition, x 10 represents the logarithm of the points that are not in the same community in the predicted community division but belong to the same community in the real community division, and x 01 represents the logarithm of points that do not belong to the same community in the real community division but belong to the same community in the predicted community division. The quality of community detection is evaluated by calculating the number of the same and different sample pairs allocated to the predicted community partition and the real community partition.

B. Network Generation
In this paper, the well-known LFR model [41] is used to create a synthetic benchmark network. The LFR benchmark network assumes that the distribution in the network and the distribution of community size conform to the powerlaw distribution: power law index of degree τ 1 sequence, τ 2 Negative index of community size distribution. The edge of each node in the community is 1-μ times its degree, while the edge connected to the outer community node is μ times its degree, which is called the mixed parameter 0 < μ < 1 and is used to dominate the complexity of community division. In addition, the model has several parameters: the number of nodes in the generated network N ; the average degree of the generated network k ; the minimum number of community nodes in the generated network C min ; and the maximum number of community nodes C max . The average clustering coefficient cc of the network is generated.
To evaluate and compare the performance of various algorithms, the following parameters are set for the generated network: N=1000, K=15, C min = 10, C max = 50, τ 1 = 2 , and τ 2 = 1 , where μ varies from 0.1 to 0.8 to increase the complexity of network generation. Fig. 5 demonstrates the performance of these several methods on two distinctive indexes. In the NMI metric, when the parameter μ ranges from 0.1 to 0.5, except for the FG algorithm, which seems unsuitable for these networks, other algorithms can divide the generated network perfectly. As the parameter μ changes from 0.5 to 0.6, the performance of the LPA, FG and EDCD algorithms declines significantly, and their NMI values fall below 0.75, while the Infomap, CDFSE and Ncut algorithms are more robust as their NMI values are all above 0.95. The performance of other algorithms, such as Louvain, WT, MCL, ASOCCA, FluidC and SCD, with NMI values are between 0.85 and 0.95 in average. When μ > 0.6 , except for the MCL algorithm, which is still strong, the performance of the other algorithms is significantly reduced. In terms of the purity index, when μ ranges from 0.1 to 0.4, except for the FG algorithm, the other algorithms can divide the generated network well. When μ ranges from 0.4 to 0.5, the performance of the EDCD and MCL algorithms begins to decline significantly, and its ARI value drops below 0.6. When μ ranges from 0.5 to 0.6, the performance of the LPA algorithm declines sharply, and its ARI value tends to 0. Infomap, CDFSE and Ncut are still robust, and their ARI values are still above 0.95. The performance of WT, Louvain, SCD, ASOCCA and FluidC algorithms also declines to varying degrees, with NMI values between 0.5 and 0.7. When μ > 0.6 , the performance of all algorithms decreases rapidly, but SCD and CDFSE are still better than that of others. VOLUME XX, 2017 Moreover, for the purpose of evaluating the effectiveness of the comparison algorithms in different community density networks, we apply fixed μ and change the average degree parameter K to generate the network, where μ=0.1 and k changes from 5 to 25. When k changes from 5 to 25, the performance of each comparison algorithm in different indicators is displayed in Fig. 6. It can be clearly observed that, our CDFSE algorithm achieves the best performance in terms of the NMI index and ARI index compared with other algorithms. When k=5, the NMI value of CDFSE is above 0.95, and the ARI value is above 0.92, while the NMI value and ARI value of the other algorithms are not greater than 0.8. When k≥10, the NMI value and ARI value of the CDFSE algorithm reach 1, and the performance of the other algorithms is not as good. The highest NMI value and ARI value of the CDFSE algorithm do not exceed 0.95. Additionally, the performance of the FG algorithm is worse because the FG algorithm is sensitive to the community density on the generation network. This effect may be caused by the resolution limitation of modularity [42].
Finally, in order to evaluate the effectiveness of the comparison algorithm in large-scale networks, we use the generation network with the increasing number of nodes. When the number of network nodes changes from 1000 to 50000, the performance of each comparison algorithm under different indicators is shown in table V-VI. It can be clearly observed that compared with other algorithms, our CDFSE algorithm has achieved ideal community detection results in NMI index and ARI index.

C. Real Network
To test the effectiveness of the algorithm on real data sets, this paper also makes comparisons between several representative real networks in different sizes, which all have actual community structure information. The NMI and ARI indexes are still used to evaluate the community detection quality of each algorithm because these networks have known real community partition information. TABLE VII describes the basic information for these real networks. These websites access real data sets: http://snap.stanford.edu/data/, http://networkrepository.com /networks.php, and https://networkdata.ics.uci.edu/index. php. Firstly, we provide a brief description of these networks.
Football network: This network originated from American college football games. It consists of 115 nodes and 613 edges. The nodes represent each football team, and the edges represent regular games between the two connected teams. The network consists of 12 sports leagues (i.e., clubs), each of which contain approximately 8-12 teams.
Zachary's karate network: This is a well-known network of karate clubs in the United States that represents the friendships among the 34 members of the club. The network is divided into two communities due to the different opinions of the leaders.
Dolphin network: A dolphin network constructed by D. Lusseau et al. after seven years' observation of a group of bottlenose dolphins living in Doubtful Sound, New Zealand. The nodes represent individual dolphins, and the edges represent frequent contact between two dolphins. The network has 62 nodes and 159 edges, which are divided into two communities.
Political book network: This is a network of books about American politics. These books are sold by online booksellers on Amazon. Its nodes represent the books related to American politics sold in the Amazon online bookstore, as well as the number of readers who have purchased these two books. The nodes are divided into three types: l, n and c, which represent "liberal", "conservative" and "centrist", respectively. Therefore, the network is divided into three communities. The network consists of 105 nodes and 441 edges.
Amazon network: This network is based on the online sales data of products on the Amazon website. The nodes represent products, and the edges represent the connection between frequently purchased goods. Each commodity belongs to a category that is considered as a real community. The network consists of 925872 edges and 334863 nodes.
Generally, the CDFSE algorithm's performance on the football network is not outstanding, but it performs very well on the karate club network, political book network and dolphin network. The algorithm also achieved good community detection quality.
On the American college football network, most algorithms, such as Infomap, MCL, Ncut, EDCD, FluidC, CDFSE, WT and Louvain, obtained outstanding clustering results due to the high average degree as shown in Fig. 7. The Infomap and MCL methods achieved high NMI and ARI values. These two algorithms also detected 12 highquality communities. Compared with the above two algorithms, the CDFSE algorithm is slightly inferior (NMI=0.90, ARI=0.81) and detected 10 high-quality communities as shown in Fig. 11. The performance of the FG algorithm on this network is not ideal and it achieved a low index value (NMI=0.76, ARI=0.47). These results are consistent with the experimental results.
On the karate club network, the CDFSE algorithm achieved the highest index score (NMI=1, ARI=1) with the EDCD algorithm taking second place (NMI=0.93, ARI=0.95). The index scores of the other algorithms are no higher than 0.9 as shown in Fig. 8. Since the NMI and ARI indexes of the CDFSE algorithm are both 1, the community structure detected by the algorithm is completely consistent with the real network partition. In fact, CDFSE successfully detected two communities as shown in Fig. 12, and their community organizations are completely consistent with the real partitions. In this network, CDFSE attained good detection because the CDFSE method considers not only the attractiveness of individuals but considers the attractiveness of groups as well. Of course, the MCL, ASOCCA, EDCD, LPA, Ncut, and FluidC methods also achieved good performance. It's worth noticing that MCL and ASOCCA also successfully detected two communities. In the dolphin network, the CDFSE algorithm once again outperformed the other algorithms in terms of the NMI index value which reached 0.89, and ARI index value which reached 0.93. Among the other algorithms, the highest NMI index value does not exceed 0.65 and the highest ARI index value does not exceed 0.57, which is more obvious from the column diagram shown in Fig. 9. CDFSE successfully detected two communities, which is consistent with the real network as presented in Fig. 13, the number of communities detected by the other algorithms ranges from 4 to 12. The quality of community detection of some algorithms is not high, and in particular, the NMI value of the EDCD algorithm is the lowest, which indicates that the community division of most nodes is incorrect.
On the political book network, the index score of the CDFSE algorithm achieves the highest again (NMI=0.58, ARI=0.68). With the exception of the MCL algorithm, the index scores of the other algorithms are also exceptional, all of which are above 0.50. Therefore, most algorithms have achieved good clustering as shown in Fig. 10. Here, VOLUME XX, 2017 This paper employs Amazon dataset to make judgement on the performance of the comparison algorithms under large-scale networks. Due to the large number of online communities, this paper selects top-5000 highest-quality actual communities with the largest number of members for comparison. On the Amazon network, the CDFSE, Infomap and WT algorithms achieve the best performance with NMI index values above 0.95; furthermore, the ARI index value of Infomap is as high as 0.94 which is far higher than that of the other algorithms as shown in Fig. 15. There are 20057 communities detected by the CDFSE algorithm and 17296 communities detected by the Infomap algorithm. However, due to the high time complexity, the Ncut, EDCD, SCD and ASOCCA algorithms cannot operate on the Amazon network.
According to the experimental results, both the Infomap and CDFSE algorithms are appropriate for networks of different sizes. They produce better performance when generating benchmark networks and obtain higher index values. However, CDFSE is better than Infomap in real networks, as it obtains the best social division.

D. Execution Time Analysis
To evaluate the scalability of the CDFSE algorithm at the network scale, this paper takes advantages from the LFR model to generate benchmark networks in different sizes with a fixed average degree of k = 15 . When the number of nodes changes from 2K to 50K, the CDFSE algorithm apparently has better performance than the other comparison algorithms. The CDFSE algorithm is more advantageous when dealing with small and medium-sized networks because its time complexity is O (k 2 ·n) as shown in Fig. 16. When the number of nodes changes from 1K to 100M, the CDFSE algorithm is faster when compared with the Ncut, EDCD, WT, FG, and MCL algorithms; additionally, for the reason that the value of K is relatively small, the CDFSE algorithm performs well when dealing with large-scale networks. However, the CDFSE algorithm is slower than LPA, Louvain, and FluidC as shown in Fig.  17. In view of the poor quality of community detection by these three algorithms and the stability problems of the LPA and FluidC algorithms, the CDFSE algorithm is preferable.

FIGURE 16.
Run time comparison of all algorithms when the number of nodes ranges from 2K to 50K.

FIGURE 17.
Run time comparison of all algorithms when the number of nodes ranges from 1K to 100M.

IV. Conclusion
Community detection has always been a argument of interest. Inspired by fish networks found in nature, this paper proposes a community detection model according to fish school effect and presents a community detection algorithm named, CDFSE according to the model. The central idea behind CDFSE is to regard the network as a dynamic system and research its dynamics with time. Individuals in the network form subgroups because of their commonness and characteristics, and subgroups will attract other individuals to form large groups because of their own attraction. As time goes on, all individuals will eventually be attracted to different groups, forming a stable community structure. We make comparisons between CDFSE and several other representative community detection algorithms by applying them on Football, Karate, Dolphin, Polbooks and Amazon networks. The experimental results show that the CDFSE algorithm performed well on different networks and it achieved better performance than the other comparison algorithms. The main performance is as follows: firstly, this algorithm has higher efficiency and better community detection quality than others which mentioned above, secondly, the algorithm does not need to set parameters and is more simple and convenient; thirdly, the time complexity of the algorithm is small, and it can be applied to large-scale network; next, the algorithm is derived from nature, in line with the laws of nature, closer to the real network, so it can be better applied to community detection. The limitations of this paper is that only time complexity been added to verify its effectiveness when comparing with the classical method on the actual data set, and the space complexity is what we need to consider for further explore.