Detecting and Evolving Microblog Community Based on Structure and Gravity Cohesion

Microblogs are open and real-time online social network platforms used by people to make posts about their moods, experiences, and interests. It will be very significant to gather microblog users who have similar interests and hobbies into the same community. In this paper, we propose novel approaches for detecting and evolving dynamic microblog communities. First, inspired by the universal gravitation law, we redefine the gravitation relationships among microblog users. Based on the structure of the microblog social network, we define the basic nodes and their gravity tendency and propose the microblog community detection algorithm. Second, we determine the community changes in the microblog social networks at times <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$t+1$ </tex-math></inline-formula> and propose a microblog community evolution algorithm. Third, we define the mutual transformation probability between communities at times <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$t+1$ </tex-math></inline-formula> and propose the microblog community evolution behavior algorithm. The experiment includes a comparison and evaluation of the microblog community detection, evolution, and behavior extraction algorithms and the optimal ranges of the parameters involved in these algorithms. The experimental results indicate that our proposed algorithms have good performance compared to other benchmarking methods.


I. INTRODUCTION
Social networks are one of the most popular applications of modern times. Famous social networking websites include Facebook, Twitter, MySpace, Microblog, WeChat, Baidu tieba, Zhihu, etc. A microblog is an open and real-time online social networking platform that is based on the platform of users and their relationships and allows people to post about daily events, their own moods and so on. The most popular microblog service platforms are Twitter and Sina microblogs. Twitter, the world's most popular online social network platform, allows registered users to post ''tweets'' of no more than 140 characters. The number of active Twitter users has grown rapidly in the past year, with the number of active users jumping 11% in the first quarter of 2019. The most popular online social networking platform in China is Sina microblog, the daily active users of which reached 165 million, up 25% from a year ago. Thus far, microblogging, in which the kind of online social networking platform has become an indis-The associate editor coordinating the review of this manuscript and approving it for publication was Zhe Xiao . pensable part of people's lives, has become a daily topic of discussion, and access to such information, such as current political reviews of important social platforms, is also a way for people to make friends.
In the era of big data, microblog users can generate an enormous amount of data every day. These data include microblog posts, thumbs up, retweets, and comments. Microblog features include originality, innovative interaction, immediacy and viral transmission [1].
Microblogs allow people to post about daily events, their moods and more. Among the large number of microblog users, some actually have similar interests and hobbies. Thus, it will be very significant to gather these microblog users who have similar interests and hobbies, such as advertising accurately and recommending friends, into the same community. Community refers to a group of nodes, and community detection is the detection of such a group with a special relationship. The prior approaches for community detection are analyzed from the community structure. However, the community is not static over time. Therefore, the study of community evolution is also indispensable. The existing methods of community evolution mainly study the explicit evolution of the community, such as ''split'', ''merge'' and ''shrink''. For the microblog social network, we can not only analyze the community from the community structure but also extract the microblog user interest features from the microblog content. For the evolution of microblog communities, some existing community evolution methods only analyze the explicit evolution behavior of the community. However, microblog communities still have implicit evolution behavior. Based on the existing shortcomings of community detection methods and community evolution methods, we will conduct our research, our main contributions of which are as follows: • We redefine the microblog social network and the gravity relationship in the microblog social network. By using the random walk method combined with game theory, we mine the gravity tendency of microblog users.
• We propose the microblog community detection algorithm to find the community. We determine the community changes in microblog social networks at times t and t + 1 and propose a microblog community evolution algorithm.
• We define the interconversion probability between communities at times t and t + 1 and propose the community evolution behavior extraction algorithm for microblog social networks.
• The experimental results show that our proposed community detection and evolution algorithms outperform the CNM (Clauset-Newman-Moore) algorithm, COPA (community overlap propagation algorithm), Infomap (information map), NRW (nonhomogeneous random walk) in average gravity and modularity, and F-score. The outline of the paper is as follows. Section II presents the related works. In Section III, we discuss the detection of the microblog dynamic community based on structure and gravitational cohesion. Section IV discusses evolution behavior extraction. The experiment and result analysis are presented in Section V. We provide conclusions and future works in Section VI.

II. RELATED WORKS
A microblog community is the common structure in a network; it shows that users are similar in one or more aspects. As an important structure in social networks, the community has attracted extensive attention.

A. COMMUNITY DETECTION
Kernighan and Lin [2] first proposed the graph segmentation method in 1970, a period in which the community concept was still unknown. Community detection has developed rapidly since the community was proposed [3]. In the following decades, researchers proposed some classic community detecting (CCD) algorithms [4].

1) MODULARITY-DEGREE-BASED CCD ALGORITHMS
The main idea of these algorithms is to find the community by using the optimized modularity Q value. We classify the algorithms into three classes: (1) top-down hierarchical clustering approaches: the CNM [5], FN (Fast-Newman) [6], and Msg-mv [7] are classic algorithms; (2) bottom-up hierarchical clustering approaches: these complex approaches split social networks into different communities, and the GN (Girvan-Newman) algorithm [3] is a typical representative; and (3) direct optimization approaches: the EO (Extremal Optimization) algorithm [8] is representative. Cao et al. [9] found vulnerable and fuzzy communities in undirected and unweighted networks by maximizing the weighted modularization. Newman [10] proved the exact equivalence relation between module degree optimization and the maximum likelihood algorithm.

2) SPECTRAL-ANALYSIS-BASED CCD ALGORITHMS
In this type of algorithm, the node characteristic component matrix contains the spatial coordinates. The network nodes are mapped to a multidimensional vector space and then gathered into the community by the traditional clustering algorithm. Chin et al. [11] proposed and analyzed a simple and robust random block model based on a spectral algorithm, which is mainly aimed at sparse graphs. Bordemave et al. [12] confirmed the ''spectral redemption conjecture''. Community detection is carried out on the basis of the leading eigenvectors above the feasibility threshold. Feng et al. [13] proposed a local spectral method to find small community structures in large social networks.

3) INFORMATION-THEORY-BASED CCD ALGORITHMS
Rosvall and Bergstrom [14] used the simulated annealing optimization algorithm and random walk to effectively detect communities. Lancichinetti and Fortunato [15] tested to show that this method is the most accurate in the current nonoverlapping community discovery algorithm. Banks et al. [16] presented the upper and lower limits of the information theory threshold for community detection.

4) LABEL-PROPAGATION-BASED CCD ALGORITHMS
Raghavan et al. [17] proposed the label propagation algorithm (LPA). Each node updates its own labels to the highest number of occurrences of its neighbors. Nodes with the same label are grouped into a community. Based on random walks, Chang et al. [18] proposed the RWLPA algorithm to detect communities by using the location probability distribution of random walkers. Hu et al. [19] reported that various community-oriented nodes differently affect the spread of labels and proposed a role-based label propagation algorithm. Fang et al. [20] proposed structure balance to measure the edge of the balance in the local network and a new type of structural balance signature network label propagation algorithm.
From the community detection perspective, the above four methods only consider the community structure of the microblog network to extract user or user-group features. However, our research utilizes a large amount of microblog data to extract the interest characteristics of microblog users, VOLUME 8, 2020 which enables us to conduct research on microblog content and community structure. We propose the microblog community discovery algorithm combining two aspects of user features of microblog content and community structure into the gravity relationship. Our proposed community detection algorithm is similar to the spectral-analysisbased, information-theory-based, label-propagation-based CCD algorithms in that it uses the bottom-up strategy to find the center point through the gravity relationship. However, we use the random walk method combined with game theory to find the relationships regarding gravity tendency among microblog users. The modularity-degree-based CCD algorithm adopts the top-down strategy.

B. COMMUNITY EVOLUTION
Community detection is used to format the law of the community in a static network. In fact, social networks are evolving continuously over time [21], [22]. Recently, a large number of evolving community approaches have been formed.

1) DYNAMIC CLUSTERING APPROACHES
Evolving clustering (EC) approaches for communities take the time stamp as the cluster sample unit and calculate the node distribution at times t and t + 1. Chi et al. [23] proposed the protection clustering quality (PCQ) and protection cluster member (PCM) algorithm based on EC. Lin et al. [24] proposed a FacetNet approach based on EC, which establishes a time stamp cost function based on community distribution at time t. Kim and Han [25] took into account the fact that FacetNet needs to preset the number of communities and does not allow us to change its disadvantages. Hong et al. [26] divided the community at every moment based on the idea of timetables and overlapping communities.

2) OBJECTIVE FUNCTION OPTIMIZATION APPROACHES
These approaches estimate the structural changes in evolving communities by using community density functions or modular optimization. Blondel et al. [27] applied the FN algorithm [4] to detect dynamic communities. They combined regional optimization and hierarchical clustering. To detect dynamic communities, they designed four dynamic event (node + , node − , link + , and link − ) strategies to improve the community density function and modularization. Zhou et al. [28] proposed multiobjective optimization approaches for dynamic community detection, which modified new or changed nodes in the community during network updating. To overcome the modular optimization, Guo et al. [29] found small communities, increasing the sequence of highly sensitive issues such as networks. They proposed a new dynamic community discovery algorithm based on distance dynamics.

3) REPRESENTATIVE NODE/COMMUNITY DETECTION APPROACHES
Duan et al. [30] used the transfer probability matrix to establish the correlation measurement between nodes and pro-posed a compact model to assess the closeness of the regional community. Chen et al. [31] proposed the evolution tracking method based on the representative community to evaluate the detected communities and find stable communities. When the network structure changes, it only tracks the stable group. Liu et al. [32] proposed a fast incremental community evolution tracking (FICET) framework to find communities and track evolving communities. The core subgraph can quickly capture community evolution events, including formation, decomposition, and division. Hu et al. [33] found the dynamic community by exploring the local view of changing nodes.

4) DYNAMIC PROBABILISTIC MODEL APPROACHES
These approaches assume that the community distribution in each timestamp is a sample of the potential community distribution. Sarkar and Moore [34] constructed the probability of the relation between nodes and then used each timestamp in nodes and the probability of the relationship between community to build a Bayesian model between nodes and the potential community. Sun et al. [35] constructed a Bayesian model for each node in the timestamp sampling of DMM (Dirichlet mixture model) and in the adjacent timestamp between nodes, which can be found by the LDA (latent Dirichlet allocation) community. Yu et al. [36] proposed the ART model to adjust the community structure adaptively according to the changes in the semantic cohesion of the community.
In community evolution aspects, on the basis of the community found at the t time point, the change in the social network is found by using the microblog data of the t + 1 time point. According to the community and the change in microblog users and their relationship, the evolution algorithm of the microblog community is proposed to find the community of t +1 time. We define the interconversion probability between the t time point and the t + 1 time point community and propose the microblog social evolution behavior extraction algorithm to extract the evolution behavior of the community in the microblog social network. Our proposed algorithm focuses on implicit and explicit evolution behaviors and identifies the implicit and explicit evolution community. However, the above dynamic clustering, objective function optimization, representative node/community detection, and dynamic probabilistic models focus on the explicit evolution community. They focus on designing different clustering algorithms and objective functions to perform community evolution from the change in community structure features. Our proposed algorithm focuses on discussing community evolution caused by changes in community evolution behaviors.

III. DETECTING A MICROBLOG DYNAMIC COMMUNITY BASED ON STRUCTURE AND GRAVITATIONAL COHESION
In microblog social networks, large numbers of microblog users, microblog posts, and the interactions between them are available. Since microblog social networks can change at any time [22], we use multiple static microblog social networks to define dynamic microblog social networks.

Definition 1:
The static microblog social network MSN t (Eq. (1)) represents the status of the microblog social network at time t.
where U t , E t , and C t are the microblog users, their interaction set between any two microblog users, and microblog communities at time t, respectively. Because microblog social networks have characteristics such as timeliness and interactive innovation, they can occur at any time change, so we use multiple static dynamic Weibo microblog social networks to define the social network. A dynamic microblog social network is the representation of multiple static microblog social networks; namely, the time series is added on the basis of the static microblog social network.
Definition 2: The dynamic microblog social network MSN (Eq. (2)) refers to the social network of microblog users, microblog user relationships and communities with time changes.
where MSN t→t+1 represents the content of changes in the microblog social network between times t and t +1, including changes U t→t+1 , E t→t+1 and C t→t+1 of microblog users and the relationships between microblog users and microblog communities. Newman [3] pointed out that community refers to dense groups in the network and that the connections among users within each community are relatively close, but those between communities are relatively sparse. We use the gravity relationship between microblog users to measure the degree of connection between them.

A. GRAVITY RELATIONSHIP OF THE MICROBLOG COMMUNITY
In the universe, a large number of celestial bodies have formed some stable structures under the action of universal gravity f . Given two objects, m i and m j , r is their distance, g = 9.8 is the gravitational constant, and the universal gravity f of two objects can be computed by Eq. (3).
The dynamic community evolving algorithm (GCEA) [37] reveals gravity relationships by considering only the degrees of each node in a dynamic social network. They only set g to 1, and m i and m j is the degree of node i and node j, respectively. The node degree is its intrinsic property and can reflect the local influence of the node. The greater the degree of the node is, the more other nodes it connects. This indicates that the node has some influence. This is similar to the gravity theory law of universal gravitation. An edge exists between two nodes, and the distance r ij between them is 1. If no edge exists between two nodes, then the distance r ij between them is 0. Yin redefined the gravity between two objects as in Eq. (4).
When an edge exists between nodes i and j, a ij = 1; otherwise, a ij = 0. In microblog social networks, the connection strengths between any two edges are different. Obviously, a ij = 1 is not suitable for any two nodes. Inspired by Yin's ideas, we reconsider gravity cohesion relationships and apply them to microblog social networks.

B. RECONSTRUCTING THE GRAVITY RELATIONSHIP IN THE MICROBLOG SOCIAL NETWORK
We build a vector of microblog users and use it to represent microblog users.
To construct the gravity relationship in the microblog social network, we redefined the distance r ij in the gravity law on two theoretical points of the complex network gravity relationship. The greater the degree of similarity between two objects is, the stronger the link between these two objects. The smaller the degree of similarity between two objects is, the weaker the link between these two objects. Considering microblog user interests, we improve the Tanimoto coefficient to compute the similarity between them by Eqs. (5), (6), and (7).
where w ij denotes the similarity between two microblog users at time t. − → u t i is the representation − → u t i of microblog users on the Boolean dimension. − → u t i • − → u t j is the number of the same elements between − → u t i and − → u t j .
where count( − → u t i ∪ − → u t j ) is the sum of the interest features of microblog user vectors − → u i and − → u j . u t ik is the element in the microblog user vector − → u t i represented by the Boolean dimension.
If the element r k in − → u t i , u t ik is 1, then the microblog user u i has the interest u t ik . Instead of 0, the microblog user u i has no interest u t ik . Example 1: We illustrate how to calculate the similarity between two microblog users u i and u j . Let (1, 1, 0, 0, 1, 1). Finally, we calculate the similarity w ij = 0.0667 between the two microblog users by Eq. (5). VOLUME 8, 2020 The greater the similarity between two microblog users is, the shorter the distance between them. We measure the relationship among similarity w t ij , a t ij and distance r t ij by Eq. (8).
After obtaining the similarity w t ij between two microblog users, we combine it with gravity theory of two points and then reconstruct the gravity relationship through the combination of community structure and interest characteristics.
where f t ij denotes the gravitational relationship between the user and the user at time t.

C. CONSTRUCTING A MICROBLOG SOCIAL NETWORK GRAPH
After establishing the gravity relationship between any two microblog users, we can set up a social network graph of microblogs to accurately express the real social network. We use each microblog user as the node of the microblog social network graph. An edge exists between two microblog user nodes under the three following cases: • Microblog user i focuses on microblog user j; • Microblog user j focuses on microblog user i; and • Microblog users i and j pay attention to each other. According to the gravity relationships between any two microblog users, we regard the gravity relationships as gravity weights of edges. Compared with the microblog social network graphs at time t, the nodes and edges of the microblog social network graphs at time t + 1 are represented by U t→t+1 and E t→t+1 , respectively.

D. GRAVITY TENDENCY
We use Ne t i to represent the neighbor collection of microblog user u t i at time t. Fig. 1 shows that the social network graph consists of two communities C t 1 and C t 2 , and microblog user nodes u t 4 , u t 7 and u 8 i are the boundaries of the community. We assume that user node u t 7 is a random walk of the initial node, and the random walk user u t 7 can choose three sides e t 7→4 , e t 7→8 and e t 7→9 to travel. Because nodes u t 7 and u t 9 belong to the same community C t 2 , u t 7 will have a greater probability of choosing edge e t 7→9 . Because microblog users in the same community have more similarities (namely, the gravity relationship between them), the probability of e t 7→9 being wandered is greater than e t 7→4 and e t 7→8 . When the walking user wanders in the community C t 1 , it will have a very small probability of getting out of the community C t 2 into the community C t 1 . In contrast, when walking users travel to community C t 1 by edge e t 7→4 or edge e t 7→8 , it will have a very small probability of moving to community C t 2 . Moving from node u t 7 to node u t 4 by edges e t 7→4 , e t 4→5 , e t 5→4 and e t 2→1 does not represent a strong gravity relationship between microblog users u t 7 and u t 1 . Definition 3: Assume that u t i is the initial node of the walking user. In the N random walk, the walking user has n times return to node u t i and n times return to node u t i by side e t j→i . The returning probability of node u t i is n/N , and the returning probability of edge e t j→i is n /N . To determine the random walk number, Yin et al. [37] used a probability function to calculate the random walk step number. They set up the following random walk: a ∈ (0, 1), b = f (x), x is the current walk number of the walking user, and f (x) is a probability function used to determine whether to move on. At the current node, if a < b, then the walking user continues; otherwise, it will terminate. The random walk probability function f (x) is shown below: where parameters δ and β control the migration distribution and the farthest node distribution of the walking user, respectively. According to the principle of ''six degrees of separation'', the diameter of a community is not greater than 6. Thus, β is set to 6. x − 1 ηd ω +1 is a structural gain function based on the logic function, which takes the degree of node as the inputting parameter of the probability function f (x). d is the degree of the node. ω and η are adjustment parameters of logical functions. The physical meaning of Eq. (10) is that when the walking user moves to a node with a high degree, the topological gain function assigns the greater probability of the walking user to travel.
In the random walk process, we propose the weighted random walk benefit maximization algorithm to measure the importance of other nodes to the initial node and extract the gravity relationship tendency of microblog user nodes. The algorithm can grasp the nodes that are more useful for the initial nodes. Even though the initial nodes and their neighboring nodes do not have a strong gravity relationship, the initial nodes may obtain more useful nodes through their neighboring nodes. As shown in Fig. 1, compared with u t 8 and u t 2 , the earnings of u t 5 choosing node u t 8 are lower than those of u t 5 choosing node u t 2 . Thus, node u t 5 may choose u t 2 because u t 5 can find nodes u t 1 and u t 3 with a larger probability in the same community. In our proposed algorithm, when the walking user walks a step, we use the gravity value between two nodes as the profit for this travel and all the nodes' average normalized negative gravity as the loss of value. The weighted random walk benefit maximization is presented in Eq. (11).
where S− → u t k represents the actual maximum benefit of a neighbor node u t k of u t 0 . β represents the actual number of walking steps, and is the loss of each random walk, which means the average normalized negative gravity of all nodes. Nodes u t 0 and u t ter represent the initial node and terminal node of the walking user, respectively. Due to the exclusion of remote nodes in the microblog social network, we use 1 − Path min (u t 0 ,u t ter ) β to express this feature. This means that the shortest distance between u t 0 and u t ter is very large, and the probability that walking users return to u t 0 is very small. Parameter ϕ is a weight value of the characteristics of the microblog user's gravity and social network structure. Because they are equally important in our proposed approach, the optimal value of ϕ is 0.5, which will be proven in the experiment section.
We will find the gravity tendency of u t 0 after the walking user has N random walk steps passing by u t 0 . First, we find u t k with the maximum average benefit as the first gravity tendency of the initial node u t 0 by Eq. (12).
where R(u t 0 ) denotes the set of gravity tendency nodes of the initial node u t 0 , S u t k indicates that the walking user has n(n ≤ N ) random walk times to pass through node u t k and gain the average maximum benefit. Thus, we use the node with the maximum average benefit as the first gravity tendency of u t 0 . Of course, if the number of nodes with the maximum average benefit is greater than 1, then these nodes are the gravity tendency of the initial node u t 0 . If the average benefit of a node is closer to the maximum average benefit, then this node is the gravity tendency of the initial node u t 0 . The gravity tendency node set of u t 0 can be updated by Eq. (13).
where ε is a threshold. This indicates that the nodes with similar average benefits with node u t k have a gravity tendency of u t 0 . Algorithm 1 shows the whole process of the weighted random walk benefit maximization algorithm. Algorithm 2 shows the process of the random walk algorithm.
← and ⇐ in the algorithms represent the assignment and the value added, respectively.
In algorithm 1, steps 10-12 perform n random walks on the initial node u t i .
Step 13 finds all neighbor nodes of node u t i . Step 14 calculates the average score S u t k of the neighboring nodes of u t i .
Step 15 finds the node u t j with the highest gravity tendency by Eq. (12). Steps 16-20 determine the more gravity tendency node u t j by Eq. (13). If there is a node, u t j , of which the average benefit and maximum average benefit are very close, then we put the node u t j into the gravity tendency set of u t 0 .
Step 21 stores the gravity tendency nodes of u t 0 to FRs t . In algorithm 2, steps 9-13 are random walks.
Step 9 uses Eq. (10) to decide whether to continue the next step. If it needs to continue to walk, then step 10 makes a step walk.
Step 11 obtains the neighbor nodes of the first walk.
Step 12 calculates the current walk score by Eq. (11).
Step 14 calculates the shortest distance between the initial node u t 0 and the destination node u t ter .
Step 15 calculates the final score of the walk. Step 14 saves gravity tendency.

E. COMMUNITY DETECTION
In physics, a gravity attraction exists between two planets, and a planet with a massive mass is likely to attract a smaller planet. In a social network, a node has a large degree, which proves that it has an edge connection with many other nodes. Thus, these nodes are very important for the social network. Finding these important nodes is a key part of our proposed microblog community detection algorithm, so we identify these nodes as the base nodes. Get the neighbor nodes for the first walk u t j ; 12 x ← x + i becomes the base node. After obtaining the basic nodes in the microblog social network, we detect the initial community by using the gravity tendency. We start with a basic node as an initial community. If more basic nodes connect with each other, then we consider them as a basic node chain of an initial community. Then, we incorporate these nodes into the corresponding initial community according to the gravity tendency of the nodes. The gravity tendency directions of two nodes have three situations: • Nonbasic node u t j and basic node u t i have gravity tendencies. We add node u t j to the community in which the base node u t i is located. • Nonbasic node u t j does not tend to the basic node u t i , but basic node u t i tends to nonbasic node u t j . We add u t j to the community in which u t i is located. Because u t i tends to u t j , it means that the nonbasic node u t j has great significance for the basic node u t i . • Nonbasic node u t j tends to basic node u t i , but the basic node u t i does not tend to nonbasic node u t j . We discuss in detail how it is handled as follows. For case (3) above, we use the gravity relationship between the community and the user node to allocate the remaining nodes into communities. If node u t i is only attracted by community C t k , then u t i is put into community C k . If node u t i is attracted to multiple communities, then u t i is placed in the community with the largest attraction to u t i . We measure the gravity tendency relationship between node u t i and community C t k by calculating the average gravity of each node in community C t k by Eq. (14).
where f (u t k , C t k ) represents the gravity relationship between node u t i and community C t k . As walking through the remaining nodes, we obtain their communities. Microblog community detection is shown in algorithm 3. In algorithm 3, step 5 finds the base node set Bnodes. Steps 6-12 find communities for basic nodes in the Bnodes. If more than two basic nodes connect each other, then steps 7-8 build a new community C t k for them. A basic node u t i does not connect to other basic nodes; steps 7-8 create a community C t k for u t i . Steps 13-15 identify nodes that are not included in Bnodes.

Algorithm 3 Microblog Community Detection
Step 14 uses Eq. (14) to calculate f (u t k , C t k ) for these nodes u t k s and to determine their communities with their largest gravity.

F. COMMUNITY EVOLUTION
On the basis of detecting the community, we develop the community C t→t+1 evolution algorithm to determine the structural changes U t→t+1 and E t→t+1 of MSN t →MSN t+1 . Two microblog social networks at adjacent times t and t + 1 undergo some changes, such as user nodes increasing (nodes + ), user node reducing (nodes − ), edge increasing (edges + ) and edge reducing (edges − ). We analyze the community evolution by using these changes.
To analyze the social network community at time t + 1, we find MSN t→t+1 =< U t→t+1 , E t→t+1 > by combining MSN t with MSN t+1 . Then, we update the gravity relationship of the changed nodes U t→t+1 . Finally, we analyze these four cases nodes + , nodes − , edges + , and edges − of community evolution.
• Microblog user nodes increasing(nodes + ). We first judge whether a new microblog user node u t+1 i affects the base nodes of MSN t . If the base nodes do not change, then we place node u t+1 i into community C t+1 j , where the gravity relationship f (u t+1 i , C t+1 j ) is the largest. If the base node changes, then we can find the community of the new microblog user node by algorithm 3.
• Microblog user node reducing(nodes − ). If the deleted node u t+1 i is a base node, then we reassign the affected nodes into the community C t+1 j , which has the largest average gravity of these affected nodes. If the deleted node does not belong to the base node, then the affected node is redistributed directly to the community in which the average gravity of these affected nodes is the largest.
• Microblog user edge increasing(edges + ). We judge whether the new edge causes changes in the base nodes. If the base nodes do not change, then the nodes affected by the increasing edge are put into the community in which the average gravity of these affected nodes is the largest. Otherwise, we use algorithm 3 to find communities of the affected nodes.
• Microblog user edge reducing(edges − ). We judge whether the reduced edge causes changes in the base nodes. If the base nodes do not change, then the nodes affected by the reducing edges are placed in the community in which the average gravity of these affected nodes is the largest. Otherwise, we use algorithm 3 to find communities of the affected nodes.
In algorithm 4, U t→t+1 represents the changed nodes in MSN t and MSN t+1 .
Step 7 updates the gravity relationship of the nodes in U t→t+1 . Steps 8-14 analyze community evolution in the case of node reduction. If the base nodes are reduced, then steps 9-10 update Bnodes, and these nodes in U t→t+1 are assigned to the communities with the largest gravity. If the base nodes do not change, steps 11-12 assign these nodes in U t→t+1 to the community with the largest gravity. Steps 16-24 process community evolution in the case of nodes + or edges + or edges − . Step 17 updates the base node set Bnodes. If the basic node set Bnodes changes, then steps 18-19 find the community for these nodes in U t→t+1 by calling algorithm 3. If the base node set does not change, then steps 20-21 assign these nodes in U t→t+1 to the community with the largest gravity.

IV. EVOLUTION BEHAVIOR EXTRACTION
Extracting the evolution behaviors from community C t i to C t+1 i helps us analyze the evolving trends of communities. Previous research [33], [37] defined seven explicit evolution behaviors of communities.
• Continuing Two adjacent time communities C t i and C t+1 i are highly similar.
• Shrinking When some nodes leave a community C t i , the size of C t+1 i is smaller than that of the previous community C t i . • Growing When some nodes join a community C t+1 i , the size of C t+1 i becomes larger than C t i , and the community differences at times t and t +1 are not very large. Assign nodes in U t→t+1 to most gravity community; 22 End If. 23 End If. 24 End.
• Splitting A community C t i at time t is split into two or more communities at time t + 1. The nodes of community C t+1 i at time t + 1 come from community C t i at time t. • Merging By combining two or more communities at time t to form a new community C t+1 i at time t + 1.
• Dissolving A community C t i at time t vanishes and does not appear at the t + 1 time point.
• Forming At time t + 1, C t+1 i is a novel community that does not exist at t time.
Based on these evolutionary behaviors, we propose the implicit evolutionary behaviors of the social network communities. In explicit evolution behaviors, such as ''continuing'', although nodes and sides of the communities at times t and t+1 do not change, the gravity relationship between the nodes in the different communities may change. We mainly analyze implicit evolution behaviors of ''continuing'', ''narrowing'', and ''increasing''. We do not discuss the other four kinds of implicit evolution behaviors because the gravity relationship of its corresponding community also changes greatly. Therefore, we present Eq. (15) to express gravity changes between C t i and C t+1 (15) where f (C t i ) represents gravity in community C t i .
where f (C t i , C t+1 j ) represents the average gravity change from C t i to C t+1 j . u t m and u t n are nodes in C t i , and u t+1 x and u t+1 y are nodes in C t+1 j . a t mn and a t+1 xy express the total number of edges of community C t i and C t+1 j at times t and t + 1, respectively.
For convenience of discussion, we denote (x t , x t+1 ) to change the function of x from t to t + 1.
Definition 5: Social network community evolution can be described as community transformation from time t to )> 0, then we consider the community to be evolving.

A. COMMUNITY RELATIONSHIP GRAPH
To extract the evolutionary behavior of social network communities, we defined a community relationship graph at two adjacent times t and t + 1. Social network changes are represented through the relationship between community and users. Fig. 2 clearly indicates the changes from time t to t + 1 and the relationship between the communities of the adjacent times t and t + 1, respectively. The left part and right part shows the social network communities at time t and t + 1, respectively. The middle part represents nodes changed from communities at time t to communities at time t + 1. If two communities share the same microblog user node, then an edge exists between the two communities. In Fig. 3, an edge connects community C t 1 and community C t+1 1 . In the same way, two edges connect communities C t 2 , C t+1 2 and C t+1 3 . The weights on these edges represent the mutual transformation probability between two communities at two times t and t + 1. We define (P(C t 1 → C t+1 1 ), P(C t+1 1 → C t 1 )) as the transforming probability of communities from C t 1 to C t+1 1 and from communities from C t+1 1 to C t 1 . At time t, microblog user nodes u t 1 , u t 2 , and u t 3 belong to community C t 1 , whereas at time t +1, the microblog user nodes u t 2 , and u t 3 belong to C t+1 1 . The mutual transforming probability P(C t 1 → C t+1 1 ) is 2/3. The microblog user nodes u t 2 and u t 3 in C t+1 1 are both in C t 1 . The mutual transforming probability P(C t+1 1 → C t 1 ) is 1.  Table 1 shows the types, conditions, reasons, and possible factors for different community evolution behaviors. If communities C t 1 and C t+1 1 satisfy the corresponding conditions, then the corresponding evolution behavior occurs. We present the reason numbers and concrete reasons in Table 2. κ and λ are the thresholds for mutual transforming probability and gravity changes. We redefine the community evolution behaviors in social networks as follows:

B. COMMUNITY EVOLUTION BEHAVIOR EXTRACTION
• (1) Explicit continuing. This evolution behavior satisfies the mutual transforming probability min{P(C t i → C t+1 j ), P(C t+1 j → C t i )} ≤ κ. The possible reasons for the ''explicit continuing'' are that the microblog user nodes and their concerning edges in the community C t i remain consistent with community C t+1 j . • (2) Implicit continuing. Based on explicit continuing evolution behavior, the gravity changes of the microblog user nodes have changed considerably. The implicit continuing evolution behaviors satisfy min{P That is, the mutual transforming probability between the two adjacent communities C t i and C t+1 j is greater than κ. At the same time, their gravity change is greater than λ.
• (3) Explicit narrowing. In this case, a certain community has lost some microblog user nodes. The evolution behaviors satisfy 1 ≥ P(C t+1 The mutual transforming probability that C t+1 j transforms into C t i is greater than the mutual transforming probability that C t j transforms into C t+1 j . The reasons for the evolution behavior are ''deleting microblog user nodes nodes − or their concerning edges edges − ''. • (4) Implicit narrowing. Based on the explicit narrowing evolution behavior, the gravity changes in microblog user nodes in the community are more than λ. The The mutual transforming probability that C t+1 j transforms into C t i is greater than the mutual transforming probability that C t j transforms into C t+1 j . At the same time, the gravity changes of the microblog user nodes between communities C t i and C t+1 j are greater than λ. The reasons are ''deleting microblog user nodes nodes − or their concerning edges edges − ''.
• (5) Explicit growing. There are some new microblog user nodes to add a community. The evolution behaviors The mutual transforming probability that C t i transforms into C t+1 j is greater than the mutual transforming probability that C t+1 j transforms into C t j . The reasons for explicitly growing evolutionary behavior are ''increasing microblog user nodes (nodes + ) or their concerning edges (edges + )''.
• (6) Implicit growing. Based on the explicit growing evolution behavior, the gravity changes of microblog user nodes in the community are more than λ. The evolution behaviors satisfy 1 ≥ P( The mutual transforming probability that C t i transforms into C t+1 j is greater than the mutual transforming probability that C t+1 j transforms into C t i . The gravity changes of the microblog user nodes from C t i to C t+1 j are greater than λ. The reasons for the implicit growing evolution behavior are ''increasing microblog user nodes (nodes + ) or their concerning edges (edges + )''. • (7) Explicit/implicit splitting. In this case, a community at time t is split into many communities at time t + 1, and the microblog user nodes in these communities at time t + 1 come from the previous community at time t. The evolution behaviors satisfy P(C t+1 The mutual transforming probability that C t+1 j transforms into C t i is greater than κ. The transforming probability that C t i transforms into many communities C t+1 j 1 , C t+1 j 2 , · · · is more than κ. The reasons for this evolution behavior are ''increasing, reducing the microblog user nodes (nodes + /nodes − ), or their concerning edges (edges + /edges − )''.
• (8) Explicit/implicit merging. Many communities C t i 1 , C t i 2 , · · · merge into a community C t+1 j , and the microblog user nodes of C t+1 j come from these communities C t i 1 , C t i 2 , · · · . The evolution behaviors satisfy P(C t The mutual transforming probability that C t i 1 , C t i 2 , · · · transform into C t+1 j is greater than κ. The mutual transforming probability that C t+1 j transforms into C t i 1 , C t i 2 , · · · is greater than κ. The reasons for this evolution behavior are ''increasing or reducing microblog user nodes (nodes + /nodes − ) or their concerning edges (edges + /edges − )''. VOLUME 8, 2020 • (9) Explicit/implicit dissolving. In this case, a community C t i may disappear at time t + 1. The evolution behaviors satisfy max{P The mutual transforming probabilities between C t i and C t+1 j are less than κ. The reasons for this evolution behavior are ''reducing microblog user nodes (nodes − ) and their concerning edges (nodes − )''.
• (10) Explicit/implicit forming. Some new communities C t+1 j 1 ,C t+1 j 2 ,· · · appear at t + 1 time. Microblog user nodes do not come from any communities at time t. The evolution behaviors satisfy max{P(C t i → C t+1 j , P(C t+1 j → C t i )} < κ at time t + 1. The mutual transforming probabilities between C t i and C t+1 j are less than κ. The reasons for explicit/implicit forming evolution behavior are ''reducing microblog user nodes (nodes − ) and their concerning edges (edges − )''. Considering the above 10 cases of community evolution behaviors, we propose community evolution behavior extraction algorithm 5. C t,t+1 indicates communities belonging to U t→t+1 . Step 7 establishes social network community graphs for the communities. Steps 8-11 judge whether communities belonging to C t→t+1 satisfy the evolution condition in Table 1, and the corresponding evolution behaviors are outputted. If the gravity changes in the community are more than λ, then Steps 12-15 proceed and output the implicit community evolution behaviors.

V. EXPERIMENT AND RESULT ANALYSIS
In this section, our main tasks are to carry out experiments and to analyze the performance of our proposed algorithm. The specific experimental environment indicates a Windows  13 Judge f (c t i , c t+1 j ) > λ. 14 Output corresponding evolution behavior. 15 EndFor. 16 End. 64 bit operation system, and the processor is an Intel Core i7-5600U (3.40 GHz/L3 4M) and 16 G running memory. We use the Python program language and anaconda compiler. We save experimental data (TXT or CSV format) on a local PC machine. We extract the features of microblog users from microblog posts by using the language cloud technology platform (LTP) system developed by the social computing and information retrieval research center of Kazakhstan.

A. MICROBLOG DATASET
Two microblog datasets are used to carry out our experiments. We use the Sina API open platform to extract microblog user data (the relationship between microblog users and their published microblog texts) and finally arrange them into dataset 1. Dataset 2 comes from the Fifth National Social Media Conference 2016 (SMP2016). It is the microblog dataset of user portrait competition. In Table 3

B. EVALUATION INDEX
We evaluate the quality of community detection through the relationship between community structures and their node gravities. In terms of community structures, we use the modularity proposed by [3]. In terms of the gravity relationship, we define the average community gravity. Modularity [3] is a commonly used method to measure the strength of network community structure. The nodes with high modularity tightly connect in the intracommunity and sparsely connect with nodes in other communities. Therefore, we use modularity (Eq. (17)) to evaluate community detection quality.
where Q t SN represents the average modularity in the social network SN at time t. e ii refers to the ratio of the number of internal edges of community C i to the total edge number in SN , and a i indicates the ratio of the number of edges that internal nodes of community C i connect to the total edge number in SN . n C refers to the number of communities in SN at time t.
In general, microblog users in the same community share the same interests. The gravity between microblog user nodes in the same community is relatively large. Therefore, we use the average community gravity (Eq. (18)) to evaluate community detection quality.
where f t SN represents the average gravity at time t in SN .

C. COMPARISONS OF COMMUNITY DETECTION
To evaluate the performance of our proposed community detection algorithm, we conduct experiments on two real datasets and compare them with other classic community detection methods, such as CNM [38], COPRA [39], Infomap [14] and NRW [40]. Table 4 shows the number of communities Num(C t ) in the social network at time t. Num(U t m ) represents the number of microblog users in C t with the largest number of microblog users. f t SN represents the average gravity at time t in SN . s(f t SN ) is a statistical deviation of f t SN . Our proposed method, NRW, and Infomap algorithms are similar to the number of communities. The reasons are that these three methods are based on the random walk method. CNM has the largest number of microblog users. s(f t SN ) indicates that the number of microblog users in the communities found by the CNM algorithm has an imbalanced distribution. The number of microblog users in one community is far greater than the number of microblog users in the other community.
In contrast, the s(f t SN ) of our proposed method is relatively low, which indicates that the community detection by our proposed method has a balanced distribution. In addition, f t SN of our proposed method is higher than other algorithms because our proposed method is a combination of community structure and gravity. In Table 5, Max(Q t SN ) shows the largest modularity of the community in the social network at time t. Q t SN represents the average modularity of the social network SN at time t. CNM can find communities with the largest average modularity. Microblog users in the community detected by the CNM algorithm connect very closely. However, the average modularity of our proposed method and other methods are similar, so our proposed method has good performance in regard to community structure.

D. COMPARISONS OF COMMUNITY EVOLUTION
We compare our method with other classic community evolution methods, such as the LDM-CET (Local Dynamic Method for Community Evolution Track) [33], GCEA [37] and iLCD (improvement Local Community Detecion) [41]. Fig. 4 shows the number of communities at each time on the two datasets. The iLDC algorithm discovers a large part of the duplicated communities. Comparing our proposed algorithm with the LDM-CET algorithm, the number of communities detected by GCEA is relatively stable. Our proposed algorithm discovers duplicated and stable communities.   Figs. 5 and 6 represent the average gravities f t SN and modularities Q t SN for the four algorithms, respectively. The community qualities of these four algorithms are maintained at a high level. As shown in Fig. 5, compared with the other three algorithms, our proposed algorithm has high average community gravities. The GCEA algorithm has low average community gravities, as clearly shown on dataset 2. The higher the average community gravities are, the higher the similarities between microblog users. The average community gravities in dataset 1 are higher than those in dataset 2. Fig. 6 indicates that the communities discovered by these four algorithms are similar. Obviously, the average community modularities in dataset 1 are higher than the average community modularities in dataset 2. Compared with the other three algorithms, the average community modularities of the iLCD algorithm are relatively low.
In Table 6, ''O'' represents our proposed algorithm, ''G'' represents GCEA, ''L'' represents LDM-CET, and ''I'' represents iLCD. It shows the recall of different algorithms for extracting community evolution behaviors. We conclude that our proposed algorithm outperforms the other three algorithms in the most adjacent times. The GCEA follows closely. The iLCD performs the worst. The number of communities that the iLCD algorithm finds at each time is too large, and microblog users in many communities are very small. Figs. 7 and 8 demonstrate the F value of four algorithms for extracting community evolution behavior on dataset 1 and dataset 2, respectively. Our proposed algorithm and GCEA algorithm perform best on dataset 1. The recall difference between our proposed algorithm and the two other algorithms is not obvious on dataset 2. The F values of the iLCD algorithm on dataset 2 are slightly higher than those on dataset 1. The F values of our proposed algorithm on dataset 2 are slightly lower than those on dataset 2. However, the average F values of our proposed algorithm are higher than those of the other three algorithms. To evaluate the extracting evolution behavior algorithm, we only focused on the explicit and implicit ''continuing'', ''shrinking'' and ''growing'' evolution behaviors. As shown in Fig. 9, we can conclude that implicit evolution behaviors occur every two adjacent times. In adjacent times 7-8, many communities show implicit evolution behaviors, indicating that the interests of many microblog users have changed during this time period. However, in two adjacent times 9-10, only a small portion of the communities display implicit evolution behaviors, which indicates that only a few microblog users have changed their interest in this time period.

E. PARAMETER ANALYSIS IN THE ALGORITHM
Our proposed algorithms 1 and 2 involve several parameters that need to be identified. In this section, our primary task is to discuss important parameters ϕ and ε. The ranges of these two parameters ϕ and ε are determined by experiments.
In Eq. (11), ϕ ∈ [0, 1] is a weight to measure the gravities of microblog users and the characteristics of microblog network structure. Fig. 10 shows that the average gravity and modularity change with ϕ. When ϕ increases from 0 to 3.5, the average gravity of the social network increases. ϕ> 3.5 keeps the average gravities in the social network stable. When ϕ slowly increases from 0 to 4.5, the average modularities in the social network remain almost stable. For the parameter ϕ> 4.5, the average modularities in the social network begin to decrease. The changes in average gravities and modularities with ϕ are similar on datasets 1 and 2. To make our proposed algorithm good, ϕ should be approximately 3.5-4.5. In Eq. (13), ε is a relatively small positive number. When the parameter ε is equal to 0.0, each microblog user has only one gravity orientation. When ε is equal to 1.0, each microblog user's neighbor nodes is the gravity orientation of this microblog user. Fig. 11 shows the number of communities in the social network with different values of ε. When the parameter ε is close to 0.0, the microblog users in the same community have high similarity, but the number of microblog users is very small. Conversely, if ε is close to 1.0, a part of the microblog users in the same community have a lower similarity, which leads to the poor results of the algorithm. The optimal interval of ε needs to be obtained according to the average gravity of the social network.

VI. CONCLUSION AND FUTURE WORKS
Community discovery refers to finding a group with a special relationship. The microblog community means finding out a group of microblog users with the same interests. Existing community detection methods only adopt the community structure features and ignore the users' microblog content features. Our proposed community detection algorithms not only analyze the community structure but also extract the microblog users' interest features (community gravity) from users' microblog content. The existing community evolution methods mainly study the explicit evolution behavior of the community. Our proposed microblog community evolution algorithm together finds explicit and implicit evolution and behaviors.
We define static and dynamic microblog social networks based on microblog data and reconstruct gravity relationships in complex networks by combining microblog user interest features. We use random walk and game theories to find the gravity tendencies of microblog users. We propose a microblog community discovery algorithm to detect basic nodes and communities in the microblog social network by combining microblog social network structure and their nodes' gravity tendencies.
Based on communities at time t, we determine changes in microblog communities, users and their relationships in the adjacent time t + 1 and propose a microblog community evolution algorithm.
By redefining the mutual transformation probability of communities between times t and t + 1, we develop a microblog community evolution behavior extraction algorithm.
Finally, the experimental results show that our proposed community detection and evolution algorithms outperform CNM, COPRA, Infomap, and NRW in average gravity and modularity and F-score. The parameters ϕ and ε involved in our proposed algorithms should remain at 3.5∼4.5 and converge toward 0.
However, our proposed algorithms are verified by experiments. The experimental parts are only carried out on two real microblog datasets. Thus, the persuasiveness of the experimental results may not be sufficient and needs improvement for future works. Then, in the community evolution and detection evaluation methods, the time intervals are one month as a granularity. However, granularities not based on monthly but on daily, seasonal, or annual granularity may affect the experimental results. Thus, our future works will focus on this topic. He is currently a Professor in computer science with Xihua University. He has published several articles in the fields of information retrieval and search engines. His experience and research work focus on information retrieval, search engines, Web mining, and computer networks. He is serving on the committees of Chinese information and a PC Member of some international conferences such as WISE, ICIC, CCIR, and WMSE.
JIA LIU received the M.S. degree in computer science and technology from the School of Computer and Software Engineering, Xihua University, in 2018. He has published two articles in the field of information retrieval such as search engines, focused crawlers, and knowledge. His experience and research work focus on social networks and software engineering. VOLUME 8, 2020