Community Clustering Routing Algorithm Based on Information Entropy in Mobile Opportunity Network

Aiming at the difficulty of traditional routing clustering algorithms to deal with the different characteristics between communities and the inefficient nodes after community clustering, this paper proposes a Community clustering Routing protocol based on information Entropy in mobile opportunity Networks(CREN). The proposed protocol uses the K-Modes algorithm with unsupervised learning, combined with the pre-selected initial clustering center node to divide the network nodes into the initial clustering community. Then, the communities with similar characteristics are clustered and merged according to the change of information entropy. At the end, a number of different types of communities are formed in the network, and the nodes in the community have a high degree of similarity, which improves the efficiency of message forwarding. At the same time, in order to eliminate the inefficient nodes in the community, based on the information entropy and the social attributes of the nodes, this paper proposes a mechanism for dynamically updating the community to ensure the efficiency of the nodes in the community. The simulation results show that the transmission success rate of this algorithm is better than other classic routing algorithms, meanwhile, it also has lower transmission delay and routing overhead.


I. INTRODUCTION
With the rapid development of network technology and mobile communication applications [1], [2], various smart portable devices have emerged and are widely used, such as tablet computers, smart watches, smart bracelets, and so on. Through these devices, users can communicate anywhere and anytime to share information. However, due to the mobility of these devices, in some extreme environments, the network performance will be degraded or even unable to communicate due to frequent network breaks. These problems make traditional wireless network-related technologies no longer applicable [3]. Mobile opportunistic network, the current research hotspot, can effectively deal with these issues [4]- [8].
The associate editor coordinating the review of this manuscript and approving it for publication was Songwen Pei . Mobile opportunistic network is a network model suitable for wireless communication, compared with the traditional wireless sensor network, the biggest feature of this network is that the information transmission between nodes needs to find ''opportunities'' for communication, as there may not exist persistent network connection between these two nodes, each node stores the data which need to forward and then sends it to the meeting node. Moreover, in mobile opportunistic network, due to frequent movement between nodes, the network structure is constantly changing, so that the nodes in the network can organize themselves to form a local selforganizing network [9], [10], such as community structure, group structure and so on, then this networks will add some certain fixed facilities to fulfill communication [11].
In mobile opportunistic networks, generally, network device is carried by people, so that each node has certain VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ social attributes. So some research works [12]- [14] use historical data carried by network nodes to predict the possibility of node encounters, and then design community routing algorithm for message forwarding. A number of research works [15], [16] divide the nodes in network into communities according to their time characteristics, and then proposes a community-based routing message forwarding mechanism. However, although there are more community-based network routing protocols, how to divide communities effectively is still a valuable problem. Although most of the community division methods take into account the interest of nodes, the similarity of nodes, and social relations in the real scene. However, for the divided communities, the existed algorithms do not consider re-merging or re-clustering these communities with the same properties. At the same time, because not every node in the community can forward messages effectively, those nodes with low transmission efficiency and high transmission delay should be removed from the community in time, so as to reduce the routing overhead and improve the success rate of message transmission.
Based on the problems of the above work, we formulate the community cluster algorithm, and explore a dynamic community update strategy and then finally propose a community clustering routing algorithm based on information entropy. The main works of this paper are summarized as follows.
(1) We first analyze the related community routing algorithms and then use the idea of unsupervised learning to define how communities are divided. (2) After the community is established, we use information entropy [18] to measure the amount of node information, and further merge and cluster the initially divided communities to ensure the integrity and unity of the communities in the entire network. (3) We propose a dynamic community update strategy based on information entropy, this strategy comprehensively consider the various social attributes of nodes, and then assign different weights to each node's social attributes, so as to ensure that all nodes in the community are efficient nodes for message forwarding. The remainder of the paper is organized as follows. In Section 2, we describe and analyze the related community routing algorithm. Section 3 presents the mode of community clustering routing based on information entropy in detail. Simulations are presented and discussed in section 4. Finally, we conclude the paper in section 5.

II. RELATED WORK
In mobile opportunistic networks, how to effectively forward the data carried by mobile smart devices has become a hot issue in this field. At present, Relevant researchers in this field have proposed various routing algorithms for data storage and forward. Among them, the Epidemic protocol [12] is based on the flooding algorithm strategy for message transmission, the main idea of this protocol is that when two nodes meet, the source node carrying the message sends the message to the meeting node. After a series of intermediate nodes forwarding, the message will finally reach the destination node. Because each node has a high probability to obtain a copy of the message, then the destination node also has a high probability to meet the node which carrying the copy of the message and causing the message to be forwarded successfully. Theoretically the epidemic protocol has a higher message transmission success rate but also has a larger routing overhead. The Direct Transmission Protocol [20] requires the source node to store the message to be forwarded and can only forward the message to the destination node. This process will continue until it encounters the destination node or the end of the message life cycle. Therefore, the transmission overhead of the Direct Transmission protocol is low, the data transmission delay is high, and the success rate is low. In [21], the authors proposed a multicopy routing algorithm based on a single branch diffusion strategy to dynamically control the number of replicas of a packet. The algorithm comprehensively takes into account the energy consumption of the nodes, the probability of message arriving at the destination node, the time to reach the destination node, and the real-time status of the packets. However, the algorithm has a high latency, and the performance of the routing algorithm is relatively low in some areas where network services are not available. In [22], the authors explained the influence of node cache on routing algorithm and suggested that if the node lacks connection and cannot forward the message, it will buffer the message according to the queue strategy, and then transmit the message after the connection is established again. In the case of congestion, the node discards the message according to the discarding strategy. The scheduling and discarding strategy together constitute the management of the node's buffer, which can affect routing performance.
In addition to the traditional meeting and forwarding message model, there are also some routing protocols based on probability, among them the PROPHET algorithm [23] is a more classical one. In this algorithm, the authors proposed a transitive probability prediction method, which used historical data of node encounters and transmission records to calculate the forwarding probability of messages between nodes, and to judge the probability of the message being delivered to the destination node. Based on the research of probabilistic routing protocols using the historical information of encounters and the transferability, in [24] the authors used the encounter interval to estimate the probability of encounter, and then selected the retransmission node according to the probability of encounter. Aiming to imbalanced data problem, an algorithm of 3D Augmented Convolutional Network to extract time series information was proposed in [25]. A novel routing algorithm based on mobility prediction was proposed in [26], which calculated the probability of message transmission to a node in a certain area based on the semi-Markov model. However, in actual situations, the performance of this algorithm will be affected by the movement of the device, as it is difficult to update the mobile data immediately. In [27] the authors proposed an efficient routing protocol based on machine learning. This protocol uses the concepts of decision trees and neural networks to predict the success rate of packet delivery.
Since smart devices are often carried by people, the movement trajectory of the node represents the behavior trajectory of the person, so the movement of the node has certain social attributes. Some research works have designed efficient routing algorithms based on the sociality of nodes. For example, Bubble Rap routing protocol [28] is one of the classical algorithms that apply node social relationships to design forwarding strategies. The authors use the centrality and community of nodes to calculate the global ranking of nodes in the entire network and the local ranking of nodes in the community. The message will only be forwarded to the higher-ranked node to increase the success rate of message transmission. However, as this algorithm uses centrality as the only indicator for message transmission in the routing process, several nodes with higher centrality may be overloaded, while other remaining nodes are too idle. Considering the correlation between cluster set and specific time, a message forwarding routing algorithm based on cluster set named DRAFT was proposed in [29]. In [30], a clustering routing protocol based on the social attributes of nodes is proposed. This protocol defines the dominant and recessive social relationship characteristics of nodes, and assigns the weights of the dominant and recessive features according to the historical data information of the nodes. Finally, combined with the proposed feature information, the node can select the remaining nodes that have a close social relationship with itself to form a cluster, so as to forward the message to his objective node. Although the protocol takes into account the explicit and invisible social characteristics of the node, it increases the probability of message forwarding in the network, but at the same time it also increases the route overhead and brings a larger load to the network environment. In paper [31], the authors introduced dynamic social features to collect the contact behavior information of nodes, at the same time they also considered the multiple social relationships of nodes. Then, based on the multicast comparison (that is, when the message meets multiple community structures that conform to the forwarding, select a node that is more similar to the target community for forwarding), the community structure is used to select the best relay node to improve the routing transmission efficiency. In [28] the authors proposed a routing algorithm based on social relations, this algorithm uses improved median centrality to evaluate the heterogeneity of nodes, and then introduces a forwarding judgment factor in the community to speed up message forwarding. Finally, this algorithm selects forwarding nodes according to a simple community identification algorithm with a decay mechanism. This scheme can better balance the two routing evaluation metrics of node transmission success rate and average transmission delay, but the algorithm proposed by this scheme has higher network routing overhead. A distributed algorithm based on social graph is proposed to detect overlapping communities and solve the problem of community evolution by using the frequency and duration of encounter between nodes to generate social graph. Based on this, a community-based routing scheme is designed in [31].
Different from the existing research, we proposed a method to select the initial cluster center node by comprehensively considering a variety of attributes between nodes. Then, according to the selected initial cluster center node, we used K-Modes algorithm to divide the initial cluster community. Since the divided communities may have similar types of communities, we re-cluster and merge the preliminarily divided communities according to the change of information entropy to ensure that the communities divided in the network have a high degree of unity. At the same time, we propose a dynamic community updating method based on information entropy to reduce the invalid nodes in the community and ensure the efficient transmission of messages in the community.

III. SYSTEM MODEL DESIGN
With the rapid development of mobile network technology and the widespread use of mobile devices and software, the information acquisition and forwarding between different people can be more convenient. As shown in Figure 1, sometimes the exchange of information between users does not require face-to-face, the inter-communication by using software on mobile devices for text, voice and video is very convenient. This feature is more convenient for long-distance information communication. Therefore, in order to improve the QoS of wireless communication, it is necessary to study network routing with high transmission rate and low latency. In mobile opportunity network, the distribution of nodes has the characteristics of clustering in a certain period of time, which is similar to human living in a form of community. Therefore, the characteristics of community can be used as a reference indicator in the mobile opportunity network, and during this period, nodes in the network will be in a state of long-term cooperation, which is easier to succeed in forwarding messages than isolated nodes. Therefore, the performance of community-based routing algorithms in opportunistic social networks is better than traditional routing algorithms. This chapter is mainly used to define how VOLUME 10, 2022 communities are divided and how to update their strategies and explain the process of message transmission across the network.

A. CLUSTER-BASED COMMUNITY PARTITION METHOD
Cluster-based community division method proposed in this paper can be divided into three stages. In the first stage, we mainly pre-process the data. As there are many kinds of information contained in the nodes, and each has its own unit of calculation. In order to facilitate the calculation and improve the accuracy of the results, we should pre-process the data firstly. The second stage is to select the initial clustering center in the entire network, which can determine the number of initial communities to a certain extent. Then the K-Modes algorithm is used to cluster nodes and divide communities. The third stage is to determine the final number of clustered communities, that is, to further cluster and merge the communities identified in the second stage using information entropy, which to determine the number of eventually divided communities and to ensure that the categories of nodes within the community are to the maximum extent similar. The process of dividing communities described above is shown in Figure 2. By performing a clustering process, nodes in the entire network can be divided into the following communities. (1) Node data preprocessing In the mobile opportunity network, each node represents a user or a mobile device, which contains a variety of attribute information, such as the number of times the nodes meet Nm, the time required to deliver the message Tm, the number of times the message was successfully delivered Nd, and so on. Normally, when there is a message need to forward, the node will forward the message to an encounter node that meets the forwarding criteria. However, if the node carrying the message encounters more than one node that meets the forwarding criteria at the current time t, the message transmission will experience less latency only if the node carrying the message forwards the message to the node that meets the destination node with a higher probability. Therefore, how to select the meeting nodes that satisfy the criteria is very important. This selection process can be judged by the priority of the node, which is an important factor affecting the success rate of message forwarding. The equation is expressed as: where NPR t i represents the priority value of node i at the current time t. As mentioned in the previous section, SeN t i represents the number of times that node i has successfully forwarded the message at the current time t, T start ij and T end ij respectively represent the sending time of node i and the receiving time of node j, and the range represented by node j is determined by node i that carries the message. The meaning of this formula is that the priority of a node consists of two parts, that is, the time it takes a node to forward a message and the number of times it successfully forwards a message. Only the less time it takes a node to forward a message, and the more successful a node forwards a message, the higher the priority of the current node.
In order to understand the attribute characteristics of nodes more intuitively and conveniently, and make the data better processed in the following context. Based on the related characteristics of the nodes mentioned above, we can build an evaluation matrix for these characteristics as follow: In the above matrix Q, the node set is defined as V = {V i |i = 1, 2, 3 . . .}, the standard set of node attributes is defined as NF={Nm, Tm, Nd, NPR t }, Due to the different units of calculation of the information, the meaning of the information is also different. If it is used directly, the result error may be large. Therefore, before the calculation, in order to eliminate the dimensional effect between the data variables and make each variable have the same expressive force, the data is standardized. The formula is as follows where q ij represents the j-th attribute value of the current node i,q j represents the average value of the j-th attribute of all nodes in the network, and S j represents the standard deviation of the jth attribute of all nodes in the network. R ij represents the new data obtained after processing these index data. Among thenq j and S j can be express as: (2) The selection of cluster centers and the division of cluster communities

1) DETERMINATION OF THE NUMBER OF NODE CLUSTERING COMMUNITIES
In real environment, the trajectory of human life presents a certain social phenomenon, such as the community where people live, shopping malls, cinemas, hospitals and so on. Compared with other activity areas, the communication between people in these areas is more frequent, and it is more conducive to the forwarding of messages. This feature can be reflected in the activity of nodes in mobile opportunity network. The relatively active nodes in the network may forward more messages in a certain period of time, and also have a higher probability of forwarding messages to the destination node. While the less active nodes will have a higher delay of transmitting messages than the active nodes, and the success rate of transmitting messages is also lower. We know that the clustering process is actually the process of other nodes grouping around the nodes with high activity. If the nodes with low activity are used for clustering nodes, then the message forwarding may cause the failure, delay and loss of message forwarding. The activity of a node can be expressed as the number of times a node forwards messages. When calculating the activity of a node, an initial value is assigned to it. Then, every time node i successfully forwards a message, the activity value of node i is SeN i plus one. In addition, the activity value of a node depends not only on the number of times the node successfully forwards messages, but also on the number of times the node receives messages ReN i . In general, the activity value of a node can be expressed as the number of times a node successfully forwards a message plus the number of times it receives a message. The equation can be expressed as: In (6) where SeN t i represents the number of messages forwarded by node i at time t, and its initial default value is set to 1, ReN t i represents the number of messages received by node i at time t. A t i represents the activity value of node i at time t. In order to compare the activity value of the node at the current time in the network more conveniently, we normalized the equation as follows: , j = 1, 2, 3 · · · n (7) In (7), max represents the maximum activity value of the node at the current moment, and min represents the minimum activity value of the node at the current moment. The size of the node activity value can preliminarily determine the number of node clusters. If the node activity is greater than the average activity value Ap t of all nodes in the network at the current moment, the number of clusters of nodes in the current network N k is increased by one. The formula for the average activity value of the nodes in the network at the current moment is: From (8) we can get the number of clustering nodes. Firstly, we use the K-Modes algorithm to cluster the network nodes and divide the communities directly, then we perform hierarchical clustering, instead of starting from a single node to perform hierarchical clustering. The purpose of hierarchical clustering is to merge communities with similar properties.

2) SELECTION OF INITIAL CLUSTER CENTER
It can be seen from 1) that the node set is defined as V = {v i |i ∈ V}i ∈ V}, and the attribute feature set form of the node is defined as U nf , among then U represents the union of each attribute range, and U nf represents the range of attribute nf. So V * NF → úU is an information function, which assigns an information value to each attribute of each node, namely ∀nf ∈ NF, x ∈ V and f (x, nf) ∈ U nf ,then the distance between the nodes x i , x j is: where: When selecting the cluster center node, if we only consider the distance between the selected node and the initial cluster center node, it may cause the remote isolated node to be selected as the cluster center node, and the node that is really suitable for the cluster center will not be selected. Therefore, in addition to considering the distance of the node, we also need to consider the surrounding condition of the node, that is, the surrounding density of the selected node, which can be expressed as the following expression: It can be seen from the above equation that the greater value ND (x i ) of the density of node i, the more data nodes are distributed around node i, and the higher the probability that node i will be selected as the initial cluster center node. Therefore, the node with the largest density can be selected as the first initial cluster center node x 1 , and the density expression is as follows: VOLUME 10, 2022 After the first cluster center node is determined, it can be put into the set CN = CN ∪{x 1 . When selecting the remaining cluster centers x j , in addition to the density factor of the node, it is also necessary to consider the distance between the node and the selected cluster center node. Therefore, the traditional maximum and minimum distance algorithm is further extended, and the equation can be expressed as: Each time a cluster center node is selected, it should be put into the CN set, and compare whether the size of the set CN is less than or equal to the number of clusters N k , if it is less, the steps of selecting the remaining cluster centers should be continue to perform, otherwise, the algorithm will directly output the final the cluster center node set CN.

3) DIVISION OF INITIAL CLUSTERING COMMUNITIES
After the initial clustering center node selection is completed, we can use the K-Modes algorithm to divide the entire network into clustering communities. The detailed steps are as follows: Step 1: Select the initial cluster center node set CN, and assuming the set CN={cn 1 , cn 2 , cn 3 . . . cn k }.
Step 2: For the set of nodes in the network V ={v i |i = 1, 2, 3 . . . n}, we compare the difference between its and the k initial clustering center nodes, and then add the current node i to the community which the initial clustering center node with the smallest difference. The formula is expressed as: In (14), v il represents the l-th eigenvalue of node i, cn jl represents the l-th eigenvalue of the initial cluster center node j, and NDF i represents the minimum difference value between the current node i and the initial cluster center node j.
Step 3: Repeat Step 2 to add eligible nodes to the cluster community where the corresponding initial cluster center is located, until all nodes in the network are traversed.
(3) Cluster and merge communities and then determine the final community size It can be seen from the above 2) that the K-Modes algorithm finally selects a certain initial cluster center node and the divided cluster community set. Each cluster community set contains at least one or more nodes. However, the K-Modes algorithm is clustering algorithm based on selected cluster center nodes. In the same area, this algorithm classifies nodes with the same attribute according to the selected cluster center, but nodes in different areas may be clustered according to the same attribute index, to divide the scale of clustered communities with certain attributes. Therefore, we can merge clustered communities with the same attributes. If the two communities that choose to merge are more similar in structure, the disorder caused by the merger will be smaller. If the differences between the two merged communities are larger, the greater disorder they will bring. Since information entropy can better reflect the distribution characteristics of data, we can use the amount of change in entropy after the merger of the two communities to measure the dissimilarity between the two communities.
Information entropy is a quantitative index to measure the information content of a system, and it is a description of the uncertainty of things happening. The size of information is related to the probability of the event. The smaller the probability event, the greater the amount of information, and the larger the probability event, the smaller the amount of information. In the process of community clustering, the nodes that are divided into the same community have a closer relationship. Compared with the nodes between communities, the probability of transmitting information is greater, and therefore, the amount of information is also smaller. On the contrary, the greater the amount of information. Information entropy can be expressed as: (15) In (15), E (nf i ) represents the information entropy value of each feature nf i of the node, and p (x i ) represents the probability function of the feature nf i . Set p (x i ) = 0, then p (x i ) log 2 (x_i) = 0, that is to say, the information entropy value of feature i is 0. It can be seen from the formula that the function E (nf i ) satisfies monotonicity, non-negativity, and accumulation. When using entropy to analyze node clustering characteristics, these properties must also be satisfied. At the same time, because information entropy requires multiple logarithmic calculations when calculating, it often leads to data overflow and the calculation is abnormally timeconsuming, so we reconstruct the equation (15) according to the above principles as follows: It can be seen from this expression that the feature nf i s probability function is p (x i ) and its value range is p (x i ) = [0, 1]. When the probability p (x i ) obtained by the feature nf i of the node is smaller, the value (1 − p (x i )) 2 is larger, and the entropy value E (nf i ) of the feature nf i obtained finally is also larger, so the more information that can be generated. On the contrary, the larger the value p (x i ), the smaller the value (1 − p (x i )) 2 , the smaller the entropy value E (nf i ), so the amount of information is smaller. Therefore, the equation (16) satisfies the entropy-related properties defined by Shannon, the father of information theory, that is, monotonicity, non-negative and additive.
For those community collection set C ={c 1 , c 2 , c 3 . . . c k } divided by the K-Modes algorithm, C i , C j ∈ C, i = j, and C i = ∅, C j = ∅, C i ∩ C j = ∅. Assuming that the number of nodes in the C i subset is h, based on the accumulation of information entropy, then E ( where E x is the information entropy of the xth node, and the amount of information entropy change between C i and C j can be expressed as: In equation (17), |C i | and C j respectively represent the number of node elements in the community set i and j, E (C i ) and E C j respectively represent the information entropy value of the community set i and j. E C i ∪ C j represent the information entropy value of the community set i and j after the merging. IE C i , C j represents the change of information entropy of the community set i and j after merging compared with that before merging.
The change of information entropy defined by the above expression can reflect the difference in the clustering structure between the two communities. If the amount of change is zero, it means that no impurities are brought into the clustering result during the merging process. Similarly, a small amount of change indicates that the clustering structure has not changed significantly, and a large amount of change indicates that the clustering results have changed significantly. As the merger between the two communities is between the selected community and the new generated community. In order to further express the change rule of community structure among different communities in the process of clustering, we expressed the relationship of entropy change as follows: Among them, I (k − l) represents the minimum value of the change in entropy in the network from the current k clustered communities to k−1 clustered communities. Related research shows that when I (k) is approximately equal to I (k−1), it can be considered that the change of entropy in the process of community merging is approximately equal. In other words, as long as the structure of the cluster community has not undergone major changes, it can be merged into one community to increase the effective transmission probability of nodes.
The flowchart of clustering and dividing communities is shown in Figure 3. The community division method adopted in this paper uses the idea of unsupervised clustering, the number of communities that can be divided is obtained through pre-processing before clustering, so as to finally form a community structure with similar characteristics.

B. DYNAMIC UPDATING STRATEGY OF COMMUNITY NODES
The network community is composed of multiple closely connected nodes, but not every node in the community has the ability to become a relay node. Nodes in the community that do not meet the transmission conditions can be considered as inefficient nodes. Deleting these inefficient nodes can reduce transmission overhead and improve community transmission efficiency. For each community in the network, we analyzed various social attributes of the node and measured the impact of various social attributes on information transmission. By weighting and combining these attributes, we can get a comprehensive indicator that can be used to evaluate the number of nodes in the community and reduce the number of invalid nodes.
It can be seen from A.(1) that we have performed preprocessing operations on the data of the nodes, and according to equation (2), we have normalized the data matrix Q. Then we calculate the probability P ij of each attribute value of the current node i in the entire network on the processed matrix, the formula is as follows: According to the reconstructed information entropy expression (16), the information entropy value of the attribute can be calculated as shown below: (20) In mobile opportunistic networks, the importance of each attribute is different. It is usually necessary to make a quantitative description of the importance, that is, to determine the weight of each attribute. Therefore, we will normalize the weight of each index and then define the weight of the j-th attribute of the current node i as: VOLUME 10, 2022 Then we can estimate the evaluation value of the current node i, and the formula is as follows: (22) According to equation (22), we can judge whether the current node j can meet the needs of the relay node according to the evaluation value, which helps to delete inefficient nodes from the current community, and can better control the size of the community to ensure all nodes in the community have high intimacy. Through this scheme of reducing nodes in the community, we can filter nodes and delete some of them that do not meet the transmission requirements in the community. After reducing the number of nodes that do not meet the transmission conditions, the relationship between nodes in the community is closer, and all nodes have higher transmission capabilities. Figure 4 below this paragraphs shows the process of clustering nodes in the network and merging and dividing communities. Some inefficient nodes are deleted to improve the performance of the algorithm. Compared with the original single-node algorithm, it not only improves the transmission success rate and transmission speed, but also reduces the transmission delay and routing cost.

C. THE COMMUNITY-BASED MESSAGE FORWARDING STRATEGY
In summary, as mentioned in above A and B, we can get several close communities from the network, and the nodes in these communities will be more favorable for message forwarding. The CREN algorithm proposed in this paper can effectively improve the success rate of data transmission and reduce the routing overhead. When two nodes meet, message forwarding will occur only when the meeting node is in the same community as the target node or is the target node itself. The specific forwarding process is as follows: When node V i carrying a message meets node V j , node V i will first determine whether node V j is the destination node. If it is the destination node, node V i will forward the message directly to node V j and deletes the message in its own cache. Judge whether Node V i can join set CN 4: If (Node V i is suitable for joining community C j ) 5: Join Node i to community C j 6: End if 7: End for 8: For each community 9: Calculate E (C i ) , E C j , E C i ∪ C j of community C i , C j 10: Calculate the combined entropy difference IE C i ,C j 11: If (I (k) ≈ I (k − l)) 12: Merge community C i , C j 13: End if 14: End for 15: Forward meesage from source node S 16: If (destination node D and S is in the same community) 17: Message forwarded to node in the community S allocate 18: Else 19: If (the message node V i in the community D allocate 20: Message forwarded to node V i 21: End if 22: End if

23: End
If V j is not the destination node, node V i will forward the message according to the following two situations: (1) Send messages within the community If node V i and the target node are in the same community, in addition, node V j and the destination node are also in the same community, then node V i will forward a message copy to node V j . Otherwise, the message will not be forwarded.
(2) Messages forwarding between different communities If the target node and node V i are not in the same community, node V i will ask node V j to query whether they are in the same community. Then node V j will determine whether they are in the same community according to its own community information. If they are in the same community, node V i will directly forward the message to node V j . If it is not in the same community, the message will not be forwarded.
The purpose of this method is to prevent useless messages from flooding the whole network and to avoid the increase of network load and the collapse of the network. Before node V i forwards the message, it will save the message in its own buffer for a period of time, The length of time depending on the size of node V i 's buffer. During this period of time, if node V i encounters a qualified forwarding target, it will forward the message. Otherwise, when the message times out, node V i will delete the message from the buffer. Therefore, the size of the node's cache is also crucial for message forwarding. In order to have a deeper understanding of the message forwarding process, the detailed algorithm flow is as follows:

IV. SIMULATIONS
The experiment of this paper is completed on the platform of opportunistic network simulator ONE [32], through this platform the performance of CREN algorithm proposed in this paper is analyzed. Compared with EPIC [12], BUBBLE [27] and DRAFT [28] algorithm, the performance of CREN algorithm is demonstrated. Among them, EPIC algorithm is a classic routing algorithm, which is based on flooding strategy for message transmission. BUBBLE algorithm is a routing algorithm based on dividing communities and passing messages by comparing the centrality between communities. DRAFT algorithm is also an algorithm based on clustering, which mainly considers the correlation between clustering set and specific time, so as to forward messages based on clustering set. After considering the data information required by the algorithm, four real datasets of Infocom5, Infocom6, Cambridge and Intel are selected for simulation experiment. The specific information of each dataset is shown as the table below: In the simulations, we use iMote as experiment nodes, each node is mobile with speed in the range 0.5-1.5m/s and each node has 5M cache. In order to verify CREN algorithm, we uses three performance evaluation indexes, transmission success rate, routing overhead and average endto-end delay. Transmission success ratio Tr refers to the ratio of the total number of data packets successfully arriving at the destination node and the total number of data packets to be transmitted sent by the source node in a given time.

Tr =
success_number created_number This metric describes the ability of routing algorithm to correctly forward data packets to the destination node. Routing overhead refers to the total number of data packets forwarded by a node in a certain period of time. It is usually evaluated by the overhead rate, that is, the ratio of the total number of data packets generated by the source node to the total number of data packets forwarded by all nodes. The formula is expressed as: forward_number − delivered_number delivered_number Transmission delay refers to the time required for data packets to arrive at the target node from the source node, which is usually evaluated by the average transmission delay. Compared with routing algorithm, transmission delay is small, which has strong transmission ability and high transmission efficiency. At the same time, it means that data packets will occupy less resources in transmission process.

A. TRANSMISSION SUCCESS RATIO
The delivery ratio is defined as the ratio of the total number of messages successfully reached their destination and the amount of total messages. The higher delivery ratio is, the greater the probability that the message will reach the destination node successfully.  Figure 5 shows the impact of time changes on the packet transmission success rate of CREN, BUBBLE, DRAFT, and Epidemic algorithms. When the simulation time is less than one day, the advantage of the algorithm CREN is not obvious, because the process of forming a community in the network takes a certain amount of time, so in a relatively short period of time, the nodes in the community are not enough for effective data forwarding. As the simulation time increases, we can find that the transmission success rate of the CREN algorithm is always the highest among these algorithms. In the CREN algorithm, the nodes in the network are divided into several communities, and all nodes in the community may often communicate with each other. At the same time, the CREN algorithm proposes a dynamic update community strategy based on node attributes to calculate the entropy value, which can minimize the number of unqualified nodes in the community. The BUBBLE algorithm is a community-based routing algorithm, but it considers the time relationship of the nodes without considering the factors of social attributes, while the CREN algorithm takes into account a variety of related attributes of the nodes, so it has better performance. For the Epidemic algorithm, it does not take into account the characteristics of the node, but forwards the message according to the probability, which has a large randomness, and is not suitable for scenarios that require high stability. For the DRAFT algorithm, the priority of the node is not considered, so it is possible that the node that is more conducive to message forwarding does not receive the message, which causes the failure of message transmission.

B. TRANSMISSION DELAY
The average end-to-end transmission delay of each algorithm is shown in Figure 6.Compared with other algorithms, the CREN algorithm has the lowest average end-to-end delay. As the CREN algorithm proposes a community dynamic update strategy by analyzing the relevant attributes of nodes, it can reduce inefficient nodes in the community, thereby reducing the average end-to-end delay. Compared with the Epidemic algorithm, when encountering the node, the message is directly forwarded, which may cause a sharp increase in routing and forwarding delay. The BUBBLE algorithm combines the characteristics of community and centrality, and can delete the number of copies in time when forwarding messages, so the transmission delay is lower than traditional algorithms. For the DRAFT algorithm, in the process of forming a cluster set to forward messages, it takes into account the attributes of the node's time range, so it is a traditional routing algorithm with low latency. Among the above algorithms, the average end-to-end transmission delay of the CREN algorithm is the smallest.  Figure 7 shows the comparison of routing costs between the above algorithms. Since the CREN algorithm adopts the idea of dynamic community and fully considers the relevant attributes of the node, the average cost of CREN is always kept to a minimum. In the CREN algorithm, instead of blindly forwarding messages to any node, nodes selectively forward messages to other nodes to distinguish them through the community. As a result, the number of copies of messages in the network is low, and the cost of routing is greatly reduced. Compared with the CREN algorithm, the Epidemic algorithm, because of its arbitrary message forwarding mechanism, requires a lot of time and resources for redundant message copies, resulting in very high routing overhead. Although the DRAFT algorithm takes into account the characteristics of the clustering set, it may cause excessive waiting time for message forwarding, which will lead to a part of the routing overhead. Although the BUBBLE algorithm takes into account the centrality of the nodes, it can reduce the resource consumption of some unavailable nodes, so routing overhead can be optimized. In short, compared to other algorithms, the routing overhead of the CREN algorithm is the lowest.

C. ROUTING OVERHEAD
Based on the experimental results in Fig. 5, Fig. 6 and Fig. 7, when the simulation time is less than one day, the proposed algorithm CREN has no obvious advantages over other algorithms. However, when the network simulation time is extended to at least 3 days, the CREN algorithm can increase the transmission success rate by 34.6%, reduce the transmission delay by 31%, and reduce the routing overhead by 26%.CREN has the characteristics of high transmission success rate, low transmission delay and low routing overhead, so it is a good routing protocol with high performance.

V. CONCLUSION
This paper proposes a community clustering routing algorithm based on information entropy and the social relationship between nodes in the mobile opportunistic network.
Compared with other routing algorithm, our algorithm perform good performance based on the experimental results. The algorithm first proposed a method for selecting initial clustering nodes based on the density and distance of nodes and their attribute relationships. Then, the K-Modes algorithm combined with the selected initial clustering nodes was used to divide the entire network into communities. In order to ensure that the divided communities have a high degree of unity, a scheme is proposed to merge and cluster the communities in the network using the change of information entropy. At the same time, in order to ensure the efficiency of information transmission by nodes in the community, a community dynamic update strategy based on information entropy is proposed. This method analyzes various attributes of nodes, comprehensively evaluates nodes, and deletes inefficient nodes. Combining the above methods, the network node forwards the message in combination with the community state, so that the message can reach the destination node in an efficient and stable transmission mode. Because the algorithm in this article requires the message to be forwarded in the form of a copy, its has higher requirements for the cache of the network nodes, so in the future research work, we will further improve the performance of the algorithm in terms of caching and security.