Node Similarity Measure in Directed Weighted Complex Network Based on Node Nearest Neighbor Local Network Relative Weighted Entropy

Node similarity is a significant basis for analyzing features in complex network. For complex network with directed weighted edge, the complexity of the relationship among nodes and the diversity of relationship weights make measure node similarity complicated from the huge amounts of nodes. Therefore, a novel node similarity measure is proposed based on the design of node nearest neighbor local network relative weighted entropy. Firstly, degree and strength based directed weighted complex network model is constructed, on the basis of that, the Node Nearest Neighbor Local network is defined. The structural features of each node in the local network are quantized into a set of decision probabilities with multiple indicators. Furthermore, Node Connection Tightness is given to compute the influence of structural complex relationship in local network on node similarity. And then, to evaluate how the outward & inward degree and strength of nodes in the local network affect node similarity, the concept of Node Nearest Neighbor Local Network Relative Weighted Entropy is designed to define Node Relative Difference for measuring the structural difference between any two nodes. Accordingly, a novel node similarity measure is designed to measure the similarity between any pair of nodes in directed weighted complex network, and then the Similar Node Mining Algorithm is proposed to obtain most similar nodes. To clarify the availability and effectiveness of the proposed measure, two sets of experiments were conducted on real-world complex networks. The results show that the measure can not only mine nodes with the most similarity in the same module, but also mine the most similarity nodes from different modules.

In the study of complex network, one of the most important directions is node similarity measure, and measuring node similarity is a fundamental work. Many studies on complex network are based on node similarity measures. For example, in complex networks, the classification of nodes [10], importance assessment of nodes [11], link prediction [12] and community division [13]- [15], these studies are based on node similarity measures.
Researchers have proposed various methods to measure node similarity, for example, node similarity is measured based on path and distance [16], including most existing node similarity measures are based on nodes and their neighbor nodes [17], [18]. To simplify measuring, more studies are on undirected unweighted networks [19]. Although it is faster and simpler to measure, much more node information and complex relationships between nodes are lost, resulting in incomplete decision information of a large number of nodes. This makes the dipartite degree of most node similarity obtained by measures is small, and node similarity cannot be accurately computed. With the development of research, there are new researches on weighted complex networks [20]. For weighted complex networks, the weights of node itself and its neighbors are introduced. Even if the weights are considered, the influence of nodes' complex relationship information on node similarity in the local network topology cannot be effectively measured.
Since most nodes in the network usually look similar in structure characteristic and are simply to be measured, quantitative structures and internal features have become the focus of measuring node similarity. Compared to undirected unweighted complex networks, directed weighted complex networks have some complex factors, such as the complexity of the relationship between nodes and the diversity of relationship weights, these factors make node similarity measure more complicated.
Thence, in this paper, in order to measure node similarity in complex networks with directed weighted edges accurately, a degree and strength based directed weighted complex network model is constructed. And on the basis of the model, a Nearest Neighbor Local network can be identified, which takes into account both the nodes' local structure and details of the relationship among nodes. In the stage of measuring node similarity, the weighted entropy and relative weighted entropy are both introduced, and the Connection Tightness is given, Node Nearest Neighbor Local Network Relative Weighted Entropy is proposed to achieve the Node Similarity Measure. And then, Similar Node Mining Algorithm is designed to measure node similarity and obtain most similar nodes in complex network accurately. The processing diagram of Node Similarity Measurement and similar nodes mining in directed weighted complex network is shown in Figure 1.
The rest of paper is organized as follows. Section II presents a directed weighted complex network model. Section III introduces Nearest Neighbor Local network with some primary definitions, and the design of the Node Nearest Neighbor Local network relative weighted entropy is described. In Section IV, with the above work, a novel Similar Node Mining Algorithm is proposed to measure node similarity in complex network. And the experimental evaluation to verify the feasibility and effectiveness of the proposed algorithm will be given in Section V. The conclusion and future work are discussed in Section VI.

II. DEGREE AND STRENGTH BASED DIRECTED WEIGHTED COMPLEX NETWORK MODEL
In order to analyze the node similarity of any two nodes in complex network, a degree and strength based directed weighted complex network model is designed and constructed as G(V , E, W , K , S). In given network, the set of nodes is denoted as V = {v i }. E = e ij denotes the set of directed edges from v i to v j , namely e ij = v i , v j , the set of weight can be expressed as W = w ij correspondingly, and w ij is the weight value of e ij . K = } is the set of node strength, and it includes in-strength s in (v i ), out-strength s out (v i ) and strength s(v i ) of v i . What's more, the edges between two nodes may have different direction and weight, which can be expressed as follows.
Take one Gene network as an example, its directed weighted complex network model is built as shown in Figure 2. The weight in the model is a correct expression of the actual complex network, and arrow size in this figure represents the weight of the directed edge in the network, which indicates the amount of exchanging chemical signals between genes.
Definition Node Nearest Neighbor Local Network: In a degree and strength based directed weighted complex network model, for the structural relationship between each  node and its nearest neighbor nodes, the local network formed the node nearest neighbor local network. This structure contains all the characteristics and information of nodes, their nearest neighbor nodes and edges between the nodes.
Take node 360 of the Gene network as an example, the Node Nearest Neighbor Local network of node 360 can be shown in Figure 3.

III. NODE SIMILARITY MEASURE BASED ON NODE NEAREST NEIGHBOR LOCAL NETWORK RELATIVE WEIGHTED ENTROPY
As an asymmetrical measure of the difference between two probabilities, the relative entropy [21] is a important conception in the probability theory and information theory.
The relative entropy of two probability distributions R and T can be described in the following form.
where N is the number of elements in the probability set R and T. In order to measure node similarity in complex network, Node Nearest Neighbor Local network is defined firstly for each node in Section II, which including the node itself, its direct neighbor nodes and the corresponding relationships among these nodes. With the local network, except for information of nodes and edges, the information of its local structure is also retained for one node.
Therefore, on basic of the nearest neighbor local network of a node, the concept of relative entropy is introduced to propose Node Nearest Neighbor Local Network Relative Weighted Entropy by using the following definitions 1 to 6. And then, based on Relative Weighted Entropy, Node Relative Difference can be designed to define a novel Node Similarity Measure.   (12) where K D i and S D i are represent the degree set and strength set of V D i respectively. And sum( , which express the node information probability of transmitting node and receiving node respectively. And then, the above divided probability elements of v i can be calculated on the basis of node degree k(v i ) and node strength s(v i ) respectively.
With a comprehensive application of formula (12) to formula (14), the set Q i can be obtained and expressed as follows.
Definition 4: In the directed weighted complex network model, the weight values of edges are transformed to reflect the degree of the tightness of connection between nodes in the local network topology. Therefore, the Node Connection Tightness of v i and v j can be defined and calculated by the following formula.
For the weight w ij between v i and v j has symmetry feature, and s in and s out are positive values, therefore the node connection tightness con ij in directed weighted complex network has symmetry con ij = con ji .
Definition 5: Owing to the complexity of the relationship between one node and its nearest neighbor nodes in different scales, on the basis of definition 3, and the probability of node degree and strength in definition 1 is also used, Node Multiple Indexes Decision Probability Set R i can be defined as follows.
The determination of the Node Multiple Indexes Decision Probability Set is beneficial to calculate Node Nearest Neighbor Local network Entropy accurately.
VOLUME 8, 2020 With Node Nearest Neighbor Local Network Relative Weighted Entropy value of each node, the difference between any two nodes can be compared by using the two nodes' Node Relative Difference, which is defined as follows.
Definition 7: According to the decision property value of nodes, the complexity of the relationship and the diversity of relationship weights between the two nodes v i and v j will be defined as Node Relative Difference RD KL (H (v i )||H (v j )),which can be obtained as follows. As can reason from the formula (26), RD KL (H (v i )||H (v j )) and RD KL (H (v j )||H (v i )) are not equal. Therefore, the similarity between the two nodes v i and v j needs to be calculated jointly by RD KL (H (v i )||H (v j )) and RD KL (H (v j )||H (v i )). By utilizing the node relative difference between v i and v j , a novel Node Similarity Measure based on Node Nearest Neighbor Local Network Relative Weighted Entropy (LRWE-SNM) can be designed and expressed as follows.
In summary, the above formulas will be used to get the Node Similarity measure of Node Nearest Neighbor Local network Relative Weighted Entropy(LRWE-SNM). And then the similarity between any two nodes in complex network can be computed, the larger measure value between two nodes means the greater similarity between these two nodes.

IV. SIMILAR NODE MINING ALGORITHM
To mine high similar nodes in a directed weighted complex network, a Similar Node Mining Algorithm is proposed by using Node Nearest Neighbor Local Network Relative Weighted Entropy based Node Similarity Measure. With the algorithm, the similarity of each pairs of nodes in the network can be measured, and then the most similar nodes in network can be mined.

Algorithm
] by using formula (12) to formula (15); 7. Calculate the Node Connection Tightness con ij by using formula (16) 10. Calculate node outward influence measure H out and node inward influence measure H in by using formula (23) and formula (24) respectively, and Node Nearest Neighbor Local network Entropy H can be obtained with Definition 6; 11. For each v j (i = j) in V do 12 to 13 12. Using the above Node Nearest Neighbor Local network Entropy, the Node Relative Difference RD KL (H (v j )||H (v i )) of nodes v i and v j can be get; 13. Calculate node similarity measure value Sim ij between v i and v j 14. Rank all nodes in V by using measure values in an ascending sort order 15.Obtain the Similar nodes set As can be seen from the Similar Node Mining Algorithm, it is a five-stage process. The first stage is step 1, the directed weighted complex network model is constructed, which will be used for obtaining structural information to measure node similarity in the network. In the second stage, from steps 2 to 3, the Node Nearest Neighbor Local network can be identified for each node on the basis of the above model. And the third stage is steps 4 to 8, where in each node's Node Properties Set and Neighborhood Node Information Set can be calculated, and the value of Node Connection Tightness of each node can also be obtained. And by considering the complexity of the relationship between nodes within the local network, the Node Multiple Indexes Decision Probability Set can be get. The fourth stage is steps 9 to 10, with the above information, Node Nearest Neighbor Local network Entropy of each node can be calculated. The last stage is steps 11 to 15, the Node Relative Difference and the node similarity measure are computed in order, and then all high similar nodes set V Sim are obtained, which contains top-t similar nodes subsets.

V. EXPERIMENTAL ANALYSIS
In this section of experiment, to verify the feasibility and rationality of Node Nearest Neighbor Local Network Relative Weighted Entropy based Similar Node Mining Algorithm, the algorithm is applied to two groups of reality complex networks with varying sizes. (i) E.coli transcription network [22] with 95 nodes and 213 edges. In this network, the 'node' is operon, and each 'edge' is directed from an operon that encodes a transcription factor to an operon that it directly regulates (an operon is one or more genes transcribed on the same mRNA). There are also three gene networks of different sizes. (ii) Gene network with 636 nodes and 3959 edges [23]. In this network, 'node' represents gene, and the weight value of 'edge' are not much different from each other.
Two sets of experiments were conducted, one set is the overall similar nodes scatter plots obtained under four different similarity measures, which reflects the distribution of similarity value between nodes in complex network. The other set is top similar node tables mined under the different similarity measures, which is analyzed for comparing accuracy between the proposed measure and other three ones.

A. NODES SIMILAR SCATTER PLOTS
In order to evaluate the feasibility of the proposed Similar Node Mining Algorithm in measuring node similarity and mining top similar nodes of network, LRWE-SNM, Local relative entropy (LRE) [19], and two typical measures Eigenvector (EI) [24] and Weighted Degree (WD) [25] are used for measuring node similarity. And the overall similar nodes scatter plots of E.coli and Gene network under the four measures are shown in Figure 4 to Figure 7 respectively.
As can be seen from Figure 4(a) and 4(b), the node similar scatter plots by measures LRWE-SNM and LRE are showed respectively, and each node has only one most similar node.
From Figure 5(a) and 5(b), and different colors represent the different most similar nodes for each node. And these two scatter plots are symmetric about the function y = x, which is consistent with the symmetry of the proposed measure. Furthermore, one node may have multiple most similar nodes, which makes the similarity measurement of the nodes lower differentiation degree and less accurate.
As shown in Figure 6(a) and 6(b), although each node corresponds to one of the most similar nodes, from overall view, the nodes in Figure 6(a) are more dispersed and the degree of discrimination is larger. This makes the similarity value between nodes in the network better distinguishable, and each node has the most similar nodes corresponding to itself. Figure 7(a) shows the disparity in node dispersion under WD, and 7(b) shows the partial enlargement with nodes No. 1 to 100, in which each nodes contains more similar nodes. And the remaining nodes in the Figure 7(a) have fewer similar nodes.
In LRWE-SNM and LRE methods, by comparing the networks of different sizes in Figure 4 to Figure 7, it can be found that the most similar node measured by LRE is close to nodes in its local network, and few most similar nodes are similar to nodes in other parts of the network. With the increase of the scale of the network, this distribution phenomenon becomes more and more obvious. Furthermore, the most similar nodes found by LRWE-SNM are more dispersed in the overall network, and the most similar nodes can be mined in the entire structure of network. At the same time, this will not be affected by the network size.

B. TOP SIMILAR NODES UNDER DIFFERENT MEASURES
In order to illustrate the effectiveness of the proposed similarity measure, top similar nodes under different measures of Gene network with 636 nodes and 3959 edges was analyzed in combination with the above scatter plot. First, the values of similarity between each two nodes are arranged in a large to small order. When the similarity of a number of nodes is the same, the first 10 nodes are selected according to the serial number of the nodes, and the selected values are the nodes corresponding to the top 10 and the last 10, which are used as analysis objects. The similarity values between these nodes and their corresponding most similar nodes are listed in Table 1 and Table 2 respectively as follows.
Next, the nodes in Table 3 were extracted comprehensively from Table 1 and Table 2, and the duplicate values were removed. And then, the most similar nodes measured by LRWE-SNM, LRE, EI and WD are listed in Table 3 as follows. Columns LRWE-SNM and LRE show the most similar node. Since the number of the most similar nodes obtained by the measures EI and WD is large, so, not all of nodes are listed in the table, columns EI and WD only show the same nodes as the previous two columns showed, and the remaining most similar nodes are indicated by ellipsis. The number in parentheses represents the number of the most similar nodes mined by this measure. ''/'' means there is no most similar nodes for corresponding node in the first column.
According to the results shown in Table 3, in the column of measure LRE, one of most similar node for many nodes in the first column is 76, and the corresponding nodes in the first column still have more similar nodes or no similar nodes in measures EI and WD.
For the Gene network model, the similar nodes of Gene network with structure characteristics are revealed as shown in Figure 8. Select nodes that the most similar nodes is 76 as example. As can be seen from Figure 8, node 76 is in the red area. And the nodes with the most similar node of 76 mined by LRE measure are node 5, 9,18, 23, 24, etc, which are represent in the same area and express as green in the figure. However, the most similar nodes mined by the method designed in this paper is distributed throughout the network. For example, node 76 and node 9 distributed in green area are the most similar nodes to each other by using LRE method, and node 9 and node 434 distributed in yellow area are the most similar nodes to each other by using LRWE-SNM method, this can explain that the method proposed in this paper can not only mined nodes with large similarities in the same module, but also nodes with large similarities between different modules.

VI. CONCLUSION
In order to accurately measure the node similarity in directed weighted complex network, a complex network model with defined Node Nearest Neighbor Local network is designed and constructed. According to this model, the characteristics of nodes and edges in the structure are digitized. With the form of weighted entropy and relative entropy, the Node Nearest Neighbor Local Network Relative Weighted Entropy is designed on the basis of the Node Nearest Neighbor Local network, the difference of structural information between nodes can be evaluated. On this basis, a Similar Node Mining Algorithm based on Node Nearest Neighbor Local Network Relative Weighted Entropy is designed to measure the similarity of any two nodes in a directed weighted complex network. In addition, through two sets of comparison experiments with other three classic measure algorithms on reality complex networks of different scales. The experimental results show that the most similarity node in the same module can be obtained, and the most similarity nodes from different modules even can be mined by the measure.
In future work, to measure the heterogeneity of complex directed network more precisely, the structure of Node Nearest Neighbor Local network are used based on entropy, the difference between local networks is considered to quantify the structural heterogeneity of directed weighted complex network.