Signed Network Node Embedding via Dual Attention Mechanism

In signed networks, GNNs are used to get node embedding by aggregating node neighbor information. Most of the existing methods aggregate neighbor information from the node level, and the different paths between nodes and neighbors will also affect node embedding. The target node and its neighbors have different link positive,negative signs and link directions, which together constitute different paths.These different paths have different contributions to the target node.Based on the structural balance theory and status theory, this paper divides the different paths between nodes and their neighbors into 20 kinds of motifs, which are using to capture the different effects of paths on target nodes. Comprehensive consideration at the node level and path level, SNEDA (Signed Network Embedding via dual attention Mechanism) is proposed based on the graph attention Network. The model has two attention mechanisms: node-level attention captures different influences between nodes at the node level; path-level attention captures the different influences between motifs at the path level. The final vector representation of nodes is obtained by aggregating neighbor information selectively based on important motifs, and the vector representation is applied to link prediction. Experiments on four real social network data sets show that the network representation obtained by the model can improve the accuracy of link prediction. Experimental results demonstrate the effectiveness of the proposed framework through a signed link prediction task on four real-world signed network datasets.

Signed network is a two-dimensional network containing positive and negative antagonistic relations. Positive relations include positive relations such as a friend, support, and like, and negative relations include negative relations such as enemy, opposition, and dislike [9]. In the process of node embedding in the signed networks, positive edge, negative edge, and edge direction all pose great challenges to node embedding [10], [11]. In this regard, researchers use the special sociological theory of signed networks (structure balance theory and position theory) to model the network structure to correctly distinguish positive and negative edges, and combine it with graph neural networks or deep learning to learn node embedding [12], [13]. This kind of method also achieves optimal performance in the field of signed network embedding. However, these methods focus on the influence of node neighbors from the node level and ignore the feature of path information. In terms of paths, different directions and symbols of links between nodes and their neighbors form complex paths, and these different paths also affect node embedding.
In this paper, the SNEDA method is proposed to capture path information (link different directions and symbols) by dividing different motifs, to learn the influence between nodes under each motif with node-level attention, and to learn the weight between different motifs with path-level attention. The vector representation of nodes is obtained based on the aggregated neighbor information of important motifs. The obtained vector representation integrates node neighbor information and network structure information. The experiment proves that the SNEDA model can improve the quality of node embedding.
The major contributions of this paper are as follows: 1) We propose a path-attentional layer, which estimates the importance coefficient for different motifs for the embedding aggregation process. 2) We introduce the HAN to model the signed network and design a new motif based GNN model for signed networks named SNEDA. 3) We evaluate the effectiveness of the proposed framework SNEDA on several real world signed network datasets through the signed link prediction task.
The rest of this paper is organized as follows. In Section II, related works are given. Section III introduces sociological theory and motifs. Section IV introduces the SNEDA framework. The experimental studies are shown in Section V. Finally, the conclusions are given in Section VI.

II. RELATED RESEARCH
The methods of network representation learning include methods based on matrix decomposition [14], random walk, and deep learning. DeepWalk [15] and Node2Vec [16] are learning methods based on random walk network representation, which have a good effect in dealing with simple graph structures. However, the real network is sparse and highly nonlinear. These methods can not capture the structure information of the network, nor retain the local and global information of the network. SDNE [5] model uses an automatic encoder to optimize the first-order and secondorder similarity at the same time. The first-order similarity of the model is used as supervised information to retain the local structure of the network; As an unsupervised part, second-order similarity captures the global structure of the network. It is a semi-supervised deep learning model. GCN is a kind of learning node embedding by combining topology and node attribute information in the graph. It belongs to direct push learning, but it can not be directly generalized to nodes that have not appeared in the training process [10], [17]. Therefore, GraphSage proposed that node embedding can aggregate the neighbor information of nodes through a common aggregation function [18]. When training, it only needs to obtain this aggregation function, which can be generalized to unknown nodes. GCN combines the characteristics of nearest neighbor nodes and the structure of the graph, and it is impossible to assign different weights to neighbors during convolution, which limits the generalization ability of the trained model in other network structures [19]. Therefore, the graph attention network GAT is proposed to allocate different attention weights between nodes independently of the graph structure and aggregate the neighbor information with different weights, to greatly improve the expression ability of the graph neural network model [20]. However, the above method is designed for unsigned networks, which have different properties (negative links) compared with unsigned networks. Therefore, the learning method of unsigned network representation can not be directly applied to signed networks.
In recent years, the research on signed network embedding is mainly based on structural balance theory and status theory and uses deep learning technology to represent and learn the network. Shuhan [21] first proposed the SNE model of the signed network with the method of deep learning. SNE combines the edge symbol information and node representation of path nodes with log bilinear model; After that, Shuhan [22] extended the structural balance theory, proposed SiNE model for signed network modeling, and proposed a triple with positive and negative triangular relationship to ensure that the distance between positive relationship node pairs is far less than that between negative relationship node pairs, and gave the measure of similarity, which can better represent the node representation; Tyler derr [23]uses the balance theory to aggregate and propagate the multi-layer information of GCN model in signed networks, but this method ignores the different effects between nodes; After that, Li Yu [24] and others introduced the attention mechanism into the SGCN ] method, allocated different weight coefficients between node pairs, and aggregated the neighbor information of nodes with the structure balance theory; Huang Junjie [25] and others divided the network structure into different topics and aggregated the node neighbor information under each topic with gat; Subsequently, Huang Junjie [26] designed a new information dissemination and aggregation mechanism using the structural balance theory, and used an average sampling theory to learn node embedding. Most of the above methods consider the influence between nodes, and different edge directions will also affect the nodes to varying degrees. Therefore, this paper proposes the SNEDA method, which uses a dual attention mechanism to capture the influence between nodes and the influence of edge direction, to make the vector representation of target nodes more rich and complete.

III. THEORETICAL KNOWLEDGE
A. STRUCTURAL BALANCE THEORY 1) STRUCTURALLY BALANCED TRIANGLE It can be judged by the product of the signs of the three sides of the triangle: if it is positive, the structure of the triangle is balanced; otherwise, the structure is unbalanced. The structural balance [27] of the above triangle is determined from the perspective of sociology and psychology, and can be simply summarized into the following four intuitive understandings: friends of friends are my friends; enemies of friends are my enemies; friends of enemies are my enemies; The enemy of the enemy is my friend. Research shows that in real signed networks, the number of structurally balanced triangles is much larger than that of structurally unbalanced ones, and the unbalanced network gradually evolves into a balanced network over time [28].

2) STRUCTURAL BALANCE CIRCLES
If an L-circles (L 3) contains an even number of negative edges, the structure is balanced, otherwise the structure is unbalanced.

B. SOCIAL STATUS THEORY
Structural balance theory provides a theoretical basis for the analysis of unsigned networks, but there is a large deviation in this theory when it comes to directed signed networks. Subsequently, Leskovec and Kleinbergand [29] proposed a social status theory for signed networks, which holds that if there is a positive edge from A to B, then A has a higher social status than B. If there is a negative edge from A to B, then B has a higher social status than A, and this status is transitive.
In a signed network composed of three nodes, the method to determine whether a triangle conforms to the social status theory is as follows: First, reverse the direction of all negative links in the triangle, and convert the signs on the links to positive signs. If the triangle cannot form a cyclic loop, then this triangle conforms to the social status theory, otherwise, it does not conform. If each member in a system follows the same position sorting method and there is no conflict in position, then as long as the direction of the edge is known, the sign of the edge can be inferred.

C. SIGNED MOTIFS
Discussion based on the above status theory, negative links have different properties from positive links, and different link directions represent different meanings [30], [31].
To distinguish different types of node neighbors, the network structure is divided into different signed motifs to learn respectively according to different directions and symbols of links, as shown in Figure 3. The node is the first-order neighbor of the node, with a total of 4 different motifs. The node is the second-order neighbor of the node, with a total of 16 different motifs.

IV. MODEL INTRODUCTION
This section will introduce the SNEDA model in detail. As shown in Figure 4, different motifs are generated through different types of node neighbors; Node level attention is used to learn the weight of neighbors based on motif and aggregate them to obtain specific node embedding; The importance of learning different motifs with path level attention; The vector representation of the node is obtained by aggregating the neighbor information based on the important motif; Through the downstream task of link prediction, the embedding quality of model nodes is detected. Several notations are listed in Table 1.

A. NODE NEIGHBOR SAMPLING
In real social networks, users' link behavior will be affected by neighbor nodes, and each different type of neighbor node VOLUME 10, 2022  has a different impact on the target node. To eliminate the different effects caused by different edge directions and link symbols, different motifs are divided to learn the influence between nodes and neighbors under the same motif. Therefore, motifs are selected to sample node neighbors information.
Node neighbor sampling specifically refers to sampling the first-order neighbor and the second-order neighbor of the target node. The first-order neighbor can be divided into four motifs. First order neighbor set , represents the first-order neighbor set, V represents all nodes in the network structure; where U represents Union, N(v i ) represents the set of firstorder neighbors. The sampling of F is mainly based on the idea of a random walk: a) starting from the target node v i , it swims to the node v j in turn, and v j is satisfied that v i is a neighbor, v j ∈ N (v i ); b) get the value of m according to the link direction and symbol between the node v i and the node, which will be added to the set F m (v i ).
repeat steps a) and b) until all first-order neighbor nodes have been swimming away.
Similarly, the second-order neighbors can be divided into 16 kinds of motifs, S m (v i ) representing the second-order neighbor set, install the super parameter p to control the number of second-order neighbors. The specific steps of sampling the second-order neighbor sets are as follows: c) starting from the target node v i , it walks to the node v j in turn and satisfies the first-order neighbor of the target node v i ; d) then start from the node v j , swim to the node, v k in turn, meet the node's first-order neighbors, determine the motif according to the direction and symbol of the path, and join v k into S m (v i ); Repeat steps c) and d) until all nodes are completed; After the walk, the nodes in the set are sorted in descending order according to the Degree of Centrality of the nodes, and the nodes before ranking P are retained, that is, the sampling of the second-order neighbor set s is completed.
Through node neighbor sampling, the node first-order neighbor set and second-order neighbor set S m (v i ) are finally

B. NODE LEVEL ATTENTION
Based on the above discussion, we choose node-level attention to learn the influence between nodes and neighbors. Node level attention can learn the weight coefficients of nodes under the same motif and their neighbors, and aggregate the learned vectors to obtain the vector representation of nodes under a specific motif. This section will introduce how to use node-level attention to learn the impact between node pairs, aggregate these learned neighbor representations, and generate a new embedded representation of the target node.
The self-attention mechanism can learn the weight between each node pair. The importance of different node pairs under different motifs can be expressed: Here att node represents the deep neural network executing node level attention, and the attention coefficient e φ ij represents the importance of nodes to nodes V i under a specific motif; h i , h j represents the eigenvector of the node V i V j ; used φ to represent different motifs, that is, motifs can be used to represent {φ 1 , φ 2 . . . φ 20 }.
Then, normalize it with softmax, and the relevant attention coefficient: Here, σ represents the activation function, the attention coefficient a φ ij of the node pair under the specific motif, N φ i represents the neighbor set of a node V j under the specific motif, [] and || both represent the connection operation.
After that, the neighbor information of the sink node V i can be aggregated through the feature vector and corresponding coefficient of the neighbor, which is expressed as follows: After node-level attention learning, nodes v i generate new feature vectors. In a given set i } under different motifs of nodes v i are finally learned by node-level attention.

C. PATH LEVEL ATTENTION
In signed networks, each node contains many types of neighbors, and the node embedding of node attention can only reflect the node from one side. Based on the above discussion, node-level attention helps us to better pick out important node neighbor information. To learn more comprehensive node embedding, we need to integrate more abundant path information, Path-level attention helps us choose important paths better. In this way, our model will selectively aggregate neighbor information based on important relationship paths, which enhance the ability to represent features. Note that node-level attention and path-level attention are not specific techniques, but we consider the influence of node neighbors and paths on the target node at the node level and path level and use the attention mechanism to capture this influence. This section will introduce how to use path-level attention to learn the impact of different motifs Take the node embedding learned by node level attention as the input, and the learning weight of each motif is as follows Here att path represents the depth neural network of execution path attention and represents the weight of the motif. Embed the nodes learned by node level attention into nonlinear changes (for example, one layer MLP), average the importance of all motifs, and the importance of each motif is represented by: Here W is the weight matrix, b is the deviation vector, and q is the path level attention vector. All embedding of different motifs shares the above parameters. After getting the importance of each motif, normalize it with the softmax function: Here B φ i represents the weight coefficient of the motif. The higher the weight, the more important the motif is. Different motifs have different contributions to the target nodes. Through learning, the weight coefficients of different motifs are obtained to obtain the final target node embedded Z:

D. LINK PREDICTION
Through the experiment of link prediction to verify the embedding quality of model learning nodes, the vector representation of each node is learned through the SNEDA model, and the node vector representation in the training set is input into the binary classification logistic regression model as the node feature for the experiment of link prediction. To better learn the model parameters, SNEDA mode el adopts Cross Entropy as the loss function, which is defined as follows: Here C is used to adjust the proportion of the number of positive connections and negative connections. It is the set of all neighbors of the node's positive links and the set of all neighbors of the node's negative links. The function reflects that the embedding of friends is similar, while the embedding of enemies is not. The model parameters are updated by continuously reducing the loss of cross-entropy. After several optimizations, when the loss tends to be stable, the final vector representation of the node V i is obtained. The specific algorithm process is as follows:

V. EXPERIMENT
In the experiment, all codes were written in PyTorch programming language. The computer is configured with CPU i7-6700, six cores and twelve threads, the memory of,1GB, and a graphics card AMD R7 2GB. The experimental process is shown in Figure 5. Firstly, the vector representation of nodes is obtained by representing the learning process; 80% of the connected edges are randomly selected as the training set, and these connected edges will produce the vector representation of nodes through the SNEDA model; The remaining 20% of the connected edges are used as the test set, and the node vector representation in the training set is used as the node feature, which is input into the binary logistic regression model for experiments. Finally, the prediction performance of the connected edge symbols on the prediction model of the test set is counted. Get the set of neighbors of the node Calculate the weight coefficient a φ ij with formula (2) 7 Calculate the node embedding h The loss value is calculated by the formula (8) 14 The gradient is calculated by backpropagation to update the model parameters 15 end for 16 end procedure

A. DATASET INTRODUCTION
In the experiment, four real social network data sets bitcoin alps, bitcoin OTC, Slashdot, and epinions are used. These data can be downloaded from Stanford's large network dataset 1 website. Most of the experimental research on signed networks is based on these four experimental sets. Each edge of these data sets has the meaning of positive edge and negative edge.
Bitcoin-alphs 2 and bitcoin-otc 3 are interpersonal networks that use bitcoin for transactions on the platforms of alpha and OTC. On the network platform, bitcoin users are anonymous. Maintain users' reputation records and prevents transactions with fraudulent and risky users, members of bitcoin alpha and OTC scores other members on a level of −10 (complete distrust) to +10 (complete trust). These score values greater than zero are regarded as positive links, and those at zero are regarded as negative links to form a signed network. Slashdot 4 is a website of before and technology news. News can be provided by all users of the website. Users can choose their ''friends'' ''enemies'' here. They regard friends as positive relations and enemies as negative relations.
Epinions 5 is a consumer review website. Users can decide whether to trust another user by evaluating the quality of products, to users make better choices. All trust relationships and distrust relationships constitute this network, which regards the trust relationship as a positive relationship and the distrust relationship as a negative relationship.

B. BASELINES
The SNEMA model proposed in this paper is compared with the following benchmark methods. The following is an introduction to the benchmark methods: •DeepWalk 6 : a network embedding method based on a random walk is designed for unsigned networks. Here, the negative links in the network are treated as positive links, and the whole signed network is regarded as an unsigned network.
•SiNE 7 : use the characteristics of the signed network to sample and model nodes with a random walk method to obtain node embedding.
•SiGAT 8 : provides a specific structure mode and uses gat to learn the nodes in each mode.
•SGCN 9 : use the structural balance theory to aggregate and spread information through graph convolution and generate node embedding.
•SNEA 10 : learn the weight coefficients between nodes by using the self attention mechanism, and aggregate high-order information by using the balance theory to represent and learn the nodes.
•SDGNN 11 : Based on the structural balance theory in sociology, new information dissemination and aggregation mechanism is designed, and an average sampling theory is used to learn node embedding C. ASSESSMENT METRICS AND EXPERIMENTAL SETUP In this paper, Accuracy, Macro-F1, F1, and AUC are used as evaluation metrics to validate the continuous edge sign prediction results. The higher values of these metrics indicate the more accurate prediction results of concatenated symbols. The ratio of positive and negative relations and the number of neighbors in different social network datasets is not balanced, and the number of neighbors varies greatly among nodes in different datasets, as shown in Table3, which gives the optimal parameter settings in each of the four datasets.

D. ANALYSIS OF EXPERIMENTAL RESULTS
The experimental comparison results of the SNEDA method and baseline method are shown in Table 4 and Table 5. From the chart, we can see that SNEDA is superior to the baseline method in all evaluation indicators on the four data sets, which indicates that the dual attention mechanism helps to improve the quality of node embedding. The following is a specific analysis: a)The performance of the DeepWalk method, which ignores negative links for representation learning, is the worst, which shows that negative links affect the representation quality of signed networks to a great extent, and the representation method of unsigned networks is not suitable 10 https://github.com/liyu1990/SNEA 11 https://github.com/huangjunjie95/SDGNN for signed networks; SiNE based on sociological theory modeling is better than unsigned network, which shows the feasibility of modeling signed network based on structural balance theory. b)SNEA introduces an attention mechanism to capture the different importance between nodes. The experimental results are better than SGCN (giving the same weight to node neighbors), which shows that assigning different weights to nodes through an attention mechanism can improve the quality of node embedding in signed networks. c) SiGAT uses multi head attention to aggregate the information of first-order neighbors to obtain the embedding of target nodes, while SNEA aggregates the high-order neighbor information of nodes based on balance theory and uses the attention mechanism to learn the different weights between node pairs. From the experimental results, the SNEDA method is better in the network with fewer nodes, but not as good as SiGAT in the network with more nodes and is more complex. It may be because SiGAT only aggregates first-order neighbor information, while SNEA can aggregate higher-order neighbor information. d)Compared with SiGAT, the SDGNN can capture highorder structure information with multiple layers. Experiments show that two-layer convolution can achieve the best effect; Both SDGNN and SNEA can process high-order structure information. SDGNN aggregates high-order structure information based on GraphSage, while SNEA aggregates highorder neighbor information through structure balance theory. From the experimental results, SDGNN is more effective than SNEA in aggregating high-order structure information. e) Compared with SiGAT and SNEDA methods, which only focus on the attention weight between nodes, SNEDA adds path attention and considers the influence of different motifs. The experimental results show that the effect has been significantly improved. The results show that selectively aggregating neighbor information based on important relationship paths can improve the accuracy of link prediction, indicating that aggregating node neighbor information with dual attention can improve the quality of node embedding.

E. ABLATION EXPERIMENTAL ANALYSIS 1) ATTENTION MECHANISM
To verify the necessity of node-level attention and pathlevel attention, ablation experiments are carried out on the core components of the model in this section. The SNEDA-s method does not use path level attention, but directly uses node level attention to aggregate neighbor information. SNEDA uses node-level attention and path attention to obtain node embedding. The experimental results are shown in Table 6,the AUC equivalence of the SNEDA method has been significantly improved compared with SNEDA-S, which shows that the effect of node embedding is better when using a dual attention mechanism. The combination of node level attention and path attention can effectively improve the quality of node embedding and improve the accuracy of link prediction. VOLUME 10, 2022

2) NODE NEIGHBOR ORDER
SNEDA-1 means to aggregate only the information of firstorder neighbors, SNEDA-2 means to aggregate only the information of second-order neighbors, and SNEDA means to aggregate the information of first-order and second-order neighbors at the same time, as shown in Figure. 6. From the experimental results, it can be seen that aggregating the information of first-order and second-order neighbors at the same time can improve the link prediction results, which shows that aggregating the information of second-order neighbors can effectively improve the embedding quality of nodes.

F. SUPERPARAMETRIC ANALYSIS
SNEDA model has two important parameters to control the effect of the experiment, which are node embedding dimension d and path attention vector q. This section will analyze the influence of the selection of super parameters on the performance of SNEDA. Bitcoin alpha is selected as the experimental data set. The other parameter values are set to default values when analyzing specific super parameters.
Figure7 shows the AUC and F1 value of singed link prediction performance of the SNEDA model under different parameters. Figure7 (a) and 7 (b) show that with the increase of training rounds, the loss value gradually decreases, the AUC value increases, then gradually converges, and finally tends to be stable.   Figure 8 (a) shows the influence of node attention vector d on the experimental results. It can be seen that the effect of vector dimension reaches the best at about 20. With the increase of dimension, the experimental effect decreases, which may be caused by overfitting Figure 8 (b) shows the impact of path vector dimension q on experimental performance. When the vector dimension is about 64, the effect reaches the best effect, and the effect decreases with the increase of vectthe or dimension.

VI. SUMMARY
In this paper, a graph-based attentional network propagation SNEDA model is proposed to learn the node representation of the signed network. The model consists of two layers of attention mechanism: node level attention learns node level information and learns different weights between nodes; Path level attention captures path level information and learns the importance of different motifs. Through the learning of the SNEDA model, the characteristic information of the target node contains both neighbor information and structure information, which makes the vector representation of the node more complete and rich. Compared with the baseline method, SNEDA achieves better results in the link prediction task, which shows that combining the path level information helps to improve the quality of node embedding, and proves the effectiveness of the SNEDA model. In future work, we will consider sampling higher-order neighbors, such as a random walk or GraphSage, to complete the integrity of node vectors.