Network Representation Based on the Joint Learning of Three Feature Views

: Network representation learning plays an important role in the ﬁeld of network data mining. By embedding network structures and other features into the representation vector space of low dimensions, network representation learning algorithms can provide high-quality feature input for subsequent tasks, such as network link prediction, network vertex classiﬁcation, and network visualization. The existing network representation learning algorithms can be trained based on the structural features, vertex texts, vertex tags, community information, etc. However, there exists a lack of algorithm of using the future evolution results of the networks to guide the network representation learning. Therefore, this paper aims at modeling the future network evolution results of the networks based on the link prediction algorithm, introducing the future link probabilities between vertices without edges into the network representation learning tasks. In order to make the network representation vectors contain more feature factors, the text features of the vertices are also embedded into the network representation vectors. Based on the above two optimization approaches, we propose a novel network representation learning algorithm, Network Representation learning algorithm based on the joint optimization of Three Features (TFNR). Based on Inductive Matrix Completion (IMC), TFNR algorithm introduces the future probabilities between vertices without edges and text features into the procedure of modeling network structures, which can avoid the problem of the network structure sparse. Experimental results show that the proposed TFNR algorithm performs well in network vertex classiﬁcation and visualization tasks on three real citation network datasets.

words. Subsequently, we can find in Refs. [23,24] that SGNS in the network model is equivalent to implicitly factorizing the transition probability matrix M between network vertices, and M D .P C P 2 /=2. Consequently, TADW [25] algorithm and MMDW [26] algorithm optimize the network representation learning procedure based on the theory of matrix factorization. TADW algorithm first introduces Inductive Matrix Completion (IMC) [27] to jointly learn the network representations using network structure features and vertex text features. Both TADW and MMDW adopt the idea of matrix factorization to optimize the network representation learning tasks. The difference is that MMDW adopts Singular Value Decomposition (SVD) [28] to factorize the transition probability matrix M between network vertices, while TADW adopts IMC algorithm to factorize the transition probability matrix M between network vertices. In addition, TADW algorithm uses text features to compensate for the sparse network structures, while MMDW uses network vertex tags to compensate for the sparse network structures based on max-margin theory [29] .
Although many existing network representation learning algorithms optimize the network representation learning tasks based on the network structure features, vertex texts, tags, communities, and other features. However, some important indexes and conclusions of link prediction have not yet been introduced into the network representation learning algorithms so as to optimize the network representation learning tasks. As we all know, link prediction algorithm can predict the future link probabilities between vertices without edges in the networks, and it can also evaluate the link certainty degrees of existing edges, which is also known as the link weights of edges. In addition, vertices in the network also contain a large amount of text contents. In social networks, the texts of vertices are the personal information, comments, published contents, and so forth. In citation networks, the texts of vertices are usually the titles and abstracts of the papers. In order to intuitively show the principle of the algorithm proposed in this paper, the following explanatory diagram is given. The specific results are shown in Fig. 1.
In Fig. 1, we show a simple network structure consisting of multiple vertices and edges. We enlarge the link relations of some areas and set the vertex number of the local network as 1, 2, 3, and 4. These four vertices have three edges, and then we calculate the corresponding weights of these three edges. The weights of existing edges in the network can be considered as the similarity or correlation between vertices. In addition, there are three dotted lines in the local network, which do not exist in the original network in Fig. 1. We define the dotted line as the future links of vertices, and give the weights of three dotted lines, the weight is the future link probabilities between vertices without edges. For example, the weight between vertex 1 and vertex 3 is 0.8, and the weight between vertex 1 and vertex 4 is also 0.8. Therefore, the future link probability between vertex 3 and vertex 4 is 0.9. Similarly, the future link probability between vertex 2 and vertex 3 is 0.2, and the future link probability between vertex 2 and vertex 4 is 0.4. Finally, we give the text contents (paper titles) of these four vertices, and we find that vertex 1 and vertex 3 are both papers related to "max-margin". In order to introduce future link probabilities between vertices without edges and text feature into the network representation learning tasks, we put forward a novel Network Representation learning algorithm based on the joint optimization of Three Features, the algorithm is named as TFNR for short. As can be seen from Fig. 1, there exists the possible future links and existent edges between vertices in the network. The TFNR algorithm predicts the future evolution of the networks through the link prediction algorithm, and calculates the future link probabilities between vertices without edges. Although this kind of probability is obtained based on the existing network structures, it can implicitly guide the network representation learning model to carry out the training in the direction of future evolution results, so that the learnt network representation vectors obtained by this method contain the future influence factors. In addition, TFNR algorithm integrates the text features of network vertices into the network representation vectors, so that vertices with more common words will have a closer space distance in the network representation vector space. In order to embed the above two feature optimizations into the network representation learning tasks at the same time, TFNR algorithm introduces the IMC algorithm, where the essence of IMC algorithm is one kind of matrix factorization algorithm, namely, IMC algorithm learn constraint features from other two auxiliary feature matrices while factorizing the target feature matrix. Consequently, the network representation vectors obtained by TFNR algorithm contain the network structure feature factors, the future link probability factors, text feature factors, etc.

Formalization
In this paper, we define the network as G D .V; E/, where V is a set composed of vertex v, and E is also a set composed of edge e. The input of NRL is the network G, and the input of some NRL algorithms based on the deep neural network is the spectrum or adjacency matrix of the network. The output of NRL algorithms is usually the low-dimensional network representation vector r v 2 R k , where k represents the size of column dimension of the network representation vectors. In this paper, we use the network representation vectors r v 2 R k obtained by NRL algorithms to conduct network vertex classification, visualization, and case analysis tasks, which can verify the network representation learning performance of the proposed TFNR algorithm.

Future link probabilities between vertices without edges
Link prediction algorithm is mainly used to predict the future link probabilities between vertices without edges [30,31] . By sorting all the future link probabilities, we can get the vertex pairs that are most likely to build edges in the next moment. Link prediction algorithm is mainly used in social networks to predict future interactions between friends, and it can also be used in recommendation systems for commodity recommendation [32,33] . The most popular method of link prediction algorithm is based on matrix factorization. The TFNR algorithm proposed in this paper introduces the future link probabilities between vertices without edges into the network representation learning tasks. Therefore, we first need to consider how to measure the future link probabilities between vertices without edges and which algorithm do we need to consider to measure the future link probabilities between vertex pairs without edges.
For the above two problems, we use link prediction algorithm to calculate the future link probabilities between vertices without edges. In the processes of computing the future link probabilities, we only consider the existing structures of the networks to calculate the future link probabilities between vertex pairs, and we do not use the method of combining structures and text features to calculate the future link probabilities between vertex pairs. The main reason is that TFNR algorithm has incorporated the text features of the network vertices into the network representations. Therefore, we can only measure the performance improvement and influence of the future link probability without text features on network representation learning tasks. In order to find a desirable link prediction algorithm, we implement the existing 21 types of link prediction algorithms in the experimental sections, and evaluate the prediction performance of each link prediction algorithm on three real citation network datasets. Finally, we adopt the Matrix-Forest Index (MFI) algorithm to measure the future link probabilities between vertex pairs without edges. Because the MFI algorithm has shown its excellent prediction performance on three real citation network datasets. MFI algorithm can obtain the future link probabilities and link weights between vertices by the following matrix operation: where I denotes the identity matrix with the size of jVj jVj, L is the laplacian matrix of the network G. Note that Eq. (1) can simultaneously calculate the weights of existing links and the future link probabilities between vertex pairs without edges, thus, matrix M MFI is composed of two different kinds of property values. So, we need to divide the weights of existing edges and the future link probabilities between vertices without edges from matrix M MFI .
Suppose that the adjacency matrix of network G is A, C is defined as the adjacency matrix of complementary graph of network G. Consequently, the weights of existing edges can be calculated by M weight D M MFI : A (2) The future link probabilities between vertices without edges can be calculated by M probability D M MFI : C (3) The symbol " : " denotes the product form of matlab programm grammar between two matrices, where the values of the same positions are multiplied.

Structure feature matrix construction
DeepWalk can use CBOW or Skip-Gram to model the relationships between vertex pairs, and use Negative Sampling or Hierarchical Softmax to accelerate the model training speed. Therefore, DeepWalk algorithm can be realized by two models and two optimization approaches. In addition, Skip-Gram model based on Negative Sampling optimization is called as SGNS for short, and its objective function is In Eqs. (4) and (5), t denotes the number of context vertices before and after the current central vertex v i . v i denotes the network representation vector of current vertex v i , v j denotes the network representation vector of context vertex v j . The symbol " " denotes a dot product between two network representation vectors.
In Ref. [24], Yang and Liu found that the essence of DeepWalk based on SGNS model is to implicitly factorize the structural feature matrix of the networks. In the structural feature matrix, the value of each element is In vector e i , the value of i-th term is 1, and the remaining terms are set to 0. The structural feature matrix M constructed by Eq. (6) has higher computation complexity. Moreover, the matrix M calculated by Eq. (6) contains a large number of non-zero elements after logarithmic operation. Therefore, Yang and Liu [24] suggested that Eq. (7) can replace Eq. (6).
We even can define M D P in some dense networks. In the TFNR algorithm, Eq. (7) is used to construct the structural feature matrix M of the network G. Because P can be defined as the first order feature matrix of the network G, P 2 can be defined as the second order feature matrix of the network G. Therefore, the structural feature matrix of network G constructed by Eq. (7) contains both first-order features and secondorder features between network vertices.

Text feature matrix construction
Various data can be transformed into network structure form to display and mine important data. The network reflects the relationships between different objects by edges, which is also the most important features of the networks. However, vertices in the networks also contain rich text features except the edge relationships. For example, the text contents of a vertex are the comments made by other users and comments made by users in social networks, while the followee and follower relationships among users are the edge relationships. In this paper, we mainly use citation networks to verify the network representation learning performance of TFNR algorithm. Therefore, the texts of the vertex are mainly the titles and abstracts of the papers in citation networks. As we all know that the title of the paper is a high summary of the whole paper, and the abstract contains the technology and algorithm information used in the paper. Therefore, if the relationships between vertices are analyzed only based on the vertex contents in the citation networks, the important structural features of the citation networks can also be mined.
In the TFNR algorithm, we first delete all stopwords in the texts of the citation network vertices, and then we delete all these words that the word frequency is less than 10. We then put the rest of the words into an array as a text feature dictionary. The words in this dictionary are the column header of text feature matrix T. The row header of the text feature matrix is the vertices in the network G. The rules for constructing the text feature matrix are as follows: if the column header of the matrix appears in the texts of network vertex, we set the value of this position in the text feature matrix to 1, otherwise, we set the value of this position to 0. The element values of the first row of the text feature matrix are the text feature transformation of the first vertex.
The column dimension of the text feature matrix constructed here is equivalent to the size of the text feature dictionary T. Therefore, the text feature matrix is a matrix with higher dimension, and the matrix contains a large number of zero elements, which results in a larger computation cost in the factorization procedure of the matrix. It is well known that the dimensionality reduction algorithm based on matrix factorization can remove the redundant features between different objects, and retain the features of optimal discrimination in lower dimensional space at the same time. Therefore, the text feature matrix can only be used after dimensionality reduction in the TFNR algorithm.

TFNR algorithm
We have known that the essence of DeepWalk based on SGNS model is to factorize the network structure feature matrix M. In order to explain the factorization process in detail, we give the following diagram in Fig. 2.
According to Eq. (7), DeepWalk implicitly factorizes network structure feature matrix M, where M is a transition probability matrix, and the elements in matrix M are composed of the reciprocals of the degree values of network vertices. As shown in Fig. 2, DeepWalk aims to factorize matrix M 2 R jVj jVj into two independent matrices W 2 R k jVj and H 2 R k jVj , which satisfies M W T H. Therefore, the objective function of DeepWalk algorithm based on matrix factorization is (8), k k is the Frobenius norm, =2 weights the trade-off between kWk 2 F and kHk 2 F . The minimization operation of kWk 2 F and kHk 2 F adds the low-rank constraint for matrices W and H. Figure 2 shows the procedure of modeling network vertex relations. In practical applications, we can use commonly used matrix factorization algorithm to directly factorize matrix M, such as SVD algorithm.
TFNR algorithm adopts the IMC method applied by TADW algorithm [25] to ensemble structure features, future link probabilities between vertices without edges, and text features into network representation vectors. The objective function of IMC is as follows: where matrices X 2 R p m and Y 2 R q n are adopted to factorize the network structure feature matrix M. IMC aims to find matrices W 2 R k p and H 2 R k q to meet the factorization condition M X T W T HY.
However, TFNR algorithm sets matrix X to identity matrix E, thus, the objective function of TFNR is as follows.
In order to intuitively understand Eq. (10), we give a detailed factorization diagram of Eq. (10) in Fig. 3.
As shown in Fig. 3, there exist three matrices M 2 R jVj jVj , E 2 R jVj jVj , and Y 2 R s jVj . TFNR algorithm aims at finding matrices W 2 R k jVj and H 2 R k s to satisfy the factorization condition M E T W T HY; k is the column size of matrix W.
In Fig. 3, we set matrix Y as an auxiliary feature matrix to factorize the network structure feature matrix M, namely, we set the future link probabilities between vertices without edges and text features as matrix Y. Importantly, we also try other feature integration methods, for example, we replace the identity matrix E with the future link probability matrix, and replace the matrix Y with the text feature matrix, but the experimental results show that this kind of feature integration approach gets the worst performance of network vertex classification compared with DeepWalk. Therefore, we first integrate the text feature matrix and the future link probability matrix based on the matrix multiplication form, i.e., M Finally, we use the matrix U p S to replace parameter Y in IMC algorithm. Note that matrix M probability has a size of jVj jVj, matrix T has a size of jVj d, d is the column size of text feature matrix. Generally, the size of d is the same with the size of k. Consequently, we denote W T˚YT H T as the final network representation vectors, which has a column size of 2k.

Experiment and Analysis
In our experiment, we conduct vertex classification tasks on three real-world datasets to evaluate the presented model. Meanwhile, we also visualize our learnt representations of three networks to verify whether TFNR is qualified to learn discriminative representations. We also show the results of the algorithm parameter sensitivity.

Dataset setup
Network vertex classification, visualization, link prediction, and other tasks are generally used to measure the performance of network representation learning algorithm. The case study is also used to compare and analyze the properties of network representation vectors. In addition, the performance of network representation learning algorithm is mainly measured on real-world citation network datasets and social network datasets. In order to find the best parameter combination, parameter sensitivity analysis of network representation learning algorithm is also done on different datasets.
It can be found from Table 1 that the average clustering coefficients of Citeseer, DataBase systems and Logic Programming (DBLP), and Cora datasets are almost the same. According to the average path length, DBLP is a dense network dataset compared with Cora and Citeseer, and Citeseer is a sparse network dataset. Cora is a dense dataset compared with Citeseer, but it  is also a sparse network dataset compared with DBLP. Thus, network representation learning performance of TFNR and different baseline algorithms on three network datasets can be measured by using network datasets with different sparsity.

Baseline algorithms
DeepWalk. DeepWalk [5] is commonly used to do a performance comparison with the proposed improved algorithms of network representation learning. We use the Skip-Gram model and Hierarchical Softmax to construct DeepWalk algorithm. We set window size as 5, random walk length as 80. LINE. LINE [7] only considers the first-order similarity and second-order similarity between vertices, so LINE is faster than DeepWalk algorithm in modeling the relationship between network vertices. However, experimental results show that LINE can improve the training speed and take the loss of the accuracy of NRL. Here, we use the 2nd LINE model to learn the representation vectors of different networks.
MFDW. DeepWalk algorithm aims at factorizing the network structure feature matrix M D .P C P 2 /=2. Thus, we factorize the matrix M to get the vertex representations.
MMDW. MMDW first factorizes the network structure feature matrix M D .P C P 2 /=2 by SVD, and then MMDW algorithm uses matrix W as the vertex representations. Finally, MMDW introduces max-margin theory to optimize the learnt vertex representation vectors.
TEXT. We reduce the dimension size of text feature matrix T to 200 as the vertex representation vectors.
TADW. TADW factorize the network structure feature matrix M based on the text features. TADW also adopts the same factorization algorithm with TFNR.

Classifiers and experiment setup
In Section 3.2, we introduce various network representation learning algorithms used in this paper. In this section, we will talk about the parameter settings of each algorithm. For each network representation learning algorithm, we set the vector dimension of its network representation learning to 200. TADW algorithm and the TFNR algorithm proposed in this paper adopt the same text features. In the network vertex classification experiment, we set the proportion of training set from 10% to 90%, and we provide the network vertex classification accuracies of TFNR algorithm when is 0.1, 0.4, or 0.7. In the visualization and case study, we set to 0.7. We repeat our experiment 10 times, and then take the average accuracy as the final result. Finally, we use LIBLINEAR [34] as the classifier for network vertex classification tasks.

Experimental results and analysis
In order to evaluate the future link probabilities between vertices without edges, we need to decide how to weight the probability and how to choose the link prediction algorithm. Consequently, we adopt the 21 link prediction algorithms introduced in Ref. [35] to evaluate the prediction performance on Citeseer, DBLP, and Cora datasets. We set the proportion of training set as 0.7, 0.8, or 0.9, and measure the performance of each link prediction algorithm with Area Under the Curve (AUC). The detailed link prediction results are shown in Table 2.
It can be observed from Table 2 that MFI algorithm achieves the best prediction performance on Citeseer, DBLP, and Cora datasets. Therefore, TFNR algorithm uses MFI algorithm to calculate the future link probabilities between vertices without edges, and the specific calculation approach can be referred to Eq. (3).
TFNR algorithm measures its network representation learning performance through the network vertex classification tasks. Therefore, we conduct the network vertex classification tasks on the Citeseer, DBLP, and Cora datasets to measure the performance of various NRL algorithms. The detailed results are shown in Tables 3, 4, and 5.
Based on the results of Tables 3, 4, and 5, we have the following conclusions: DeepWalk algorithm is the most classical network representation learning algorithm and it is also the representative algorithm of network representation  and Cora. But MFDW and DeepWalk algorithm achieve almost the same network vertex classification performance on dense DBLP dataset. LINE can improve the training speed of the NRL algorithms through the loss of learning accuracy, but LINE is very suitable for largescale network learning tasks, for example, LINE obtains slightly inferior to DeepWalk algorithm on the dense DBLP dataset, but its training speed is much faster than DeepWalk. MMDW is also a network representation learning algorithm based on matrix factorization, which adopts the node labels to optimize the network representation vectors. Therefore, the network representation learning performance of MMDW algorithm is also better than DeepWalk, LINE, MFDW, and other algorithms. Specifically, MMDW further optimizes the network representation vectors trained by MFDW algorithm. The experimental results show that these optimizations are feasible and effective.
On DBLP and Cora datasets, the network vertex classification performance of TEXT is worse than that of DeepWalk and MFDW. However, if the target factorization matrices of MFDW and TEXT are integrated by IMC algorithm, the integrated algorithm is called as TADW algorithm, and the network vertex classification performance of TADW algorithm is better than that of MFDW and TEXT. On Citeseer dataset, the network vertex classification performance of TEXT is better than that of MFDW, and the network vertex classification performance of TADW algorithm is superior to that of MFDW and TEXT after integrating structure feature matrix and text feature matrix. These results show that the integrated features can fully reflect the features of the network structural properties if the different properties of the network feature matrices are integrated. In addition, the network representation learning algorithms based on multiview feature integration can generate excellent network vertex classification performance, which is superior to that of single network learning algorithm.
Inspired by TADW algorithm, TFNR algorithm proposed in this paper tries multiple feature integration methods. Finally, we find the best feature integration method for network representation learning tasks, which integrates the text features of the network vertices and the future link probabilities between vertices without edges by matrix multiplication form. TFNR replaces the text feature matrix with the concatenated feature matrix based on the framework of TADW algorithm. Experimental results show that TFNR algorithm achieves excellent network representation learning performance under the three different parameter settings. And TFNR algorithm introduces the future link probabilities between network vertices without edges based on TADW algorithm, consequently, its performance is better than that of TADW algorithm on the classification tasks. These results show that the classification performance of the network representation learning algorithm can be effectively and stably improved by introducing the link probabilities between vertices without edges.
These observations demonstrate that TFNR can learn high quality of network representations. Moreover, the classification accuracy of TFNR is also competitive though we do not perform specific optimization for the classification tasks.

Parameter sensitivity
In the previous sections, we evaluate and analyze the network vertex classification performance of TFNR algorithm on Citeseer, DBLP, and Cora datasets. In order to make a fair comparison with baseline algorithms, we uniformly set the network representation vector length k and the trade-off of the low-rank constraint in Eq. (10), respectively. In this section, we use the network vertex classification task to analyze the effects of different sizes of k and . Note that the size of the network representation vector is equivalent to the size of the column dimension of W T˚YT H T . The detailed results are shown in Fig. 4.
As shown in Fig. 4, we set the size of network representation vector to 50, 100, 150, 200, and 300. The experimental results show that TFNR achieves poor network representation learning performance when the size of network representation vector is 50 on Citeseer, DBLP, and Cora datasets. When the network representation vector size is 300, TFNR achieves better network representation learning performance. These results show that the classification performance of TFNR algorithm increases with the growth of the network representation vector size.
In addition, we set the size of to 0.1, 0.2, 0.3, 0.5, 0.8, and 1.0. Although the value of varies from 0.1 to 1.0, the network vertex classification performance of TFNR algorithm almost remains stable. Therefore, has negligible effect on network vertex classification of TFNR algorithm.

Visualization
Network representation vector visualization is another measure method of NRL, where the network representation vectors are projected to 2D visualization space. If network vertices of the same categories show a stronger internal cohesion, and network vertices of different categories show a clearer classification boundary, consequently, we suggest that the proposed network representation learning algorithm learns and generates the understandable and discriminative network representation vectors. Therefore, we visualize the learnt network representation vectors trained by DeepWalk and TFNR on Citeseer, DBLP, and Cora datasets. The detailed results are shown in Fig. 5.
As shown in Fig. 5, we randomly select 4 categories on Citeseer, DBLP, and Cora datasets, and then we randomly select 200 presentation vectors from the selected 4 categories for visualization. We use t-SNE algorithm to visualize the learnt network representation vectors. The experimental results show that the network representation vectors trained by DeepWalk algorithm show the worst visualization results on the Citeseer dataset, and show good cohesion and obvious classification boundaries on DBLP and Cora datasets. The visualization results of network representation vectors obtained by TFNR algorithm proposed in this paper is obviously better than that of DeepWalk algorithm on Citeseer. On Cora and DBLP datasets, the visualization results of TFNR algorithm and DeepWalk algorithm are almost the same. Therefore, there exists little difference about the visualization results between different algorithms on the dense network datasets, while TFNR algorithm proposed in this paper can generate better visualization results on the sparse network datasets, namely, TFNR can learn the discriminative network representation vectors on the sparse networks.

Case study
In the above sections, we verify the performance of TFNR by various tasks, such as network vertex classification and network visualization, and we also discuss the performance impacts for network vertex classification of TFNR based on network representation vector size and in Eq. (10). In this section, we will analyze the properties of the network representation vectors trained by the TADW and TFNR algorithms. Therefore, we first set the network title of a target vertex as "Maximum margin planning", then we adopt cosine method to calculate the five most relevant vertices of the target vertex. This experiment is a case study on the DBLP citation network. Therefore, we analyze the properties of the learnt network representation vectors by showing paper titles of the most relevant vertices.   The specific results are shown in Table 6. We first analyze the papers that have reference relationships with "Maximum margin planning", namely, we analyze the vertices that have link relationships with this target vertex on the DBLP network. By checking the references of the paper "Maximum margin planning", we find that the paper "Maximum margin planning" cites the papers "Solving large scale linear prediction problems using stochastic gradient descent algorithms" and "Apprenticeship learning via inverse reinforcement learning" amongst the five most relevant papers in Table 6. The paper "Learning for control from multiple tunings" cites the paper "Maximum margin planning". The papers "Robot learning from demonstration", "Algorithms for inverse reinforcement learning", and "Policy invariance under reward theory and application to reward shaping" have no any link relationships or word co-occurrence of text feature with the paper "Maximum margin planning", but it is the most relevant paper of the paper "Maximum margin planning". Therefore, we consider that one or two of these three papers and "Maximum margin planning" will be cited by a new paper in the future. Therefore, the introduction of the future link probability between vertices without edges in TFNR algorithm would make the learnt network representation vectors reflect the future evolution structures of the networks.

Conclusion
In order to introduce the future link probabilities between vertices without edges as well as text features into the network representation learning framework, we propose a novel network learning algorithm-TFNR. TFNR algorithm tries a wide variety of feature integration methods between the future link probabilities and text features, eventually, we find a best feature integration method. To embed the features of different properties into NRL framework, TFNR algorithm introduces the inductive matrix completion algorithm. Experimental results show that TFNR algorithm proposed in this paper can show excellent network vertex classification performance on Citeseer, DBLP, and Cora datasets. Through the comparison analysis of TFNR and TADW, it can be found that introducing the future link probabilities between vertices without edges can greatly improve the performance of network representation learning. In addition, the visualization experiment results show that the network representation vectors obtained by TFNR algorithm can make vertices of different categories show clearer classification boundaries, and make vertices of the same categories show stronger internal cohesion. In conclusion, the TFNR algorithm proposed in this paper is a network representation learning algorithm with excellent performance and stability.