Heterogeneous Information Network Embedding for Mention Recommendation

,


I. INTRODUCTION
With the exponential growth of of social networks such as Twitter and Weibo, a lot of information is created and spread by millions of users every day on these platforms. Furthermore, information flow among users is very easy to form large-scale cascade diffusion through social connections. This feature has attracted significant interests from different social application tasks, such as helping market promotion [1]- [3], influencing political election [4]- [6], and detecting fake news [7]- [9].
Social networks naturally consist of communities corresponding to certain social circles or interest groups. The propagation of information flow therefore is hindered due to a combined effect of structural trapping, as shown in Figure 1(a). Fortunately, social network offers a key function named Mention, which allows user to place other users The associate editor coordinating the review of this manuscript and approving it for publication was Fabrizio Messina . in a message by the form of @username. For example, Figure 1(b) illustrates the examples of mention on social network, where the user @David post a message ''Welcome to San Francisco, USA!'' by mentioning another users @Alice and @Bob, and the user @John express a love to @Lucy with the message ''@Lucy awe i love you too!!!! 1 am here!!!!!!''. These mentioned users will be received a notification by the Mentions tab. From this, we can see mention can break through the structural trapping of social network and expand the scale of diffusion with the containing mention messages. On the other hand, mention also plays an important role in enhancing the strength of relationship. Therefore, it is important and necessary to explore whom to mention in a message on social networks.
Many studies on whom-to-mention in a message have been reported on social networks in recent decades. A simple but powerful strategy is to model the recommendation of whom-to-mention based on the influencing factors of mention behaviors using rank learning method [10]- [13]. Meanwhile, whom-to-mention can also be viewed as two classification problems on basis of link prediction algorithm [14]. Facing a large number of candidate users, mentioning the optimal users in the appropriate time is also an unbalance assignment problem [15]. Taken together, these studies suggest that the influencing factors of mention behaviors can be summarized as the content of message [11], [13], [16], the strength of social influence between users [13], [17], spatiotemporal information [11], [15], and the topic interests of user [17], [18]. However, none of the above models jointly consider multiple different object types (e.g., user, message and link relationship, etc.) in mention task.
The study of mention on social networks is based on the fusing heterogeneous data, including network structure, user profile and historical behaviors, as shown in Figure 1(c). From the figure, we can see the links capture the following relationships, the posts denote the publishing actions, and the mentions indicate the mentioning behaviors. This is a typical heterogeneous network through user, message and behaviors relationships. Besides, data sparsity is the common problem on social networks. It directly impedes the social models to achieve good results on specific tasks. To address these issues, network representation learning is proposed in the most recent research work. The network embedding algorithms not only encode the topological structure of each node, but also consider the rich information of node and alleviate the data sparsity [19]- [22].
Motivated by the network embedding, we propose a novel Network Embedding Mention recommendation model, called NEM, to recommend the right users in a message. This model first constructs a heterogeneous mention network based on following relationships, publishing actions and mention behaviors among different entities. Second, we utilize the network embedding model DeepWalk [19] to generate homogeneous, heterogeneous and correlation relationships, respectively. Third, in order to learn a unified low dimensional embedding vector for entities, we model these relationships from user and message dimensions by considering network structure and vertex content information. Finally, we calculate the relevance scores of the candidate users for a message to recommend the right users. We conduct extensive experiments on a social network dataset to evaluate the effectiveness of our proposed model. The experiment results demonstrate that our proposed model outperforms the previous state-ofthe-art methods in mention recommendations. This work makes the following three contributions: • A heterogeneous mention network is constructed to model different entities such as author, message and user based on different relationships into a unified low dimensional embedding vector space. These heterogeneous information can be computed in the same embedding space.
• We propose a novel mention recommendation model based on deep random walk algorithm on the heterogeneous mention network. The approach can model both network structure and vertex content information.
To the best of our knowledge, this is the first work for mention recommendation to exploit network embedding technique.
• Comprehensive experiments on a social network dataset clearly validate that our proposed model outperforms state-of-the-art methods, which proves the effectiveness of network embedding learning. The rest of this paper is organized as follows. In the next section, we review related work. In Section 3, we give several necessary definitions used in this paper and present the formal definition of the mention recommendation problem. In Section 4, we present a heterogeneous mention network embedding recommendation model for heterogeneous information network and its learning and inference procedures. Then in Section 5, we empirically evaluate our proposed method on a real-world dataset including a comparison to baseline methods along with their results. Finally in Section 6 we give a conclusion of this work.

II. RELATED WORK
In this section, we review the related work from two aspects, network embedding methods and mention behavior modeling.

A. NETWORK EMBEDDING METHODS
A great number of methods with network embedding have been proposed from different dimensions. For example, Wang et al. [23] use deep neural network to represent the nonlinear relations among nodes. Ou et al. [24] employ graph VOLUME 8, 2020 embedding algorithm to preserve high-order proximities of large scale graphs and capture the asymmetric transitivity. Tu et al. [25] consider user preference and social influence to improve the accuracy of social recommendation. Besides, network node incorporating the text content and label information can boost the quality of network embedding representation and improve the learning performance. For instance, Yang et al. [26] incorporate text features of vertices into network representation learning. Tu et al. [27] learn vertex representations to reflect both their network structure and labeling information by jointly optimizing the maxmargin classifier and the aimed social representation learning model. Tu et al. [28] also design context-aware embeddings for vertices with mutual attention mechanism according to the neighbors. Grover and Leskovec [29] propose a semisupervised algorithm for learning continuous feature representations by using a second-order random walk approach to generate network neighborhoods for nodes in network. Besides, the rich interaction information such as retweet, reply also play an important role in learning node embedding representation. Tu et al. [30] model the interactions between vertices with translation mechanism to extract social relation in social networks. Ren et al. [31] combine the node2vec method and the incremental strategy to improve the efficiency and recommendation accuracy in the financial news recommendation task. Yin et al. [32] propose an attention mechanism-based graph neural network to predict links on a bipartite user-item graph using information propagation. In summary, network embedding indeed can alleviate data sparsity and improve the performance of node learning.

B. MENTION BEHAVIOR MODELING
A number of studies have been proposed to analyze whom to mention in a message on social networks. Naturally, the problem can be viewed as the ranking task. Wang et al. [10] use ranking support vector regression to recommend the candidates with some features including user interest match, content-dependent user relationship and user influence. Tang et al. [11] use ranking support vector machine as the solution by utilizing content, social, location and time based features. To solve the mention overload problem, Zhou et al. [12] propose a personalized ranking model by considering multi-dimensional relations among users and tweets to generate the personalized mention list. Besides, Li et al. [13] utilize probabilistic factor graph model with mention relationship as edges and candidates as nodes to help users deal with overwhelmed information. Gong et al. [16] propose a topical translation-based method to predict the mentioned users by taking into consideration both content of microblog and histories of candidate users. Recently, neural network-based methods have been used for the problem. Huang et al. [18] design an end-to-end memory network architecture by incorporating users' interests with external memory. Ma et al. [33] propose a cross-attention memory network by using user's interests with external memory and the cross-attention mechanism to extract both textual and visual information to improve the mention performance. Gui et al. [34] propose a cooperative multi-agent reinforcement learning method for mention recommendation by incorporating dozens of times as many historical tweets as previous approaches. These above mentioned models aim to find the right users to mention in a message for information diffusion. Whom to mention can also be cast as the other tasks. For instance, Jiang et al. [14] use link prediction to predict the mentioned users by using user, textual, link and temporal information features. Bao et al. [35] formulate it as a binary classification task by using three factors from structure, influence, and content. Ding et al. [15] model the task as an unbalance assignment problem using Hungarian method to find the optimal users. However, existing approaches haven't yet been solved the problems well due to data sparsity and rich heterogeneous information. To the best of our knowledge, no existing studies have tackled the problem of whom to mention in a message using network embedding technique. Our study aims to leverage it.

III. HETEROGENEOUS MENTION NETWORK
In this section, we first introduce basic concepts. Then we present notations used in the remaining of the paper.

A. HETEROGENEOUS MENTION NETWORK
A heterogeneous mention network can be represented as where V is a set of nodes, E is a set of directed relationships between nodes, the relationships include following, post and mention. W is a weight matrix that represents link weight among nodes, and C denotes the set of meta information with nodes. Figure 1(c) is an example for heterogeneous mention network on social network. The heterogeneous network not only includes three types of nodes (e.g., authors, messages and users), but also holds three types of links, where following denotes a user follow any others, and posting denotes a message is posted by the author, and mentioning denotes a user is mentioned in a message.

B. META-PATH
Meta-path is defined as a path of linking two different nodes on network schema. In this paper, we formally define metapath as O 1 where O l denotes the type of node, and R l denotes the type of relationship. Obviously, from the definition, G is heterogeneous network.
The Figure 1(c) also shows different types of meta-paths.  Table 1 describes the types of all relationships rules in our mention heterogeneous network.

IV. HETEROGENEOUS MENTION NETWORK EMBEDDING BASED MENTION RECOMMENDATION MODEL
In this section, we present a novel network embedding mention recommendation model, named NEM. This model is a unified learning framework based on random walk, which uses the network structure and vertex content information when learning vertex representations. We optimize the objective function by using hierarchical softmax, and rank the relevance scores of mention candidates.

A. HETEROGENEOUS MENTION NETWORK EMBEDDING
The goal of NEM model is to learn a unified low dimensional embedding vector for user and message in the constructed heterogeneous mention network. Based on the network embedding representation, vertices sharing common edges or with similar text content are close in the embedding space, and can be performed the comparison and computation. In this paper, we divide the heterogeneous relationships among the vertices from heterogeneous mention network G into three categories: homogeneous relationship, heterogeneous relationship and correlation relationship, as shown in Figure 2. In the following, we discuss different relationship modeling and then present heterogeneous mention network embedding.

1) HOMOGENEOUS RELATIONSHIP MODELING
Homogeneous network consists of the same type of target nodes, and the relationships among nodes are obtained from heterogeneous mention network based on user and message dimensions.

a: USER-USER RELATIONSHIP MODELING
There are typically multiple paths both users in heterogeneous mention network. For example, user A follows user B indicates how much A interests B in the simple follow network, and user A interacts user B demonstrates how much A knows B in the interaction network. Therefore, user-user relationships need to be aggregated over these two paths.
According to the idea of DeepWalk [19], we construct a random walk generated from the heterogeneous mention network G. Specifically, we use a random walk path s = {u 1 , u 2 , · · · , u l } to represent a sentence, where each vertex u i corresponds to a word in neural language models. We then utilize DeepWalk method to train Skip-Gram model on the generated random walk collection S u of users, and obtain a distributed vector representation for each user vertex. Given a vertex u i for all random walks s ∈ S u , the objective function of user-user relationship modeling is defined as: where M is the number of users and d is the length of contextual window with u i . Given the current user vertex u i , the probability of finding neighboring vertices {u i−d : u i+d } is calculated as: wherev u i and v u j are the input and output representation vectors of the vertices u i and u j , respectively.

b: MESSAGE-MESSAGE RELATIONSHIP MODELING
According to the results in [11], messages posted by different users with similar topic distributions are likely to mention the same users. Hence, we model message-message relationship to measure the strength of semantic dependence among message vertices. The distribution of topics satisfies the hypothesis that the messages similarities in observed spaces are consistent with the latent spaces. Meanwhile, Similar to the user-user relationship modeling approach, the objective function of message-message relationship modeling is defined as: where N is the number of messages and S m is the generated random walk collection of messages. The probability of observing neighboring vertices {m j−d : m j+d } given a message vertex m i is calculated using the softmax function as: The category of mention relationships on social network. We classify the pairwise relationships among the vertices in the heterogeneous mention network G into three categories: homogeneous relationship, heterogeneous relationship and correlation relationship.
wherev m i and v m j are the input and output representation vectors of the vertices v i and v j , respectively.

2) HETEROGENEOUS RELATIONSHIP MODELING
Similarly, heterogeneous network consists of the different type of target nodes, and the relationships are extracted from the interactions of users and messages in the heterogeneous network.

a: USER-MESSAGE RELATIONSHIP MODELING
There are two types of relationships used in the heterogeneous mention network that are referred to as posting and mentioning, respectively. Posting indicates a user post a message in social network, and mentioning represents a user is mentioned in the body of a message. We consider two relationships as a random walk path, and model user-message relationship to measure the strength of interactions among users and messages. Based on these findings, we leverage the valuable relationship information between users and messages. In particular, we first collect all the messages posted by the same user. Then we use the user vector as input and simultaneously learn the input user vectors and output message vectors. It can be formalized by the following objective function: The probability of observing the message given a user u i can be calculated as:

3) CORRELATION RELATIONSHIP MODELING
Correlation network indicates the attribute information of the target nodes, and the relationships are extracted from the heterogeneous mention network on the basis of contentdimension.

a: USER-CONTENT CORRELATION MODELING
User interests could impact the overall effectiveness of mention recommender. Meanwhile, the interests of a user can be inferred by user generated contents. Therefore, we also model user interests on basis of the content of message. Based on the assumption that similar users have similar patterns of mention behavior in observe space and in hidden space, Our observation also find that users with similar topic interests are likely to be mentioned in the same message. Specifically, we collect all the text content associated with messages which are posted by one user. Texts are represented by bag-of-words model. Finally, the objective function is formulated as Similarly, the probability of observing the words given a user u i is defined as: wherev w j is the output representation of word w j and L is the number of distinct words in the whole heterogeneous information network.

b: MESSAGE-CONTENT CORRELATION MODELING
The goal of mention is to expand information spread. The message contents directly influence diffusion effects. Thus, the correlation between message and content should be considered. Based on the above observations, we collect all the text content corresponded to one message. Then the messagecontent correlation relationship can be modeled as the contextual information of words within a message. The objective function is achieved by maximizing the following loglikelihood function: The probability of observing contextual words given vertex m i is defined as: 91398 VOLUME 8, 2020

B. HETEROGENEOUS MENTION NETWORK EMBEDDING
In this section, we explain the proposed MEN model, which is used to jointly capture the network structure and the content information associated with each vertex in the network.

c: MODEL ENSEMBLE
We incorporate homogeneous relationships, and heterogeneous relationships, as well as correlation relationships into an integrated framework. The objective of the integrated framework is to maximize the following log likelihood function as where λ is a tunable parameter to balance the weight strength between network structure information and text content information, d is the window size of sequence and w j is the j-th word in a contextual window. Notice that the first three terms in Eq. (11) denote mutual reinforced information between users and messages, the last two terms in Eq. (11) indicate the text information and user information will jointly affectv w j , which represents the output representation of word w j . In turn, the word vector representation can further propagate back to influence the input representation of u i and m j in the network. As a result, the vertex representation (i.e., the input vectors of vertices) will be enhanced by both network structure and content information.

d: MODEL OPTIMIZATION
The stochastic gradient descent can directly be used to solve Eq. (4.11), but the process of computing the gradient in getting conditional probability in Eq. (4.2), Eq. (4.4), Eq. (4.6), Eq. (4.8) and Eq. (4.10) is time-consuming and expensive cost. This is because the softmax probability of all words need to be calculated. To reduce computational complexity, hierarchical softmax [36] model can be used to build three Huffman trees. Specifically, user vertices can be considered as leaves, and message vertices as leaves as well as distinct words as leaves. Therefore, given a target vertex u i , then we can compute as follows: where (s 20 → s 21 → · · · → s 2c ) represents the path from the root s 20 to each leaf s 2c for vertex u i . Meanwhile, P(s 2k |u i ) can be further modeled by a binary classifier, which is defined as: where σ (x) is the sigmoid function andv v s 2k is the representation of tree vertex s 2k 's parent. We can also use the technique to compute conditional probability in Eq. (4), Eq. (6), Eq. (8) and Eq. (10).

e: NETWORKING EMBEDDING MENTION RECOMMENDATION
We finally consider the task of mention recommendation as a ranking problem over a set of candidate users p i ∈ P (i = 1, 2, · · · , L). The proposed model aims to return the top ranked candidate users by measuring the relevance scores between user and message. Specifically, given a coming message m posted by u, the relevance scores is calculated based on heterogeneous embedding into vectors as where V m = [v m 1 ; v m 2 ; · · · ; v m N ] is the vector representation of training messages, v m is the vector representation of the coming message m, V u = [v u 1 ; v u 2 ; · · · ; v u M ] is the vector representation of users related to training messages, v u is the vector representation of user u. Once the training is finished, the candidates are ranked by the relevance scores. Finally, we obtain the final user list for mention recommendation based on the top ranked. Algorithm 1 summarizes the whole process that determines the training users relevance scores associated with the given message.

V. EXPERIMENTS AND ANALYSIS A. DATASET AND METRICS 1) DATA DESCRIPTION
Weibo is one of the most popular social network platforms in China like Twitter, which allows users to follow each other, and mentions any users in a message. In this paper, we use a publicly available Weibo dataset [37], which consists of user profile, tweet and the snapshot of network structure, etc. Particularly, lots of tweets contain mention behaviors, which is more suitable for experimental datasets.
In order to achieve high-quality results, we preprocess the dataset by extracting all interactions represented via @ symbol. Since publisher may mention multiple users in a message, we can extract multiple mention instances. Besides, we prepare our dataset by sampling users whose number of historical interactions are between 10 and 1,000 as publishers, to avoid the influences of inactive users and extremely active spammers. Table 2 summarizes the detailed information of the used dataset.

2) EVALUATION METRICS
We use Precision, Recall and F-Score to evaluate the performance of user recommendation methods for the highest VOLUME 8, 2020

Require:
The heterogeneous mention network G = (V , E, W , C), the user A, the coming messages and all the candidates, expected number of dimension of the vector representation k, window size d, iteration number T , the walk length of Random Walk l and the number of the recommended user U . Ensure: Recommendation mentioned user list.
1: Generate random walk collection S u from the user and authors of the training messages, generate random walk collection S m from the coming message and training messages; 2: Generate a user binary tree T u , generate a message binary tree T m and generate a vocabulary binary tree T w ; 3: Get the initial input vector representation v u , v m and output vector representationv u ,v m for the user and each author of the training message, the coming message and all the training messages, respectively; 4: Get the initial output vector representationv w j for each word w j ∈ W 5: for iter = 1, 2, 3, · · · , T do 6: Fixv w j , solve Eq. (11) to update v u , v m ,v u ,v m ; 7: Fixv u ,v m , solve Eq. (11) to update v u , v m ,v w j ; 8: end for 9: Calculate the relevance score r q for the given message and rank the candidate mentioned users according to r q ; 10: Select top ranking Q candidate users as recommendation list.

B. BASELINE METHODS
We compare the proposed model with the following state-ofthe-art methods: • Frequency Descending (FD): The ranked list depends on the frequency ranks of the candidates in the history. A candidate user mentioned with a higher frequency by the author would have a higher rank in the recommendation list.
• WTM: This method uses support vector regression as ranking algorithm by measuring user interest match, content-dependent user relationship and user influence [10].
• PMPR: This model considers the mention recommendation as a probabilistic ranking problem to find the maximal possibility candidate by using probabilistic factor graph model in the heterogeneous social network [13].
• CAR: This method uses a ranking support vector machine model by considering content, social, location and time based features to recommend the target users [11].
• A-UUTTM: This method considers not only the content of a microblog but also the histories of candidates based on translation-based model [16].
• AU-HMNN: The model proposed in this paper incorporates the textual information of query tweets and the history interests of the author and candidate users. The history interests encoder is a memory network architecture with a hierarchical attention mechanism [18].
In our experiments, all of the above baseline methods are set with default parameter settings. For example, the number of mention in CAR is 4, A-UUTTM set µ = 0.8 and the topic number to 30, AU-HMNN fix the number of hops to 6 and the number of embedding dimensions to 300. For our model, we empirically set λ = 0.2, and the embedding dimension to 30, and the window size to 2, and the length of random walk to 20, and the number of iterations to 100.

C. COMPARISON WITH OTHER NETWORK EMBEDDING APPROACHES
To evaluate the performance of heterogeneous information network embedding based recommendation approach, we compare it with the other four embedding based recommendation approaches: • DeepWalk: It learns paper network representation by utilizing network structure information [19].
• LINE: It preserves local and global network structure to learn paper network representation [20].
• Node2Vec: It embeds variable length of text into a fixed length distributed vector using neural network models [29].
• Author2Vec: It embeds variable length of text into a fixed length distributed vector using neural network models [38].
• TriDNR: It simultaneously considers paper network structure and paper vertex content to learn paper network representation [39].
In this paper, the following default parameter values are used in our experiments: In DeepWalk model, we set the number of walks numwalk=80, the length of walk legwalk=40, the dimension of the embedding embedding=128. For Line model, the dimension of the embedding is 100, the number of negative samples is 5, the starting value of the learning rate is 0.025. In Node2Vec model, we set the length of walk to 80, the number of walks to 10, the upper bound of number of neighbors to 30. Meanwhile, we set the dimension of the author embedding to 100, the size of window to 10 in Author2Vec method. For TriDNR, the weight of text is 0.8, the number of feature is 100, and the random state is 2.

D. PERFORMANCE AND ANALYSIS
In this section, we compare the recommendation performance of all baseline methods and variants in Table 3, and show the performance changes on the different number of recommended users in Figure 3. Table 3 with baselines category shows the comparisons of the proposed method with the state-of-the-art methods. From the results, we can draw the following observations: (1) Our proposed model outperforms the baseline methods. Furthermore, the advantages of our approach are more obvious as the number of mention increases. (2) AU-HMNN can achieve better the performance of mention recommendation than most of conventional methods (e.g., CAR, PMPR, WTM, FD) when using an end-to-end memory network. (3) A-UUTTM performs obviously better than the other methods, which demonstrates that the topic interests is an important factor for mention. (4) CAR has a obviously better performance than the aforementioned baselines. (5) The reinforcement learning method like CROMA for mention recommendation is a good research direction. The experimental results show that considering a lot of historical tweets can capture the mention behaviors successfully. In particular, the best results of our proposed model for MRR and Hits@5 are relatively greater performance compared with other methods. Hence, by incorporating homogeneous relationships, and heterogeneous relationships as well as correlation relationships, our proposed model indeed performs well in mention recommendation based on network embedding technique.

2) OTHER NETWORK EMBEDDING APPROACHES FOR EVALUATION
We perform the different network embedding methods in heterogeneous mention network, and the experimental results are shown in Table 3 with embeddings category. From the comparison results, we can conclude DeepWalk-based and LINE-based mention recommendation methods perform poorly in all the comparison results. The main reason is both approaches only consider the structure features of sparse vertices, and ignore the node information. Node2Vec achieves better performance than DeepWalk-based and LINE-based methods. An explicit explanation is Node2Vec designs a smart search strategies for nodes. As results illustrate, Author2Vec shows better performance than all above works. This is because the author nodes hold the rich information comparing to network structure. Compared with all above methods with the topological information and the author information, TriDNR-based mention recommendation approach shows the best performance when exploiting inter-node relationships and node-content correlation to learn an optimal representation for each node. Finally, we can see that the proposed NEM approach outperforms all the other network embedding approaches by utilizing homogeneous relationships and heterogeneous relationships as well as correlation relationships with network embedding representation.

3) RESULT COMPARISON ON DIFFERENT RELATIONSHIP FEATURES
We implement the different variants of our model to show the effectiveness of our proposed factors in Table 3 with features category: • NEM-UU: Eliminating the effect of user-user relationship modeling by removing the first term in Eq. (4.11).
• NEM-MM: Eliminating the effect of message-message relationship modeling by removing the second term in Eq. (4.11).
• NEM-UM: Eliminating the effect of user-message relationship modeling by removing the third term in Eq. (4.11).
• NEM-UC: Eliminating the effect of user-content correlation modeling by removing the forth term in Eq. (4.11).
• NEM-MC: Eliminating the effect of message-content correlation modeling by removing the fifth term in Eq. (4.11).
From the Table 3 with features category, we can conclude that (1) The performance of NEM degrades when eliminating any of those relationship features. (2) Furthermore, we can observe that basically the order of importance for all features is NEM-UM > NEM-UU > NEM-UC > NEM-MM > NEM-MC. More specifically, user-message relationship modeling is the most important feature, which indicates that the topical match degree between the coming message and target users is the most significant factor when occurring a mention. Second, user-user relationship modeling recommendation performs better than the other three. The result indicates that the strength of social influence among users affects future interaction behaviors. Third, performance for message-message relationship modeling recommendation is relatively poor. The primary reason is that message on social network dataset is highly sparse. In summary, we can observe that the topic interests of user and the strength of history interaction are the primary factors in the mention recommend task. Figure 3 shows the performance of our model and the baseline models with different numbers of recommended users, varying from 1 to 5. From the figure, we can see that (1) Our proposed model achieves consistently better performance than the other methods with different number of recommendations. (2) With the number of recommended users increasing, Precision decreases and Recall increases gradually, indicating while the number of mentions in a message are neither too much nor too little, the recommendation achieves the best performance. It is a reasonable explanation that each message to be mentioned with a small number of users (e.g., one to three persons) would be viewed as a intimate chat with close friends, and a spam otherwise. (3) We also observe that the best performance of mention recommendation is obtained in term of F-Score when recommending the top one user.

E. PARAMETER SENSITIVITY
We analysis the influence of several critical parameters of the proposed model in the following perspectives.

1) EFFECT OF λ
The parameter λ is a tunable weight to balance the strength of the network structure and message content. Here, we report the experimental results of our proposed model when λ varies from 0.0 to 1.0, as shown in Figure 4. From the figure, we observe that our model obtains the best performance when λ = 0.2. Furthermore, when the weight parameter λ is close to 0, the network topology cannot affect the final result. When λ is sufficiently large, the performance decrease with a larger number of sparsity network structure.

2) EFFECT OF EMBEDDING DIMENSION k
The embedding dimension indicates the power of vector representation for vertices. Figure 5 plots the performance of our proposed model with various values of k. From the figure, we can observe that the plot first gradually rises and then drops gradually along with the increase of k gradually. In particular, it is clear that the best performance is achieved when k is around 30. Therefore, we choose k = 30 as the latent embedding dimension.

3) EFFECT OF WINDOW SIZE d
The window size decide the prediction power of contextual word with vertices. Figure 6 plots the performance of our proposed model with various values of d. From the figure, we can observe that the plot gradually rises with the increase of window size d gradually, and F-Score becomes stable when d = 2. Considering the calculation effect and time efficiency, we choose d = 2 as the window size.

4) EFFECT OF LENGTH OF RANDOM WALK l
We analyze the contribution of different length of random walk on mention recommendation. Specifically, we respectively set the length of random walk to {5,10,20,50} into our proposed model and evaluate the increase of the recommendation performance. Table 4 shows the accuracy of different lengths on our proposed model. From the table, we can observe clear increase on the performance when adding the length of random walk. In particular, l = 20 indicates that our proposed model performs the best performance. As increasing the length of random walk, the performance of our proposed model will further boost. It is observed that it is better to set 20 length with an acceptable time cost.

VI. CONCLUSION
In this paper, we propose a novel network embedding mention recommendation model to recommend the right users in a message by incorporating homogeneous relationships, and heterogeneous relationships as well as correlation relationships. Based on network embedding, we use these mention relationship models to construct the objective function under the framework of random walk. To demonstrate the effective of the proposed model, we construct extensive experiments. The experimental results reveal that our proposed method can outperform the state-of-the-art baseline methods. As future work, we will consider how to apply graph neural network methods to better incorporate the embeddings of multiple social contexts. BO JIANG received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences, China, where he is currently an Associate Researcher with Institute of Information Engineering, Chinese Academy of Sciences, China. His main research interests include data mining, knowledge graph, social network mining, and recommendation system. He has served as a Reviewer for the IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS and Elsevier.
JIANJUN WU received the Ph.D. degree in computer science from the University of Chinese Academy of Sciences, China. He is currently an Associate Professor with the Computer Science College, Beijing College of Politics and Law. His main research interests include social network mining, data mining, and machine learning.