TPS: A Topological Potential Scheme to Predict Influential Network Nodes for Intelligent Communication in Social Networks

The growing popularity of Online Social Networks (OSN) have prompted an increasing number of companies to promote their brands and products through social media. This paper presents a topological potential scheme for predicting influential nodes from large scale OSNs to support more intelligent brand communication. We first construct a weighted network model for the users and their relationships extracted from the brand-related content in OSNs. We quantitatively measure the individual value of the nodes from both the network structure and brand engagement aspects. Moreover, we have addressed the problem of influence decay along with information propagation in social networks and use the topological potential theory to evaluate the importance of the nodes by their individual values as well as the individual values of their surrounding nodes. The experimental results have shown that the proposed method is able to predict influential nodes in large-scale OSNs. We investigate the top-k influential nodes identified by our method in detail, which are quite different from those identified by using pure network structure or individual value. We can obtain an identification result with a higher ratio of verified users and user coverage by using our method compared to existing typical approaches.


I. INTRODUCTION
O NLINE Social Networks (OSN) have become increasingly popular in recent years. With the emergence of the mobile Internet, users are able to enjoy OSNs such as Facebook, Twitter, and Weibo at all times and in all places. Extensive online User-Generated Content (UGC) has been produced on social media, and has become an important current aspect of Electronic Word of Mouth (eWOM). Social media has become an important channel through which companies can release information to and maintain contact with their customers. Therefore, eWOM via social media has become a key driver of brand marketing towards consumers, prompting an increasing number of companies to promote their brands and products through OSNs. From the marketing perspective, the importance of the nodes in a large-scale OSN is not equal. There exist some active users in the network, who have a certain influence and are also very concerned about some brands. Obviously, these influential nodes can help companies to perform brand communication through social media by affecting other nodes. Therefore, if the influential nodes can be identified within large-scale OSNs, then companies can rely on them for brand communication. Those influential nodes will act as 'bridges' between companies and other consumers.
Although there have been a number of previous studies about identifying or predicting influential nodes in OSNs [1], [2], few have addressed the potential significance to brand communication or how to identify influential nodes that are more suitable for promoting brands through social media. Moreover, massive brand-related data have become a kind of big data [3] in OSNs. Therefore, predicting influential nodes within a large-scale OSN for brand communication is still a problem worthy of further study. In this paper, we propose a topological potential scheme for predicting influential nodes in OSNs by considering both the network structure and brand engagement factors. The preliminary results of this study can help companies analyze and discover the characteristics and rules of OSNs to provide decision support for data-driven or intelligent brand communication in social media. In particular, we have considered the problem of influence decay in OSN and apply a topological potential model to identify influential nodes more suitable for brand communication. The major contributions of this study are summarized as follows: 1) We propose to measure the importance of nodes in OSN by considering both the network structural and contentrelated metrics and quantitatively represent it as individual value. 2) An intelligent topological potential scheme (TPS) is proposed to determine the node influence and predict influential nodes in OSNs for brand communication. 3) We collected a real-world dataset from SMZDM.com including more than 40000 users and 60000 social relations. Comprehensive experiments are conducted to validate the effectiveness of our method. The rest of the paper is organized as follows. In Section II, the motivations are introduced, and the related works are reviewed. Section III describes the process of predicting Details about the performance evaluation are presented in Section IV. Finally, some conclusion and future work are presented in Section V.

II. RELATED WORKS
Currently, many efforts have been made to identify or predict influential nodes in OSNs. In this section, we briefly review the existing works in several categories.

A. Structural Methods
Social network analysis mostly relies on topological metrics [4] such as centrality and community concepts, and many of the terms used to measure these metrics are a reflection of their sociological origin [5]. For example, Freeman [6], [7] illustrates that the centrality of a node indicates the connection ability of the node in the social network structure and can be used as a criterion for measuring the importance of the node. Corley & Sha [8] address the problem of n-most vital nodes problem and propose the algorithm to solve the problem of node importance evaluation. Currently, many efforts have been made to discover the most influential nodes for maximizing influence in social networks [9]- [11]. These studies of influence maximization aim to discover nodes that can activate as many nodes as possible, which indicates that the influence of nodes can be propagated as extensively as possible.
For example, Zareie et al. [12] introduce two influential node ranking algorithms that use the diversity of the neighbors of each node [13] to obtain its ranking value. Kumar & Panda [14] propose a coreness-based method to find influential nodes by voting. They also compare the performance of their method with some existing popular methods. Salavati & Abdollahpouri [15] take into account the interactions between users and network topology in weighted and directed graphs and consider target users' profit and similarity in identifying influential nodes. Zhang et al. [16] introduce a trust-based influential node discovery method for identifying influential nodes in social networks. However, their idea about trust between nodes is still based on the topological information of the network. Salavaty et al. [17] develop a formula that integrates the most significant network centrality measures in order to synergize their effects and simultaneously remove their biases to identify the most influential nodes in a complex network. However, their method is mainly used in biological systems. Zhou et al. [18] intended to solve the problem of finding the influential nodes which are able to initiate large-scale spreading processes in a limited amount of time. Amnieh and Kaedi [19] try to use two personality characteristics, openness and extroversion, to estimate for network members and find influential nodes. However, their personality characteristics are still computed based on the network structure.
There are also a few of methods that take into account the influence of community (or group) structure [20], [21] in the network. Jain & Katarya [22] identify the community structure within the social network and the opinion leader by using a modified firefly algorithm in each community. Srinivas & Rajendran [23] propose an integer linear programming model to detect community structure in real-life networks and also identify the most influential node within each community. Zhao et al. [24] propose an algorithm for identifying influential nodes in social networks with community structure based on label propagation. The proposed algorithm can find the core nodes of different communities in the network through the label propagation process. Generally, these methods identify global influential users regardless of domain-specific information.

B. Hybrid Methods
The spreading influence of a node on a network depends on a number of factors, including its location on the network, the content of exchanged messages [25], and the character and amount of activity of the node [12]. Therefore, pure network structural methods are quite insufficient for identifying influential nodes in OSNs. In contrast, hybrid methods combining network structure and content seem to be more suitable for this problem. For example, Aleahmad et al. [1] try to detect the main topics of discussion in a given domain, calculate a score for each user, and then calculate a probability of being an opinion leader by using the scores. Liu et al. [26] take into account the dimensions of trust, domain, and time, and propose a product review domain-aware approach to identify effective influencers in OSNs. Advertising cost has also been taken into account, in addition to nodes influentiality, to determine influential users [27], [28]. Zareie et al. [2] measure the interest of users in marketing messages and then propose an algorithm to obtain the set of the most influential users in social networks. Weng et al. [29] propose an extension of PageRank algorithm called TwitterRank, to measure the influence of users in Twitter. They measure the influence taking both the topical similarity between users and the link structure into account.
Moreover, many researchers have tried to use the ranking model like PageRank to identify opinion leader detection and especially in combination with topic models, e.g., Influence Rank [30], OpinionRank [31], Dynamic OpinionRank [32], TopicSimilarRank [33] and others. SuperedgeRank [34] is a mixed framework to find the influential users based on supernetwork theory, that is composed of network topology analysis and text mining. Li et al. [35] develop a ranking framework to automatically identify topic-specific opinion leaders. The score for opinion leadership is computed from four measures include expertise, novelty, influence, and activity. Topic-based methods can also be used to mine influential users in OSNs. For example, Hamzehei et al. [36] propose a topic-based influence measurement approach to integrate the user-topic relationships, topic content information, and social connections between users. Fang et al. [37] address the more important topic-level influence and develop a topic-sensitive influencer mining framework in interest-based OSNs.
Although these hybrid methods may gain better performance by combining network structural features and contentrelated features, most of them haven't addressed the problem of influence decay along with information propagation [38].
In other words, we should consider the influence of users from the perspective of dynamics in information propagation, rather than single and static user.

C. Brand Marketing in Social Media
In addition to the methods mentioned above, many efforts have been made to study how social media can be used to support brand communication or how brands can be promoted in social media in the field of marketing. For example, Hajikhani et al. [39] try to investigate the overall polarity of public sentiment regarding specific companies' products by analyzing content from Twitter. Kabadayi & Price [40] study the factors affecting consumers' liking and commenting behaviors on Facebook brand pages. Schivinski & Dabrowski [41] investigate 504 Facebook users in order to observe the impact of firm-created and user-generated social media communication on brand equity, brand attitude and purchase intention by using a standardized online survey. Jim enez-Castillo & S anchez-Fern andez [42] study how effective digital influencers are in recommending brands via electronic word-ofmouth by examining whether the potential influence they have on their followers may affect brand engagement. Gao & Feng [43] examine the differences in Chinese users' gratifications of different social media and the impact of brand content strategies on the quality of brand-consumer communication via social media. Godey et al. [44] study how social media marketing activities influence brand equity creation and consumers' behavior towards a brand. Veirman et al. [45] explore the marketing through Instagram influencers and assess the impact of number of followers and product divergence on brand attitude by two experiments with fictitious influencer accounts on Instagram. Although many studies have been done about brand communication in social media, fewer existing studies have addressed how to identify and make use of influential nodes for brand marketing on social media.

III. OUR PROPOSED TPS
In this section, we mainly present an intelligent method for predicting influential nodes in OSN for intelligent brand communication.

A. Weighted Network Model
An OSN can be formally represented as a graph G ¼ ðV; E; W Þ, where V denotes the set of people or users that belong to the network and E represents the set of relations between the users. There is an edge between two nodes if they have a social relation. Given two nodes u i and u j , if u j follows u i , then there is an edge directed from u i to u j . Moreover, if the post of a user is commented on by another user, we consider this interaction as another kind of social relation between two users. For example, if u j comments on a post generated by u i , then there is an edge directed from u i to u j . If u j follows u i or comments on u i 's post, it means that u i is able to affect u j or that information can spread from u i to u j . W indicates a set of weights for the directed edges in E. The value of the weights in W denotes the number of relations and interactions between the users.
For a specific brand (e.g., a cell phone or cosmetics), we can extract all posts related to it from an OSN and construct a corresponding weighted network model before we start to identify the influential nodes. Then, the task of mining influential nodes can be constrained in a limited space or community.
The detailed process of network model construction is illustrated as follows: 1) First, we crawl the posts about the brand within a period of time (e.g., one month) from an OSN and the set of posts are denoted as P . 2) Then we extract the authors of the posts in P , and get a set of users, so called U.
3) The relations between the users in U are further extracted and added to a set R. Each r in R can be denoted as < u i ; u j > , where u i and u j are the two users have a social relation r in the OSN. To each r in R, we create a corresponding weight w, set w ¼ 1 and add w to a weight set W . 4) To each user u in U, we get the users who follow u in the OSN as a set U uf . We also get all his/her posts in P , marked as P u and we have P u & P . To each post p in P u , we get all the users who have commented on p as a set U uc . Then we have an extended user set it means that this is a newly-found social relation. In this case, we add < u; u i > to R, create a new weight w i ¼ 1 for < u; u i > and add w i to W . Moreover, we add u i to a temporary set U 0 . 6) We update the user set U by performing U [ U 0 . 7) Finally, we get the weighted network model G ¼ ðU; R; W Þ for a specific brand in the OSN. Here U and R can also be represented by V and E.

B. Network Structure Characteristics
In this article, we take into account two typical and frequently-used structural metrics to support our method, namely, outdegree and betweenness centrality. These two metrics can be used to measure the scope of nodes' influence and their ability to control the community in the network.
Given a network G ¼ ðV; E; W Þ , the outdegree of a node can be formally denoted by the following equation: where u i and u j represent two nodes in the network, rðu i ; u j Þ 2 E represents a directed edge from u i to u j , w i;j 2 W represents the weight of the edge, and N & V represents the adjacent node set of u i . The outdegree of a node is mainly related to the behaviors of following and commenting. Users can follow others whom they are interested in. To an active user u i , the more other users who follow u i , the more attractive u i is and thus the greater ability he/she has to influence others. Users can also comment on the posts about which they are concerned. Given a post p j generated by user u i , the more comments that p j gets, the wider the scope of influence of p j is. The more times that u i 's posts are commented on, the greater influence the information generated by u i has.
Given three nodes u i u j u k , then the control ability of u i over the communication between u j and u k is computed by the following equation: where g jk represents the total number of shortest paths between u j and u k , and g jk ðu i Þ represents the total number of shortest paths between u j and u k passing through u i . Note that we only consider the case that there exists at least one path between the two nodes u j and u k . We can calculate the sum of the control capability of u i with respect to all node pairs in the network and finally obtain the betweenness centrality of u i as follows: The betweenness centrality of a node considers the degree that counts the occurrence of a node on the straight (or shortest) path between other nodes. That is, if a node is the only way for other nodes in the network to connect with others, it has a more important position in the network. Given an active user u i , the larger the betweenness centrality of u i is, the more important location he/she has in the network.
As we have mentioned before, the weight of an edge represents the closeness of the relationship between the two nodes. To simplify the calculation of the distance between nodes, we first determine the maximum edge weight w max in the original network and then use the following equation to update the original weight for each edge: In this way, we obtain an updated weight set W 0 for the network. For any node pair u i and u j in the network, we use an improved Floyd algorithm to calculate all the shortest paths and the corresponding shortest distances between the two nodes. Then, we can calculate the betweenness centrality for each node.
To avoid the impact of excessive difference between the two metrics, we perform a maximum-minimum normalization on the two metrics as follows such that both metrics are mapped to the interval [0, 1]: Therefore, we can get the overall network structure score for a node u i by the following equation: where od norm ðu i Þ refers to the normalized value of outdegree and bc norm ðu i Þ refers to the normalized value of betweenness centrality. A larger score network ðu i Þ value implies that node u i has a more important location in the network from the structural perspective.

C. Brand Engagement-Based Value
In the context of brand communication, only considering the network structural metrics is insufficient to discover the real influential nodes. We should also take into account the content-related metrics to measure the individual value of the nodes in OSN. To identify influential nodes that are suitable for the communication and marketing of a specific brand, we should check whether a user is concerned about the brand. Therefore, we try to measure the value of nodes from the perspective of brand loyalty [46] or brand engagement [47] in addition to network structure. We try to quantitatively measure the brand engagement-based value of a node. As brand engagement is directly related to users' behaviors [48] in OSNs, we mainly consider the following four behaviors: 1) Publishing: A user writes or shares posts.
2) Commenting: A user comments on the posts by others.
3) Liking: A user presses the 'like' button below a post. 4) Adding to favorites: A user adds a post to his/her favorites. It is not difficult to quantify the above behaviors. Given a brand b j and a user u i , we can obtain the number of posts related to b j that u i has actively published on his/her personal page. As a potential influential node, he/she shall publish and share information related to a certain brand (product, event, etc.) frequently. Moreover, we can also obtain the percentage of positive posts related to b j published by u i , which are positively commented on, liked and added to favorites by other users. If many users positively respond to the posts, it reflects that u i is able to evoke the emotional resonance of other users or obtain their support for b j . We illustrate how to measure brand engagement quantitatively by the following steps: 1) Mark the polarity of posts: If the post content is negative about the brand, we mark the post as negative or with '-'. Similarly, if the post content is nonnegative about the brand, we mark the post as nonnegative or with 'þ'. 2) Calculate the support rate of posts: A semantic analysis approach based on sentiment dictionary is used to evaluate the opinions of other users on specific posts. We evaluate the sentiment polarity of each comment on a post and classify the sentiment polarity into negative and nonnegative. Then we calculate the support rate of posts (p support ) by the following equation: where, N pos com is the number of nonnegative comments, N neg com represents the number of negative comments, N favorite represents the number of adding to favorites, and N like represents the number of likes. 1) Obtain the brand engagement-based value for a user: Then, we can obtain the overall brand engagement score for node u i by using the following equation: where, i represents the i-th brand-related post published by u i , post i polar represents the polarity of the i-th post, and p i support represents the support rate of the i-th post.

D. Measuring a Node's Individual Value
After evaluating each node's characteristics, we can obtain the individual value of each node by the weight sum of the scores of each factor. We can use entropy theory to determine the weight for the two scores of a node, the so-called entropy weight, and then make a comprehensive and objective evaluation of the individual value of the node.
Given n nodes in a network with two scores each, we can construct an n Ã 2 matrix R. Each row in R represents a node, each column represents a score, and item r ij in R represents the j-th influence value of the i-th node. Let f ij ¼ r ij P n i¼1 r ij and m ¼ 1 ln n , with f ij ¼ 0 and f ij lnðf ij Þ ¼ 0. The entropy value of the j-th influence value is defined as follows: Then, the entropy weight of the j-th influence value is defined as follows: We can further measure the individual value of the node by the following equations: ) value indv ¼ score network Ã w 1 þ score brand Ã w 2 : (12Þ As we can calculate the individual value for each node in the network, the individual value of the users can be represented as the weights of the nodes. Therefore, we can obtain a dual-weighted network model for brand communication. The corresponding formal representation for the dual-weighted model is as follows: where W 0 represents the updated weight set for E according to 4, and A ¼ fa 1 ; a 2 ; Á Á Á ; a n g represents the set of individual values for the nodes in V . The ultimate purpose of mining influential nodes in big data is to support more intelligent brand marketing. Thus, influential nodes should have a stronger ability to disseminate marketing information for a brand. Although we have proposed to use individual value to measure the importance of each node in OSN, we still cannot guarantee that a node with high individual value always disseminate information efficiently. For example, u is a node with high individual value, but the individual values of the nodes around u are very low. In this case, the marketing information originated from u may not spread well in the network, as the information dissemination capacity of its surrounding nodes is not strong enough. In other words, although the individual value of u is high, we still cannot consider it as an influential node due to the low individual values of its surrounding nodes. Therefore, when we determine whether a node is an influential node, we should also consider not only the individual value of the node but also the individual values of its surrounding nodes. Nodes with high individual values can obviously affect their surrounding nodes, but this effect will decay as the distance increases [49]. Therefore, we need more replay nodes with high individual values to support more efficient information spreading or dissemination in the network [50]- [52].
To address this issue, we try to further make use of topological potential theory to determine influential nodes in our method. According to the topological potential theory, a node will be affected by other nodes in the network. We improve the typical topological potential equation and calculate the topological potential value as follows: where d ij denotes the shortest distance between nodes u i and u j , influence factor s is a parameter used to depict the influence range of each node; v i refers to the individual value of node u i , v j refers to the individual value of node u j , and Fðu i Þ is the topological potential value of u i . The potential entropy can be calculated as follows: where Z ¼ P n i¼1 'ðv i Þ is a normalization factor. If we put 13 into 14, the potential entropy H is a function for s, as illustrated in Fig. 1. According to the entropy theory, when the potential entropy is maximum, the uncertainty is also maximum and the network distribution tends to be uniform. In that case, we have 'ðv i Þ Z ¼ 1 n . Therefore, we will take s when the potential entropy is minimum in our method (see Fig. 1.). According to the definition of potential entropy, we have: 1) When s ! 0 þ , 'ðu i ! u j Þ ! 0, there will be no in teraction between nodes u i and u j , and we have ' ðiÞ ¼ ðm i Þ 2 ¼ M 2 . Thus, the potential entropy will approach the maximum value log ðnÞ; 1) When s ! þ1, 'ðj ! iÞ ! m j , then no matter what the distance between two nodes is, their interaction force will be the same, and we have ' ðiÞ ¼ nM 2 . If we normalize Z, the potential entropy will still approach the maximum value log ðnÞ. Therefore, the potential entropy is a function of s. The range of s is ð0; þ1Þ and the range of potential entropy is ð0; log ðnÞÞ. The value of potential entropy will first decrease monotonically with the increase of s. However, the value of potential entropy will increase monotonically with the increase of s, when the minimum value is reached. The potential entropy reaches the maximum value at both ends of s's curve.
Thus, we can further identify influential nodes from OSNs according to their topological potential values. The details for predicting influential nodes is illustrated in Algorithm I. With the algorithm, we can finally get the top n% items as the recommended influential nodes for brand communication.

IV. PERFORMANCE EVALUATION AND INDUSTRIAL APPLICATIONS
In this section, we present various experiments to evaluate the performance of the proposed TPS method on a real-world dataset from SMZDM.com.

A. Experimental Setup
To evaluate the proposed TPS, we collected a real dataset from SMZDM.com to carry out the experiments. SMZDM. com is an online shopping guide website in China that also integrates product review services such as Yelp and social network services similar to Facebook and Twitter. We have implemented a crawling program based on Python to crawl brand-related content from the website automatically. The data we extracted are all related to Xiaomi, which is a wellknown and typical mobile phone brand in China. We extracted the posts about Xiaomi within a period of time (until August 25, 2019). Thus, we obtained a brand communication dataset for Xiaomi from SMZDM.com to evaluate the performance of our method.
We also processed the original dataset by following the steps illustrated in Section III-A. An open-source Chinese language segmentation tool was used to deal with the posts from the OSN. The number of nodes in the extracted dataset is approximately 40181, and the number of edges is about 60000. Among them, the number of edges with weights greater than or equal to 2 is approximately 37812, accounting for 63% of the total number of edges in the dataset; in addition, the network density is 3:72 Â 10 À5 . The noise-reduced dataset has 15895 nodes and 37812 edges in total.

B. Network Characteristics Analysis
We first try to divide the dataset into several subcommunities and verify the scale-free and small-world properties of these subcommunities. We used the Gephi software to generate an interaction network diagram for the brand communication dataset, as shown in Fig. 2. There exist many subcommunities in this network. We use the modular function of Gephi to divide the subcommunities. By setting the three parameters of Randomize, Use edge weights and Resolution in the software, we find that the modularity of the network and the modularity with resolution are both 0.757, and the number of subcommunities is 1155 (see Fig. 3). As shown in Fig. 3, the number of nodes in most communities is too small; therefore, we only analyze the eight largest subcommunities. As illustrated in Table I, the sum of the internal degree of each community is much larger than the sum of the external degree.
We further analyze the small-world property of the network for brand communication from an empirical perspective. Table II shows the statistical results of the network statistical properties of the eight largest subcommunities; the maximum value of the average path length in the eight subcommunities is 2.5, which means that one node can reach any other nodes only by 2.5 hops in a subcommunity. We also obtain the clustering coefficients C 2 ð0:008; 0:038Þ for the eight subcommunities. In contrast, the clustering coefficients C rand of the random networks at the same scale are relatively small. Therefore, it can be concluded that the 8 subcommunities demonstrate the characteristics of a small world, and the information in a subcommunity can be quickly spread to each part of the subcommunity.
The scale-free characteristics of the network are also analyzed through experiments. Fig. 4 and Fig. 5 show the Complementary Cumulative Distribution Function (CCDF) graphs of node indegree and node outdegree, respectively, for the eight subcommunities. By performing a least squares fit on the node set, we can get the expression for the fitted curve as follows: According to 15, we have the power-law exponent a > 0 of the indegree and outdegree distribution for the eight subcommunities (see Table III), which indicates that there are fewer nodes with a larger indegree and more nodes with a smaller indegree, which is consistent with the scale-free feature for social networks. In other words, only a few members have deep participation in the network for brand communication, and they are only the promoters of the development for the brand community. The statistical results show the correlation coefficient g < 0 of the eight subcommunities; that is, the nodes with higher degrees are mostly connected with the nodes with lower degrees. In other words, in the process of information spreading, the information tends to flow from influential nodes to common nodes in the network.

C. Influential Node Identification
By using the proposed method, we selected the top 20 nodes from the candidate set as the influential nodes, as shown in Table IV. We further divide the top 20 nodes into two groups. The first group of nodes has high individual values. According to 13, the nodes with high individual values are more likely to be identified as influential nodes. For example, it can be seen from Table IV that nodes 9339612697 and 6390492327 have the highest topological potential values among the 20 nodes. Their brand engagement scores are also larger than those of other nodes. This means that they have published many posts related to the Xiaomi brand, which are supported by many other users in the network. The second group of nodes does not have high individual values, and some of them even have a low individual value. After investigating these nodes further, we find that they have published few posts about the brand but have often commented on brand-related content. For example, the brand engagement score of 6195251507 is 0, which means that the user has not published any brand-related content or that the content has not received any positive comments. These kinds of nodes are usually ignored by the existing methods and thus will not be identified as influential nodes. Although these nodes rarely publish brand-related content directly, they are very concerned about the brand, and their comments can also be an important part of brand marketing in OSNs.
We have also identified the top 20 nodes by using two different metrics separately rather than their topological potential values (see Table V). The influential nodes identified by using network structure scores and individual values are quite similar, as there are 14 nodes in common for the first and second columns of Table V. Moreover, we can see that the influential nodes identified by using their topological potential values are quite different from those identified by purely using network structure scores or individual values. The first and third columns have 6 nodes in common, while the second and third columns have 8 nodes in common. It makes sense that a node with a high individual value is more likely to be identified as an influential node. However, It is insufficient to consider the individual value of a single node. The proposed method also considers the individual value of surrounding nodes by using the topological potential model and thus can obtain a more accurate result, compared with using pure individual values.

D. Performance Evaluation
We also compare the performance of the proposed TPS with three existing methods for measuring node importance, namely, Weighted PageRank [53], Weighted HITS [54] and IMUD [2]. The top 20 influential nodes identified by the four different    Table VI. There are also 10 nodes in common for the result sets by Weighted PageRak, Weighted HITS and IMUD. In both result sets by both Weighted PageRank and IMUD, there are only 6 influential nodes in common with the that of TPS. The result set by Weighted HITS has 8 influential nodes in common with that of TPS, which is a little bit larger than that of Weighted PageRank and IMUD.
We have also checked the top 20 influential nodes by the other three methods and find that few of them had published or shared enough content about mobile phone or the Xiaomi brand. For example, the first influential user identified by Weighted Pag-eRank, Weighted HITS and IMUD is the same and it is the user 4077360552, while the first one identified by TPS is 9339612697. Although both users have published many posts about mobile phones, the p support value of 9339612697 is much larger than that of 4077360552. The posts of 9339612697 receive more positive comments than those of 4077360552. Moreover, 9339612697 has also written more comments on others' posts than 4077360552. Therefore, 9339612697 is more suitable for promoting mobile phone brand like Xiaomi on social media. Moreover, 9339612697 is also the third influential user identified by Weighted PageRank, and the second one by Weighted HITS and IMUD. It also depicts that the influential users identified by TPS are reliable. Both Weighted PageRank and Weighted HITS only address the relationship between nodes, but they do not take into account the content features of users' posts. Therefore, most influential nodes identified by simply using either Weighted PageRank or Weighted HITS are not very valuable for brand communication. Although IMUD has taken into account the content-related features like the topics and the related messages exchanged by users, they have neglected the factors like sentiment and the impact of surrounding nodes.
According to our investigation, there are no widely accepted metrics used to evaluate the performance of influential node mining. In this article, we use the ratio of   verified users and the ratio of user coverage to evaluate the performance. The ratio of verified users refers to the proportion of verified users among the collection of influential users. The ratio of user coverage refers to the proportion of the users that can be covered or affected by the top n% influential nodes among the complete set of users. As seen from Table VII, the ratio of verified users of TPS is much higher than that of the other three methods. Using the proposed TPS, 1461 out of 2000 influential users are verified. The comparison of the user coverage ratio is illustrated in Fig. 6. The curves of the three methods begin to flatten when n ! 1. Therefore, if the top 1% of the influential nodes identified by the four methods are considered separately, the proposed method can directly cover more than 60% of users in the network. However, we can see that TPS can cover more users than the other three methods when n ! 1. Additionally, it can be seen that the proposed method can cover almost 100% of users in the sample set when n ! 40, while IMUD, Weighted PageRank and Weighted HITS can only cover 89.1%, 86.4%, 86.6% with the same n. Therefore, we can see that TPS performs better than the other three methods from the perspectives of both the ratio of verified users and the ratio of user coverage.

E. Industrial Applications
With the popularity of OSNs in our daily life, mining and discovering key opinion leaders or influential nodes from large-scale social networks has become a research hotspot. Currently, increasing number of companies tend to promote their brands or products (especially some newly released ones) through social media instead of traditional media. The method proposed in this article can be applied to support intelligent brand communication or marketing in real-life industrial applications.
Traditionally, when carrying out social media marketing (or viral marketing), companies' marketing information will be pushed to a group of consumers in OSNs. This group of  IV  TOP 20 INFLUENTIAL NODES FOR BRAND COMMUNICATION   TABLE V  TOP 20 INFLUENTIAL NODES BY USING DIFFERENT METRICS consumers is usually selected by the platform. Although some of them may forward the material to others when they receive marketing information, the effect of information spreading is limited due to their influence on OSNs. Therefore, companies want to choose a set of customers to market to that will maximize their Internet profits (profits from sales minus the costs of marketing).
The metrics and algorithm proposed in this article can be used to mine and identify influential nodes or users in OSNs (see Fig. 7). After mining a collection of influential users from OSNs, these users are considered seed users. What companies need to do next is to establish trust relationships with these influential users and engage them to promote their brands or products spontaneously on social media. In social media or online communities, eWOM generation can be achieved by influential users after they have positive consumption experiences. As an influential user can always affect a number of common consumers, marketing information can spread quickly through social networks. In this way, companies can promote their brands or products with less cost but better effects on social media. Moreover, companies can even perform personalized recommendations on OSNs through influential users.
For consumer-oriented industries [55], companies increasingly rely on social media to promote their brands and products. The technology presented in this article is able to mine influential users from large-scale OSNs, and companies can then improve their marketing strategies with the help of those influential users.

V. CONCLUSION AND FUTURE WORK
In this article, we mainly address the problem of predicting influential nodes from OSNs for brand communication. We quantitatively measure the individual value of nodes by considering both the network structure and content-related factors. Moreover, an improved topological potential scheme is  proposed for predicting influential nodes in OSNs. In the process of mining influential nodes from OSNs, network structure, brand engagement, and topological potential are combined together in our method to overcome the limitations of the existing methods. The computational results suggest that the proposed method is able to predict influential nodes for brand communication in OSNs. We can find out the nodes that have published few posts about the brand but commented on brand-related content a lot, which are usually ignored by the existing methods. Moreover, we can obtain identification results that have a higher ratio of verified users and user coverage by using the proposed method compared to three existing methods.
We also consider some possible future directions of this study. For example, we only used the followship and comment relationship between users to model the weighted network. In fact, there exist more deep or potential relationships among users, which can be discovered by using more complex mining algorithms. Therefore, in the future, we can obtain a more complex network model for predicting influential nodes. Additionally, we only investigated the characteristics of users in OSNs statically, but we have not considered the impact of time changes. If we take into account the time factor and study the time-dependent trend of user behaviors in OSNs, we can obtain more characteristic information about influential nodes.