A Random Walk Approach for Avoiding Unwanted Users in Competitive Social Network

How to detect influential customers to broadcast advertisements to the maximum range is a key issue in effective viral marketing. If multiple companies sell the same product in viral marketing campaigns, there was competition between them. There may exist some unwanted users who hold hostile opinions, and these users may have a negative effect upon receiving promotional information. The company does not want the advertising information to reach such unwanted users over a period of time. In such competitive advertising, how to propagate the positive influence of its product and avoid it reaching the unwanted users of competitors in the limited time is a critical problem for business product promotion. Motivated by the phenomenon, we study the influence maximization problem with limited unwanted users (IML) in the independent cascade (IC) model. To accelerate the process of the influence propagation simulation, we present a path sampling approach based on the random walk to simulate the process of influence propagation. To avoid the unwanted users, only the influences on the paths reaching the wanted users are calculated at each time step, and the influences reaching the unwanted users are ignored.To reduce the computation time, the paths of the random walk will be recorded to avoid repeated random walks in the subsequent seed selection. To find the optimal influential customers, we employ a greedy scheme to select the top- $k$ most influential nodes as seed nodes. Experimental results over the real-world datasets show that the algorithm we presented can get wider effective influence spreading than other algorithms.


I. INTRODUCTION
Recently, with the booming development of social networks,network analyses [1] have become a hotspot of research in many areas,such as bioinformatics, political elections, viral marketing, economics, public opinion monitor, network security, epidemic spreading control and so on. For example, a merchant of Amazon may promote a new product through a trade network by providing discounts to some influential customers in the network to maximize the influence of the product. The goal of the merchant is to use those influential customers to influence as many of their direct and indirect friends as possible. After this cascade propagation process, The associate editor coordinating the review of this manuscript and approving it for publication was Justin Zhang . a large number of customers begin to adopt and buy this product. This marketing method is called virus marketing. For another example, to prevent the spreading of epidemic disease such like COVID-19, we must analyze its spreading model in the social network, such as SIS model or SIR model. In the SIR model, the patient has lifelong immunity after being cured. In the SIS model, a person can get sick again after recovered. By network analysis, we can find the key paths of the virus propagating. By blocking such key paths, we can minimize the scope of virus spreading.
In viral marketing, a business usually promotes its products by advertising through social or trade networks. In such marketing strategy, merchants often select some influential customers so as to use their influence to promote the products in the social network. By the ''word of mouth'' effect, sales can be increased via advertisement propagating between the customers through the social and trade networks. People usually make decisions to buy the product when they are frequently influenced by positive information concerning a product from his friends or relatives in the network. In particular, if a customer buys a product, she may explicitly or implicitly influence her friends to purchase the same product, and her friends will influence her own friends, and so forth. This influence spreading process forms a cascade effect throughout the whole social network. Therefore, a small number of initial influential customers can trigger large-scale information diffusion in the network. Based on such phenomenon, a salesman in Amazon can sell his new commodity via a distribution network by offering discounts to several active customers on the hope that they can persuade more customers to buy the commodity. We call such influential customers the seeds of influence. Once the overall discount is determined by the business based on its budget, they should select some influential customers so as to maximize the number of final buyers.
The problem of how to detect such influential customers to effectively broadcast advertisements to the maximum range is the key issue in marketing strategy designing. Recently, it has been extensively studied by a lot of researchers, and the technique of influence maximization (IM) has been widely applied to detect the influential customers in viral marketing [2]- [7].
A most common used method for influence maximization is based on the centrality measures of the nodes in the network. Assuming that the nodes with larger centrality are probably the influential ones, the approach identify the seed nodes according to the centralities of the nodes. There are several classical centrality measurements [8] such as closeness centrality,degree centrality, betweenness centrality, sub-graph centrality, eigenvector centrality, local average connectivity, information centrality and neighborhood centrality.
Domingos and Richardson [9] treated the influence spreading as a Markov random field, they proposed a heuristics method for maximizing the influence. Kempe et al. [10] observed that the IM problem was NP-hard and also proposed a framework of the greedy strategy that yielded (1-1/e)approximate solutions.
In viral marketing, it usually happens that one product may be promoted by different companies at the same time. In such competitive sale, there will be opposite views on the same commodity. Consumers purchase the same product sold by different companies and they may hold different opinions about the same product. Such competing influence spreading is mainly from the rival companies participating in marketing campaigns to attract customer's attention. For instances, in the mobile phone market, there are two competing companies, Apple and Samsung. Now Apple wants to promote iPhone XS on some distribution networks and attempts to conceal its marketing strategy against the members in Samsung for a given period. But advertising information will finally be received by the members in Samsung. Thus, Apple must spread the advertising content to as many of its customers as possible in a short period of time.The customers whom we do not wish to influence are referred to unwanted users. Usually, unwanted users have opposite viewpoints on the product, and they may make negative consequences when getting the advertising information. To sell a new commodity to a large portion of the customers, the company might initially activate several influential customers by offering them discounts so that it can maximize the number of the final buyers and avoid the influence being propagated to the unwanted users from its opponents. Therefore, it is necessary to design an efficient method for this problem. Unfortunately, most existing researches on influence maximization only consider social networks with positive influence without the unwanted users.
According to the reality of trade networks and virus marketing practices, we focus on two commonly used models for influence propagation: Independent Cascade (IC) model and Linear Threshold (LT) model.
In this work, we study the problem of detecting influential customers in competitive viral marketing with unwanted users in the IC model. We define this issue as influence maximization in networks with limited unwanted users (IML). In IML, the overall influence spreading by a given seed set is the number of the buyers influenced by the seeds on the premise of restricting the influence reaching the unwanted users. An algorithm IMUU_RW (Influence Maximization with Unwanted Users based on Random Walk) is proposed in this paper for solving the problem. To avert the timeconsuming task of emulating the influence spreading in greedy methods, we present a random walk-based algorithm to simulate the influence propagation. To reduce the computation time, the paths of the random walk are recorded and reused. Empirical results on real network datasets have demonstrated that IMUU_RW can get higher quality results than the state-of-the-art methods in much less time. Main contributions of our work are summarized as follows: 1. We model the issue of detecting influential customers for competitive viral marketing as influence maximization with limited unwanted users (IML) in the IC model.
2. We propose a path sampling algorithm using a random walk to simulate the influence propagation. To reduce the calculation, we record the paths of the random walk so that they can be used in subsequent seed selection.To avoid the unwanted users, only the influences on the paths reaching the wanted users are calculated at each time step, and the influences reaching the unwanted users are ignored.
3. We present an algorithm IMUU_RW for finding the k most influential customers in competitive viral marketing with unwanted users.
4. We propose a pruning technology during the seed node selection which can greatly reduce the computation time.
5. Abundant experiments have been made on real network datasets. Empirical results demonstrated that IMUU_RW can obtain wider positive influence propagation in less time than the state-of-the-art methods.
The remainder of this paper is structured as follows. The related works on influential customer selection is overviewed in Section II. In Sections III, we model the problem of detecting influential customers for competitive viral marketing as influence maximization with limited unwanted users in the IC model. Section IV presents a random walk-based path sampling approach to estimate the propagation increment. Section V gives the framework of IMUU_RW algorithm. Section VI illustrates and analyzes the experimental results by IMUU_RW on real networks. Section VII gives the conclusions, and indicates further research works.

II. RELATED WORK
A key issue in viral marketing is how to detect influential customers to to maximize advertising. This issue can be modeled as the problem of influence maximization (IM) in social networks. Recently, IM has been applied in viral marketing strategy design, some propagation models have been advanced, and methods have been designed to solve the various problems in viral marketing.
In viral marketing campaigns, if two retailers or merchants sell the same product, there is competition between them. In such competitive marketing, there may exist two opposite influences in the trade network. The goal of a business is to spread the influence of its products to the maximal range of its customers and restricts the impacts by its competitors. This is essentially the issue of influence maximization in the competitive networks. Recently, some approaches have been proposed to tackle this issue. Carnes et al. [11] and Bharathi et al. [12] first study competitive influence spreading in the networks. They showed that the influence maximization in the competitive network with the prior knowledge of competitor was NP-hard and sub-modular. Therefore, they addressed several hill-climbing algorithms for competitive influence diffusion with known opponent strategies. Borodin et al. [13] proposed some different versions of the linear threshold model in the competitive scenarios. However, for a large number of competitive influence models, they show the propagation function in the proposed models is non-submodular which cannot be solved by the greedy approach. They also proved that the issue of calculating the spreading function was NP-hard under these models. Afterward, a long stream of techniques for variants of competitive IM has been proposed. Wang et al. [14] presented an influence spreading model using the fluid dynamics theory to describe the evolution process of influence spreading. They also designed the Fluid spread greedy approach to tackle the issue of maximizing useful influence spreading.
In competitive marketing, there may exist different point views on a product which may lead to two opposite influences in the trade market. To block the negative influence, the merchant tries to select influential users who can propagate the positive influence to offset the negative influence to a great extent. Budak et al. [15] and He et al. [16] modeled this issue as the problem of influence blocking maximization (IBM) which focused on selecting positive seed nodes to reduce the negative influence spreading, or to prevent as many as possible users from being affected by negative influence. They showed that such issue of IBM was NP-hard, and the spreading function was sub-modular. They provided greedybased approaches for this issue in the IC and LT models. Subsequently, Wu and Pan [17]designed two efficient heuristic algorithms CMIA-H and CMIA-O using the maximum influence arborescence approach to effectively solve the issue of IBM in different competitive independent cascade spreading models. Their experimental results demonstrated that their approaches can achieve the same influence blocking effect as greedy methods in much less time.
In the real world competitive marketing, it is impractical that the rival's strategy is known in advance. For a company, how to maximize its influence without knowing its competitors' market strategy is a key issue in competitive marketing. To solve this problem, Li et al. [18] employed game theory for solving the influence maximization in competitive networks. Ou et al. [19] also investigated the issue of comparative influence maximization. They proposed interacting influence maximization game to solve the problem by broadening the existing model of the competitive influence maximization game. They also proposed a heuristic method TOPBOSS for the second-mover to beat the first-mover on the premise of cognizing the first-mover's strategy in the comparative environment.Ben-Ishay [20] et al. proposed a strategy game called ''Spread-It'', which simulates the process of spreading information through social network structures. Experimental analysis has shown that among all heuristics methods, the Monte Carlo Tree Search (MCTS) method was found to achieve relatively good results and ensure the budget required for the game.
In competitive marketing, one product may be promoted by more than two companies and this led to a multi-party competition. Yu et al. [21] modeled this issue as the IM problem in multi-influence competition and studied how to maximize a company's influence spreading according to its budget. They also designed a greedy method MG for solving the problem. Yang et al. [22] defined this issue as the relative influence maximization (RIM) under the competitive IC model by extending the classical IC model. They adopted a greedy approach named GreedyRIM to solve the problem by utilizing the monotonicity and sub-modularity of the spreading function. Lu et al. [23] used the CIC model to depict the propagation of multiple influences with competitions. Under this model, they addressed two problems named SELFINF-MAX (Self Influence Maximization) and COMPINFMAX (Complementary Influence Maximization). They showed that those problems were NP-hard and thus designed approximation approaches to tackle the issue. In [24], Lin et al. presented a machine learning-based method for solving the multi-round multi-party competitive influence maximization problem.
Time is a key factor in viral marketing. For example, the expiration date for a certain food is September 1, and marketers want to sale the food to as many customers as possible before September 1. Furthermore, the influence between customers might be time-depending. For instance, some customers might propagate the information of a product to his friends and relatives after a certain amount of time. Therefore, the influential customers selected might not propagate the advertising immediately. Chen et al. [25] and Shi et al. [26] modeled this problem as maximizing the influence spread on a limited time. Chen et al. [25] showed that this problem under the time-delayed IC model was sub-modular and could be solved by a greedy strategy. To avoid the inefficiency of the greedy method, they proposed two heuristic methods for solving the problem efficiently. Shi et al. [26] showed that the issue of detecting the seed set to get the maximal positive influence within a time limit was NPhard, and advanced a greedy-based heuristic approach for the problem. Zhang et al. [27] studied the problem of maximizing influence for influential customers in the dynamic competitive environment. They presented a model named DICE (Dynamic Influence in Competitive Environments) and proved that the problem of detecting the candidate seed with the largest utility was an NP-hard under DICE. They provided a 1/3 − ε/n solution to the problem.
In competitive marketing, a merchant may have some target customers who are reliable and have large purchasing power. In order to promote his product, the merchant wants to influence as many of these target customers as possible to buy his products. Guo et al. [28] modeled the problem as maximizing the influence propagation to a given group of users. They proved this problem was NP-hard, and also showed the submodularity of the objective function. Guo et al. also presented a scalable algorithm which can ensure the approximation rate so as to be applied for larger networks.
However, in the research works for IM in competitive networks, few works investigate the problem of influence maximization which limits influence spreading to unwanted users. It is necessary to design an efficient method to tackle this issue. In this work, we present a highly efficient method to spread the effective influence on maximal number of wanted users while limiting the influence propagating to the unwanted ones.

III. DEFINITION OF THE PROBLEM AND THE BASIC IDEA
To detect the influential customers in competitive viral marketing with unwanted users, in this section, we model this issue as influence maximization with limited unwanted users (IML). Then we describe the propagation model and the influence propagation function under the model. Major symbols appearing in this section are shown in Table 1.

A. PROBLEM DEFINITION
The trade network of viral marketing can be modeled by a directed graph G = (V , E, P). Here, V and E respectively represent the set of customers and directed edges in G. Directed edge (u, v) is attached with a real number p(u, v) ∈ (0, 1) which indicates the probability for influence propagation from customer u to v. We define two states for each customer: inactive and active states. Inactive state indicates the corresponding customer has not been influenced, namely, it has not received the information of the product promoted. Active state indicates the customer has been influenced and ready to by the product.
In competitive viral marketing, a business wants to propagate the positive influence of its product to reach a large portion of the customers and to limit the influence on some unwanted users. Such unwanted users may have adverse record in transactions, or they could produce a negative effect on their marketing. Under such circumstances, the disseminators try to detect a set of influential customers, which are called seeds, to propagate the advertising to as many customers as possible except the unwanted users.
Definition 1 (Spreading of a Seed Set): Denote the set of customers influenced by the seed set S as I (S).The influence spreading of S, denoted as |I (S)|, is the expectation of the number of the influenced nodes by S. Definition 2 (Effective Spreading of a Seed Set): Use T ⊂ V to denote the set of unwanted users. For a seed set S, its effective spreading δ(S) is the influence spreading excluding the unwanted users in T : We define the problem of detecting influential customers in competitive viral marketing with unwanted users as influence maximization with limited unwanted users (IML).
Definition 3 (IML Problem): In a directed graph G = (V , E, P), given a set of unwanted users T ⊂ V ,the problem of IML is to find a seed set S * with the largest effective spreading: (2) VOLUME 8, 2020 From the NP-hardness of IM problem, it can easily be seen that IML is also NP-hard. Thus, we present a random walkbased greedy algorithm for the problem.

B. PROPAGATION INCREMENT
In this work, we adopt the Independent Cascade (IC) model [10] to emulate the information spreading in viral marketing. In this model, a node stays in an active or inactive state at every moment of the influence spreading. At the beginning of influence spreading, all the seed nodes are in the active state, and all the other nodes are initially inactive. After node v being activated, it activates its inactive neighbor w only once with probability p vw . The activation result of node v on its neighbor w does not depend on that on other neighbor nodes of v, namely, the activation of v to its neighbors are mutually independent. Such process of propagating ends when there is no more node being activated.
Under this model, the influence propagation function of S can be calculated by: where F V (v, S) is the probability that the seed set S activates v via the paths consisting of nodes in set V . We exploit the greedy approach to detect the seed set consisting of the most influential customers. The approach firstly initializes S as an empty set. In each time step, the node x ∈ V \ S which can maximize the influence propagation increment (x) is selected as the new seed and added to the set S. Such process will be repeated up till the size of S is equal to k. We define the influence propagation increment of each candidate node according to the effective spreading function.
Definition 4 (Propagation Increment): Let T be the set of unwanted users, the current seed set be S, x ∈ V \S be a candidate seed, and the influence increment (x) of x is defined as: From the definition, we can see that (x) is the growth of effective spreading when x joins the seed set S.

C. THE BASIC IDEA
Due to the exclusion of unwanted users in the influenced scope, the problem of influence maximization with unwanted users is quite different from classical IM problem. In the classical problem of influence maximization, we usually only consider the influence propagating to the whole network.
In estimating the influence scope of a seed set, we only consider simulating the spreading on the paths which can reach the most nodes. But in the problem of influence maximization with unwanted users, since the unwanted users should be avoided, we must simulate the spreading on the paths which do not lead to the unwanted users. However, it is difficult to detect such paths in advance.
To tackle this difficulty, we propose a new method for estimating the influence scope of a seed set. In classical influence maximization, the estimation is based on the spreading path starting from the seed nodes, which are the sources of influence propagation. In our method, to avoid the unwanted users, the spreading is estimated from the aspect of the influenced nodes, which are the destinations of influence spreading. In this way, only the influences on the paths reaching the wanted users are calculated at each time step, and the influences reaching the unwanted users are ignored.
To reach our goal of maximizing the influence spreading excluding the unwanted users δ(S) = |I (S)\T | − |I (S) ∩ T |, we use the greedy approach to sequentially select the node x ∈ V \S which is not an unwanted user and can maximize the influence propagation increment (x) as the new seed and added to the set S. From (4), we can see that computing (x) needs to calculate the influence spreading I (S) for seed set S.
we use a random walk to simulate the propagation of influence in the nodes excluding the unwanted users.

IV. THE PATH SAMPLING APPROACH BASED ON RANDOM WALK A. CALCULATE THE PROPAGATION INCREMENT
In the traditional greedy methods for influence maximization, the influence propagation should be simulated by Monte-Carlo method for each candidate seed x in V \S in each time step. However, the calculation of the propagation increment is a # P problem, which requires a huge amount of computation time. Therefore, we estimate (x) efficiently using a random walk to simulate the propagation of influence in the IC model. First, we give the following theorems.
Theorem 1: Let S be the current seed set, and G = Here, F V \S (v, {x}) denotes the probability for seed set {x} activating node v in the subgraph G = (V \S, E ).
Proof: According to the definition of F V (v, S), the following equation can be obtained: Here, F V \{x} (v, S) denotes the probability for seed set S activating node v in the subgraph where V \{x} is the set of vertices. Then it can be obtained that Based on Theorem 1, the influence increment (x) of the influence spreading on the nodes excluding the unwanted ones can be calculated according to the following theorem: Theorem 2: Let S be the current seed set, T be the set of unwanted users. For a candidate node x in V \S, (x) can be calculated as: Proof: According to the definition of (x), we have: (3), we need to estimate all the possible spreading from nodes in S to v. Let u ∈ S be a seed node and v ∈ V \ S be a node activated by u through a path L connecting u and v. Based on the probability of such path L, the probability for seed u to activate v can be calculated. Suppose a path L from u to v consists of a sequence of edges e 1 , e 2 , . . . , e l and p i is the activating probability on edge e i . The probability of path L = (e 1 , e 2 , . . . , e l ) from u to v can be estimated as In case there are probably more than one path connecting u and v, the probability for seed u to activate v can be computed in independent paths. We call paths L 1 and L 2 independent paths if they do not share common edges.
From (3), we know that the spreading of a given seed set S can be estimated by adding values of F V (v, S) of all the nodes in V , and F V (v, S) can be calculated by adding the probabilities of all the independent paths from nodes of S to v. Therefore, it can be obtained by iterations based on the following equation: Here, N (v) is the node set consisting of the neighbors of node v. Eq. (7) calculates the probability for seed set S to activate v by recursively enumerating all the independent paths connecting nodes in S with v and adding up their probabilities.

C. RANDOM WALK FOR PATH SAMPLING
In (6), F V (v, S) is the probability that node v can be activated by seed set S under the IC model, and can be calculated by iterations using formula (7). For selecting the candidate node in every step of the greedy method, we must compute the F V (v, S) for all the nodes in V . Such computations need to be repeated for every candidate in all steps and require a large amount of computation time. Such a time-consuming process makes the greedy methods unable to solve the real-world viral marketing problems. To avoid such time-consuming process, we use the path sampling approach based on random walk to simulate the propagation process and record the random walk paths so that they can be reused the subsequent seed selection to reduce the time cost. In order to calculate F V (v, S), we assume that a particle starts from v and performs random walks according to the propagation probability on each edge. In a reversed direction, such random walk can simulate the influence propagating from S to v. If a particle walks through node v in the path after t steps of random walk, v is likely to be activated at time t. The summation of the probabilities of these vertices belonging to For a node v ∈ V \S, we use f V (v, S, t) to denote the probability that the particle starts from v and arrives at a node in S at step t, then: In (8), we need to sum upf V (v, S, t) infinitely for all t values. This is not feasible in real-world applications. However, we can ignore the long paths since their probabilities are too small and can be omitted. Therefore, we can limit the length of the paths to a certain range L, so that the error of the result is less than a given threshold ε. Let VOLUME 8, 2020 and In order to ensure Since a valid propagation path must not consist of circles, its length L should satisfy L < |V |. Therefore, we set the value of L as Suppose the particle participating in random walk starts at v and reaches the node u t at step t, we define Subsequently, we employ the Monte-Carlo method to calculate E [g(t)] by multiple random walks. Let R be the number of repeated random walks. Suppose in the r-th random walk, the particle starts from v and reaches the node u r t at step t.
We define Then E [g(t)] can be estimated by 1 R R r=1 g r (t) and F L V (v, S) can be estimated by: From (12), we can see that each time a random walking particle starting from v passes through a seed node in S, then the value of F L V (v, S) should be increased by 1/R. Based on this observation, we present an algorithm PS_RW (Path Sampling Random Walk) to compute the effective spreading of each node x by the path sampling random walk. Table 2 shows the variables used in algorithm PS_RW.
The framework of algorithm PS_RW is as follows.
In line 6 of Algorithm 1, the particle at node u randomly selects a neighbor w by the ''Roulette'' method. Let the neighbors of u be v 1 , v 2 , . . . .v d , and the probabilities for u to activate them be p 1 , p 2 , . . . .p d respectively. The ''Roulette'' based selection has two steps: (i) Determine whether the random walk to next node is successful. The probability for the particle being transferred to one of its neighbors is (1 − p i ).In this step, a random number r 1 ∈ (0, 1) is generated.If r 1 < p succ , the next step can be performed, that is, one of its neighbor w will be selected and the particle will move to w. Otherwise, the selection is failed and the random walk is terminated.
(ii) If r 1 < p succ , the particle selects one of its neighbors according to the probability on the edges. In this step, we first calculate p sum = d i=1 p i , and p j = j i=1 p i p sum , (j = 1, 2, . . . , d), respectively.Let p(0) = 0.A random number r 2 ∈ (0, 1) is generated.If p(i − 1) < r 2 < p(i),v i is selected the next node.
Suppose |V | = n, the time complexity of the above algorithm is obviously O(n * L * R). Here, L and R can be considered as constants. Therefore, the overall complexity of Algorithm 1 is O(n).

D. ESTIMATING THE SIZE OF PATH SAMPLING
In algorithm PS_RW, usingF L V (v, S) to approximate F L V (v, S) could cause a certain error. Obviously, increasing the sample size R can reduce the error. But the larger sample size R will consume more computation time. To achieve a good balance between computational time and accuracy, we use Heoffding bound to determine the appropriate sample size R to make the error |F L V (v, S) −F L V (v, S)| within a given threshold ξ . Theorem 3 (Heoffding Bound): Let x 1 , x 2 , . . . , x R be mutually independent random variables, and a r ≤ , then for an error bound ξ > 0, the following for r = 1 to R do 3: u ← v; 4: Len(v, r) = 0; 5: for l = 1 to L do 6: Selecting a neighbor w of u by the ''Roulette'' method; 7: if the ''Roulette'' method selection is unsuccessful then 8: Break (end the path and exit the for l loop; 9: end if 10: Path(v, r, l) = w; 11: H (w) = H (w) ∪ {v}; 12: Loc(v, r, w) = l; 13: if v ∈ T then 14: δ T (w) = δ T (w) + 1/R; 15: else 16: δ V \T (w) = δ V \T (w) + 1/R; 17: end if 18: Len(v, r) = Len(v, r) + 1; 19: u ← w; 20: end for l; 21: end for 22: end for holds: In algorithm PS_RW, the results for node v in the r-th sampling is L t=1 g r (t), and Since F L V (v, S) ∈ (0, 1) we set b r = 1 and a r = 0. By Theorem 3,we have Suppose the probability for the error is larger than ξ and must be less than a given threshold θ>0, i.e.
From formula (15), we can see that the sampling size R must satisfy:

V. SEED NODE SELECTION
In this section, we propose an algorithm for IM in competitive networks with unwanted users. The algorithm first calculates the influence spreading of each node v in V \T and T by calling the algorithm PS_RW. In the meantime, the propagating paths are also obtained. Based on the path information obtained by algorithm PS_RW, the effective influence spreading δ(S) of a seed set S and the propagation increment (v) of each node can be approximately estimated.

A. APPROXIMATION OF THE SPREADING INCREMENT
According to Theorem 2, we know that the propagation increment (x) of a node x in V \S can be calculated as: To estimate (x), we use the result of the random walk F L V (v, S) to replace F V (x, S) in (17). Based on the paths in random walks, we can obtain: Here, the indicator function I [T ] is as follows: From formula (18) we can see thatF L V (v, S) is the expectation of the number of seeds in S that the particle from node v walks through in the random walk.
To estimate the values of v∈V \T (17), we also use the results of the random walk. We define: VOLUME 8, 2020 FIGURE 1. An example of a node that x can activate w during the random walk.

FIGURE 2.
An example of a node that x cannot activate w during the random walk.
Then we have To avoid activating the unwanted users in T , we select the node x with the largest value of (x) defined in (20), which depends on the value of δ V \T (x, S) − δ T (x, S). It can be seen from (19), that δ V \T (x, S) and δ T (x, S) are respectively the numbers of vertices outside and inside T that x can activate. (20) is to choose the seed x such that the influence spreading of the new seed set S ∪ {x} in V \T can be maximized while the influence spreading in T is minimized.

Maximizing (x) in
We use the result of the random walkF L V \S (v, {x}) to replace the F V \S (v, {x}) in (19), and get To compute δ V \T (x, S) by (21), we need to enumerate all the paths starting from the nodes in V \T and passing through x. Suppose one of such paths L starts from a node w ∈ V \T . The influence propagation from x is in the inverse direction of L as shown in Fig. 1. If there is no node u ∈ S between x and w in this path, x can activate w following this path. Otherwise, if there is a node u ∈ S between x and w in this path as shown in Fig. 2, then selecting x as the new seed node cannot increase the probability for activating w since u can activate w before x.
Therefore, the location of x in the path must satisfy: loc(w, r, x) < min u∈S loc(w, r, u). Then, (21) can be rewritten as: Similarly, we can get the following equation to compute δ T (x, S):

B. UPDATING THE VARIABLE VALUES IN THE ITERATIONS
To detect the optimum seed set S, we first call Algorithm 1 which simulates the random walk of the influence propagation by the sampling method and record the random walk paths. Then a greedy method is applied to construct the seed set. The seed set S is initially set as an empty one. Then the algorithm iteratively selects the nodes to join S. In each iterative step, node x in V \S which has the highest propagation increment (x) is chosen as a new seed and added to S. For calculating (x) using (20), the initial values of δ V \T (v, S) and δ T (v, S) are respectively the output δ V \T (v),δ T (v) of algorithm 1. Also, we setF L V (v, S) = 0 for all v ∈ V since S is initially set as an empty set.
Then the algorithm selects one node x with the maximal (x) to join the seed set in each iteration. After the node x being added to S, the values ofF L V (v, S), δ V \T (v) and δ T (v) should be updated according to the changes of S. Since we have recorded the random walk paths in Algorithm 1, it is not necessary to repeat the random walk when updatingF L V (v, S), S∪{x}) can be obtained by incremental calculations on the basis ofF L V (v, S), δ V \T (v) and δ T (v), and the value of (v) under the new seed set can be calculated according to (20). The rules for updating,F L V (v, S), δ V \T (v) and δ T (v) are as follows: (1) UpdatingF L V (v, S) According to (18), we havē Then , r, u)) .
The condition (path(v, r, l) = x) and (l < max u∈S (loc(v, r, u)) in (24) is to ensure that there is no node u ∈ S between x and v in this path as shown in Fig. 1 so that x can activate v follow this path. Therefore, for each of such path, the value of F L V (v, S) should be increased by 1/R. If there is a node u ∈ S between x and v in the path as shown in Fig. 2, adding x to the seed set cannot increase the probability of activating v.
(2) Updating δ V \T (v, S) According to (19),we have (w, r, u) . (25) Equation (25) illustrates that after adding x to the seed set, some paths where v is located between x and a node u in S should be removed in computing δ V \T (S, v). Because if the node v is located between x and u in S in the red region as shown in Fig. 3, adding x to the seed set do not increase the probability of activating v. When calculating δ V \T (v, S ∪{x}), the contribution on these paths should be subtracted from δ V \T (v, S). Therefore, for each of such path, the value of δ V \T (v, S) should be decreased by 1/R.
(3) Updating δ T (v, S) Similarly, we can get the rule for updating δ T (v, S). Therefore, for each path satisfying loc(w, r, x) < loc(w, r, v) < min u∈S loc(w, r, u),the value of δ V \T (v, S) should be decreased by 1/R.

C. FRAMEWORK OF THE ALGORITHM IMUU_RW
In our algorithm, we use variables F(v),δ V \T (v) and δ T (v) to respectively represent the values ofF L V (v, S),δ V \T (v, S) and δ T (v, S) under the current seed set S. By the updating rules (24), (25) and (26), we can see that once x is selected into the seed set, the vertices in a path after x are no longer likely to be selected as seeds, and it is not useful in updatingF L V (v, S), δ V \T (v, S) and δ T (v, S). Therefore, we prune this part of this path when x is selected to join the seed set.
To illustrate our method in a simplified manner, we give Fig.4 to show the main workflow of our proposed algorithm. In Figure 4, there are 4 steps that can be described in detail as follows: 1. Select the node x with the maximum δ V \T (x) − δ V (x) as a new seed to maximize the influence excluding the unwanted user set T .
3. Update (x), the influence increment excluding the unwanted user set T according to (20). 4. Select the node x with the maximum δ V \T (x) − δ V (x) as a new seed to maximize the influence excluding the unwanted user set T .
The framework of algorithm IMUU_RW (Influence Maximization with Unwanted Users based on Random Walk) is described as follows.
Let |V | = n, algorithm IMUU_RW performs k times for loop i, up to n times for loop w, R times for loop r and up to L times for loop l. Hence, the time complexity of IMUU_RW is obviously O(n * L * R * k). Because k, L and R can be considered as constants, the total complexity of Algorithm 2 is O(n).  Loc(v, r, u), //Update δ V \T (u) and δ T (u) of node //u after x according to (25) and (26) end for j; Len(w, r) = l − 1; //Prune the path after x F(w) = F(w) + 1/R; / * Update F(w) of the activated node w according to (24) * / end for r; end for w; (20) * / end for; Select the node x with the maximal (x); S = S ∪ {x}; end for i; End

VI. EXPERIMENT
We test IMUU_RW for IML problem under the IC model on six real-world networks to evaluate its efficiency. We code the algorithms using C++ which is performed under Win-  Epinions [29] is a website where customers can give or obtain the reviews on the commodity, they can also release comments on the reviews by other customers. Based on the comments, positive or negative relations between the customers are observable.
BlogCatalog [30] is a social blog website which can be represented by a network with nodes representing the blogs and the links reflect their relations. The relations between the blogs can be observed from the information attached with the blogs.
Gnutella [31] is a large-scale peer-to-peer network. In Gnutella network, nodes stand for the hosts, and the edges represent the connections between the hosts. The dataset includes 9 snapshots of the evolving network.
Slashdot [32] is a social news website. The website consists of all types of news stories. The users can post news stories on the website. In addition, they can also make comments on the stories posted on the website. ArXiv-Collaboration [31] is a website consisting of scientific research papers to be published. ArXiv−Collaboration dataset can be represented by a network where the authors are represented by the nodes. If two authors collaborate on a paper, there will be an edge linking the nodes of the authors.
NetHEPT [31] is a co-author network. It describes the cooperative relationships among scientists in different domains. Table 3 lists the features of the networks including the size of the node set and edge set of the network, and its average degree.
Over the six datasets, the activation probability on the edges is set by two models: random IC (RICM) and weighted IC (WCM) models. Under the RICM model, a random real number p ∈ (0, 0.09] is assigned to each edge (u, v) indicating the probability for u to activate v. In WCM mode, each edge (u, v) is assigned a real number 1/d v as the probability for activating v. Here, d v denotes the degree of node v. Compared Methods: we test and compare the performance by our IMUU_RW algorithm with the following five methods: Random: It is a randomized algorithm for IM problem. In the algorithm, the k seed nodes are selected. Repeat such stochastic seed set construction procedure 1000 times to obtain 1000 seed sets. The top k nodes most frequently appear in 1000 seed sets are used to form the final seed set.
MaxDegree: This method [33] uses a heuristic approach based on each node's degree. In each time step, the node which has the largest degree is selected as the new seed. The method repeatedly selects and adds the nodes with the largest degrees to the seed set up till the size of the seed set reaches k.
Greedy: In each step of this method [34], the influence spreading increment of each node is estimated by Monte-Carlo simulations. The node which has the highest influence spreading increment will be selected and added to the seed set.
MG: Yu et al. [21] presented the algorithms MG for IM in social networks with multiple influences. The method aims to suggest the companies proper influence spreading at the cost less than their budgets. It chooses the node with the lowest EBR (Earning-to-Budget Ratio) and adds it into the seed set.
DCIM_CELF: As an improvement of the CELF algorithm, Li et al. [35] presented DCIM_CELF algorithm to tackle the problem of DCIM (Dominated Competitive Influence Maximization) under the new competitive IC model CIC-M. Each time DCIM_CELF chooses a node which has the highest marginal increment of spreading and adds it into seed set.

B. THE EFFECT OF PARAMETERS ON PERFORMANCE
The default parameters of IMUU_RW are set as follows: ε = 0.2, ξ = 0.1, θ = 0.3. Therefore, we can derive the range of L from equations (9) and (10). In the process of influence spreading, existing research shows that the seed nodes have a great influence on one-hop neighbors and two-hop neighbors. However, their influence spreading after three-hop neighbors is not large. Therefore, in our experiments, we set L = 3.   Figs.5-8 show the effect of parameter R on the performance of algorithm IMUU_RW. It can be investigated that our algorithm IMUU_RW can achieve the best performance over all datasets when R = 130.
In the experiments,we test on the different sets T of unwanted users with sizes |T | = 50, 100, 150. The unwanted users in T are selected from the node set V randomly and uniformly.The experimental results are shown in section C.

C. TEST ON DIFFERENT NUMBER OF UNWANTED USERS
We test the effective influence spreading in WCM model by the algorithms on different numbers of unwanted users. Figs.9 to 12 show the results on the ArXiv-Collaboration, BlogCatalog, Gnutella and NetHEPT datasets. From these four figures, it can be observed that our algorithm IMUU_RW can achieve higher effective influence spreading than all the other methods on all datasets with different numbers of unwanted users. This indicates that VOLUME 8, 2020 IMUU_RW can obtain higher quality results of seed selection. As to DCIM_CELF and MG, they get the second and the third effective influence spreading respectively. However, DCIM_CELF always has a slightly better influence spread than MG over these four datasets. Obviously, Random method shows the lowest influence spreading on all the datasets. It can also be observed from Figs. 9 to 12 that all the algorithms except Random have achieved the best performance when |T | = 100. The reason for IMUU_RW obtaining the higher effective influence spreading than all the other methods is that the effective spreading increment used for selecting the seed nodes is based on analyzing the spreading paths, which considers avoiding reaching the unwanted users. After a node being selected and added to the seed set, the effective spreading increment is also be updated by analyzing the spreading paths. This makes the seeds selected can achieve larger effective influence spreading than other methods.

D. TEST ON DIFFERENT SEED SIZES
Subsequently, we conduct experiments to compare the effective influence spreading of different methods on the datasets ArXiv-Collaboration, BlogCatalog, Gnutell and NetHEPT in the RICM and WCM models with different seed sizes. In the test, we fix the number of unwanted users as 100, since all the algorithms except Random have achieved the best performance when |T | = 100. Fig.13 depicts the effective influence spreading by six methods on arXiv-Collaboration. As shown in Figs.13(a) and 13(b), IMUU_RW significantly outperforms the other methods on arXiv-Collaboration. DCIM_CELF is superior to the other four algorithms under both models. As to MG,  it achieves the third influence spreading in the WCM model. In the RICM model, it obtains less effective influence spreading than Greedy, which demonstrates the performance of MG is unstable. Random gets the lowest quality result among all the methods under both models. It can be observed that if the seed set size is less than 25, both Random and MaxDegree methods achieve similar effective influence spreading under RICM model. Random achieves lower effective influence  spreading than MaxDegree if k > 25. Thus, MaxDegree can obtain slight higher quality results than Random in RICM model. On the whole, our method IMUU_RW can obtain higher quality results than the five other methods over the arXiv-Collaboration in both models. Fig.14 depicts the effective influence spreading by different methods over the dataset of BlogCatalog. It can be easily seen from Fig.14, IMUU_RW achieves the best performance among the six algorithms under both models. However, the performance of IMUU_RW under the WCM model is higher than that in the RICM model. Except for the algorithms MaxDegree and Random, the performance of the other four algorithms are very similar when k ≤ 10. But when k > 10, algorithm IMUU_RW achieves higher effective influence spreading than DCIM_CELF, MG, and Greedy. Overall, algorithms DCIM_CELF, MG, and Greedy have roughly the same performance in both models. It can be also found from Fig.14 that the performance of Random is the  lowest in WCM model, and performs of MaxDegree is the lowest in RICM model. Fig.15 depicts the effective influence spreading by the methods on the dataset of Gnutella. It can be seen from the figure that the IMUU_RW is significantly superior to the other five methods under both models. As to the other five algorithms, influence spreadings by DCIM_CELF, MG and Greedy are very close if the number of seeds is less than 20 in WCM model. MG obtains almost the same effective spreading as DCIM_CELF if the number of seeds is less than 20 in the RICM model. With the increasing of seed size, DCIM_CELF can obtain much larger influence spreading than the other four algorithms. Except for algorithm Random, the performance of the other five algorithms in the WCM model is better than that under RICM model. Fig.16 depicts the effective influence spreading obtained by the different methods on the data set of NetHept. As shown in the figure, we can observe that IMUU_RW still gets the highest quality results and Random gets the lowest under both models. As to the other four algorithms, it can be investigated that DCIM_CELF, MG, and Greedy obtain almost equal influence spreading if number of the seeds is less than 30 in WCM model. While in the RICM model, MG always has nearly the same effective spreading as DCIM_CELF and Greedy performs slightly better than MaxDegree. However, with the increasing of seed set sizes, DCIM_CELF achieves higher quality results than all the other methods. Moreover, the performance of MaxDegree in the WCM model is better than that in the RICM model. From the experimental results illustrated in the figures, it can be seen that our algorithm IMUU_RW obtains the best influence spreading among all the six methods tested on all the datasets in both models. The results demonstrate that IMUU_RW achieves the highest performance in term of effective influence spreading. It can also be observed that IMUU_RW has better performance in the WCM model than in RICM. For the other methods, DCIM_CELF achieves the second-best results under both models on all datasets. The performance of algorithm MG exceeds the other three algorithms on most data sets excluding arXiv-Collaboration dataset in the RICM model. On this dataset, MG obtains lower effective influence spreading than Greedy. MaxDegree gets the least performance on BlogCatalog dataset under the RICM model. In addition, Random gets the lowest influence spreading in other cases. The reason for IMUU_RW achieving the highest quality results is that it takes not only the wanted users but also the unwanted ones into consideration for calculating the influence increment. This makes the influence be propagated to the wanted users and avoids it to the unwanted ones.

E. TEST ON LARGE-SCALE NETWORKS
To make a more in-depth test on the performance of IMUU_RW, we test it on large-scale networks Slashdot and Epinions under WCM model with more than 20000 nodes. We test the effective spreading in different propagation time t and compare with other algorithms. We set k = 800, 1500, 5000 on Slashdot, and 800, 5000 on Epinions dataset, respectively, and set |T | = 100 on both datasets. Fig.17 shows the results on Slashdot with different seed set sizes. From Fig.17 we can see that the curve of IMUU_RW is above that of the other algorithms in almost all of the time steps. This indicates that IMUU_RW can achieve higher effective influence spreading at all steps of spreading. The reason is that the other algorithms, such as Greedy, MaxDegree, and Random, only focus on the single-influence maximization. They are not efficient for the large-scale competitive network with wanted and unwanted influences. As for DCIM_CELF and MG, since they need huge amount of Monte Carlo emulations, they do not perform well on large-size networks. In addition, we observed an interesting phenomenon. If we set k = 800 or 1500 for the tests on Slashdot data, Greedy performs better than MaxDegree. However, when k = 5000, it can be noticed that the performance of MaxDegree exceeds Greedy. This is because the MaxDegree chooses the node which has the highest degree as the most influential one, and the spreading increases with the growth of the seed set. However, MaxDegree consumes a large amount of computational time if the seed set size becomes very large. In addition, with the large seed size k, for instance when k = 5000, the curves of DCIM_CELF and MG algorithms are very fluctuating. This indicates that the two algorithms are not stable with large k values.However,the performance of DCIM_CELF is much better than MG algorithm when k = 5000. The reason is that DCIM_CELF method is more capable of detecting hidden and influential users than the MG when the seeed set grows. In conclusion, IMUU_RW can obtain much more positive influence spreading than other algorithms in Slashdot under both models. Fig.18 shows the test results on Epinions with 800 and 5000 seeds. It can be seen from the figure that the six algorithms obtain almost the same effective spreading. Since the components of the Epinions network are compactly connected, no matter how big the seed set is, the number of wanted activated nodes is always about 40,000. By an indepth examination on the topological structure of the network, we observed that the spreading covered most part of the connected component. This is the reason for different methods getting almost the same spreading results. However, as we can see from Fig.18, IMUU_RW has the largest positive influence spreading among all the algorithms tested at almost all of the time steps.

F. TEST ON RUNNING TIME
In our experiments, the computation time of IMUU_RW is tested and compared with that of other methods. In the experiment, the number of the seed nodes is set as 50, while the size of the unwanted users is set as 100. Fig.19 depicts the running times reuired by the six algorithms on four data sets. From Fig.19 we can see that Random always runs fastest than all other five algorithms over the four datasets. Though Random runs fastest, it achieves the lowest influence spread as indicated in Figs. 9 to 18. As to the other five algorithms, MaxDegree requires the second least time and IMUU_RW consistently runs faster than MG, DCIM_CELF and Greedy over the four datasets. Algorithm Greedy takes the largest computation time among all the methods tested, it requires more than one hour to detect 50 seeds on every dataset in two models. From the figure, we can also see that algorithms IMUU_RW, MaxDegree and Random run faster in WCM model than in RICM.
By comparing the test results above, it can be seen that our method IMUU_RW consumes the third least running time under both models. This illustrates that algorithm IMUU_RW is more stable and runs the third faster than the other methods in almost all the time steps. The reason for IMUU_RW running faster is that it uses a path sampling based random walk method to simulate the influence propagation and record the paths of the random walk so that they can be reused in subsequent seed selections. In addition, it uses pruning techniques to remove the part of paths which are unlikely to consist of new seed nodes. This is useful for effectively detecting the seeds with the maximum effective influence spreading.
Given an overall consideration on the effective influence spreading and computation time of the experimental results, our random walk-based algorithm IMUU_RW can detect influential customers who are capable to activate more buyers in less computation time. IMUU_RW obtains the largest effective influence spreading on all the datasets under both models. Also, IMUU_RW runs the fastest on all the datasets among all algorithms except Random and MaxDegree. However, Random has the worst performance and MaxDegree achieves the second worst performance in influence spreading. Both of them are unstable. Therefore, IMUU_RW can achieve the most effective influence spreading effectively in the two models in solving the IML problem.

VII. CONCLUSION AND FUTURE WORK
In viral marketing, there may exist competitions among retailers or merchants. Consumers probably take various views on each product by different merchants. To promote a new product to a large portion of the customers, the company might initially activate several influential customers so that they can activate a maximum number of buyers. In the meantime, they do not want the unwanted users to receive information about the product, since such unwanted users may create a barrier to product sales and reduce the benefits of the company. In this paper, we model this issue as the influence maximization with limited unwanted users(IML)in competitive viral marketing. To reduce the computation time, we propose a path sampling based random walk algorithm to simulate the influence propagation and record the path of the random walk so that they can be reused in subsequent seed selections. To select the best seeds, we adopt the greedy method and present an algorithm IMUU_RW which can efficiently select the seeds without the time-consuming Monte-Carlo simulations to estimate the influence spreading. Empirical results in several social networks show that IMUU_RW algorithm obtains much more effective influence spreading within less time than the other ones.
In viral marketing, the customers may have positive and negative relations. These relations can be observed by their comments on the merchants and their products. It is important for the business to utilize the two opposite relationships among the customers to sell its products to maximum range of the trade network. To tackle this problem, as our further study, we will attempt to extend our algorithm for such network with opposite relationships. In addition, we intend to study the issue of IML in multiple networks. In such network, each customer may participate in several social and trade networks where the influence of influential customers can propagated through different networks.
JIE HE was born in Huaian, Jiangsu, China, in 1995. He is currently pursuing the master's degree with the College of Information Engineering, Yangzhou University, Yangzhou, China. His research interest includes influence blocking maximization.
LING CHEN was born in 1951. He graduated from the Mathematics Department, Yangzhou Teachers' College. He is currently a Professor of computer science with the Information Technology College, Yangzhou University. He is a member of ACM. His research interests include artificial intelligence, data mining, system optimization, and complex network analysis. He has published more than 200 articles in journals and conferences. He has also authored/coauthored six books. He has 15 research projects supported by the Chinese Natural Science Foundation and other organizations. He has received five Awards of Progress in Science and Technology from the Government of Anhui and Jiangsu Province. He was awarded the Government Special Allowance by the State Council. His research interests include data mining, artificial intelligence, bioinformatics, machine learning, and computational optimization.