Influence Minimization With Node Surveillance in Online Social Networks

Diffusion dynamics is the transfer of information from one node to another. Information diffusion has two goals: maximize or minimize information spread over the network. Attractive information such as innovations, awareness campaigns, branding, and advertising help people positively. However, awful information such as rumors, malicious viruses, pornography, and revenge disturb people. The negative information contributes to chaos; therefore, it must be blocked and inhibited from further diffusion. We are motivated to study the problem, namely Influence Minimization. The new information alters the energy level or entropy associated with a node. Entropy quantifies the influence propagation rate across the network. This article proposes two reduction policies that reduce repulsive information’s influence through entropy. We validate the proposed system by considering the user response and surveillance on real-time networks.


I. INTRODUCTION
A Complex Network [1] is an explicit model for describing real-world complex systems. The Complex Network helps to understand the structure and dynamics of complex systems ranging from technological networks to biological networks and social networks to business networks [2], [3]. Social Networks act as a medium for propagating information across the world. Examples of social networks are Facebook, Twitter, Instagram, and Linkedin. The influence maximization occurs when the positive information propagates along the length and breadth of the networks and impacts most of the entities, [4], [5]. On the other hand, influence minimization [6] gives minimal care to spread the negative information. Both Influence maximization and minimization are opposite sides of the same coin.
The Influence Minimization attempts to block nodes that spread negative information in each social community, thereby minimizing the impact. Minimizing the propagation of awful information in a networked graph is challenging. The dynamic social network seems to lose energy. In this context, entropy is the energy needed to maintain a The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Guidi . steady state [7]. An entropy rate indicates how randomly or quickly information is spread over the network. A node receiving the same message again may not be a piece of relevant information. Hence the entropy value does not change. The change in entropy is the amount of new information a node receives. It considers capable of inflating the spread. In general, the influence minimization approach cuts the edges that lead to dis-joined nodes, no matter where the nodes are or how much capacity they bear to carry over the information [8]. Thus we can attain our primary objective of protecting society from untruthful facts and troubles.
The influence minimization finds its existence in many applications across various domains like social media, epidemiology, security, and public health. Here we coin a novel concept for the problem and propose a term entropy based Influence Minimization (e-IM).
The e-IM model selects m nodes, which tend to propagate negative information. These sets of nodes that propagate awful information are called malicious nodes. The temporary blocking of malicious sets of nodes from the network creates subsets of sub-graphs. Isolating the infected node is a better process of controlling the spreading to restrict the spreading of rumors in a limited area.
The surveillance nodes are the optimal subset of nodes that advertise the spreading of information. The outbreak of transfusion of rumors or viruses demands the social networks to follow an epidemic outbreak. The optimal subset of the surveillance nodes represents a function of the energy level of an infected node. Hence, it affects the estimation of the source and outbreak path. Furthermore, it helps detection of the spreading of awful information. Therefore, designing a minimization algorithm ensures the surveillance function's accuracy and promotes social well-being by preventing unwanted information.
In this article, we propose the study of the influence minimization problem by node surveillance. We consider the structure of the social network as a multilayered network. We design this study as a source(s) information propagation function. When a node receives information, its threshold increases, and we measure it as entropy. We generate the seed set (k) for maximizing the information explicitly. The e-IM algorithm proposed here maximizes the threshold function and forms a subset of nodes called malicious nodes(m). We design an m-k policy to block all common nodes from the graph to make the two disjoint sets.
The (m-k) policy and ((m-k)-p) policy improve our algorithm to reduce the influence to a greater extend. In both cases, the principle of the disjoint subset helps to improve the e-IM problem and helps to attain linear running time by considering network size. To show the reliability and effectiveness of the e-IM algorithm, we conduct experiments with the help of a python-based simulator on real-world networks. We observed that the two information blocking policies are good in minimizing the influence spread. To summarize the article, the major contributions are: -propose a novel influence minimization algorithm called e-IM. -formulate two policies (m-k) and ((m-k)-p) for establishing e-IM algorithm. -validating the algorithm with real-network for showing the effectiveness of e-IM. The rest of this paper is organized as follows. Section II familiarize some related work in the field of influence minimization. It gives an insight into the literature. The problem and proposed procedure to downsize the influence of information diffusion in social networks is described in section III, its validation in section IV and illustration of the results in Section V. The article is concluded in section VI.

II. RELATED WORK
Information propagated across social networks may be either reliable or malicious. Positive information is beneficial to the community and hence demands a wide propagation. On the other hand, malicious information needs to be minimized [9]. Entropy is the term used to quantify information diffused in the network that changes when nodes receive new information. Moreover, whether the message is highly informative or malicious can be identified based on the entropy value change. The entropy level differs while receiving messages like information, surprise, or uncertainty. For flaws or rumors, this entropy decreases while forwarded from one node to another.
Influence maximization and minimization are the leading research topic under Information cascade in the Social network domain [10]. The influence minimization problem reduces the rumor disinformation by blocking nodes from a topic modeling perspective [11]. When undesirable events propagate in a social network, reduce the size of the infected volume by blocking some nodes outside the infection area. This optimization problem uses heteroscedastic discriminant analysis (HDA)-Linear discriminant analysis (LDA), and divergence methods such as KL to analyze the influence of topic modeling in the independent cascade model. The topic-aware influence minimization approach works based on betweeness centrality, and the concept of out-degree [12], [13], [14]. We observed that this approach is better than any centrality-based approach, especially at the beginning of the contamination.
The targeted influence minimization cut down the influence of negative information to some particular user groups in social networks [15], [45]. For example, YouTube enables age-restricted content delivery to the users. The algorithm focus on two cases of influence minimization problems, the first one is the impact of budget, and the second one is robust sampling [16]. The algorithm provides an optimal solution and a greedy approximation. Both are not appropriate for large dynamic networks such as Online Social Networks (OSN). The robust sampling method applies to real social networks that guarantee an effective solution. The samplingbased solution covers the maximum area where the information spreads. However, the method is less efficient when the incremental addition of nodes has awful information.
Another approach is the minimization of rumor spreading by considering user experience dynamically [17], [18]. The method is to plug off a subset of the nodes to block malicious rumors. The minimization problem also takes care of the constraint of the user experiences. Based on the entropy threshold, the node propagating rumors is blocked temporarily for a fixed time-stamp known as the blocking time. If a user exceeds the threshold, the service of the network system declines. The node is established back later on.
These assumptions lead to designing a problem based on survival theory, and the maximum likelihood principle [19]. The popularity and user inclination towards rumor analysis through using model [20]. The algorithm is inaccurate when the structure of the networks becomes more complex. Each node is examined closely, and its surveillance can be incorporated to enhance the effectiveness of influence minimization. The approach suggested in [21] has considered the structure of the social network as single-layered and omitted the rumor spread at the source node. The influence minimization problem is studied in the Linear Threshold(LT) model [22], [23].
Studying the Influence Minimization in online social networking is challenging. Considering the real-world networks as the arrangement of layers, the information diffusion across VOLUME 10, 2022 the layers has to minimize. Not many methods to minimize the influence spread in such representation of complex networks are available. This motivates to design an effective method that blocks widespread rumors and awful information.

III. ENTROPY-BASED INFLUENCE MINIMIZATION
This section defines influence minimization in complex real-world networks using the Linear Threshold (LT) model. The physiological changes, associated with emotion, change the energy level of nodes. The LT model analyzes the emotional changes. The goal of the model is to minimize the influence of awful information. The activated nodes that spread the rumor or awful information are identified. The proposed model aims to minimize the number of activated nodes at the final stage of the information diffusion under node surveillance. In our survival model, we assume that the probability of an activated node is the sum of all probabilities of the previously activated nodes. The proposed methodology intends to block the active nodes in the previous state t n−1 .

A. PRELIMINARIES
We assume that the complex network be a multilayer network, mathematically defined as a quadruple, where V is the set of vertices, and L = {L d a } a=1 is the set of elementary layers [24], [25], [26]. The layer L d a in which a is the total number of layers and d is the aspect of a network. The notation L = {L d a } a=1 stands for a stack of layers in the networks described through d aspect. A multilayer network is formed by integrating a set of all possible combinations of elementary layers L 1 ×··×L d . The notation V M denote the set of vertices in each layer in the network, The set of edges between layer α and β is expressed in Eq. 2.
A matrix is the elementary representation for the graph in network science. Since the multilayer matrix is the stack of layers, so an aggregation of the adjacency matrix of the individual layers is required for representing such a complex networks [27]. We consider the edges encompass pairwise connections between all possible combinations of nodes and layers; i.e, a node u in layer α can be connected to any node v in any other layer β. The interconnection between layers, the edges between α and β is expressed as E αβ , The corresponding matrix representation, A The Supra Laplacian matrixL = D − A where D is the diagonal matrix in which the elements are the degree of each nodes, and A is the adjacency matrix [28], [29]. Some properties like coupling strength in multilayer networks and diffusion methods are discovered with the help of the Supra-Laplacian matrix [30].

B. PROBLEM STATEMENT
Consider a directed multilayer network, G, where (V live ) 0 is the initial infected set of vertices, and the optimal subset of infected nodes be σ ((V live )|V ). We aim to minimize the infection spread across the network by temporarily blocking m infected nodes. The final infected volume, denoted as (V live ) * is computed as, where σ ((V live )|V \B) is the number of infected nodes from V after ignoring B, set of blocked nodes. The (V live ) is computed by a maximization function where B is computed based on entropy ( H ). A Linear Threshold (LT) model is scaled to a multilayer network [31]. The parameter θ denotes the threshold value as θ i = (0, 1) and the function W assign an influence weight, W (i α , j β ) ∈ (0, 1) to each edge(i, j) ∈ E αβ , the edges are either in a layer or between the layers. The weighted supra-adjacency matrix represents the LT model. The inneighbor set and out-neighbor set of each node are defined. An edge(i α , j β ) represents that the node i from the layer α influences node j in the layer β and i, j belongs to either in the same layer or in different layer [32]. The diagonal elements are represented as: diag{[θ 1 , θ 2 , . . . , θ n ]} The propagation of negative information causes the social contagion process. People's interest in such information tends to shrink with time. When a node receives a rumor, the energy level increases [33]. The difference in energy level is measured in terms of entropy, computed by the equation: where T is change in information diffusion, c is a centrality constant and m p is m-PageRank that shows the energy distribution capability [34], [35]. Since the node is activated, it transmits the information to its neighbors. Through the proposed work, we aim to identify the set of source seeds, S that is blocked when there is entropy hike, H . Any path from i α to j β is to be blocked. Therefore further spreading of information is restricted. The instance graph does not have a path from node i to node j. Assume that (V live ) time=0 be the active set of nodes at time t = 0, and the node changes from inactive state to active state in every step. Then, The function continues until all nodes reaches active state. Definition 1: Given a multilayer network G, the set of Seed nodes S exists such that S ⊆ V , and S = {S t ∪ S t+ t } and |S| ≤ k, 1 ≤ k ≤ n,is a set of target nodes V \ S. Definition 2: Given a multilayer network G, the set of Target nodes X exists such that X ⊆ {V \ S}, and and has highest H . The influence minimization is achieved either by blocking the nodes from further transmission by the annealing process or removing the link between infected nodes [36]. Removal of a link causes isolation of a subset of nodes completely. The influence minimization process depicted in Fig. 2 helps to understand well.

Definition 3: Given a multilayer network G, set of active nodes, p is the set of influencers in the network and m be the set of infected nodes and |p| ≤ |m|.
An edge removal is possible only when the infected node is an influencer [37], [38], [39]. If the blocked time exceeds a threshold, the node either leaves the social networks or intimates its status. The minimization process has to unblock or retain the link after a certain time-stamp. The time-stamp depends on how long the negative information lives in the system. The tolerance for latency is a significant issue in the influence minimization problem.

C. SOLUTION APPROACH
We propose a greedy algorithm for influence minimization. We find the optimal solution to either maximize or minimize the threshold. The algorithm works on the LT model. The entropy of the node changes when a piece of new information is received at the node. The energy of information spreading is computed by considering the entropy change and the node's position. Since our objective is to regulate the spread of information, we compute the influencing capability of a node.
The centrality metric helps to find the most influencing node, and the set of nodes can influence others in the network. When nodes influence other nodes, energy distribution occurs. The Equation 5 helps to find a node's energy to spread information to neighboring nodes. We apply the minimization technique to optimize the spread. Our proposed process blocks the significant nodes with the highest capability to share information further across the network. The result of the entropy change following the change in centrality on real-world networks is captured in Figure 1.
Since the intended network is a multilayer network, m-PageRank is most suitable for finding the centrality metric for each node in different layers. We use m-Page Rank as a metric for finding the most significant nodes for information diffusion. The formation of seed sets and malicious nodes depends on the propagation probability.
The algorithm picks an edge randomly with the weight function, p : E m → [0, 1]. Let (i, j) be the live-edge with weight W ij , all live-edge form a subset E live . Each node that connects edges in the subset E live has a threshold. Let the threshold of j be θ j , the node, j, is infected if W ij ≥ θ j . Let B be the subset of nodes blocked for a duration, t. Let σ ((V live )|B) be the expected number of infected node that (V live ) can infect under LT mode. The e-IM function selects a subset of blocked nodes B with a size equal to the size of the seed set. The e-IM function also minimizes the σ (V live |∅) − σ (V live |B).
The Algorithm 1 illustrates the minimization of the influence spread by maintaining (m-k) policy. An m-PageRank centrality metric identifies the spread maximization and forms a subset of influential nodes in the given networks.
• m-PageRank. It works on a biased random walk.
A node's rank is computed by considering the interconnection strength, not only from the same layer but also from the neighboring layers. An inbound link into a VOLUME 10, 2022 for w ← 1 to n do 4: for i ← 1 to k do 5: if for j ← 1 to m do 12: 13: end for 14: Return(V * live ) 15: end procedure node disturbs the steady state of each node and changes the centrality value. The versatility of the PageRank makes use for ranking by adding a personalize vector: layer rank, number of active spreaders in a layer. Thus m-PageRank accepts the links from the interlayer and computes them by considering the layer weight-the rank of each layer act as a significant element in the computation of centrality. The strength of the inbound link makes the node more influential in the networks. The PageRank combined with intra-layer links make the node distinguishable by their centrality values. The performance of this metric on the proposed algorithm is analyzed through experiments, and its analysis is shown in section V. The first policy states that the set of live nodes is formed by performing the set difference operation between live nodes and seed.
Influence minimization problem is further optimized by ((m-k)-p) policy. The ((m-k)-p) policy also takes a multilayer complex network as input and produces a subset of nodes that block to minimize the negative information spread across the networks. The algorithm finds live nodes V live from the network graph. The m-PageRank centrality helps to form the set of seed nodes. The input to the The algorithm 2 is the set of live nodes identified by the algorithm 1 for further optimization. The algorithm constructs a DFS tree and chooses a random node u. Initialize P ← ∅Ø; 3: for i ← 1 to n do 4: while j = n do 5: construct a DFS_Tree 6: consider an edge (u, v) ∈ E live 7: for p ← 1 to m do 8: if (v, p) ∪ (p, u) / ∈ E then 9: continue 10: end if 11: end for 12: P * ← P ∪ u 13: end while 14: end for 15: 16: return(V * live ) 17: end procedure Any path, if there exists, is looked for to reach u through p selected nodes. If any path exists, the node u adds to a previously defined set, P. The process continues to discover all the nodes in the live graph. The algorithm performs a traversal through the identified live node iteratively to identify the influencers in the network. When the Live Seed nodes are identified to be the influencers, the algorithm blocks them temporarily and iterates further.
The Algorithm 1 has a running time complexity of O(m 2 k) where m is the cardinality of the live nodes and k is the size of the seed set. The algorithm has m iteration in the worst case.
The ((m-k)-p) policy obtained optimized result over ((m-k)-p) policy by blocking the influencers from the set of live nodes. The subset of blocked nodes is formed by the intersection of live nodes and seed using the Algorithm 1. It is further optimized by the union of blocked nodes with a set of influencers to minimize the diffusion across the networks. An algorithm to incorporate the (m-k)-p policy is illustrated in Algorithm 2.

IV. EXPERIMENT AND VALIDATION
The competency of our proposed e-IM algorithm is validated through experiments. We have developed a python based simulator using Python 3.10.0, and NetworkX 2.4 [40]. The proposed methodology compares with major algorithms to show effectiveness. A series of experiments are carried out using three real networks: NetScience [41], Hep-Th [42], and Facebook [43], that are used for social network analysis commonly. The dataset details are exhibited in Table 1.
The proposed method is compared with two of the stateof-the-art algorithms, DRIMUX and DGMT. The results in  • DRIMUX: It is a rumor spreading minimization model called dynamic rumor influence minimization with user experience [44]. It reduces the rumor propagation by blocking a certain subset of nodes in the network. if the block period exceeds the time limit, affect the utility of the whole the system DRIMUX considers the characteristics of the rumor and its experience of the user.
• DGMT: It applies a linear restriction on seed set and find the optimal solution for minimizing diffusion using an integer linear programming problem [45].

A. PERFORMANCE PARAMETER
The system's objective is to block k susceptible nodes from the set of live nodes by identifying the central node as per m-PageRank ranking metric. Set of seed nodes (k = |seed|) are to be blocked from set of live nodes, m = |V live |. Let p be the set of influencers which connects two or more connected components, where p ≤ k. The optimal set of live nodes is formed by deducting p nodes and minimize the spread rate by isolating graph components.

V. RESULT
The algorithms are experimented with aforementioned datasets. The minimization factor, (µ) is defined as the ratio of blocked nodes to the number of live nodes. The algorithm gives an optimized solution with minimal spread while applying the ((m-k)-p) policy. The higher the value for µ, the better the minimization capability for the algorithm. The comparison of µ between e-IM, DRIMUX and DGMT is shown in Table 2. Comparisons with DRIMUX and DGMT are shown in Fig. 3. We observed that as the blocked nodes increase, the number of nodes receiving negative information decreases. The algorithm runs iteratively by continuously blocking the infected nodes until no further new susceptible nodes are found. Fig. 4 shows the correlation between blocked nodes and influence spread on Hep-Th and NetScience. As the number of blocked nodes increases, information diffusion decreases. If the seed set contains any influencer, the system is further improved by blocking such nodes for a while to maintain homogeneity.
The experimental evaluation of the real dataset was compared with existing algorithms to show the significance of the research. The comparison is tabulated and displayed in Table 2, which shows that the proposed minimization policies outperform the state-of-the-art influence minimization algorithms. Our experiments proved that the proposed minimization policies are suitable for many social network applications.

VI. CONCLUSION
This article formulates (m-k) policy to minimize the influence spread across the network. We identified the influencer nodes in the network through the metric, m-PageRank. Blocking such nodes minimizes the awful information from spreading further. The algorithm is optimized further by adding ((m-k)-p) policy that isolates nodes from further propagation. A python-based test-bed is developed and experimented with accurate networks. The results support our claims. The work finds its application in the current pandemic environment. The approach may be implemented as an effective method for isolating people from any viral diseases.