Trace Me If You Can: An Unlinkability Approach for Privacy-Preserving in Social Networks

Privacy in social networks has been a vast active area of research due to the enormous increase in privacy concerns with social networking services. Social networks contain sensitive information of individuals, which could be leaked due to insecure data sharing. To enable a secure social network data publication, several privacy schemes were proposed and built upon the anonymity of users. In this paper, we incorporate unlinkability in the context of weighted network data publication, which has not been addressed in prior work. Two key privacy models are defined, namely edge weight unlinkability and node unlinkability to obviate the linking of auxiliary information to a targeted individual with high probability. Two new schemes satisfying these unlinkability notions, namely MinSwap and $\delta $ -MinSwapX are proposed to address edge weight disclosure, link disclosure and identity disclosure problems in publishing weighted network data. The edge weight is modified based on minimum value change in order to preserve the usefulness and properties of the edge weight data. In addition, edge randomization is performed to minimally modify the structural information of a user. Experimental results on real data sets show that our schemes efficiently achieve data utility preservation and privacy protection simultaneously.


I. INTRODUCTION
In recent years, social networks such as Facebook, Tik-Tok, WeChat, LinkedIn, Netflix, Google and Instagram have gained tremendous popularity as these networks support a variety of attractive features and services that help to connect the people. Rapid growth of such networks generates huge amount of sensitive individual data, which are valuable for research and development. Network data are digitally collected and the aggregated data are often published, shared or sold to third parties (such as analytics companies, marketing companies or commercial data brokers) for further analysis. Some applications of network data include analyzing the formation of communities [1], marketing and advertising [2], [3], opinion modeling [4], network information spread [5], criminal analysis [6], [7], shortest paths analysis [8]- [11] and spanning trees [12], [13]. Privacy in the applications of Ad-hoc social networks [14] and non Ad-hoc social networks [15] are also gaining the public concerns. There are laws and guidelines to restrict the types of publishable data and The associate editor coordinating the review of this manuscript and approving it for publication was Kuo-Hui Yeh . agreements on the usage and storage of network data, such as General Data Protection Regulation (GDPR) [16], [17] and Personal Data Protection Act [18], [19]. However, privacy breach could still occur if the data are not released under a strong privacy scheme [20].

A. MOTIVATION
A typical data publishing scenario involves three parties: social network users, data publisher and data recipients, as shown in Figure 1. The data publisher is a trusted entity who collects information provided by the social network users and releases the collected data to third party recipients, such as research institutes, companies and public communities. The trust relationship is not transitive to the data recipients. Some data recipients (adversaries) are not honest and attempt to infer sensitive information of a user from the published data.
Therefore, a privacy breach could occur if the personal information that a user intends to keep private, is disclosed in a published data to an entity who is not authorized to access or have the information. In this paper, we address three privacy leaks, namely: The first scheme called MinSwap is proposed based on edge weight unlinkability to address edge weight disclosure by breaking the association between the weights and its values. The edge weight data are modified based on the idea of data swapping to fully preserve its statistical properties, including the distribution, mean, standard deviation and other statistics.
We propose another new scheme called δ-MinSwapX based on node unlinkability to address edge weight disclosure, link disclosure and identity disclosure simultaneously. The edge weight data are perturbed to other near values from the same data set to preserve the shortest path length. Randomization which includes selective edge deletion and random edge and node addition are deployed to prevent identity disclosure and link disclosure that rely upon edge weight and structural data as the adversary's background knowledge. Selective edge deletion allows the data publisher to minimize the distortion on network structure as the important edges can be well-preserved in the published data. The randomness is inserted during the edge and node addition phases to increase the uncertainty of an adversary in reidentifying the true identity and link, regardless of the background knowledge an adversary may possess. This efficiently protects a user against privacy leaks as auxiliary structural data provide little useful information about the true nodes in the published data.
In summary, we make the following contributions: 1) We define edge weight unlinkability and design a greedy algorithm, namely MinSwap to generate anonymized data that resist edge weight disclosure. 2) We define node unlinkability and design a greedy algorithm, namely δ-MinSwapX to generate anonymized data that resist identity disclosure, link disclosure and edge weight disclosure simultaneously. 3) We deploy data swapping, perturbation and randomization to minimally modify original network data to enhance the data utility preservation. 4) We provide a thorough analysis on the anonymization strength of the proposed work and present extensive experiment results on scalable real data sets to validate the efficiency of our schemes.
The rest of this paper is organized as follows. Section II discusses the research scope of our work. Section III gives a brief review of related work associated to privacy-preserving edge weight anonymization and structural anonymization schemes in social networks. Section IV defines two new privacy models, namely edge weight unlinkability and node unlinkability. Section V and VI elaborate on the proposed schemes for anonymizing network data. Section VII presents an extensive evaluation of the proposed algorithms using scalable real data sets in terms of security, efficiency and utility. Finally, section VIII concludes the paper.

II. RESEARCH SCOPE
In this section, we discuss the problem setting of a weighted network data publication. We present a non-directed and VOLUME 9, 2021 weighted network model. We also define the capability of an adversary and how the adversary would utilize the auxiliary background knowledge to attack the privacy of users. In addition, we elaborate upon the desired privacy and utility objectives of the data publication.

A. NON-DIRECTED AND WEIGHTED SOCIAL NETWORK
We present a non-directed and weighted graph G = (V, E, W) using Figure 2 as an illustrative example. The nodes of the graph, V = {ν 1 , ν 2 , ν 3 , . . . , ν n } denote meaningful entities from the real world such as individuals, organizations and communities. An edge e i,j ∈ E is an association between two nodes ν i ×ν j ∈ V×V such as friendship, partnership, co-authorship, co-workership and transaction between any two entities.
A non-directed graph consists of edges that do not have a direction (for instance, a mutual friendship). In a weighted network, each edge e i,j is associated with a weight w i,j ∈ W which represents the strength of connection between nodes ν i and ν j , such as the communication frequency between individuals, degree of friendship, trustworthiness and transaction amount.

B. ADVERSARY's BACKGROUND KNOWLEDGE
An adversary requires some background knowledge to attack the privacy of a target user in the published network. In this paper, we assume that an adversary may possess partial or complete edge weight and structural information of some real-world target individuals.

1) Edge weight information. Value or weightage
attached to the edge, which represents the intensity and strength of the connection. 2) Structural information. The information about the neighbours of the target node and how these neighbours are being connected, which includes: a) Degree of node A, D A : The number of edges connected to node A. We focus on these two types of background knowledge as commonly deployed in the current literature [23], [26], [27], [32], [37]- [39], [50]- [57]. It is relatively less difficult to collect accurate edge weight information and structural graph of a targeted individual [32], [55], compared to other types of implicit information (such as eigenvector, betweenness and closeness centrality).

C. LINKAGE ATTACK
Linkage attack is one of the major privacy attack models in network data publication [15], where an adversary attempts to match the auxiliary background knowledge obtained from external resources to the published data in order to learn some useful information about a target victim. In Figure 3, linkage attack could be categorized as edge weight attack and structural attack according to the types of background knowledge summarized in section II.B. As the published data consists of edge weight and structural data only, other auxiliary information (such as the node label and edge label) provides very little additional information about the nodes in the published data. Figure 4 shows a naively anonymized network of Figure 2, where the identities of all nodes are hidden. However, it is insecure when an adversary learns that node X has two connections of edge weights 1 and 4, then X 's true identity (node 2 in Figure 2) is revealed. In some cases, edge weight and structural information are combined to reidentify the target. For instance, although node Y and Z in Figure 4 have similar degree, 1-neighbourhood graph and subgraphs (and thus invulnerable to respective structural attacks), these nodes can be distinguished if an adversary possesses additional background knowledge of edge weight data.

D. PRIVACY AND UTILITY GOALS
We consider the data publishing problem where a publisher attempts to release a secure anonymized version of G, denoted by G , to serve a variety of data analysis.
The published data is said to be privacy-preserved if an adversary cannot infer the identity, link and edge weight values of a network user from the released data with high probability. The user's privacy is protected by limiting the ability of an adversary to infer this information, given that the adversary has full access to the published data G and some available background knowledge.
Given an arbitrary query to an original database and its anonymized database, the outputs of query to both databases should be almost similar, that is, the difference between the outputs should be less than a parameter. A utility-preserved anonymized data could be produced by minimally modifying the edge weight data and network structure so that the published data remain accurate and meaningful in the data mining process. In this paper, we assume that the published data are utilized for several analyses, which include statistical analysis, shortest path length analysis and network centrality analysis [21], [23], [24], [29], [32], [42].

III. RELATED WORK
In this section, we present a comprehensive literature review on the topics related to PPDP in social networks. Particularly, we focus and discuss relevant structural and edge weight anonymization schemes that address identity disclosure, link disclosure and edge weight disclosure in social networks.

A. STRUCTURAL ANONYMIZATION
Structural anonymization schemes modify the structure of a network to prevent identity disclosure that is based on structural information as adversary's background knowledge and to address link disclosure. The schemes can be grouped under three main classifications: graph modification, clustering based method and differential privacy.

1) GRAPH MODIFICATION
Graph modification anonymizes a network by adding, deleting or switching edges or nodes in the original graph. Although similar techniques were deployed in the literature, these techniques are the basic tools used to generate different publishable graphs that satisfy different privacy and utility requirements. Graph modification can be further classified as randomization, which performs graph modification randomly and k-anonymization method, which performs graph modification to meet some desired constraints.

i) Randomization:
There are different randomization approaches proposed to protect the identity and link privacy of a user [33]- [35], [37], [39]. In [33], a randomization scheme was proposed to preserve the spectrum of a graph (the set of eigenvalues of the graph's adjacency matrix) which is important to some topological properties of the graph. [34] focused on the link privacy protection and presented a neighbourhood randomization scheme which randomizes an edge by restricting the randomization to the neighbouring nodes. Hence, the network structure could be preserved to a greater extent when the structural proximity of nodes is considered. A k-candidate anonymity [37] was proposed to tackle the node reidentification attack such that there exist at least k different nodes that match every structural query over the graph. [39] proposed Bernoulli distribution to modify the edges instead of random edge addition and deletion. Bernoulli trial is deployed to determine which edge should be added or removed from the network.
Randomization does not focus on the adversary's background knowledge as the sensitive information of a user in the randomized graph are protected through the random process that modified the graph. Thus, an adversary cannot utilize the structural information to reidentify an individual from the published data as the association rules between the background knowledge and the sensitive information are dimmed. Furthermore, the presence of link cannot be inferred with high probability as randomness is deployed in the published data. The confidence level in inferring the identity, link and sensitive information of a user is bounded by a privacy level, which is affected by the amount of randomization. The data utility after randomization can only be evaluated empirically.
ii) k-anonymization Method: In k-anonymization method, the proposed schemes modify the edges and nodes in the network to produce multiple indistinguishable nodes and edges with respect to certain privacy requirements. Different assumption of adversary's background knowledge leads to different expectation of privacy criteria.

a) Degree Based Anonymization:
A graph-anonymity model called k-degree anonymity was proposed in [52] to guarantee that there are at least k nodes with the same degree in the published graph. Meanwhile, a k 2 -degree anonymity [53] requires that for every node with an incident edge of degree pair (D A , D B ), there exist at least k-1 other nodes with the same degree pair in the published network. Degree of a node provides a limited structural information of a target victim. An adversary with such background knowledge is weak as the degree information can be modified easily by adding or deleting nodes and edges from the original graph. Although the schemes are invulnerable to degree attack, they are insecure against other stronger structural attacks.

b)
Neighbourhood Based Anonymization: A k-neighbour-hood anonymity model [54] was proposed to guarantee that there exist at least k indistinguishable nodes in the published graph, such that the 1-neighbourhood graphs of each of the k nodes are all similar. Moreover, [55] combined both conventional k-anonymity [58] and -diversity [59] in anonymizing the social network data, such that the published VOLUME 9, 2021 graph satisfies k-neighbourhood anonymity and contains at least different node labels. Hence, it renders stronger privacy level to the users.

c) Complete Structural Based Anonymization:
In [56], a k-automorphism was proposed to defend against reidentification attacks using the structural information of node, which include node's degree, 1-neighbourhood graph, subgraph and hub fingerprint. Hub fingerprint is the distance between a hub (a node with high degree exceeding the average degree of the network) and other nodes. The k-automorphism is a strong privacy model as it guarantees that there are at least k indistinguishable nodes in the network in terms of their structural information. Hence, an adversary cannot reidentify any individual with a confidence level of higher than 1/k using the structural information as background knowledge. In [38], a k-isomorphism was proposed to enhance the ability of k-automorphism in link protection. The scheme creates k isomorphic subgraphs through edge additions. Two graphs are said to be isomorphic if the graphs contain the same number of nodes and the nodes are connected in the same pattern.
k-anonymization method incurs unnecessary information loss when the privacy parameter k is high. More edge modifications are performed to achieve k indistinguishable nodes. This would significantly compromise the network properties as well as the data usefulness. If the privacy parameter is low, the schemes would provide insufficient privacy protection to the users. An optimized parameter is required to provide sufficient privacy and utility level. However, the computation of the optimized privacy parameter is shown to be NP-hard in k-anonymization [60]. Therefore, modification of k indistinguishable nodes with respect to the structural graph is practically infeasible due to the high cost and high computational complexity of finding an optimal solution to the algorithms, especially when the network is scalable.

2) CLUSTERING BASED METHOD
Clustering based method involves the process of clustering nodes and edges into groups that are called supernodes and superedges, subject to some constraints on the characteristics of the nodes and edges [31], [45], [51], [57]. This approach achieves high privacy level. However, it provides a low utility as the data are changed extensively and becomes useless for certain studies. The graph is shrunk post-anonymization and most of the local structures are difficult to be analyzed.

3) DIFFERENTIAL PRIVACY
Differential privacy [61] provides a formal privacy guarantee to the nodes of a database, regardless of the auxiliary information available to an adversary. It guarantees that an adversary in possession of the released results is not able to determine the existence of an individual in the original database. Therefore, the released results provide meaningful interpretations about the underlying population statistics of the database but obscure the presence of any individual. The notion of differential privacy was adapted to network data and several new privacy definitions were formalized.
In edge differential privacy [40], two graphs G and G are said to be edge neighbours if G can be obtained from G by deleting or adding k arbitrary edges from G. Hence, edge differential privacy guarantees that an adversary is not able to infer the existence of a particular edge in an original database G with high probability. A local differential privacy model was proposed to preserve community structure information of a centralized and decentralized social graph with higher accuracy [41], [42].
In node differential privacy [40], two graphs G and G are said to be node neighbours if G can be obtained from G by deleting or adding a single node including all its adjacent edges from G. Hence, node differential privacy assures that an adversary is not able to infer the existence of a target node in an original database G with high probability. Research on node differential privacy mainly focused on improving the accuracy of publishing the degree distribution of a graph [43], [44].
A degree-differential privacy graph generation model with field theory was presented to preserve the true edges of a graph [46]. Differential privacy was deployed to add Laplace noise to the nodes' degree. The edges are then reconstructed using the proposed field theory model. A fake edge between existing nodes is generated with high probability when the interaction force between the nodes is relatively large. Hence, the impact on the structure of the graph is reduced.
Meanwhile, a random matrix approach that achieves differential privacy was proposed to publish eigenvector of a graph [47]. Two Gaussian random matrices are added to the adjacency matrix of a graph to introduce a small amount of random projection and random perturbation. Then, the projected and perturbed matrices are released as published data.
A differential privacy scheme based on graph abstraction models was proposed [48], which utilizes the dK-1, dK-2 and dK-3 series. The dK-1 level represents the degree distribution, the dK-2 level is the joint degree distribution and the dK-3 level contains the number of wedges and triangles. A differentially private noise is added to the dK-2 level of an original graph to obtain a perturbed dK-2 level, which is then used to compute the corresponding new dK-1 and dK-3 levels. Hence, a new graph is generated by combining the structural information of the three dK levels.
Differential privacy, randomization and clustering were combined to propose a PBCN (Privacy Preserving Approach Based on Clustering and Noise) [49]. The nodes are clustered into groups based on the similarity of the degree, followed by addition of Laplace noises to the degree sequence of each group. A new graph is reconstructed using the perturbed degree sequences. However, the true nodes with low degree are likely to be deleted and a number of fakes nodes are injected into the graph for fake edge addition.
Differential privacy is a strong model as it does not depend on the background knowledge of an adversary. However, the main drawbacks of differential privacy model are presented on the utility aspect. Randomization and k-anonymization methods release a privacy-preserved graph which can be studied in place of the original database, to allow a broader range of analysis. Nevertheless, the released results under a particular differential privacy model can only serve a specific query. Furthermore, differential privacy is highly inaccurate to queries with high sensitivity. The sensitivity of a query is the largest possible difference that one data point can effect on the result of that query, for any data set. Instances of high sensitivity queries include the computation of clustering coefficient, path length distribution, betweenness distribution and closeness distribution.

4) OVERALL DISCUSSION ON STRUCTURAL ANONYMIZATION
The privacy protection is guaranteed in the structural anonymization schemes above. However, important nodes and edges are not guaranteed to be preserved in the published data. Our work fills the gap by proposing a new randomization technique that incurs a lower utility loss. This is achieved by considering edge deletion based on the importance of edges in the original network, such that essential edges are preserved in the published data. Hence, this may preserve network centrality to a greater extent.

B. EDGE WEIGHT ANONYMIZATION
The edge weight anonymization schemes modify the edge weight data to prevent edge weight disclosure and identity disclosure, which can be categorized under three classifications: perturbation, differential privacy and generalization.

1) PERTURBATION
Perturbation is commonly used to modify the edge weight values to prevent the edge weights from being utilized for node reidentification while at the same time, maintain the shortest path characteristic between node pairs in the network. A pioneer work was presented in [21], which developed two privacy strategies for different natures of network. The first one is a Gaussian Random Multiplication Perturbation (GRMP) developed for dynamic networks, which adds Gaussian noise to the original edge weights to achieve shortest path preservation. However, the edge weight is power-law distributed in most real life scenarios. Hence, the introduction of Gaussian noise may not guarantee the desired privacy and utility preservation of network data if the edge weights are not normally distributed. The second strategy is a greedy perturbation algorithm developed for static networks. However, it is highly possible for an adversary to reidentify the correct individual by linking the edge weight information to the associated node as some edge weights are unmodified. A linear programming model was proposed to anonymize the edge weight while preserving the properties of graph that are expressible as linear function of the edge weight [22]. The edge weight is modeled as a matrix and the anonymization is formulated as a linear optimization problem. However, there can be no feasible solution to the optimization problem for large systems which demerits the practicability of this method in scalable social networks.
A k-anonymous path privacy model was presented to protect the sensitive shortest path between two nodes in a weighted graph [23]. It prevents the true shortest path from being revealed by ensuring that there exist at least k shortest paths with the same shortest path distance. Thus, this limits the sensitive path disclosure to a maximum probability of 1/k. [24] extended k-anonymous path model and modify the edge weights by considering network centrality such as PageRank and nodes' degree. As the edge weights can only be modified once, k-anonymous path privacy cannot be guaranteed when multiple node pairs are involved.
k-anonymous path was further improved in [25] with additional background knowledge of nodes' degree on the shortest path. A (k 1 , k 2 )-shortest path privacy was proposed to ensure that there are at least k 1 indistinguishable shortest paths between the source and target nodes. In addition, for the nonoverlapping nodes on the k 1 shortest paths, there exist at least k 2 nodes with same node's degree and lie on more than one shortest path. There are more restrictions on the modification of edge weight, which lead to a greater information loss than that in [23].
The work of [26] and [52] were combined to propose a k-weighted-degree anonymous model [27]. The edge weights and nodes' degree were assumed as an adversary's background knowledge. This model ensures that in the anonymized graph, there are at least k indistinguishable nodes having the same degree and the distance between the weight sequence of those nodes is within a predefined constant. After obtaining a new degree sequence that is k-degree anonymous using the proposed algorithm in [52], new edge weight values are assigned to the new created edges. The edge weights are adjusted using a linear programming model based on three distance functions (absolute distance, relative distance and rate distance) to ensure that the edge weights generated are nearly valued to other edge weights associated to the node.

2) DIFFERENTIAL PRIVACY
Differential privacy is a relatively new approach to modify edge weight data by adding Laplace noise. It guarantees that the statistical properties of a database is insensitive on a record change. Thus, the output probability of the same results will not change significantly, whether a record is in the data set or not. [28] deployed differential privacy to preserve the privacy of social recommendation. It first clusters the nodes into supergroups, then Laplace noise is added to the average edge weight of each supergroup to modify all edge weights.
In [29], differential privacy was applied to protect the edge weights of social networks and preserve shortest path. The scheme assumed edge weight sequence as an unattributed histogram. Barrels with the same count are merged into one group to reduce the amount of injected noise. Then, Laplace noise is added to edge weight to guarantee k-indistinguishability between groups so that the number of groups with the same amount of barrels is at least k.
A Variational Bayes-Weighted Network Differential Privacy (VB-WNDP) scheme was proposed with consideration of the structural role [30]. VB-WNDP establishes a probability model of weighted network through Variational Bayes. Noises are added to the parameters of the probability model instead of the edge weights to enhance the data accuracy.
Differential privacy is a strong privacy model as it makes no assumption about the background knowledge of any potential adversary. However, it generates inaccurate results to queries with high sensitivity (for example, kurtosis and correlation). Furthermore, the original data could be estimated with high accuracy from repeated queries.

3) GENERALIZATION
A generalization approach was deployed in [31], where the edge weights are recalculated as the ratio per total edge weight. Particularly, the new edge weight provides very little information about the original network. [32] adopted generalization to generalize the edge weights in an edge group into a range of values. For example, if edge weights 3, 4, 8 and 10 are categorized into a group, then range of values [3,10] is reassigned to the four edge weights. The larger the range, the higher the information loss.

4) OVERALL DISCUSSION ON EDGE WEIGHT ANONYMIZATION
While anonymity has been addressed in the schemes presented, the aspect of unlinkability has not been considered. The schemes discussed do not consider the weight linkability property of network data as the association rules between the original value and the published value are retained in the released data. Hence, the published data leak some useful information of a user and the noise injected could be estimated, provided the association rules are clearly defined to an adversary. Our work fills the gap of the literature by addressing unlinkability in a social network. Unlinkability requires that an adversary cannot sufficiently infer the association between the background knowledge of an adversary and the sensitive information of a user. Therefore, no auxiliary edge weight data could be utilized to infer the original edge weight data and the identity of a user with high probability.
From the utility aspect, the aforementioned work were not designed to preserve the statistical properties of original data such as the distribution, mean and standard deviation. Our work adds to the design of a new edge weight anonymization scheme that fully preserves the statistical properties of a data set based on the idea of data swapping.

IV. EDGE WEIGHT UNLINKABILITY AND NODE UNLINKABILITY
In this section, we present the definition of some key terms and notation used in this work. We then define edge weight unlinkability and node unlinkability as two new privacy models in weighted social networks. Edge weight unlinkability prevents the inference of true edge weights of a user while node unlinkability prevents the linkability of edge weight information to its associated users in the original data.
A. NOTATION

B. EDGE WEIGHT UNLINKABILITY
We define edge weight unlinkability as below.

Definition 1 (Edge Weight Unlinkability):
Given an edge weight w ∈ W with value X in an original network G, w is said to be unlinkable if w is perturbed to w with value Y in a published network G , where X = Y and there does not exist an injective function: f (Y ) → X that maps value Y in the published data to value X in the original data. An anonymized data is said to be edge weight unlinkable if all edge weights in the perturbed network G satisfy edge weight unlinkability such that the perturbed edge weight value does not equal to the original edge weight value for all edge weights in weight sequence and there does not exist an injective function f between the original and published data. In mathematical notation, w p = w p , ∀w p ∈ W, ∀w p ∈ W , ∀p = 1, 2, 3, . . . , m and f (Y ) → X is not an injective function.
Here, we provide the proof that edge weight unlinkability addresses edge weight disclosure.

Proposition 1:
Suppose an adversary possesses full access to a published data that satisfy edge weight unlinkability, the adversary cannot infer the true edge weights of an arbitrary node in the published data with high probability.
Proof: From the definition of edge weight unlinkability, the mapping function f between the original data and the published data is not injective. This implies that w 1 = w 2 when f (w 1 ) = f (w 2 ). That is, different original edge weight values are mapped to the same published edge weight value. Furthermore, w p = w p implies that w p could be selected from W -{w p }. Hence, an adversary cannot sufficiently infer the relationship between the original data and the published data as the association rule is not well-defined. In real world scenario, the size of W -{w p } is large. This prevents an adversary from making defined estimation on the original edge weight with high probability. This completes the proof.
We further evaluate the probability of edge weight disclosure in section VII.

C. NODE UNLINKABILITY
We define node unlinkability as below.
Definition 2 (Node Unlinkability): Given a node a with associated edge weight sequence W(a) in G and W (a) in G , the node is said to be unlinkable if ∀w ∈ W(a) ⇔ w ∈ W (a) and there does not exist an injective function f mapping an original value X to a new value Y . An anonymized data is said to be node unlinkable if all nodes in G satisfy node unlinkability such that ∀v ∈ V ∧ ∀w ∈ W(v) ⇔ w ∈ W (v) and there does not exist an injective function: f (Y ) → X that maps value Y in the published data to value X in the original data.
The edge weight data are modified such that the associations between the edge weight values and its nodes are broken. Node unlinkability implies edge weight unlinkability but not vice versa. The proof is direct from the definition and is omitted. Hence, node unlinkability addresses edge weight disclosure.
Here, we prove that node unlinkability addresses identity disclosure that relies on edge weight as background knowledge. Particularly, we prove that there does not exist a mapping function that links associated edge weights to its corresponding node in the perturbed data as shown in proposition 2. Moreover, no linkage attack is possible to reidentify a target node in the published data with high probability using edge weight information as background knowledge, as proven in proposition 3.

Proposition 2:
Given there exists a function g that maps a set of edge weights, W(a) to a node a in an original data, such function g does not exist in a perturbed data that satisfy node unlinkability. Proof: We prove by contradiction. Given that ∀w ∈ W(a) are associated (mapped) to a node a ∈ V, we have a function g such that g(w) → a. The existence of the function g indicates that node a is associated with some edge weights w. First, we assume that such function g exists in the perturbed data W (a). However, based on the definition of node unlinkability, ∀w ∈ W(a) ⇒ ∀w / ∈ W (a), we know that there does not exist a function g that maps w ∈ W(a) to the node a in the perturbed data as all the associated edge weights of node a are modified such that w / ∈ W (a). Here, we have arrived at a contradiction where our original assumption (function g exists in a perturbed data that satisfy node unlinkability) could not be true. This completes the proof.

Proposition 3: Given an adversary possesses a complete edge weight information of a known target node a that exists in the network, the adversary fails to reidentify correctly node a in the published data that satisfy node unlinkability using a linkage attack.
Proof: There are only three possible outcomes of the reidentification. Let b denotes as an arbitrary node in the network and W (b) is the associated edge weight of b that are published. Outcome 1: There is no exact match of W(a) and W (b). Thus, ∀a, b ∈ V W(a) = W (b). ∴ No identity is inferred from the published data. Outcome 2: There is at least one exact match of W(a) and W(a).

From the definition of node unlinkability, W(a) = W (a). This implies that W (b) = W (a).
However, it can be deduced that: ∴ Although there is an exact match, a is not the true identity of node b. Outcome 3: There is at least one partial match of W(a) and W (b). Thus, ∀w ∈ W(a), ∀w ∈ W (b) ⇒ ∃w = w . However, from node unlinkability, we have ∀w ∈ W(a) ⇒ ∀w / ∈ W (a), which implies that w must not be an edge weight of node a in the published data. Hence, if w ∈ W(a) is an edge weight of node b in the published graph, then node a and b must not be the same individual. ∴ Node a cannot be reidentified by linking the edge weight information to the published data.
Therefore, although an adversary possesses a complete edge weight data of a known target node a, the adversary fails to correctly reidentify node a from the published data using a linkage attack. This completes the proof.
We further evaluate the probability of identity disclosure in section VII.

V. MinSwap
In this section, we design MinSwap which deploys edge weight unlinkability model to address edge weight disclosure. This scheme consists of edge weight modification via data swapping to preserve the edge weight distribution and therefore its statistical properties.

A. MinSwap ALGORITHM
MinSwap consists of two main phases, namely possible set determination and candidate selection. The edge weight data is perturbed by exchanging edge weight values among data tuples to achieve privacy preservation. Data swapping is a value-invariant method where the edge weight distribution is not changed during program execution, only the edge weight sequence is altered. It preserves the univariate statistics such as mean, variance, distribution and lower-order multivariate statistics such as covariance reasonably. A pseudo algorithm of MinSwap is presented in Algorithm 1. if N (Z p ) = ∅, then 5 {Calculate Prox(w) for each w ∈ Z p . 6 Determine max of Prox(w). 7 Find corresponding w. 8 Update N (Z T ). } 9 else 10 {Select a value w from Z p randomly.

11
Record w in U (Z T ). }

12
Assign the value w to w p .} 13 return W .

Possible set determination (line 1-3 in Algorithm 1):
During the first phase, possible candidates that satisfy edge weight unlinkability are determined from the original data. We denote Z T as the universal set containing all distinct values of W and N (Z T ) as the complete frequency set recording the frequency of values in W. The set Z T is separated into Z p ∪ {w p }. The new edge weight (qualified candidate) is selected from the possible set Z p to ensure that the anonymized data satisfy edge weight unlinkability.

Candidate selection (line 5-7 in Algorithm 1):
The new edge weight, w is selected from Z p based on the maximum of the proximity function (Prox(w)), which we define as: This function serves two purposes: it allows a nearer value to be selected (a lower information loss) and over the iterations in greedy Algorithm 1, one value could be mapped to different new values (injective function does not exist). This increases the uncertainty of an adversary in inferring the original edge weight value.
Example: An example is demonstrated using data in Figure 2. The original data, W, Z T and N (Z T ) are shown in Table 2  original data distribution. However, randomness is applied to provide a higher privacy protection. U (Z T ) is utilized to record the frequency of the overused w. This scenario only occurs when there is a dominant value in the original data (> 50% of the edge weight data). Edge weight data are big data with high diversity, which ensure the availability of Z T . Therefore, the existence of a solution for Algorithm 1 is guaranteed, regardless of the types of distribution of the original data.

B. DISCUSSION
It is not highly possible to reverse-engineer and discover the true edge weight as there does not exist an injective mapping between the published data and the original data. The association rules between the original data and the published data are not well-defined. From the utility aspect, the statistical properties of edge weight data are highly preserved as the anonymized data is a permutated version of the original data. This is a scheme designed for networks where the identity of nodes are public knowledge but the edge weight values are sensitive. No node anonymization is required and more utility could be preserved. Examples include research communities (ResearchGate and DBLP) and professional sites (LinkedIn and JobStreet).

VI. δ-MinSwapX
In this section, we design another scheme based on node unlinkability to address edge weight disclosure, link disclosure and identity disclosure simultaneously. This scheme consists of edge weight modification using perturbation and structural modification using randomization.

A. EDGE WEIGHT MODIFICATION
Perturbation is deployed to prevent edge weight disclosure and node reidentification using edge weight data as the background knowledge. It consists of two main phases, namely candidate set determination and minimal candidate selection.

Candidate set determination (Algorithm 2):
The universal set that contains all the edge weight values (Z T ) is separated into two mutually exclusive sets, namely candidate set (S) and associated edge weight set (W(a∪b)). Candidate set is the set that collects all the possible candidates, such that the candidate s ∈ S is not associated with node a and b. Candidate set is given by S(a, b) = {s|s ∈ Z T -W(a∪b) } = Z T \ W(a∪b). This is to ensure that S contains all the qualified candidates that satisfy edge weight unlinkability and node unlinkability, as shown in proposition 4.  W(b). 4 Find candidate set, S = {s|s ∈ Z T − W(a∪b) }.

Minimal candidate selection (line 5 in Algorithm 3):
Candidate is selected based on the least value change to guarantee minimum information loss, as shown in proposition 5. The new edge weight is computed as w p = min |s − w p | + w p , for ∀s ∈ S.

Algorithm 3 Edge Weight Modification Input: The original edge weight data, W(a,b)
Output: The perturbed edge weight data, W (a,b) 1 Determine the weight sequence, W. 2 Find candidate sets for all edge weights. {call algorithm 2 to determine the candidate set, S. 5 Assign w p = min |s − w p | + w p , for ∀s ∈ S. } 6 return W .

Proposition 4: Anonymized edge weight data postimplementation of Algorithm 3 satisfy node unlinkability.
Proof: From the definition 2, we have ∀w ∈ W(a) ⇒ w ∈ W (a) ⇒ ∀w / ∈ W (a) Given that Z T = S ∪ W(a∪b), this implies ∀w ∈ W(a) ⇒ ∀w / ∈ S Since the new edge weight is selected from S only, we have w ∈ W (a) ⊆ S, which means that ∀w / ∈ S ⇒ ∀w / ∈ W (a). ∴ ∀w ∈ W(a) ⇒ ∀w / ∈ W (a) Hence, node unlinkability is satisfied, which further implies edge weight unlinkability. This completes the proof.

Proposition 5: The information loss due to Algorithm 3 is minimum.
Proof: The information loss occurs during minimal candidate selection. At each iteration, the information loss is |w p − w p |. This is the noise injected. The total information loss is m p=1 |w p −w p |, where m is the number of original data. Since w p is selected based on the lowest value change (min |s−w p |), the total information loss due to Algorithm 3 is minimum. This completes the proof. Example: Using the same data set from Figure 2, an example is demonstrated using Algorithm 3 in Table 3

Discussion:
The perturbed data satisfy both edge weight unlinkability and node unlinkability. A user could not be retraced using edge weight data of the targeted victim as the associations between the edge weights and the nodes have been broken completely. From the utility perspective, we have minimally changed the data so that no excessive utility is loss due to the edge weight modification. If there does not exist a candidate set for a particular edge weight, then no new edge weight is published for that particular edge weight to secure the privacy of a user. However, this is not common in a scalable network which contains high diversity of edge weight values.

B. STRUCTURAL MODIFICATION
Randomization is deployed to modify the network structure to prevent node reidentification using structural data as background knowledge and to prevent link disclosure. It consists of four phases, namely edge deletion, fake node addition, fake edge addition and fake edge weight addition. A pseudo algorithm for structural modification is presented in Algorithm 4.

Edge deletion from existing edges (line 1-5):
Most of the prior work modified the graph based on the network centrality of nodes, which measures the influence of the nodes in a graph [24], [27], [31], [34]. In our work, the graph is modified based on the edge betweenness, which represents the importance of an edge in a graph. Edge betweenness is the number of shortest paths between pairs of nodes that run along an edge. An edge should not be removed if the edge is important in the network (high edge betweenness). A userdefined parameter δ is selected to remove δ of the existing edges in the ascending order of edge betweenness. A checker C is defined to record the change of structural information. If an edge has been removed, the associated nodes would be removed from C. VOLUME 9, 2021

Algorithm 4 Structural Modification
Input: The perturbed edge weight data, W (a,b) Output: Perturbed data that resist edge weight disclosure, link disclosure and identity disclosure 1 Define a parameter, δ, where 0 ≤ δ ≤ 1.
{Edge betweenness is calculated for each edge using original edge weight data. Denotes C as a checker set containing all nodes in the network.

4
Remove δ of the existing edges according to the ascending order of edge betweenness.

5
Record the edge (a, b) that has been removed.
* Remove the corresponding nodes a and b from C. 6 Add n Add fake nodes d into the network.
* n Add = max( |C| D mode , 1), where D mode is mode of degree. If there are at least one mode, choose maximum mode. 7 while C = ∅, 8 {Add edges between the remaining nodes c in C and the fake nodes d randomly until C is empty.
* Randomly select D mode of the remaining nodes c from C to form edges with a fake node d. 9 Record the edge (c, d) that has been formed.
* Remove the corresponding nodes c from C. Fake node addition (line 6): Some fake nodes d are added into the network to conceal the existence of a target victim in the published data. We determine the minimum number of fake nodes required to be added, n Add as follows: After line 5 of Algorithm 4, |C| represents the number of nodes with intact structural information. Hence, fake edges are formed to modify the structural information of the intact nodes. By considering the degree mode, D mode (degree that appears most often) in original network, all the fake nodes are likely to possess approximately the same degree as the majority nodes in the network (the presence of fake node is hidden). Furthermore, important nodes are preserved in the anonymized network as no true node is removed from the network.
Fake edge addition from non-existing edges (line 7-9): D mode of the remaining nodes c are selected randomly from C to form edges with a fake node d until C is empty. An empty C indicates that all nodes' structural information have been changed. Due to the randomness property of the newly added edges, an adversary could not confidently infer the structural properties of the target victim from the published graph. Furthermore, the structure of the graph is changed without compromising the important nodes and edges in the original network.
Fake edge weight addition (line 10-12): New weight is inserted to each fake edge, which is selected from candidate set of the original node c so that it satisfies node unlinkability, such that w ∈ S(c). Furthermore, to minimize the influence of these fake edges on the shortest paths of the original network, the new edge weight must satisfy one of the following conditions:

1) If there exists a set of values such that
δ-MinSwapX algorithm: The pseudo algorithm of δ-MinSwapX is a combination of Algorithm 2, 3 and 4. Follow from Table 3, Figure 2 and 5 show the network before and after edge weight modification while Figure 6, 7 and 8 show the network representation after each phase in structural modification, using δ = 0.25.

Discussion:
The overall edge modification algorithm is flexible and random. During the edge deletion process, a parameter δ is defined to determine the portion of edges in the network that should be removed. Important edges could be preserved as the edges are deleted according to the influence of edges (edge betweenness). During the edge addition process, the new edges are randomly inserted between the fake nodes and existing nodes in the original network to conceal the true nodes and edges. The δ is used to control the balance between privacy level and utility level. Higher value of δ implies more deletions of true link and thus the probability of link disclosure is reduced. This further implies the larger amount of distortion on the network structure.
Regardless of the value of δ defined, the structural information of all real nodes are modified post-implementation of δ-MinSwapX, include degree of node, degree sequence, subgraph and 1-neighbourhood graph of the real nodes. In addition, the edge weight value of the fake edges do not affect the shortest path in original network as the assigned values are slightly larger or equal to the edge  weights involved in that particular shortest path. Hence, the background knowledge of an adversary cannot be utilized to map to the published data for node reidentification as the edge weight and structural information are unlinkable and randomized.
We assume the parameter δ is available to both data miners and attackers [33], [37]. Although δ is known, the identity and link of a user is still protected through the edge randomization process. Note that if δ = 1, the published graph is a null graph (graph with no edge) with n+1 nodes, which clearly contains almost no information about the original graph. We intend to have δ to be a small value.
δ-MinSwapX is a scheme designed for networks where the identity, the links and the edge weight data of a user are sensitive information. Edge weight anonymization and structural anonymization are applied simultaneously to fully protect a network user. Examples of such networks include healthcare networks (Doctor On Demand, HelloMD and LiveHealth Online) and social media networks (Facebook, Twitter and Instagram).

VII. SECURITY AND PERFORMANCE ANALYSIS
In this section, we analyze the security level of our schemes theoretically and evaluate the performance of our schemes on three real data sets. All the experiments were conducted on a machine running Microsoft Windows 10 Home Single Language operating system, with an Intel Core TM i7-8750H 2.20 GHz CPU and 16GB RAM. All the algorithms were implemented in Python 3.7.

A. DATA SETS
Three real data sets are used in the experiments to study the performance of our schemes on the data quality in terms of security, efficiency and utility. We extracted a subset of Bitcoin Alpha, 1 Facebook Artist 2 and Youtube 3 to validate the proposed schemes. All the data considered were weighted and non-directed. The details of the data sets are shown in Table 4. The data size is comparable or larger than other relevant work [29], [32], [41]- [43].

B. SECURITY EVALUATION
In this paper, we proposed two schemes that address edge weight disclosure, link disclosure and identity disclosure. We compare our work with some related literature discussed in section III in terms of the privacy components and summarize the comparisons in Table 5. We further analyze the privacy level rendered by our work in proposition 1, 2, 3, 6, 7 and 8.
In the previous work as shown in Table 5, anonymity of edge weight is achieved through the process of data perturbation, k-anonymization, differential privacy and generalization, such that an edge weight could not be reidentified with high probability. As shown in our gap analysis, these schemes do not provide unlinkability feature to the edge weight data. In contrast, our schemes provide anonymity and unlinkability, such that there does not exist an injective mapping between the original and the published data. The edge weight protec- tion rendered in our schemes is higher since the distinct values in network data are diverse, as shown in proposition 6. All the edge weights are modified in MinSwap, providing a certain amount of node protection to the users.
In addition, δ-MinSwapX is proposed to provide additional link and node protection. Randomization is deployed to randomly modify the structural information according to the edge betweenness. Random fake edge addition hides the presence of true link in the published graph, and thus prevents the link disclosure, regardless of the background knowledge an adversary may possess, as shown in proposition 7. Furthermore, fake node addition hides the true nodes in the published data. The number of fake nodes and fake edges added are affected by the original data itself, which cannot be inferred by an adversary with high confidence level. Since the edges are randomized, the change of structural information is randomized. An adversary cannot simply map the auxiliary structural information to attack the published data to infer the link and identity of a user. Moreover, node unlinkability further guarantees that the edge weight information cannot be linked to its corresponding user in the published data, as shown in proposition 3. The probability of identity disclosure is proven in proposition 8. (1−δ)m+m Add . This is the same privacy level rendered in [34]. This completes the proof.   values with the reconstructed edge weights, in the worst case, there is an exact match.
c) Reconstruct the original structural graph from the published data and deploy linkage attack: Every node is subjected to either edge deletion or edge addition. The change of degree of node A is [−n 1 , 0) ∪ (0, n 2 ]. Given an adversary has a confidence level of σ that a node undergoes edge deletion, then the probability of inferring the correct degree, P(D A ) = σ n 1 + 1−σ n 2 . If there are n 3 nodes with D A in the published data, P(A) = σ (n 2 −n 1 )+n 1 n 1 n 2 n 3 . This completes the proof. Figure 9 demonstrates the running time of both MinSwap and δ-MinSwapX for δ = 0, 0.2, 0.4, 0.6 and 0.8. When δ = 0, only edge weight modification is applied on the data. When δ = 1, a null graph is obtained and hence the time taken is zero. Thus, 1-MinSwapX is not considered in the evaluations. The times taken for Bitcoin Alpha under both schemes are less than 52.5s.

C. EFFICIENCY EVALUATION
The time complexity (also commonly referred as computational overhead [62]) of MinSwap is O(m), which has a lower time complexity than other models in Table 6. The linear complexity implies the feasibility of MinSwap in anonymizing scalable data. The running time increases linearly with the data size. On the other hand, δ-MinSwapX has a higher time complexity of O(n 2 × log(n) + mn) due to the heavy computation of edge betweenness [63] during the structural modification. Nevertheless, the time complexity of δ-MinSwapX is lower than [27] and comparable to [46], [49]. Hence, δ-MinSwapX is usable for real world implementation.

D. UTILITY EVALUATION
We study a set of statistical aggregate queries, shortest path analysis and several important graph metrics, which were   similarly adopted in [21], [24], [32], [37], [38] to validate the utility of the anonymized graph.

1) STATISTICAL ANALYSIS
The impacts of MinSwap and δ-MinSwapX on the statistical properties of edge weight data are measured using the Kolmogorov-Smirnovb test and statistical aggregate queries. Kolmogorov-Smirnovb test at confidence level = 0.05 is utilized to verify the distribution preservation. As shown in Figure 10, 11 and 12, the distribution of Bitcoin Alpha, Facebook Artist and Youtube are preserved at 100% rate under MinSwap as all the original data are inter-swapped within the same data set. However, δ-MinSwapX does not preserve the distribution of all the three data sets as the edge weight data are modified to satisfy stronger privacy constraints with a minimal utility loss. As the value of δ increases, the level of distortion on the data distribution increases.    [32] in terms of the edge weight statistical properties preservation. Under δ-MinSwapX, most of the query results of each data set remain useful even when the value of δ increases. The maximum deviation of the query results is observed in the mode of Facebook Artist when δ is 0.6. Although δ-MinSwapX is not designed to preserve the statistical properties of the edge weight data, it shows an acceptable preservation rate provided that it guarantees a higher privacy level compared to MinSwap. This is a reasonable trade-off between privacy and utility level.
We further analyze the mean absolute error (MAE) of the statistical aggregate query results to measure the average difference between the original data and the published data. The mean absolute error is one of the common statistical metrics which is defined as follows: where y i is the simulated result, x i is the original result and n is the number of observed results. The smaller the MAE,  the higher the utility of the published data. As shown in Table 10, the MAE of the statistics of all three data sets under MinSwap is 0. This implies that there is no difference between the original data and the published data generated by MinSwap. On the other hand, Table 10 shows low MAE under δ-MinSwapX for all three data sets, except for mode and sample variance of Facebook Artist and mode, standard deviation and sample variance of Youtube. Nevertheless, the trade-off is affordable and within reasonable bounds as δ-MinSwapX assures additional privacy protection compared to MinSwap.

2) SHORTEST PATH ANALYSIS
The Dijkstra algorithm is used to determine the shortest paths between all reachable node pairs and evaluate the corresponding shortest path length. We consider the change of shortest path length of the most influential nodes as it is infeasible to evaluate the shortest paths of all reachable nodes in scalable  networks. Figure 13 shows the change of average shortest path length. All data sets show low change of average shortest path length, compared to the range of the edge weight values. This indicates an acceptable preservation rate of average shortest path length rendered by our work.

3) NETWORK CENTRALITY ANALYSIS
We used Cytoscape 3.7.2 as a tool to examine some important graph metrics to evaluate the information loss in δ-MinSwapX. MinSwap preserves the network structure as no structural modification is applied, and thus is omitted.
The clustering coefficient is a measure of the extent to which nodes in a graph tend to cluster together. The closeness is the inverse of average shortest path length. The normalized connectivity centralization measures the degree to which a graph resembles a star graph topologically. The average degree of a graph is the average number of edges per node in the graph. The diameter of a graph is the maximum distance between all node pairs. The radius of a graph is the minimum among all the maximum distances between a node to all other nodes. The network heterogeneity measures the variance of the degree distribution.
As shown in Figure 14, 15 and 16, the number of fake nodes added decreases as δ increases. Furthermore, the number of fake nodes added is random and depends on the original data itself. All the original nodes are preserved in the published data.
In Figure 17, 18 and 19, as δ increases, the total number of edges, true edges and fake edges added decreases due to the increasing number of edges deleted. Note that the number of    fake edges added is random and depends on the data itself. Regardless of the value of δ (0 < δ ≤ 1), all the structural information of a node are modified. The higher the value of δ, the larger the amount of edge modification and hence the higher the privacy level rendered.
As δ increases, the edge deletion process compensates the effect of edge addition, which eventually modifies the original graph into a null graph. Therefore, the clustering coefficient, closeness, normalized connectivity centralization and VOLUME 9, 2021    average degree decrease as shown in Figure 20, 21, 22 and 23. Nevertheless, most of the clustering coefficient, closeness, normalized connectivity centralization and average degree remain accurate for 0 < δ ≤ 0.8.
In Figure 24, the network diameter changes steadily and slightly as δ increases. The network radius of Facebook Artist is preserved for 0 < δ ≤ 0.6, as shown in Figure 25. Furthermore, the network radius of Bitcoin Alpha and Youtube    fluctuates with large magnitude as δ increases. This indicates that the network radius is not preserved in both data sets. In Figure 26, the change of network heterogeneity of all three data sets is consistent and minor over the value of δ. Therefore, the preservation of network centrality is relatively high and δ-MinSwapX can be efficiently deployed in scalable real data to provide a high privacy level with a low utility trade-off.

4) DISCUSSION
Most of the existing schemes apply to only unweighted social networks, which do not consider edge weights [33]- [49], [51]- [57]. A weighted graph is a generalization of the unweighted graph. Therefore, our schemes are more practical, such that the proposed schemes provide higher privacy level to the users, in terms of edge weight, link and identity protection by rendering unlinkability in a social network. From the utility aspect, MinSwap preserves the statistical properties of edge weight data at rate = 100%. Regardless of the value of δ defined, all the structural information of each node are changed, providing a considerable amount of protections to the users. Furthermore, the average shortest path length and network centrality are well-preserved, considering the degree of privacy protection provided.

VIII. CONCLUSION
In this paper, we studied the problem of privacy-preserving weighted social network data publication. Particularly, our work adds to the design of two secure anonymization schemes based on two new privacy models to efficiently address edge weight disclosure, link disclosure and identity disclosure. Edge weight unlinkability and node unlinkability are defined to address sensitive edge weight disclosure and node reidentification that rely upon edge weight data as the background knowledge. In addition, edge randomization is deployed to modify the structure of a graph to protect the link and identity of a user against structural attacks. The privacy-preserving ability of our work is evaluated extensively. The empirical study shows that our work maintain high data utility while protect the privacy of users simultaneously.
Overall, we re-emphasize that our work provides the following unique features which are not rendered in other work: 1) Our schemes address three existing privacy problems, namely edge weight disclosure, link disclosure and identity disclosure by achieving anonymity and unlinkability to provide stronger privacy protection. 2) Our schemes efficiently preserve the statistical properties of edge weight data to assure high data utility postanonymization. 3) Our schemes minimally modify the structural data without eliminating the important edges, and thus resulting in lower information loss compared to other randomization schemes.
For future work, the schemes could be improved for dynamic social networks, where the data are collected and published continuously. Furthermore, another possible direction is to integrate the schemes with differential privacy to further protect the data privacy in an interactive publishing environment.