Protecting Social Network With Differential Privacy Under Novel Graph Model

Online social networks (OSNs) contain sensitive information about individuals, so it’s important to anonymize network data before releasing it. Recently, researchers introduced differential privacy to give a strict privacy guarantee. Graph abstraction models are essential to transform graph structural information into numerical type data, and the choice of models may influence the utility preservation of the published graph. In this paper, we propose a comprehensive differentially private graph model which combines the dK-1, dK-2, and dK-3 series together. The dK-1 series stores the degree frequency, the dK-2 series adds the joint degree frequency, and the dK-3 series contains the linking information between edges. In our scheme, low dimensional series data makes the regeneration process more executable and effective, while high dimensional data preserves additional utility of the graph. As the higher dimensional data is more sensitive to the noise, we carefully design the executing sequence and add three levels of rewiring algorithms to further preserve the structural information. The final released graph increases the graph utility under differential privacy. We also experimentally evaluate our approach on real-world OSNs and show that our scheme produces ready-to-be-shared graphs that are closely matched with the originals, while achieving differential privacy.


I. INTRODUCTION
Studying online social networks (OSNs) through graph analysis could produce knowledge of human social relationships, help feed advertisements to recommendation targets, and evaluate the effectiveness of applications. Since OSN data contains personal information, any releasing procedure without sufficient anonymization work causes panic to the users of social media. Various anonymization techniques have been proposed. Differential privacy is one of the most remarkable techniques, since it could theoretically achieve a strong privacy guarantee [4].
Differential privacy requires graph abstraction models to convert the graph structure into numerical-like data. Figure 1 gives an example of the dK model. The dK model is separated into different dimensions. The dK-N model captures the degree distribution of connected components of size N. N.
The associate editor coordinating the review of this manuscript and approving it for publication was Vijay Mago .
Sala et al. employed the dK-2 series as the graph abstraction model to achieve differential privacy [22]. However, deploying one abstraction model can only capture some aspects of information, while other utilities are lost in the published graph. For example, because the dK-2 graph model is the record of edges, it may not preserve information involving more than two nodes, e.g., the clustering coefficient. The limitations in the models restrict their ability to achieve structural similarity under differential privacy. Therefore, choosing the right abstraction model becomes an important issue. Mahadevan et al. proved that dK models in higher dimensions have more information than the ones in lower dimensions, e.g., the dK-3 model is more precise than the dK-2 model [17].
Since different models have different advantages, choosing the abstraction model becomes an important issue. After studying the differences of abilities between dK-1, dK-2, and dK-3 series, we find that low dimensional models, e.g., dK-1, are less sensitive to noise and can easily regenerate a graph. However, high dimensional models can preserve more structural information. Our initial idea is to preserve differential privacy with the dK-3 model. To the best of our knowledge, there is no systematic regeneration algorithm for the dK-3 model because of its complexity [17], [22]. In our study, we also find that it is hard to reconstruct the graph with only the dK-3 series. However, dK-3 information can still be embedded in the published graph. We find that low dimension dK series, i.e., dK-1 and dK-2, can help the regeneration process. And we can use some rewiring algorithms to inject the dK-3 series in our published graph.
Hence, we absorb the benefits of different models and design a comprehensive model that combines three levels of dK graph models together. To achieve differential privacy, we introduce noise on the dK-2 level, which causes less distortion than with the dK-3 level. Then we use the perturbed dK-2 series to get the corresponding dK-3 and dK-1 series. After that, we use three levels of dK series together in our scheme to construct a new graph.
The impact of noise is the major challenge in the graph regeneration process. Although the three models in our scheme are closely related, they may have conflicts with each other because of noise. Hence, we first use some dK information to regenerate an intermediate graph, then use the remaining information to rewire the edges. In particular, we propose two sub-schemes, namely consider all together (CAT) and lower to higher (LTH), with different executing sequences in the dK series. The CAT scheme uses all three kinds of dK information in the regeneration phase. It aims to reduce the errors of the dK-2 and dK-3 series. Because the CAT intermediate graph preserves some dK-3 information, it is easy to apply the dK-3 rewiring in the following phases. By contrast, the LTH scheme just focuses on the dK-1 series in the regeneration phase. The intermediate graph fits the dK-1 information extremely well.
In our previous work, we demonstrated the algorithms for building the intermediate graphs [7]. In this paper, our general purpose of graph regeneration is minimizing the error between the target dK series and the one in the published graph under all three levels. Hence, we develop three dK rewiring algorithms to reduce the errors graphically. These rewiring algorithms also help us inject the remaining dK information into the graph.
The major technical contributions are the following: (1) We are the first to build the systematic regeneration algorithm for embedding dK-3 information in graph anonymization, which helps to preserve more utility than existing dK models. (2) We combine the dK-3 model with both dK-1 and dK-2 models in sampling and graph regeneration, which mitigates the high sensitivity and complexity in the dK-3 model and makes the design practical. (3) We design two different routes, CAT and LTH, to generate the graph efficiently and effectively, even under the impact of noise. (4) We use three levels of rewiring algorithms to actively reduce the errors between the desired dK series and the published graph. (5) We reveal the insights and challenges of using different levels of dK abstraction models jointly to enhance the utility under differential privacy.

II. RELATED WORK
Researchers proposed several anonymization techniques to preserve the privacy in OSN data sharing. Naive ID removal, which removes users' identities, is the simplest way but it is vulnerable to structural information attacks [13], [19]. k-anonymity is another kind of anonymization techniques [28], [29]. k-anonymity requires that there are at least k elements in each category, then the attacker is hard to differentiate these K elements. However, k-anonymity is often designed for some specific structure semantics, e.g., neighbors of each node [29], persistent structures [23], and they are overcome by other semantics, e.g., nodes' hierarchies or users' attributes [27].
On the attackers' side, various structural-based deanonymization attacks have been proposed [9], [14], [16], [21], [25]. They revealed the vulnerability of pervious anonymization techniques in different angles. Some attacks used the seeds in their de-anonymization to have better accuracy and efficiency [9], [14]. In [20], Qian et al. also introduced semantic knowledge to support the structural-based attack.
Hence, researchers utilized differential privacy to provide a strong privacy guarantee [4]. Nowadays, differential privacy has been widely adopted in privacy preservation for research purposes and commercial purposes, e.g., Apple and Google [2], [24]. However, differential privacy was originally proposed for numerical-type data in databases. Researcher need the graph abstraction model to transform OSN from graph-type data into numerical-type data. For example, our work is inspired by existing work about degree sequence model (dK-1 ) and joint degree model (dK-2 ) [3], [11]. Researchers also apply other graph abstraction models like the hierarchical random graph model and the adjacency matrix model [1], [8], [10].
In [6], we compared the dK series model with other graph abstraction models, our evaluation results show that the dK series model captures and stores more relative information about degree. As shown in [8], it is hard to combine different aspects of structural information, e.g., degree information and clustering information, into one model. Our scheme introduces different levels of dK series as an attempt.
Recovering the graph-type data after noise injection is another challenge in OSN anonymization. Mahadevan et al. proposed the randomized rewiring concept and they accepted the rewiring procedure if it could reduce the dK-3 error. Gjoka et al. use a semi-active rewiring algorithm which deploys the sequence of dK-2 series to simulate the clustering coefficient [11]. Inspired by these researches, we propose the active rewiring algorithm in our scheme.

A. THE DK GRAPH MODEL
In this paper, the OSN is modeled as an undirected graph G = (V , E), where V is the vertex set and E is the edge set. We denote |V | as the cardinality of the set V , and d v as the degree of the vertex v. e u,v means that there exists an edge between node u and v.
Since the differential privacy is applied on the query result, typically the numerical type data, the dK graph model is chosen as the graph abstraction model to transform the graph structures into a set of structural statistics. The dK graph model is better than most of the other graph abstraction models because the dK series could be used to re-construct a new graph. This graph has similar structural information with the original graph, so it can be used as the released OSN which preserves private information and useful information.
The dK-N model captures the degree distribution of size-N-connected-components in the target graph [17]. For example, the dK-1 model, known as the degree distribution, counts the number of nodes in each degree value. The dK-2 model, known as the joint degree distribution, counts the number of edges in the combination of two degree values. The dK-3 model counts the number of 3-node subgraphs in the combination of three degree values. Specifically, there are two kinds of 3-node subgraphs with different structures, wedges and triangles. In this paper, we also define the dimension of dK information as the subgraph size N, i.e., dK-1 series has lower dimension than dK-2.

1) THE WEDGE dK-3 ENTRY
The dK-3 entry ∨, d u , d v , d w = k means that there are k 3-node wedges which have the node degree values equal to d u , d v and d w , and each of the two subgraphs have at least one different node. In order to prevent double counting, d u should be less than or equal to d w . Assume the combination of node u, v and w forms such a subgraph, then w should not be the neighborhood of u. The node set of the subgraph should be The error between two dK-3 series is defined as the sum of all absolute differences in each corresponding dK-3 entry.
Similarly, err 1 and err 2 measure the error in dK-1 and dK-2 series. And our work focuses on minimizing the error between the dK series in the published graph and the target dK series calculated under differential privacy.

B. DIFFERENTIAL PRIVACY
Differential privacy is designed to protect the privacy between neighboring databases which differ in only one element. In the model of OSNs, the adversary is not able to be sure if two users are linked in the original network. In this paper, the neighbor database/graph refers to a OSN with one edge added or deleted. Then we have the notion of sensitivity.
Definition 1 (Sensitivity): The sensitivity f of a function f is the maximum distance of any two neighbor databases D 1 and D 2 in 1 norm.

Definition 2 ( -Differential Privacy): A randomized algorithm A achieves -differential privacy if for all neighbor datasets and all S ⊆ Range(A)
Equation (3) calculates the probability that two neighbor databases have the same result under the same algorithm. Based on the definition, researchers designed the Laplace mechanism to achieve -differential privacy when the entries have real values. It adds Laplace noise with respect to the sensitivity f and the desired security parameters to the result. In particular, the noise is drawn from a Laplace distribution with the density function p(x|λ) = 1 2λ e −|x| λ , where λ = f . Theorem 1 (Laplace Mechanism): For a function f : achieves -differential privacy [18].

IV. SCHEME
Given an OSN, our goal is to publish an anonymized network which preserves the structural utility as much as possible while satisfying -differential privacy. The general idea is to add sufficient noise to the dK model and reconstruct a graph G based on the perturbed dK series. As mentioned in [17], a model of higher dimension is more precise. Compared with previous research which advocates the dK-2 graph model [11], [22], the dK-3 model could preserve more information under differential privacy. It contains not only the information of nodes and edges, but also the linking information between edges. However the sensitivity of the dK-2 model is lower than the dK-3 model. In the scheme, we inject noise to the dK-2 series, which preserves more utility under the same privacy level. Then we use the perturbed dK-2 series to construct the corresponding dK-3 and dK-1 series. Our purpose of graph regeneration is to publish a graph with similar dK series as the perturbed result.
We propose two sub-schemes, namely consider all together (CAT) and lower to higher (LTH), in Figure2. The mutual steps are marked in 'both'. The two sub-schemes have the same perturbation process on the dK-2 series. The main differences in the two sub-schemes are in the regeneration algorithms, CAT uses all three kinds of dK series (mainly dK-2 series) while LTH uses dK-1 only. As a result, these two schemes have speciality in reducing dK-1 or dK-2 error. After the regeneration part, both sub-schemes have an active rewiring procedure to mitigate their errors, e.g., the dK-2 and dK-3 series have not been used by LTH.
In the following sections, we discuss these components which are also shown in Figure 2: 1.

A. dK-2 PERTURBATION
We find that achieving dK-3 differential privacy needs much more information distortion which largely reduces the benefits of dK-3 model after the analysis in Section V-A,. What's more, as the dK-2 series is the record of edges, we can make it indistinguishable to achieve edge differential privacy. Hence, we choose to inject noise at the dK-2 level. In particular, after counting the dK-2 entries, we add sufficient Laplace noise to achieve differential privacy. According to Equation 4, the noise level is determined by the sensitivity f and the privacy parameter . The sensitivity shows the impact of adding or deleting an edge in the model. For a given entry d x , d y = k, the sen- Example: Figure 1 shows a running example which is also used in the following sections. Figure 3 has the perturbed dK-2 series. If the value of an entry changes, it is marked in red. We can find that some dK-2 series like 2, 3 , although not exist in the original example (have a value of 0), are created. Because of differential privacy request, any entries in the range between 1, 1 and d max , d max are modified.

B. dK-3 CONSTRUCTION
Given the dK-2 model, we construct the dK-3 model to preserve edge linking information. Particularly, if one dK-2 entry is perturbed, its corresponding dK-3 entries is also perturbed, which leaks no edge information beyond differential privacy. Hence, we examine the influence of dK-2 perturbation on dK-3 model in the example of one edge e u,v , then do the modification.
First, there is a simple case that all three-node pairs in the graph are wedges. There are d u − 1 edges connected with the node u. Then the edge produces Second, we improve the case that the graph has some triangles. Adding an edge e u,v between node u and v, if they have an common neighbor x, the original entry ∨, d u , d x , d v will be changed to , d u , d x , d v . However, if they do not have the common neighbor, there will be some new entries added like the case before. Therefore, the total number of dK-3 entries containing the edge e u,v is also affected by the number of triangles.

1) ADJUSTED dK-3 MODEL
We find that if we deploy some specific counting method for triangles, the wedges and triangles can be treated equally. Thus, the adjusted dK-3 model is proposed to simplify the calculation of the dK-3 series. The adjusted model is completely based on the basic dK-3 series. Using adjusted model will not increase or decrease the ability of dK-3 series to present or reconstruct the graph. The new model does not change the wedge entry ∨, In the following sections, all dK-3 series are sampled in the adjusted dK-3 model. After deploying the adjusted dK-3 model, deleting or adding an edge e u,v always changes d u + d v − 2 dK-3 entries. In the following sections, a wildcard character * is used to match ∨ and . The dK-3 entry is like * , d u , d v , d w .
In the above section, the dK-2 series is perturbed for privacy. Each unit of increment or decrement in dK-2 entries could be viewed as one edge adding or deleting. Then we do corresponding modification on the dK-3 series. Specifically, facing increasing or decreasing, there are three possible changes in dK-3 entries.
First is called replacement. If d u , d v decreased by one and d u , d w increased by one, the graph replaces the edge e u,v by e u,w . So we pick min(d w , d v ) + d u − 2 dK-3 entries and use the number d w to replace d v in the dK-3 entries.
Second is subtracting. For each unit of decrement in d u , d v , the graph deletes the edge e u,v . So we reduce the dK-3 entries containing d u , d v by the total value of d u + Third is adding, for each unit of increment in d u , d v , the graph adds an edge e u,v . The formation part is a little special because there is no original record of the neighbors of u or v. So we randomly pick a structure, wedge or triangle, and a degree number d x in the range of [1, d max ]. Then we add the total value of d u + d v − 2 to the dK-3 entries containing Example: In the example of Figure 1, the dK-2 entry 4, 4 has total 4 + 4 − 2 = 6 corresponding dK-3 entries ∨, 1, 4, 4 and , 2, 4, 4 in the adjusted model. In Figure 3, the corresponding dK-3 series is constructed. Taken the dK-2 entry 1, 4 as the example, because the dK-2 perturbation causes 1 unit of decrease, the corresponding dK-3 series has 1 + 4 − 2 = 3 units of decrease. In the constructed dK-3 series, the first three modifications are '−3', '+1' and '−1' while the total amount of decrease is 3.

C. dK-1 RECOVERY
The dK-1 series is also important in the generation of the graph. Unlike the dK-3 series, it can be recovered directly from the dK-2 series. It is calculated by the following equation.
The recovery process shows that the high dimension data, e.g., dK-2, contains all the information of the low dimension data, e.g. dK-1 . Example: In the example of Figure 3, as the number '4' total appears 1 + 3 + 1 * 2 + 1 = 7 times in the dK-2 series, the dK-1 series should be 4 = 1.75 ≈ 2. Although the perturbed dK-2 values are integers, the recovery dK-1 values may not be integers. Here we can only round the value to integers because it shows the number of nodes and we have no information besides the dK-2 series. And the round-off error causes the two levels of dK series, dK-1 and dK-2, mismatch. In the rewiring section, we discuss the mismatch problem.

D. GRAPH REGENERATION
Given the target dK-2, dK-3 and dK-1 series, we need to regenerate the corresponding graph. Focusing on different level of dK series, we propose two sub-schemes namely CAT and LTH with different regeneration algorithms.
The LTH scheme starts from the dK-1 series because of the idea that dK-1 series is the base of the graph. If the degree of a node has a error, there will be large distortion on the corresponding dK-2 and dK-3 series. Hence, LTH just needs the dK-1 information and generate a graph with the least err 1 . It leaves the task of mitigating err 2 and err 3 to the rewiring procedure.
By contrast, the CAT scheme considers the dK-2 and dK-3 series in regeneration because of the idea that rewiring cannot guarantee to achieve the lowest err 2 and err 3 . It aims to reduce the most err 2 while preserving some dK-3 information as well.
In both schemes, we call a node 'saturated' if it has enough neighbors as its label (dK-1 information), and call it 'unsaturated' otherwise. If the value of a dK entry in the graph reaches the target value, we call it 'full'. Algorithm 1 dK-1 Graph Regeneration (LTH) dK-1 Input: Output: G 1 (V 1 , E 1 ): the perturbed graph 1: V 1 ← dK-1 add nodes with degree labels 2: {d 1 , d 2 , . . . , d |V | } ← dK-1 3: for i = 1, i |V |, i + + do 4: pick a node u with degree d i 5: while u is unsaturated do 6: if all nodes are connected with u then break 7: the dK-1 is non-graphical 8: pick v with the highest degree among all unsaturated nodes unconnected with u

1) LTH
Algorithm 1 firstly sorts the degree sequence into a nonincreasing order, which means d 1 d 2 . . . d |V | . Each number in the sequence also represents the target degree value of a corresponding node. Then, beginning from the first node with degree d 1 , the algorithm links it with d 1 nodes. These nodes are chosen from the node set which are unconnected with the first node, and they have the highest degree values in the set. According to [3], a graph can be reconstructed with the exact dK-1 information if and only if every node v is connected to all d v nodes in the leftmost part of the degree sequence (having the highest degree values).

2) CAT
In each iteration, Algorithm 2 picks one dK-3 entry and try to add one edge to the graph if it can find two nodes, having corresponding degrees in the dK-3 entry, can pass the edge check. Here, the edge check means there are two unsaturated nodes with the correct degree, the two nodes are not connected, and the corresponding dK-2 entry is not full. When an edge is added in the graph, its corresponding dK-2 and dK-3 entries is updated. The regeneration process stops when there is no node pairs to pass the edge check. Also, in the edge check process, it may happen that the only pair of unsaturated nodes are already connected. Simply connecting them together forms multi-edges in the graph, which is forbidden in OSNs. Algorithm 3 switches one neighbor from a saturated node to an unsaturated node with the same label.
There are two phases for the Algorithm 2 to choose dK-3 entries and add edges. In the beginning phase, it randomly picks a dK-3 entry and add two or three edges into the graph correspondingly if the node pairs could pass the edge check. In the continuing phase, we use the last chosen dK-3 entry, denoted as * , d u , d v , d w , to find a new dK-3 series * , d v , d w , d x . Assuming the node w could pass the edge Algorithm 2 dK-2+ Graph Regeneration (CAT) dK-1 Input: , dK-2, dK-3 Output: G 1 (V 1 , E 1 ): the perturbed graph 1: V 1 ← dK-1 add nodes with degree labels 2: dK-2 ← 0, dK-3 ← 0 initialize the dK-2 and dK-3 3: while exists dK-2 entry not full do 4: --------beginning phase-------- 5: randomly pick * , d u , d v , d w not full in dK-3 6: if d u , d v not full in dK2 then 7: if exists u and v unconnected and unsaturated 8: if * = ∨, add edge e u,v 9: if * = , add edge e u,v , e u,w 10: update dK-2 and dK-3 entries 11: else if exists u and v connected and unsaturated 12: adding edge causes multi-edges 13: NeighborSwitch(u, v) 14: else mark d u , d v full, continue Algorithm 2 make distinctions between wedges and triangles. It builds triangles if the three users link with each other originally, and forces no edge between u and w if the dK-3 entry is in the form of ∨, d u , d v , d w . The dK-3 information used in Algorithm 2 could preserve more structural information on the triangles and wedges, which is helpful to reconstruct a network with similar clustering information.
Example: Figure 4 and Figure 5 give the example of regenerated graph from the perturbed dK series in Figure 3. When the numbers on nodes represent the request degree, the LTH result satisfies the dK-1 series. However, it has no dK-2 information, e.g., 2, 4 = 4 in the graph but the desired value is 3. Compared with the given dK series in Figure 3, we have err 1 = 0, err 2 = 4, err 3 = 18.   By contrast, the CAT result seems to satisfy the dK-2 requirement perfectly. However, one node with mark '3' and one with '4' have not get the required degree, and all the dK-2 series are exhausted. The err 1 happens because of mismatch, and has a impact on err 2 and err 3 . Compared with the given dK series, we have err 1 = 2, err 2 = 4 and err 3 = 13. Comparing the two results of the example, each sub-scheme has its own advantage in preserving information.

E. TARGET REWIRING
As mentioned in the last section, there is no dK-2 and dK-3 information preserved in the LTH intermediate graphs. LTH needs to compare the graph with the target dK-2 and dK-3 series and apply rewiring. Intuitively, the CAT intermediate graph only needs to apply the dK-3 rewiring because it does not consider the entire dK-3 entries. However, after analyzing the impact of noise in Section V-B, we find that the result which satisfies dK-2 series may have non-trivial error in dK-1 information. As a result, the CAT scheme needs dK-1, dK-2 and dK-3 rewiring, from lower to higher. Here we propose three levels of dK rewiring algorithms, each level of  20: do the rewiring check, , between e u,v , e x,y 21: end while 22: return G 2 the rewiring preserves the lower dimensional information but may change the higher dimensional information.

1) dK-1 REWIRING
Given G 1 as the input, we build edges between pairs of unsaturated nodes. Building each edge reduce the err 1 by two. There are two special cases in the dK-1 rewiring shown in Algorithm 4. First, there are just two nodes unsaturated and they are already linked. A neighbor switch process should be applied on these two nodes. Second, there is just one node unsaturated, but it needs at least two edges. Then the neighbor switch should also be applied on this node. Here the neighbor switch processes in dK-1 rewiring is slightly different from the one in Algorithm. 3. It has no limitation on the degree of v , just needs v and u to be unconnected.

2) dK-2 REWIRING
In this step, err 2 is reduced while keeping the result from the first step. Figure 6 shows the dK-2 rewiring process described in Algorithm 4. The dK-2 series in the intermediate graph is compared with the target dK-2. The algorithm applies the rewiring procedure if the prerequisites are satisfied. We define the dK-2 rewiring prerequisites as d v , d w and d y , d z are higher than the target, and d v , d z and d y , d w are lower than the target. When at least three out of four prerequisites are satisfied, we admit a rewiring pair to reduce the err 2 by at least two.

3) dK-3 REWIRING
err 3 is reduced with a similar procedure. Figure 6 shows two kinds of rewiring on the same six nodes. It is notable that the two different solutions lead to the same direct dK-3 changes, which is denoted as the direct impact of dK-3 rewiring. However, the rewiring process may also have indirect impact on dK-3, e.g., some entries involve node u and v are also changed. Hence, in each iteration, Algorithm 4 calculate the dK-3 series of the new graph called dK-3 i and find the dK-3 rewiring pairs. We admit a step of dK-3 rewiring only if the dK-3 error is decreased.
Numerically, in the example of Figure 6, the rewiring changes the dK-3 series directly but keeps the dK-2 unchanged if and only if d u = d x , d v = d y and d w = d z . Hence, we define the dK-3 rewiring prerequisites as * , d u , d v , d w and * , d x , d v , d z are higher than the target, and * , d u , d v , d z and * , d x , d v , d w are lower than the target. We also admit the pair when at least three requirements are satisfied. Here the rewiring can directly reduce the err 3 by at least two. Structurally, the two types of dK-3 series have additional prerequisites on the existence of edges. For example, if the value of the entry , d u , d v , d z are lower than the target value, there should be one edge between u and z before rewiring, then rewiring builds a triangle automatically. Example: Figure 7 shows the example of dK-1 rewiring when the original graph is in Figure 5. When the original graph has two unsaturated nodes but the two nodes are linked, a neighbor switch process involving the right node with mark '2' can help all nodes satisfy the dK-1 series.   Figure 4. In the simple example, Figure 7 and Figure 8 are the same graph which shows that the CAT result after dK-1 rewiring can get the same graph as the LTH result after dK-2 rewiring. Both graphs have err 2 = 1, err 3 = 2, which shows that the rewiring algorithms can significantly reduce the error in dK series.

A. SENSITIVITY ANALYSIS
The sensitivity shows the impact of adding or deleting an edge in the dK-2 model. Theorem 2: Given an entry d x , d y in the dK-2 model, the sensitivity f is upper bounded by 2 · d x + 2 · d y + 1.
Proof: Let e x,y be a new edge added to the graph G between nodes x and y. There is one new dK-2 series d x , d y getting increment by 1. Also, the degrees of x and y increase from d x and d y to d x +1 and d y +1 respectively. In the original dK-2 model, there are d x series related with the node x. They are in the form of d u , d x and d x , d u . They are deleted and new series d u , d x + 1 , d x + 1, d u are added. Hence, totally 2 · d x + 2 · d y + 1 dK-2 series are changed when the new edge is added.
Similarly, given the dK-3 series * , d x , d v , d z , the sensitivity of the adjusted model is 2 where d max is the max degree value in the graph. And we can compare the noise amount that added on the dK-2 model with the one on the dK-3 model under the same privacy criteria.
Property 1: Under the same level of differential privacy, there is more noise added in the adjusted dK-3 perturbation than the dK-2 perturbation.
Proof: If we denote the total size of dK-2 model as |dK-2|, dK-2 is from 0, 0 to d max , d max , so |dK-2| d 2 max . Then the expected randomization in the dK-2 model is where f (dK-2) is the sensitivity of dK-2 model. Similarly, |dK-3 | 2 · d 3 max . The expected randomization in the dK-3 model is Hence, under the same privacy level, the noise in the dK-2 model is much less than the noise in the dK-3 model. If we choose to add noise on the dK-3 level, the information distortion may exceed the benefits of using dK-3 model. VOLUME 8, 2020

B. PERFORMANCE ANALYSIS
In this section, we analyze the impact of noise on the dK graph models and then show the ability of our schemes to reduce dK error under noise.
If the dK series is graphical, which means it can build a graph, it must obey the following rules: 1) the values of dK entries being non-negative integers, 2) the dK-1 information, if in non-increasing degree sequence form, following the Erdös-Gallai theorem [5], 3) the dK-2 entries having d − d x , d y In most of the real cases, the perturbed dK-2 and dK-3 series are non-graphical, so we need to fix the dK-2 and dK-1 values to non-negative integers. However, the approximation makes the dK series have conflicts with each other. For instance, the degree value 3 appears 4 times in the dK-2 series, we can just make 3 = 4 3 ≈ 1. Hence, we introduce the rewiring algorithms and apply the approximation graphically from lower to higher.  When considering the ability of reducing err 1 , our LTH scheme has such property.
Proof: Because the dK-1 regeneration algorithm is directed by the Erdös-Gallai theorem [5], it always builds a graph when the dK-1 information is graphical. If the dK-1 information is not graphical, the dK-1 regeneration algorithm adds possible links to high degree nodes as much as possible, which does not form a forbidden link enlarging err 1 [3].
We also show the shortage of the CAT scheme in an example. Assuming the target dK-1 information is 6 = 1, 2 = 6, Figure 4 is a correct solution of dK-1 regeneration algorithm. Although Figure 5 violates the dK-2 series of the graph, it is still a possible intermediate graph published by the dK-2+ regeneration algorithm when noise is injected. Then the dK-1 rewiring algorithm tries to add neighbors to unsaturated node v. Node u and w are the possible candidates. The two nodes are saturated which means the dK-1 rewiring needs to apply the neighbor switch process. However, all neighbors of them are linked with v, which is the only unsaturated node in the graph. Hence, it is impossible to switch a neighbor of u (or w) to v. Finally, the dK-1 rewiring cannot get the correct graph while the LTH scheme can.
As shown in the example, when the dK series do not match with each other, the ability of dK-1 rewiring algorithm is limited. Also, we analyze the errors in the dK-2 and dK-3 rewiring algorithms. Here we define the term of local minimum as there is no neighbor graph (with one edge changing) having lower error than the rewiring result.
Property 3: Given a graph G with fixed dK-1 series, the dK-2 rewiring algorithm can achieve the local minimum in err 2 .
Proof: By contradiction, assuming there is a neighbor graph G having lower err 2 than the rewired graph G with edge e u,w deleted. In order to preserve the dK-1 series, node u and w should each be linked with a new node, the two new nodes should also delete one existing edge. As a result, deleting e u,w makes d u , d w lower, d u , d z and d y , d w higher and at least one other change in the dK-2 entries. Because the edge changing violates the prerequisites of dK-2 rewiring, which means at least two of four prerequisites are not satisfied, the err 2 is not reduced by the 'illegal' edge changing. So the assumption must be wrong, and the rewiring result have the minimum err 2 among all its neighboring graphs.
The dK-3 rewiring algorithm also has the similar property. If the indirect impact of dK-3 rewiring is ignored, it can also achieves the local minimum in err 3 . The rewiring pairs reduce the error by two or four in dK-2 and dK-3 rewiring algorithms. Then, considering some particular pairs may trap the error in the local area. Deploying a weighted mapping algorithm can help us design the sequence of picking rewiring pairs and solve the problem at the cost of efficiency.
All three kinds of rewiring algorithms have possibility to be trapped in local area when searching global minimum, so it is significant to choose a start graph before rewiring. LTH starts from a graph with the best dK-1, the most basic information. CAT starts from a graph with some dK-3 information, which restricts the level of err 3 . Our two schemes use two routes to deal with the noise and the conflict problem. Each of them has its own advantages in reducing the error.

A. EXPERIMENT SETTINGS
In this section, we evaluate our anonymization scheme over three real-world datasets, namely ca-HepTh, Facebook and Enron [15]. The statistic of the three OSNs are shown in Table 1.
is a privacy parameter to measure the ability of hiding existence edges. Smaller means more strict privacy guarantee as well as more noise injected in the model. We generate -private graphs with ∈ [5, 100] to evaluate the performance under different privacy level. For comparison purposes, we implement one state-of-the-art technique as the reference method, which is the differential privacy algorithm with only the dK-2 model [22]. In the following figures, results of this scheme is marked as 'reference', while two of our sub-schemes are marked as 'CAT' and 'LTH', respectively.
To evaluate the three different anonymization schemes, we compare their published graphs under the same privacy level, i.e., the same . The similarity between the published graphs and the original graph is compared under four graph topological utility metrics: the average shortest path length, the clustering coefficient, the average degree and the betweenness centrality. We also evaluate the three levels of errors and one application utility metric, the influence maximization.

B. TOPOLOGICAL UTILITY METRICS 1) CLUSTERING COEFFICIENT
Clustering coefficient is a measure of how nodes in a graph tend to cluster together. While the dK-2 model may break the features of cluster, a scheme with the dK-3 series is believed to preserve partial clustering information because structural information like the triangles and wedges are included. Figure 11 shows the clustering coefficient distribution under different . Figure12 shows the distribution in different datasets. In the original ca-HepTh graph, there are 28% nodes with the median clustering coefficient (0.2 to 0.8). However, when = 20, this kind of nodes only occupy 9% of total nodes in the reference result, 12% in the CAT graph and 13% in the LTH graph. Three dK anonymization methods all lose some clustering information.
The original ca-HepTh dataset has an average clustering coefficient of 0.47. When = 5, the average clustering coefficient of the reference result is 0.21, and 0.24 for CAT and 0.26 for LTH. When = 100, the average clustering coefficient is 0.12, 0.25 and 0.27 for the reference, CAT and LTH result correspondingly. The figures show that the clustering coefficient distribution of our two schemes are always closer to the original distribution than the reference result. The dK-3 series in our scheme preserves the structure information of triangles and wedges, which determines the clustering coefficient. Hence, the reference scheme shows more randomness while our schemes can preserve more clustering information.
In the Facebook dataset, the average clustering coefficient is 0.55, 0.17, 0.34 and 0.69 for the origin, reference, CAT and LTH data correspondingly. In the Enron dataset, it is 0.36, 0.13, 0.16 and 0.37. It shows that CAT and LTH consistently outperform the reference method in terms of clustering information preservation in all three cases.

2) AVERAGE SHORTEST PATH LENGTH
The average shortest path length measures the average length of the shortest path from one node to every other node. Figure 13 shows the average shortest length distribution in three datasets when = 20. Taken the result of Enron dataset as example, the overall average shortest path length VOLUME 8, 2020 of the original data is 3.61, and the reference, CAT and LTH schemes have the result 4.20, 3.54 and 11.35 correspondingly. The figure shows that the reference scheme and the CAT scheme can preserve the shortest path length information well in all three datasets. However, the nodes in the LTH anonymized graph have a longer distance between each other than the origin.
As shown in the analysis in Section V-B, the dK-2 rewiring algorithm cannot help LTH to minimize err 2 . The LTH scheme just use the dK-1 information in graph regeneration, and the probability of high degree nodes linking with each other is much higher than the original. We can make an observation that the shortest path length is closely related with the dK-2 series.

3) BETWEENNESS CENTRALITY
Betweenness centrality measures the number of shortest paths that pass through each node. The betweenness centrality of the original graph and each released graph is shown in Figure 14. The experiment result shows that all three anonymization schemes can preserve some of the shortest paths from the original OSN. Our two sub-schemes outperform the reference scheme on some OSNs. For example, the average betweenness is 0.0041 for the original graph, 0.0014 for the reference result, 0.0019 for the CAT result, and 0.0030 for the LTH result on the Facebook dataset. The average betweenness is 0.0119 for the original graph, 0.0100 for the reference result, 0.0120 for the CAT result, and 0.0202 for the LTH result on the Enron dataset.

4) DEGREE
The degree of a node in the network is the number of edges incident to the node. Figure 15 shows the average degree in the ca-HepPh dataset. The published graph is more similar to the origin when noise level is lower in all three schemes. When = 5, the average degree difference of the reference result is 0.37 while CAT has difference 0.24, LTH has difference 0.19. Compared with the reference method, the CAT scheme has an additional dK-1 rewiring algorithm which effectively reduces the err 1 , so the CAT result is better than the reference result. The LTH scheme achieves the best result in reducing err 1 . In the regeneration part,  it just uses the dK-1 information while other schemes begin with the dK-2 series. Here we can make the observation that LTH has better performance than CAT even though CAT has the dK-1 rewiring algorithm, which is consistent with Property 3.

C. APPLICATION UTILITY METRIC
Information Maximization: Information maximization measures the proportion of users influenced by other users [12]. At the very beginning, we set a part of users as seeds, i.e., the persons who hold the information. Then we measure how many users in the OSN are affected, i.e., received information from other users, after certain rounds. In the evaluation, we choose the greedy algorithm to choose the seed users and we implement the independent cascade model to propagate information [26]. The propagation probability is 0.02 in this experiment and this experiment terminates after 20 rounds.
The result is shown in Figure 17. Our two sub-schemes have more similar number of influenced users with the original graph than the reference result. For example, on the ca-HepPh network, when comparing with the percentages on the original graph, the Root Mean Square Error is 3.52 for the reference result, 1.41 for the CAT result and 0.63 for the LTH result. The information maximization evaluation shows that our two sub-schemes outperform the reference dK-2 scheme in simulating the information transmission result. The published graph of our anonymization schemes are more useful when analyzing the information propagation properties of OSNs.

D. dK SERIES ERRORS
We also compare the dK series in the original graph with different regeneration results. Figure 16 gives the three levels of error in the Facebook dataset. The reference result has 284 unit of err 1 while the CAT result has 96 and LTH result has no err 1 . It shows that the dK-1 rewiring algorithm can reduce large amount of err 1 , but using dK-1 in graph regeneration is better. Reducing err 1 also helps the CAT result, which has smaller err 2 (2.6K) than the reference result (4.8K). Because of the cumulative error in dK series and lacking a dK-3 rewiring algorithm, the reference result has 0.19M err 3 while CAT has 0.12M and LTH has 0.11M. The closer dK-3 distance between our results and the original graph is a reason that our schemes can preserve more structural information in the published graph.

E. EVALUATION SUMMARY
From the above experiments, we can conclude that the CAT and LTH schemes perform better in most measurements than the reference scheme. The LTH scheme can better preserve degree information but it lacks the ability to preserve average shortest path information. The CAT generally produce better result in all other graph metrics we evaluated. Using the comprehensive dK graph model helps us achieve better graph utility than the reference scheme which solely depends on the dK-2 model.

VII. CONCLUSION
In this paper, we propose a uniform scheme which combines three levels of dK graph models to publish a perturbed social network. We design two different sub-schemes, CAT and LTH, and three levels of rewiring algorithms to regenerate the graph and reduce the error under the differential privacy noise. The empirical study indicates that our two schemes have different merits in preserving graph utility. The design, analysis and comparison also reveal more insights and challenges of using multiple levels of graph abstraction models together in differential private graph releasing for OSNs.