Multi-View Low-Rank Coding-Based Network Data De-Anonymization

Social networks are extensively exploited by third-party consumers such as researchers and advertisers to understand user characteristics and behaviors. In general, before network data is published, sensitive relationships should be anonymized to prevent the compromise of individual privacy. To quantify the guarantee level of privacy-preserving mechanisms and mitigate users’ privacy concerns, numerous studies concerning network data de-anonymization have been carried out. However, most existing studies focus on single-view data


C. MOTIVATION
Typically, in the era of big data, there are rich diversities, and the same object can be observed from different viewpoints or captured by distinct apparatus. Different views that are complementary to each other form the basis of multiview learning. Specifically, each view of the data contains some specific information that others do not have. Thus, multiple view models can be applied to represent the data comprehensively [32]. For example, for authors in the scientific community, their relationships can be characterized based on co-authorship or citations. Another example is the real social network of individuals, where various social media applications (e.g, Sina Weibo, Wechat) capture the interactions between users from different viewpoints.
In this study, we explore the problem of network data deanonymization from the perspective of multi-view learning and determine the possibility of anonymized links inference. In multi-view networks, existing anonymization techniques assume that it is enough to anonymize each of their views independently. Let us consider the simplest case in which published data includes only two views. We will then have options to anonymize the data as shown in Table 1: no anonymization for either view, anonymization for one view, and anonymization for both. With two views as shown in Table 1, case 4 is the backbone of the anonymization techniques for data publishing, which is clearly the strongest protection of privacy. Here, we are interested in answering the following research question: Is case 4, the strongest among four cases, sufficient for anonymizing multi-view network?

D. CONTRIBUTIONS
In this study, we seek to answer the question by taking an adversary approach to assay the privacy level of anonymized multi-view networks. The idea behind this is that the anonymized links of existing methods are generated without considering the structural patterns of the target network, and the resulting local subgraphs have inconsistent structural patterns compared with the normal ones [33]. In addition, we assume that the auxiliary networks have consistent structural information with the target network, and they can be utilized for structural patterns learning. Thus, we propose a novel network data de-anonymization framework, called Multi-view Low-rank Coding (MVLRC), to model the target network and auxiliary network together. Based on lowrank theory [34], we define a low-rank constrained network representation model and uncover the anonymized links by exploring the representation relationship among elemental subgraphs. The key contributions of this study include: • We answer the problem of whether the traditional privacy protection methods are still valid for the anonymization of multi-view data and formulate the privacy-preserving oriented multi-view network deanonymization framework. To the best of our knowledge, it is the first time that the privacy protection of multi-view network data has received attention.
• We develop the multi-view low-rank coding method MVLRC for network de-anonymization, in which the auxiliary network can be incorporated naturally with the target network for anonymized links inference.
• The promising accuracy of MVLRC demonstrates that the representative network anonymization approaches cannot be directly applied on multi-view network data. This observation will help researchers in designing multi-view network data anonymization methods by taking network structural patterns into consideration.

E. ORGANIZATION
The remainder of this paper is organized as follows: In Section 2, we introduce the related work. Section 3 presents the preliminaries and problem formulation, followed by the proposed MVLRC algorithm and the derivation of the optimal solutions. Experiments are reported in Section 5. The last Section concludes the study.

II. RELATED WORK A. NETWORK PRIVACY PRESERVATION
When releasing data for analysis, privacy preserving of individuals has recently raised great concern in the data mining field. The main concern is that sensitive information should not be disclosed. There are different types of privacy models proposed for preserving data privacy such as K-anonymity, Ldiversity, T-closeness, and differential privacy. The methods should limit the disclosure risks while maintaining the utility of the data. More details about existing research outputs and achievements of the privacy field can be found in [35]. However, most existing studies on preserving privacy in data publishing have focused on tabular data. Owing to the dependency and complexity of network data, privacy preserving about social network data is much more challenging than the anonymization of the conventional tabular data, and the anonymization techniques for tabular data cannot adapt to network data [12]. Generally, for network data publication, there are three important privacy risks: content disclosure risk, identity disclosure risk, and link disclosure risk [13], [14]. A large portion of studies on social network privacy has concentrated on identity disclosure, which reveals users' identifiable personal information (such as names and social security number) based on the structure features or descriptive attributes [22], [36]- [38]. Although a privacy-protection mechanism for social network data publishing should consider content, identity, and link disclosure threats, link disclosure often leads to both content and identity disclosures. Therefore, limiting link disclosure is more fundamental than the others.
Many anonymization methods have currently been proposed and can be categorized into two groups, i.e., generalization based approaches and perturbation based approaches. Specifically, the basic idea of generalization based methods is to replace the sensitive information with a less specific, but semantically consistent value [39]. Perturbation based methods include link modification strategy and randomization strategy, in which the former proposes link addition and deletion mechanism to meet the desired constrains, such as kdegree anonymity [40], and k-automorphism anonymity [41]; the latter attempts to change network structure by randomly adding and removing links. In addition, differential privacy methods [42], [43] are also proposed for network data anonymization.
Compared with the large number of generalization based approaches [19], [44] and k-anonymity based methods [16], [41], [45] proposed for user identity anonymization, the research on link privacy protection is insufficient. Initially, Zheleva and Getoor [18] focused on the problem of preserving the privacy of sensitive relationships in network data and defined the sensitive relationships inference problem based on anonymized network. Thereafter, the most conservative approach for link privacy protection was proposed to remove the sensitive relationships altogether, thus preserving any privacy that these relationships may compromise [18]. Moreover, the works [20], [46] presented random perturbation and random switching strategies for link privacy protection. Along this line, considering the structural proximity of nodes, some structure-aware randomization perturbation methods have been proposed, including the local perturbation based methods [14], [15] and the random walk based method [47]. Furthermore, the Gaussian noise based method [48] and differential privacy methods [49]- [51] were developed for link privacy protection.

B. NETWORK DATA DE-ANONYMIZATION
Network de-anonymization techniques are actively studied to explore the vulnerabilities of current network data privacy protection mechanisms. Most of the existing de-anonymization attacks focus on user identity deanonymization. Typically, these approaches can be classified into two categories, attributes based identity deanonymization as well as structure based identity deanonymization. Recently, there has been a surge of interest in the topic of identity de-anonymization by involving the attributes information of users. Most of the methods extract features from public profile fields, such as user ID, location, etc., and content information, e.g., timestamps, geo-tags, etc. Thereafter, these methods adopt classifiers to infer whether the node pairs correspond to a similar identity [52]. Important studies on this topic are reviewed by Shu et al. [53].
The structure based identity de-anonymization, also called vertex re-identification, assumes that the accounts belonging to the same user across social networks have similar local structures. Thus, the subgraphs associated with target nodes can be used as background knowledge for user identification. Specifically, Nilizadeh et al. [37] matched the anonymized network with auxiliary network and identified user identity by considering the community structure. Lee et al. [54] incorporated multi-hop neighbors' information in network structures as novel features and optimized the matching for users between the anonymized network and the auxiliary network VOLUME 8, 2020 by leveraging a machine learning technique. Narayanan and Shmatikov [55] proposed a network topology based deanonymization method that first identifies some seed nodes and then propagates the mapping to new nodes based on structure similarity. To match the nodes accurately, Ji et al. [56] defined a unified similarity measurement and proposed a deanonymization framework based on it. Ji et al. [57] implemented comprehensive quantification of de-anonymizability of networks with seed information and provided theoretical foundation for structure-based de-anonymization attacks. Zhou et al. [58] proposed a cross-platform unsupervised user identification algorithm based on friend relationships. The above works demonstrate that privacy-preserving on network structure is necessary for the anonymization of user identity.
Besides identity de-anonymization, the link deanonymization, i.e., link disclosure or link re-identification, which aims to identify sensitive relationships among users from anonymized networks, is also an important issue in the field of network privacy protection. Specifically, Ying and Wu [59] investigated the sensitive relationships protection problem and verified the value of similarity measures for link privacy breaching. To mitigate the vulnerability of network anonymization mechanisms, Zhang et al. [28] developed an enhanced network anonymization method by generating fake edges as plausible as possible. Fire et al. [29] presented a classifier based link reconstruction attack method to identify sensitive relationships. Wu et al. [30] defined a low-rank approximation based de-anonymization algorithm to reconstruct a network from link randomized observation. Vuokko and Terzi [31] proposed a maximum-likelihood-estimationbased method to reconstruct the original networks. These methods are mostly based only on the structural features of the anonymized network. In our work, we investigate how to utilize multi-view features from the target network and auxiliary network for network link de-anonymization.

C. MULTI-VIEW LEARNING AND PUBLISHING
Owing to the diverse domains and various feature extractors, multiple groups of features are currently available for specific learning problems, and each of them can be regarded as a particular view. Accordingly, multi-view learning paradigm is developed to exploit the useful information from different views. The existing multi-view learning algorithms can be classified into three groups [60]: 1) co-training, 2) multiple kernel learning, and 3) subspace learning. Importantly, co-training algorithms enhance the learning performance in different views by using the information from one another; multiple kernel learning algorithms define a kernel function for each view and thereafter combine the kernels together to improve learning performance; subspace learning algorithms aim to find a meaningful low dimensional embedding or latent subspace shared by all feature sets. Generally, existing multi-view learning methods mainly aim to maximize the agreement on multiple distinct views or exploit their complementary information and ensure their success. Recently, many multi-view algorithms have been proposed by taking into consideration the complementary information from different views, such as clustering [61] and subspace learning [62].
For privacy-preserving data publishing, Dou and Coulondre [63] presented a formal analysis of privacy violation in the context of multi-view tabular data. Yao et al. [64] defined k-anonymity based on relational view and concentrated on how to detect whether or not a given set of releasing views violates k-anonymity. In our study, we also analyze the privacy risk of multi-view data. However, our work aims to explore the network structure de-anonymization problem, which is different from the existing ones that mainly focus on tabular data.

III. PRELIMINARIES AND PROBLEM
Typically, to anonymize networks for publication, the sensitive relationships contained in original graphs are removed firstly, and then the anonymization strategies, such as sparsification, perturbation and switching methods, are applied to add or remove network links. To quantify the guarantee level and assess the privacy risk of state-of-the-art anonymization strategies for multi-view networks, the core task of this study reduces to recover the original social graph and identify anonymized links as accurate as possible based on target graph. Based on the recovered graph, the sensitive relationships can be accurately inferred with subgraph attacks, similarity measures, etc. For simplicity, this paper assumes that only one auxiliary graph is available for structure deanonymization.

Definition 1 (Original Graph):
A social network can be modeled as a graph SG = {U , R}, in which U denotes the set of users and R ⊆ U × U indicates the set of relationships between the users. If the associated parties prefer to keep link R i,j in graph SG hidden, then this link R i,j is the sensitive relationship, and the graph SG can be regard as the original graph that needs privacy-protection.
Definition 2 (Target Graph): For a social graph SG = (U , R) containing sensitive relationships, its structure is always modified based on a certain anonymization strategy I (·) to preserve privacy before publishing. We refer to the published social graph as the target graph SG T = {U T , R T }, where U T is the user set, U T = U , and R T is the relationship set, R T = R. Definition 3 (Anonymized Link Set): For a social graph SG, the difference between the link sets of SG and its target graph SG T is defined as anonymized link set , = R\R T .  Problem Statement: For an original graph SG, given the target graph SG T and its related auxiliary graph SG H , the goal of the network structure de-anonymization is to develop an algorithm (·) to generate a de-anonymized graph SG D = (SG T , SG H ), thereby approximating the original graph SG as much as possible.
The description of the notations used in this study is presented in Table 2.

B. NETWORK ANONYMIZATION MODEL
To evaluate the performance of the de-anonymization attack, we consider popular anonymization techniques that have been most widely used in structure anonymization works [12], [36] [20] [54] [65] . Specifically, the selected anonymization strategies are introduced as follows: (1) Densification. This method ensures the anonymity of graph SG = {U , R} by only adding k|R| links randomly where k is the anonymization coefficient.
(2) Sparsification. This method obtains the anonymization of social graph SG = {U , R} by randomly eliminating k|R| links.
(3) Perturbation. This method first removes k|R| links from a social graph SG = {U , R} in the same way as the sparsification method does. Thereafter, it adds random false links until the number of links in the anonymized graph is the same as the original one.
(4) Switching. This method selects two random edges (i 1 , j 1 ) and (i 2 , j 2 ) from social graph SG = {U , R} such that Thereafter, it switches pairs of links, i.e. removes links (i 1 , j 1 ) and (i 2 , j 2 ) and adds new links (i 1 , j 2 ) and (i 2 , j 1 ) instead. This step is repeated k|R| 2 times, which results in k|R| link removals/additions.
In this study, the original graph with sensitive links can be anonymized by the various anonymization strategies and generate different anonymized graphs on a same node set. Among the resulted networks, except the target graph, all the others can be selected as auxiliary graph. Thus, for original social graph de-anonymization, the target graph and the auxiliary graph can be assumed to be the views corresponding to a same group of users [66].
Example 1: Given an original social graph with sensitive links, data publishers always remove the sensitive links firstly and then apply anonymization strategies for privacy preservation. Data publishers independently perform privacy protection operations, resulting in multiple views. Specifically, as shown in the middle of Fig. 1, two publishers anonymize the original social graph SG with perturbation strategy, in which two links indicated by the red dotted line are deleted and the two links indicated by the red solid line are added. Similarly, with the switching anonymization strategy, the publishers anonymize SG by switching two links selected randomly, thereby generating graphs shown in the right of Fig. 1. Thus, for original social graph de-anonymization, the above graphs generated by the publishers can be viewed as target social graph SG T and auxiliary social graph SG H .

IV. NETWORK STRUCTURE DE-ANONYMIZATION
In this section, we will introduce a principal and explainable network representation model. In order to utilize the complementary information from auxiliary network for network structure de-anonymization, the proposed model is extended to a multi-view scenario.

A. NETWORK REPRESENTATION MODELING
Based on empirical analysis, real-world networks have been proven to have some common topological characteristics, such as small-world, scale-free, and core-periphery features [67]. Hence, networks are always assumed to have specific structural patterns for structure modeling [68]. Moreover, Koutra et al. [69] found that network structures can be summarized and compressed by using an enriched set of representative subgraphs as building blocks, such as cliques, stars, chains, and bipartite cores. Inspired by these works, in this study, networks are viewed as the linear summation of a VOLUME 8, 2020 set of elemental subgraphs with a specific interaction pattern i.e., networks can be represented by using the elemental subgraphs as structural bases. Specifically, let A ∈ R n×m denote the adjacency matrix of an anonymized graph that consists of m neighborhood structures, i.e., [A :,1 , A :,2 , . . . , A :,m ]. Given a complete basis matrix D = [D :,1 , D :,2 , . . . , D :,m ] ∈ R n×m , each neighborhood structure A :,i can be represented as a linear combination of the bases, which is defined as follows: where X k,i denotes the weight corresponding to the structural basis D :,k . Consequently, the adjacency matrix A can be represented by A = DX , in which the representation matrix X ∈ R m×n captures the structural patterns of networks, and the matrix D indicates the set of representative subgraphs.
To recognize the structural patterns of the network, the best candidate for the basis matrix D is the adjacency matrix A, and the network can be represented by the following equation: Since social networks often contain frequent subgraphs, the columns of the representation matrix X corresponding to the subgraphs should be correlated. Thus, X is expected to be low-rank. Moreover, individuals may have different interaction patterns in reality. Thus, the modeling of real-world networks should be node-oriented. Because each column of the adjacency matrix A represents the interactions between a node and the rest of the nodes, to characterize the nodespecific corruptions in networks and learn the representation matrix, 2,1 norm, i.e., || · || 2,1 , is adopted in our model to capture the difference between the adjacency matrix A and the graph representation AZ in terms of the graph node. Based on the above observations, social networks can be modeled via the following structured low-rank representation: where E is the noise term, λ ≥ 0 is a trade-off parameter used to balance the low-rank and noise terms.
In this study, we assume that the anonymization process does not alter the network structure significantly. Thus, the original network SG can be inferred based on the learned structural patterns from anonymized network. Fig. 2 provides an intuitive illustration of our low-rank representation-based network structure de-anonymization method. To a specified network, by solving the structured low-rank representation model, three structural bases a 1 , a 2 , and a 3 are identified, and the network can be represented based on them, as shown in the left of Fig. 2 (a). Thus, to an anonymized social graph where some neighborhood structures are perturbed for privacy-preserving, such as a 4 and a 5 , the original network structure a * 4 and a * 5 can be recovered based on the identified structural bases and the learned representation relationships. To the networks with the same dimension, we argue that the lesser the number of its structural bases, the higher the proportion of the redundant structure in the networks and the more possible it is for the anonymized structure to be recovered, as shown in Fig. 2 (b).

B. REGULARIZED MULTI-VIEW LOW RANK REPRESENTATION
Most of the existing network structure anonymization strategies for privacy-preserving do not take into account the underlying structural characteristics of networks. Consequently, the difference between the original network SG and the target network SG T , i.e., the anonymized link set, follows different structural patterns with the original network SG. Thus, we argue that the anonymized link set can be identified via the structural patterns centered network representation model.
In the previous section, the network representation model is proposed based on the assumption that only the single view data is available, i.e., the target network, for structural patterns learning. Nevertheless, in reality, multiple related social networks from different viewpoints on the same set of users contain valuable information and can be adopted as auxiliary networks for structural patterns characterization, i.e., optimizing the accuracy of the identified structural bases and the learned representation relationships, thereby improving the performance of network structure de-anonymization.
Let A (i) , i = 1, 2 indicate the relationship set of target network SG T and auxiliary network SG H respectively, and they can be represented as follows: where X (1) and X (2) are representation matrices and E (1) and E (2) are noise terms. Because the target network SG T and the auxiliary network SG H capture the interactions between the same group of users from different viewpoints, we expect the multiple view-specific networks to embody consistent structural patterns and be complementary to each other, which can be used for the de-anonymization of any one of them. Consequently, we define a regularizer by pushing the representation matrices closer to ensure the consistence, i.e., minimizing the following problem: Based on the regularizer term (X ), the view divergence between the published anonymized networks could be well mitigated. Therefore, based on structured low-rank representation in Equation (3), the regularized multi-view low-rank representation problem can be formulated as follows: where E (i) 2,1 represents the 2,1 of i th view, λ and α represent trade-off parameters. Because the rank(·) minimization problem in objective function (7) is difficult to solve, nuclear norm ||X || * , i.e., the sum of its singular values, was proposed as a good surrogate for the rank minimization problem [70], and we subsequently come up with the following problem formulation: To solve the optimization problem as shown in the objective function (8), we adopt the recently proposed inexact augmented Lagrange multiplier (inexact ALM) algorithm in [71]. To facilitate the optimization, the auxiliary variables Q and V are introduced to make the objective function separable: where Y (i) , K (i) and W are the Lagrange multipliers, and µ > 0 is a penalty parameter. The initializations for each variable and the complete optimization algorithm for solving the problem (9) are shown in Algorithm 1. Consequently, the optimal value of X (1) can be combined with the target graph SG T for network structure de-anonymization.

C. MULTI-VIEW LOW-RANK CODING METHOD
The proposed regularizer in Equation (6) models the correlation between the representation matrices X (1) and X (2) via 2,1 norm to utilize the complementary information. However, the values of X (1) and X (2) learned from Algorithm 1 are the approximate representations of the structural patterns of the anonymized social networks and are not accurate enough.
To model the shared structural patterns of multi-view networks directly, we define a common representation matrixX and propose a novel method called MVLRC. The architecture of the proposed multi-view learning model for network structure de-anonymization is given in Fig. 3. Here, we present the details of MVLRC.

1) ALGORITHM EXPLANATION
The optimal value of X (1) in Algorithm 1, i.e., the estimation of the structural patterns of the target network, plays an essential role in network structure de-anonymization. To characterize the structural patterns effectively, the regularization term X (1) −X (2) 2,1 is introduced to encourage the consistent structural information and restrain the discrepancy between the anonymized and the auxiliary networks. Consequently, the optimal values of X (1) and X (2) are robust to the corruptions coming from anonymization manipulations and prone to the common knowledge of the multi-view networks. Algorithm 1 Solving Problem (9) by Inexact ALM Input: The adjacency matrices A (1) ,A (2) of the target and auxiliary networks, trade-off parameter λ and α. Output: The representation matrix X (i) , error matrix E (i) , i = 0, W = 0, µ = 10 −6 , ρ = 1.1, ε = 10 −8 , max µ = 10 10 ; 2: while not converged do 3: Fix the other variables and update Q (i) by Fix the other variables and update X (1) by Fix the other variables and update X (2) by X (2) = Fix the other variables and update V by Fix the other variables and update E (i) by Update the multipliers (1) − X (2) )); 9: Update the parameter µ by µ = min(ρµ, max µ ); 10: Check the convergence conditions However, they are still not an accurate reflection of the networks' structural patterns. To solve the problem, we define the representation matrixX to characterize the common structural patterns and modify the regularized multi-view low-rank representation as follows.
Lemma 1: Solving the network representation model defined in the objective function (10) can accurately char- Proof: For the network representation model A (1) = A (1)X +E (1) and A (2) = A (2)X + E (2) , the low-rank pursuit of X * and noise minimization E (i) 2,1 , i = 1, 2, collectively require the models to reconstruct the networks with neighborhood structures i.e., structural bases, and noises that are as few as possible. Because of the consistency between A (1) and A (2) , the networks have similar structural patterns. Consequently, the structural bases and their contribution to network reconstruction are basically the same. Thus, the networks A (1) and A (2) could be inferred approximately based on a common representation matrixX , i.e., A (1)X and A (2)X , with the sparse differences being modeled by E (i) , i = 1, 2. Finally, the common structural patterns of multi-view networks can be captured by the optimal value of the matrix X accurately.
Example 2: Here, we consider the multi-view networks contained in correlated subgraphs, as shown in Fig. 4 (a) and Fig. 4 (b). To the nonzero regions of their adjacency matrices, we present three different network representations with various constraints. Specifically, the first line in Fig. 4 (a) and Fig. 4 (b) is the network representation in which the representation matrices are full rank and the error matrices are empty. The second line is the network representation in which the rank of representation matrices are reduced to 2, and the error matrices still remain empty. According to the third line, the network representations of the two subgraphs have the same representation matrix with the lowest rank value, i.e., 1, and sparse error matrices. By comparing the three cases, we can conclude that the low-rank and sparse constrains collectively propel the network representation model to represent the multi-view networks with a common representation matrix.
To solve the MVLRC model, we first introduce the auxiliary variableQ to make the objective function (10) separable.
The problem can subsequently be transformed as follows: Thus, the augmented Lagrangian function of the objective function (11) is defined below.
where Y (i) and K are Lagrangian multipliers and µ > 0 is a penalty parameter.
Update E (i) : Here we show how to update E (i) (i = 1,2) with fixedQ andX variables. After dropping the irrelevant terms w.r.t. E (i) (i = 1,2), the function (12) can be transformed as follows: The solution to the problem is presented in [34]. Specifically, let = A (i) − A (i)X + Y (i) µ , the k-th column of E (i) is given as follows: UpdateX : After dropping the terms independent ofX , Equation 12 can be transformed as follows: The above process is repeated until convergence. The detail of the MVLRC algorithm for finding the common structural patterns is presented in Algorithm 2. Fix the other variables and update E (i) via (14); 6: Update the multipliers K and Y (i) Update the parameter µ by µ = min(ρµ, max µ ); 8: Check the convergence conditions ∞ < ε and X −Q ∞ < ε 9: end while 10: outputX , E (i)

D. DE-ANONYMIZATION ALGORITHM
Given an target graph SG T , the goal of our study is to generate a de-anonymized graph SG D based on the topology of SG T and the learned optimal structural patterns, thereby inferring the anonymized links that are perturbed for privacypreserving.
According to the network representation model, the target network can be inferred based on the linear combination of the basis matrix with the representation weight. To maintain the data utility for subsequent data analysis, the anonymized operations for privacy-preserving are always limited, and the anonymized network largely retains the intrinsic structural features of the original network. Thus, the adjacency matrix A (1) of the target network is the strongest candidate to be the basis matrix. In addition, the representation matrix X ( * ) , learned from the anonymized networks by Algorithm 1 or Algorithm 2, captures the structural patterns and can be used for target network reconstruction. Accordingly, we can infer the target network structure by A (1) X ( * ) .
The whole procedures for network structure deanonymization are shown in Algorithm 3. Specifically, in the first step, the structural patterns, captured by the representation matrix of the target network can be efficiently learned from the multi-view social networks by Algorithm 1 (i.e., X (1) ) or Algorithm 2 (i.e.,X ). Then, the adjacency matrix O of the target network can be calculated in Step 2.

E. COMPUTATIONAL COMPLEXITY
In this section, we provide a detailed complexity analysis of the proposed algorithms wherein, the main computation complexity mainly focuses on nuclear norm computation, matrix inversion, and multiplication computation. Specifically, exact SVD of an n × m matrix has time complexity O(min{nm 2 , n 2 m}). In case of a matrix with size m × m, time complexity of SVD is O(m 3 ). It will be time consuming if m is large, i.e., the number of data samples is large. Fortunately, the SVD of an m × m matrix can be accelerated to O(r 2 m) according to [72], where r is the rank of the low-rank matrix. In addition, the computation complexity of matrix inversion and multiplication computation all cost O(m 3 ).
Suppose the A (1) ∈ R n×m and A (2) ∈ R n×m matrices, the main time-consuming components of Algorithm 1 concentrates on the solving of Q (1) and Q (2) in Step 3 and the updating of X (1) and X (2) . Since Q (1) , Q (2) ∈ R n×m , the time complexity of SVD on Q (1) and Q (2) is O(l 1 m 3 ) and O(l 2 m 3 ) respectively, where l 1 and l 2 are the total number of SVD. Meanwhile, in Algorithm 1, the complexity of matrix inverse and multiplication in X (1) and X (2) costs O(l 3 m 3 ) and O(l 4 m 3 ) respectively, where l 3 and l 4 are the total number of matrix inverse and multiplication operations. Therefore, the total computation complexity of Algorithm 1 is approximately O(tl 1 m 3 + tl 2 m 3 + tl 3 m 3 + tl 4 m 3 ), assuming that there are t iterations. Moreover, the main time-consuming processes of Algorithm 2 are SVD computation used in solv-ingQ, and matrix inverse and multiplication in solvingX ; the total complexity of Algorithm 2 is O(tf 1 m 3 + tf 2 m 3 ), where f 1 represents the total number of SVD computations, f 2 is the total number of matrix inverse and multiplication operations, and t is the iteration number. In Algorithm 3, the main complexity comes from the de-anonymization of the target network in Step 2, and the matrix multiplication costs O(nm 2 ) for A (1) ∈ R n×m and X ∈ R n×m . Thus, the total complexity of de-anonymization method combining Algorithm 1 and Algorithm 2 is O((l 1 + l 2 + l 3 + l 4 + f 1 + f 2 )tm 3 ). Because l 1 , l 2 , l 3 , l 4 , f 1 , f 2 , t are all small constants, the complexity is O(m 3 ). Similarly, the total complexity of de-anonymization method combining Algorithm 1 and Algorithm 3 is O(m 3 )

+O(nm 2 ).
It is worthwhile to note that the complexity of the proposed method is a one-time cost and may be performed off-line. Therefore, it is feasible for graphs with a couple of thousands of nodes. For very large graphs with millions of nodes, randomized algorithms may be used to figure out the SVD. Furthermore, when compared with the existing graph data deanonymization algorithms, the complexity of the proposed methods are competitive. For example, the method proposed by Narayanan et al. [73] is O(n 4 ), the method in [74] costs O(n 3 ), and the time complexity of the methods in [75] is O(n 3 ).

V. EXPERIMENTS
In this section, the performance of the proposed MVLRC algorithm is evaluated on two synthetic networks, and three real-world datasets which are obtained from Stanford Network Analysis. 1 We adopt Reliability and AUC as performance metrics, and verify the superiority of MVLRC algorithm over the comparison methods under various anonymization techniques. In addition, we evaluate the robustness of MVLRC algorithm with diverse parameters setting, and visually compare the true and identified anonymized links to demonstrate the effectiveness of the MVLRC algorithm.

A. EXPERIMENTAL SETTING
Our approach is compared to three state-of-the-art singleview and multi-view structure de-anonymization methods. The details of the methods are given as follows: • RPCA based Recovery Approach. RPCA [76], [77] is applied for subspace segmentation and link prediction. Thus, we identify the true network structure and infer the anonymized link set by conducting RPCA on the anonymized network (referred to as ''RPCA'').
• LRR based Recovery Approach. LRR [34] is a representative method to recover the original row space from a set of corrupted observations. Here, we utilize LRR to model the anonymized network where the anonymized links are viewed as noise, outliers, and sample-specific corruptions (referred as ''LRR'').
• MVLRR. The method recovers the target network by combining Algorithm 1 and Algorithm 3, where the anonymized auxiliary network is incorporated by regularization.
• MVLRC. The method recovers the target network by combining Algorithm 2 and Algorithm 3, where the common structural patterns are characterized by a specific representation matrix.    To measure the accuracy of the proposed MVLRC method for network structure de-anonymization and anonymized links identification, we adopt reliability as the evaluation metric, defined as follows: where AN is the number of added links being accurately found, DN is the number of deleted links being accurately found, and TAN and TDN are the total number of added links and deleted links for network structure anonymization, respectively. The more the anonymized links being identified by the de-anonymization algorithms, the higher the value of the reliability metric. Moreover, the metric AUC (Area Under the Receiver operating characteristic curve) is also adopted for the performance evaluation of network structure de-anonymization i.e., a higher value of AUC means a better network structure de-anonymization performance. Algorithm 4 details the overall evaluation process of MVLRC. Firstly, based on the target network and auxiliary network, we infer target networks using the deanonymization methods. Then, we compare the inferred  target network with the original target network and obtain the difference between them. Next, we sort the possible links and compare them with the real anonymized link set to calculate the evaluation metric.

Algorithm 4 Experimental Evaluation Process
Input: The adjacency matrix A of the target network, the adjacency matrices A (1) , A (2) of the target network and auxiliary network, and the real anonymized link set I generated by various anonymization techniques in the target network. Output: The values of evaluation metric Reliability and AUC. 1: Estimate target network O based on A (1) and A (2) using all candidate structure de-anonymization methods; 2: Calculate the discrepancy link set by S = A − O; 3: Rank the entries of S based on their absolute values s i,j , and select the top |I | entries as the inferred anonymized link set P in which the negative items correspond to the deleted links, and the positive items correspond to the added links in the anonymization process; 4: Calculate the evaluation metric Reliability and AUC by comparing P and I ; 5: Return the values of Reliability and AUC.

B. RESULTS ON SYNTHETIC NETWORKS
In this section, we conduct experiments on synthetic networks to confirm the expectation that MVLRC would perform well in this case. Based on Newman-Watts-Strogatz model [78], we generate a small-world network in which the node number is set to 1000, average node degree is set to 8 and the probability of random reconnection is set to 0.3. Meanwhile, we produce LFR community network with Lancichinetti-Fortunato-Radicchi (LFR) model [79] where the node number is set to 1000, average node degree is 5, and the mixing ratio is 0.2. Table 3 and Table 4 show the reliability and AUC results of RPCA, LRR, MVLRR, and MVLRC under different anonymization techniques on the small-world network. It can be clearly seen that MVLRC outperforms better than the other methods for anonymized links inference, and the reason is that the structural patterns of small-world network can be better captured by MVLRC. Moreover, a similar conclusion can be drawn from the experimental results on LFR network, as shown in Table 5 and  Table 6. Therefore, the proposed MVLRC method is effective for the de-anonymization of synthetic networks.

C. RESULTS ON EMAIL-EuAll DATABASE
The Email-EuAll Database [80] is obtained from a large, undisclosed European research institution, and contains 3,038,531 emails between 287,755 different email addresses. Nodes represent individual persons who sent or received email messages, and links denote emails having been sent or received from one person to the others. We view the database as a simple, undirected graph. Because the large scale of real-world networks always make the experiments based directly on the them to be time-consuming and sometimes impractical, researchers related to social network analysis often sample the real-world networks firstly and then conduct experiments on the sampled networks. Similarly, in our study, all algorithms are tested on the networks with the size of 1000 randomly sampled from the database. The sampled networks are anonymized with the techniques 94586 VOLUME 8, 2020     presented in Section 2.2 to generate the target and auxiliary networks. For each sampled network, we repeat our experiments 10 times and report the average result. Table 7 shows the results of RPCA, LRR, MVLRR, and MVLRC under different anonymization techniques. We can observe that multi-view approaches MVLRR and MVLRC outperform the single view methods RPCA and LRR in terms of reliability. The results demonstrate that the auxiliary network is valuable for de-anonymization optimization, and the proposed multi-view framework is effective for complementary information modeling. In addition, Table 7 shows that LRR outperforms RPCA when only the target network is available. This can be explained by the reason that LRR has a stronger expressive capability than that of RPCA. For the multi-view modeling based network structure deanonymization, Table 7 illustrates that MVLRC outperforms MVLRR. Here, the advantages of MVLRC are mainly due to its methodology. Specifically, MVLRC directly targets on learning the common structural patterns by a specific representation matrix, which determines the de-anonymization results. In contrast, MVLRR is proposed for learning the target network's structural patterns and the auxiliary network's structural patterns respectively, constrained by a regularization term. Moreover, according to Table 7, the values of the reliability metric under densification anonymization strategy obviously increase with anonymization coefficient k increasing, which indicates that the densification strategy tends to be inefficient for network structure anonymization. Meanwhile, Table 8 shows the de-anonymization results in terms of AUC metric. The results show that MVLRC generally obtains the best performance among the de-anonymization methods under various anonymization techniques and coefficients, and the multi-view de-anonymization algorithms perform better than the single-view ones.
Besides the superiorities in terms of de-anonymization accuracy, another advantage of MVLRC is that it works well under a wide range of parameter specifications, as shown in Fig. 5. It can be seen that the MVLRC algorithm is better than the other methods as the parameter λ varies from 0.10 to 0.18, as shown in Fig. 5 (a). Moreover, notice that MVLRC is not sensitive to the parameter λ on this dataset. With the size of the sampled network growing from 500 to 1,500, MVLRC is better than the other methods in all cases, as shown in Fig. 5 (b). It is worthy to note that there have been similar results when the other anonymization techniques are adopted.
To test the effectiveness of MVLRC for network structure de-anonymization, we visually compare the true anonymized link set and the identified anonymized links, as shown in Fig. 6. In detail, the added and deleted links in the anonymized network are presented in Fig. 6 (a). By comparison, in Fig. 6 (b), the orange lines indicate the identified anonymized links while the blue lines represent the undetected ones. It is worth noting that most of the anonymized links are identified correctly.

D. RESULTS ON FACEBOOK DATABASE
The Facebook Database [81] contains information of nearly 10 million pairs of users on Facebook. The website aims to promote and facilitate the interactions across friends, colleagues, etc. For example, if user A and user B are friends, or they have same political tendency and hobbies, the network would create a link between them with a high probability. Here we sample the dataset randomly into a network with 1,000 nodes and conduct RPCA, LRR, MVLRR, and MVLRC on it. Table 9 shows the de-anonymization results under different anonymization strategies and various anonymization coefficients. It can be seen that our proposed MVLRC algorithm performs better than the others in terms of reliability metric. The experimental results agree with the discussions addressed in Section 4, which shows that the auxiliary network does contain valuable information for deanonymization, and the MVLRC method can perform well in capturing common structural information. Table 10 shows the performance of de-anonymization methods in term of AUC. The results demonstrate that MVLRC outperforms the other methods.
To examine the robustness of the proposed method, we perform experiments with various trade-off parameters and different network sizes. We present the reliability values corresponding to the de-anonymization methods in Fig. 7. The   results verify the superiority of the MVLRC method under various conditions, which confirms the results obtained from the previous subsection.
After reconstructing the target network, we infer the anonymized links by comparing the recovered target network with the original target network. Then, we estimate the consistency between the inferred anonymized links and true anonymized links to measure the de-anonymization accuracy. The detailed results presented in Fig. 8 show that the MVLRC has excellent performance for network structure deanonymization.

E. RESULTS ON BITCOIN-ALPHA DATABASE
To further verify the de-anonymization performance of MVLRC, we adopt the Bitcoin-Alpha dataset [82] for evaluation. The data is collected from Bitcoin Alpha, which is an online trust platform for users who make a deal by using Bitcoin. Since anonymity makes transactions risky, many researchers use Bitcoin-Alpha and Bitcoin-OTC to verify the effectiveness of de-anonymization algorithms. We transform the Bitcoin-Alpha dataset into an undirected graph with a weight value of 1, and then use the sampled subgraphs as experimental networks.
Considering the experimental results on EuAll-Email network and Facebook network, our algorithm performs better than other methods on the Bitcoin-Alpha network. Specifically, Table 11 and Table 12 show the evaluation results with different anonymization strategies and various anonymization coefficient on the size of 1000 nodes in terms of reliability and AUC. When compared with the single view methods, multi-view learning algorithms show significant performance improvements. Furthermore, for the multi-view  learning algorithms, MVLRC results in a better performance than MVLRR for network structure de-anonymization. Moreover, the experimental results in Fig. 9 under different parameter settings prove the robustness of our MVLRC approach. In addition, the results in Fig. 10 illustrate the effectiveness of our MVLRC method for anonymized network recovery.

F. TIME CONSUMPTION
We explore the average time consumption of the proposed approaches, RPCA and LRR over 10 runs on two synthetic networks and three real-world databases, as shown in Fig. 11. We can observe from the results that multi-view low rank learning methods MVLRR and MVLRC generally cost more time than single-view low rank learning methods RPCA and LRR. However, compared with single-view low rank learning methods, the multi-view low rank learning methods with higher computational cost have better performance for link inference. Moreover, about the multi-view low rank learning methods, we can see that the running time of MVLRC is much lower than that of MVLRR, which demonstrates the efficiency of the proposed multi-view network structural learning procedure.

VI. CONCLUSION AND DISCUSSION
Data publication has become a vital foundation for big data analysis and applications; however, inappropriate sharing and usage of data could threaten users' privacy. To the commonly existing multi-view network data, in this study, we propose the MVLRC algorithm for network structure de-anonymization. The method models the target network and auxiliary network together to learn the common structural patterns, thereby identifying the anonymized link set of target network. Our empirical results on real-world networks show highly promising improvements in accuracy of anonymized links inference compared with the methods that only utilize single-view data. Therefore, besides the target network, the auxiliary networks collected from different viewpoints could also be explored to strengthen privacy inference attacks, which challenges the traditional privacy protection methods.
Three problems for future research are worthy to be considered: 1) investigating the performance of network deanonymization algorithms in the face of more sophisticated privacy preserving techniques, 2) exploring the influence of auxiliary network on de-anonymization accuracy theoretically, and 3) developing effective anonymization methods for multi-view network data.