Identifying the Topology of Undirected Networks from Diffused Non-stationary Graph Signals

We address the problem of inferring an undirected graph from nodal observations, which are modeled as non-stationary graph signals generated by local diffusion dynamics that depend on the structure of the unknown network. Using the so-called graph-shift operator (GSO), which is a matrix representation of the graph, we first identify the eigenvectors of the shift matrix from realizations of the diffused signals, and then estimate the eigenvalues by imposing desirable properties on the graph to be recovered. Different from the stationary setting where the eigenvectors can be obtained directly from the covariance matrix of the observations, here we need to estimate first the unknown diffusion (graph) filter -- a polynomial in the GSO that preserves the sought eigenbasis. To carry out this initial system identification step, we exploit different sources of information on the arbitrarily-correlated input signal driving the diffusion on the graph. We first explore the simpler case where the observations, the input information, and the unknown graph filter are linearly related. We then address the case where the relation is given by a system of matrix quadratic equations, which arises in pragmatic scenarios where only the second-order statistics of the inputs are available. While such quadratic filter identification problem boils down to a non-convex fourth order polynomial minimization, we discuss identifiability conditions, propose efficient algorithms to approximate the solution and analyze their performance. Numerical tests illustrate the effectiveness of the proposed topology inference algorithms in recovering brain, social and urban transportation networks using synthetic and real-world signals.


I. INTRODUCTION
Consider a network represented as a weighted and undirected graph G, consisting of a node set N of cardinality N , an edge set E of unordered pairs of elements in N , and edge weights A ij ∈ R such that A ij = A ji = 0 for all (i, j) ∈ E. The edge weights A ij are collected in the symmetric adjacency matrix A ∈ R N ×N . More broadly, one can define a generic graph-shift operator (GSO) S ∈ R N ×N as any matrix having the same sparsity pattern than that of G [2]. Although the choice of S can be adapted to the problem at hand, it is often chosen as either A, the Laplacian L := diag(A1) − A, or its normalized counterparts [3].
Work in this paper is supported by the Spanish MINECO grants OMICRON (TEC2013-41604-R) and KLINILYCS (TEC2016-75361-R), and MIT IDSS seed grant. R. Shafipour and G. Mateos are with the Dept. of Electrical and Computer Eng., Univ. of Rochester. S. Segarra is with the Inst. for Data, Systems, and Society, Massachusetts Inst. of Technology. A. G. Marques is with the Dept. of Signal Theory and Comms., King Juan Carlos Univ. Emails: rshafipo@ur.rochester.edu, segarra@mit.edu, antonio.garcia.marques@urjc.es, and gmateosb@ece.rochester.edu. Part of the results in this paper were presented at the 2017 ICASSP Conference [1].
Our focus in this paper is on identifying graphs that explain the structure of a random signal. Formally, let x = [x 1 , ..., x N ] T ∈ R N be a zero-mean graph signal with covariance matrix C x = E xx T , in which the ith element x i denotes the signal value at node i of an unknown graph G with shift operator S. We say that the graph S represents the structure of the signal y ∈ R N if there exists a diffusion process in the GSO S that produces the signal y from the input signal x, that is for some set of parameters {α l } or, equivalently, {β l }. While S encodes local one-hop interactions, each successive application of the shift in (1) percolates x over G; see e.g. [4]. The product and sum representations in (1) are common equivalent models for the generation of random network processes. Indeed, any process that can be understood as the linear propagation of an input signal through a static graph can be written in the form in (1), subsuming heat diffusion [5], consensus and the classic DeGroot model of opinion dynamics [6], as well as symmetric structural equation models (SEMs) [7] as special cases. When x is white so that C x = I N , (1) is equivalent to saying that the graph process y is stationary in S; see e.g., [8,Def. 1], [9], [10] and Section II-A for further details. Here though, we deal with more general non-stationary signals y that adhere to linear diffusion dynamics as in (1), but where the input covariance C x can be arbitrary. This is for instance relevant to (geographically) correlated sensor network data, urban population mobility patterns, or to models of opinion dynamics, where (even before engaging in discussion) the network agents can be partitioned into communities according to their standing on the subject matter. The justification to say that S represents the structure of y is that we can think of the edges of S as direct (one-hop) relations between the elements of the signal. The diffusion described by (1) modifies the original correlation by inducing indirect (multi-hop) relations. In this context, our goal is to recover the fundamental relations dictated by S from a set of independent samples of a non-stationary random signal y, as well as realizations of x, or more pragmatically, knowledge of C x . This additional information on the input x is the price paid to accommodate the more general non-stationary generative models for y, and is not needed when identifying the structure of stationary graph signals [11], since C x = I N in that case. Relation to prior work. Under the assumption that the signals are related to the topology of the graph where they are supported, the goal of graph signal processing (GSP) is to develop algorithms that fruitfully leverage this relational structure, and can make inferences about these relationships when they are only partially observed [3], [2]. Most GSP efforts to date assume that the underlying network is known, and then analyze how the graph's algebraic and spectral characteristics impact the properties of the graph signals of interest. However, such assumption is often untenable in practice and arguably most graph construction schemes are largely informal, distinctly lacking an element of validation.
Network topology inference is a prominent task in Network Science [7,Ch. 7]. It is also central to predicting emergent complex behavior as well as monitoring the operation of (networked) critical infrastructures, including the smart power grid, transportation systems and the Internet backbone [12], [13], [14]. Since networks typically encode similarities between nodes, several topology, inference approaches construct graphs whose edge weights correspond to nontrivial correlations or coherence measures between signal profiles at incident nodes [7], [15]. Acknowledging that the observed correlations can be due to latent network effects, alternative methods rely on inference of full partial correlations [7,Ch. 7.3.2]. Under Gaussianity assumptions, there are well-documented connections with covariance selection [16] and sparse precision matrix estimation [17], [18], [19], [20], [21], as well as high-dimensional sparse linear regression [22]. Extensions to directed graphs include SEMs [23], [24], [25], Granger causality [26], [15], or their nonlinear (kernelized) variants [27], [28]. Recent GSP-based network inference frameworks postulate instead that the network exists as a latent underlying structure, and that observations are generated as a result of a network process defined in such graph [11], [29], [30], [31], [32], [33]. Different from [30], [32], [31], [33] that operate on the graph domain, the goal here is to identify graphs that endow the given observations with desired spectral (frequencydomain) characteristics. Two works have recently explored this approach by identifying a GSO given its eigenvectors [11], [29], but both rely on observations of stationary graph signals. Different from [32], [33], [34], [35] that infer structure from signals assumed to be smooth over the sought graph, here the measurements are related to the graph via filtering (e.g., modeling the diffusion of an idea or the spread of a disease).
Smoothness models are subsumed as special cases found with diffusion filters having a low-pass frequency response. Paper outline. In Section II we formulate the problem of identifying a GSO that explains the fundamental structure of a random signal diffused on a graph. While for stationary y the sought GSO shares its eigenvectors with the signal's covariance matrix [8], [9], [11], in the general (non-stationary) setting dealt with here this no longer holds and we elaborate on the ensuing challenges (Section II-A). Still, the graph's eigenvectors are preserved by the polynomial graph filter that governs the underlying diffusion dynamics (1). This motivates a novel two-step network topology inference approach whereby we: i) identify the GSO's eigenbasis from a judicious graph filter estimate; and ii) rely on these spectral templates to estimate the GSO's eigenvalues such that the inferred graph exhibits desirable structural characteristics (e.g., sparsity or minimum-energy edge weights; see also Section II-B). The estimation of the diffusion filter in step i), which is not required when the signals are stationary [11], has merit on its own and is of interest beyond topology inference. Feasibility of this additional system identification task requires extra information on the excitation signal x. Section III addresses the (simpler) scenario where direct observations of the inputs are available so that the unknown filter matrix and the input-output signal pairs are linearly related. The focus in Section IV shifts to scenarios where second-order statistical information is used, and the relationship between the input-output covariances and the filter is quadratic. Identifiability conditions for the noisefree case are discussed and particular cases for which the problem can be recast as a convex quadratic optimization are described. Section V develops projected gradient descent and semidefinite relaxation-based algorithms with complementary strengths, to deal with the (non-convex) fourth-order polynomial minimization associated with the recovery problem in Section IV. Numerical tests with synthetic and real-world data corroborate the effectiveness of the novel approach in recovering the topology of social, brain and transportation networks (Section VI). Concluding remarks are given in Section VII. Notation. The entries of a matrix X and a (column) vector x are denoted by X ij and x i , respectively. Sets are represented by calligraphic capital letters. The notation T and † stands for transpose and pseudo-inverse, respectively; 0 and 1 refer to the all-zero and all-one vectors; while I N denotes the N × N identity matrix. For a vector x, diag(x) is a diagonal matrix whose ith diagonal entry is x i . The operators ⊗, ⊙, and vec(·) stand for Kronecker product, Khatri-Rao (columnwise Kronecker) product, and matrix vectorization, respectively. Lastly, X p denotes the ℓ p norm of vec(X) and ker(X) refers to the null space of X. The spectral radius of matrix X is denoted by λ max (X).
II. PROBLEM STATEMENT Consider the generative model in (1), whereby the properties of the graph signal y depend on those of the excitation input x and the underlying graph G represented by the GSO S. Given realizations of the output and prior information on the input, the goal is to infer a parsimonious graph representation that explains the structure of y. Alternatively, we can say that the goal is to recover the GSO which encodes direct relationships between the elements of y from observable indirect relationships generated by a diffusion process.
To formally state the problem, consider the symmetric GSO S associated with the undirected graph G. Define the eigenvector matrix V := [v 1 , . . . , v N ] and the eigenvalue matrix Λ := diag(λ 1 , . . . , λ N ) to write S = VΛV T . Now observe that while the diffusion expressions in (1) are polynomials on the GSO of possibly infinite degree, the Cayley-Hamilton theorem asserts they are equivalent to polynomials of degree smaller than N . Upon defining the vector of coefficients h := [h 0 , . . . , h L−1 ] T ∈ R L and the symmetric graph filter H := [2], the generative model in (1) can be rewritten as for some particular h and L ≤ N . Given a set Y := {y (p) } P p=1 of P independent samples of a non-stationary random signal y adhering to the network diffusion model (2), the problem is to identify the GSO S which is optimal in some sense as described in Section II-B. Fundamental to the topology inference approach developed here is to note that because H is a polynomial on S, then: i) all such graph filters (spanned by the unknown coefficients h) have the same eigenvectors; and ii) such eigenvectors are the same as those of the shift, namely V. In other words, while the diffusion implicit in H obscures the eigenvalues of the GSO, the eigenvectors V are preserved as spectral templates of the underlying network topology. Next, Section II-A describes how to leverage (2) to obtain the GSO eigenbasis from a set of nodal observations Y, by first estimating the unknown graph filter H. We show that the information in Y is in general not enough to uniquely recover H. Hence, we will resort to additional knowledge on the input signal x (either realizations, sparsity properties, or, second-order statistical information) and also possibly on the structure of the graph filter H. Section II-B outlines how to use the spectral templates V to recover the desired GSO by estimating its eigenvalues Λ and, as byproduct, the graph shift S = VΛV T itself.

A. Stationary versus non-stationary observations
Consider that the P observations in Y correspond to independent realizations of a process y adhering to the generative model in (2). The goal is to use Y to estimate the spectral templates V of the filter H that governs the diffusion in (2).
To gain insights, suppose first that x is white so that C x = I N [11]. Then the covariance matrix of y = Hx is In obtaining the third equality with used that H is symmetric, because it is a polynomial in the symmetric GSO S. Using the spectral decomposition of S = VΛV T to express the filter as H = can diagonalize the covariance matrix as Such a covariance expression is precisely the requirement for a graph signal to be stationary in S [8, Def. 2.b]. Remarkably, if y is graph stationary, or equivalently if x is white, (4) shows that the eigenvectors of the shift S, the filter H, and the covariance C y are all the same. As a result, to estimate V from the observations {y (p) } P p=1 it suffices to form the sample covarianceĈ y = 1 P P p=1 y (p) (y (p) ) T and use its eigenvectors as spectral templates to recover S [11], [29]; see also Section II-B.
In this context, the broader focus of the present paper is on identifying the GSO S that is considered to be the best possible description of the structure of a non-stationary signal y = Hx [cf. (2), where x is not white]. For generic (nonidentity) input covariance matrix C x , we face the challenge that the signal covariance [cf. (3)] is no longer simultaneously diagonalizable with S. This rules out using the eigenvectors of the sample covarianceĈ y as spectral templates of S. Still, as argued following (2) the eigenvectors of the GSO coincide with those of the graph filter H that governs the underlying diffusion dynamics. This motivates using realizations of observed signals together with additional information on the excitation inputs x (either realizations of the graph signals, sparsity assumptions, or the covariance matrix C x [25]) to identify the filter H, with the ultimate goal of estimating its eigenvectors V. This system identification task in the graph setting is the subject dealt with in Sections III and IV, but before we close the loop showing how to recover S given its estimated eigenbasisV.
B. Using the spectral templates to recover the shift Given estimatesV of the filter eigenvectors, recovery of S amounts to selecting its eigenvalues Λ and to that end we assume that the shift of interest is optimal in some sense. At the same time, we should account for the discrepancies betweenV and the actual eigenvectors of S, due to finite sample size constraints and unavoidable errors in estimating the filter H. Accordingly, we build on [11] and seek for the shift operator S that: (a) is optimal with respect to (often convex) criteria f (S); (b) belongs to a convex set S that specifies the desired type of shift operator (e.g., the adjacency A or Laplacian L); and (c) is close toVΛV T as measured by a convex matrix distance d(·, ·). Formally, one can solve which is a convex optimization problem provided f (S) is convex, and ǫ is a tuning parameter chosen based on a priori information on the imperfections. Within the scope of the signal model (1), the formulation (6) entails a general class of network topology inference problems parametrized by the choices in (a)-(c) above. The spectrum of alternatives is briefly outlined next, while concrete choices are made for the numerical tests in Section VI.
Criteria. The selection of f (S) allows to incorporate physical characteristics of the desired graph into the formulation, while being consistent with the spectral templatesV. For instance, the matrix (pseudo-)norm f (S) = S 0 which counts the number of nonzero entries in S can be used to minimize the number of edges towards identifying sparse graphs (e.g., of direct relations among signal elements); f (S) = S 1 is a convex proxy for the aforementioned edge cardinality function. Alternatively, the Frobenius norm f (S) = S F can be adopted to minimize the energy of the edges in the graph, or f (S) = S ∞ can be chosen to obtain shifts S associated with graphs of uniformly low edge weights. This can be meaningful when identifying graphs subject to capacity constraints.
Constraints. The constraint S ∈ S in (6) incorporates a priori knowledge about S. If we let S = A represent the adjacency matrix of an undirected graph with non-negative weights and no self-loops, we can explicitly write S as follows The first condition in S A encodes the non-negativity of the weights whereas the second condition incorporates that G is undirected, hence, S must belong to the set H N of real and symmetric N × N matrices. The third condition encodes the absence of self-loops, thus, each diagonal entry of S must be null. Finally, the last condition fixes the scale of the admissible graphs by setting the weighted degree of the first node to 1, and rules out the trivial solution S = 0. Other GSOs (e.g., the Laplacian L and its normalized variants) can be accommodated in our framework via minor modifications to S; see [11].
The form of the convex matrix distance d(·, ·) depends on the particular application. For instance, if S −VΛV T F is chosen the focus is more on the similarities across the entries of the shifts, while S −VΛV T 2 focuses on their spectrum.

III. (BI)LINEAR GRAPH FILTER IDENTIFICATION
Consider m = 1, . . . , M diffusion processes on G, and assume that the observed non-stationary signal y m corresponds to an input x m diffused by an unknown graph filter H = L−1 l=0 h l S l , which encodes the structure of the network via S. In this section we show how additional knowledge about realizations of the input signals x m can be used to identify H and, as byproduct, its eigenvectors V. We consider settings in which this extra information comes either from direct observation of {x m } M m=1 , or through an assumption on input signal sparsity.

A. Input-output signal realization pairs
Suppose first that realizations of M output-input pairs {y m , x m } M m=1 are available, which can be arranged in the data matrices Y = [y 1 , ..., y M ] and X = [x 1 , ..., x M ]. The goal is to identify a symmetric filter H ∈ H N such that the observed signal y m and the predicted one Hx m are close in some sense. In the absence of measurement noise this simply amounts to solving a system of M linear matrix equations When noise is present, using the workhorse least-squares (LS) criterion the filter can be estimated as Because H is symmetric, the free optimization variables in (9) correspond to, say, the lower triangular part of H, meaning the entries on and below the main diagonal. These N H := N (N + 1)/2 non-redundant entries can be conveniently arranged in the so-termed half-vectorization of H, i.e., a vector vech(H) ∈ R NH from which one can recover vec(H) ∈ R N 2 via duplication. Indeed, there exists a unique duplication matrix D N ∈ {0, 1} N 2 ×NH such that one can write D N vech(H) = vec(H) [36]. The Moore-Penrose pseudoinverse of D N , denoted as D † N , possesses the property vech(H) = D † N vec(H). With this notation in place, several properties of the solution H * of (9) are stated next.
Proposition 1: Let M r denote the rank of matrix X ∈ R N ×M . Then, it holds that: a) The entries of the symmetric filter H * that solves (9) c) The minimizer of (9)  is sufficiently rich -i.e., if M ≥ N and the excitation signals are linearly independent -, the entries of the diffusion filter H can be found as the solution of an LS problem. Interestingly, that H has only N (N + 1)/2 different entries cannot be exploited to reduce the number M of input signals required to identify the filter. The reason is that the matrix (X T ⊗ I N )D N is rank deficient if X T has a non-trivial null space. In other words, when using input-output pairs to estimate the filter H one needs the same number of pairs, regardless of whether the graph is symmetric or not. Symmetry, however, can be exploited to enhance performance in overdetermined scenarios with noisy observations; see also the tests in Section VI.
As explained in Section II-B, once H * is estimated using (10), the next step is to decompose the filter as H * =VΛV T and useV as input for the GSO identification problem (6). Note that obtaining such an eigendecomposition is always possible since filter estimates H * ∈ H N are constrained to be symmetric.

B. Sparse input signals
It is not uncommon to encounter application domains in which the diffused graph signals adhere to linear network dynamics y = Hx and the input x is sparse, having only a few nonzero entries. Sparsity in x is well-motivated due to its practical relevance and modeling value -network processes such as y are oftentimes the diffused version of few localized sources, hereby indexed by supp(x) := {i : x i = 0} [37], [30]. For instance, opinion formation processes in social networks have been modeled using graph filters (see e.g., [4], [6]), and sparse x could represent the initial opinion of those few influential actors that instilled the observed status-quo. Similar ideas are naturally relevant to linear network dynamics encountered with rumor spreading, adoption of new technologies, epidemic outbreaks [37], as well as with identification of heat, pollutant, or seismic localized sources [30].
Given realizations of M diffusion processes {y m } M m=1 arranged as columns of matrix Y ∈ R N ×M , a possible formulation of the graph filter identification problem amounts to finding H ∈ H N such that Y is close to HX, where the unobserved matrix X = [x 1 , ..., x M ] is assumed to be sparse. Different from Section III-A, input realizations are now unavailable and the resulting bilinear problem entails finding the decomposition factors up to unavoidable scaling and permutation ambiguities. In the absence of noise, recent fundamental results and accompanying algorithms developed in [38] can be brought to bear here. Regarding identifiability, it is established therein that M = O(N log N ) samples are sufficient to uniquely determine the decomposition with high probability, under the assumption that X is generated by a sparsity-inducing Bernoulli-Gaussian or Bernoulli-Rademacher process [38,Theorem 1]. From an algorithmic standpoint, an efficient dictionary learning procedure called Exact Recovery of Sparsely-Used Dictionaries (ER-SpUD) is proposed that solves a sequence of linear programs with varying constraints. Under the aforementioned assumptions, the algorithm exactly recovers H and X with high probability. This result holds when: (i) the sparsity level measured by the expected number of nonzero elements in each column of X is at most of order √ N ; and (ii) the number of samples M is at least of order N 2 log 2 N .
Related blind identification problems on graphs were formulated in [37], where H = L−1 l=0 h l S is assumed to be a polynomial graph filter of unknown coefficients h but where the GSO S is known. Given the particular structure of H, it should not be surprising that sparse inputs in X along with the filter coefficients h can be identified from few measurements of Y = HX. Instead, the remarkable results in [38] are valid for arbitrary H ∈ R N ×N ; hence, useful when the graph is unknown as in the topology inference setting dealt with here.

IV. QUADRATIC GRAPH FILTER IDENTIFICATION
In a number of applications, realizations of the excitation input process x m may be challenging to acquire, but information about the statistical description of x m could still be available. To be specific, assume that the excitation input processes are zero mean and their covariance C is known. Further suppose that for each input x m , we have access to a set of output observations {y (p) m } Pm p=1 , which are then used to estimate the output covariance aŝ , the aim is to identify a filter H such that matricesĈ y,m and HC x,m H are close.
Assuming for now perfect knowledge of the signal covariances, the above rationale suggests studying the solutions of the system of matrix quadratic equations To gain some initial insights, consider first the case where M = 1 and henceforth drop the subindex m so that we can write (11) as (5). Given the eigendecomposition of the symmetric and positive semidefinite (PSD) covariance matrix C y = V y Λ y V T y , the principal square root of C y is the unique symmetric and PSD matrix C 1/2 y which satisfies y stands for a diagonal matrix with the nonnegative square roots of the eigenvalues of C y .
With this notation in place, introduce the matrices C xyx := C 1/2 x . Clearly, C xyx is both symmetric and PSD. Regarding the transformed filter H xx , note that by construction we have that H xx is symmetric. Moreover, if H is assumed to be PSD, then so will be H xx . These properties will be instrumental towards characterizing the solutions of the matrix quadratic equation C y = HC x H in (5), which can be equivalently recovered from the solutions H xx of

A. Positive semidefinite graph filters
Let us suppose first that H is PSD (henceforth denoted H ∈ H ++ N ), so that H xx in (12) is PSD as well. Such filters arise, for example, with heat diffusion processes of the form y = ( ∞ l=0 β l L l )x with β > 0, where the Laplacian shift L is PSD and the filter coefficients h l = β l are all positive. In this setting, the solution of (12) is unique and given by the principal square root Consequently, if C x is nonsingular (so that the excitation inputs are not degenerate), the definition of H xx can be used to recover H via The previous arguments demonstrate that the assumption H ∈ H ++ N gives rise to a strong identifiability result. Indeed, if {C y,m } M m=1 are known perfectly, a PSD graph filter is identifiable even for M = 1.
However, in pragmatic settings where only empirical covariances are available, then observation of multiple (M > 1) diffusion processes can improve the performance of the system identification task. Given empirical covariances {Ĉ y,m } M m=1 respectively estimated with enough samples P m to ensure they are of full rank, for each m defineĈ xyx,m := C 1/2 x,mĈy,m C 1/2 x,m . The quadratic equation (12) motivates solving the LS problem Whenever the number of samples P m -and accordingly the accuracy of the empirical covariancesĈ y,m -differs significantly across diffusion processes m = 1, . . . , M , it may be prudent to introduce non-uniform coefficients to downweigh those residuals in (15) with inaccurate covariance estimates. The following proposition offers insights on the solution to (15), and extensions to weighted LS criteria are straightforward. Proposition 2: Define the matrices Then, the filter H * that solves (15) can be found as Moreover, if M = 1 and matrix C x,1 is nonsingular, the minimizer H * is unique and given by Proof: To show (16) one can follow steps similar to those for (10) in Proposition 1. The identifiability result for M = 1 follows from the arguments leading to (14), noting that the cost in (2) vanishes if H * is selected to satisfyĈ

B. Symmetric graph filters
Consider now a more general setting whereby H is only assumed to be symmetric, and once more let M = 1 first to simplify notation. With the unitary matrix V xyx denoting the eigenvectors of C xyx and with b ∈ {−1, 1} N being a binary (signed) vector, one can conclude that solutions to (12) have the general form If the input covariance matrix C x is nonsingular, all symmetric solutions H ∈ H N of (12) [and hence of (5)] are given by In the absence of the PSD assumption, the problem for M = 1 is non-identifiable. Inspection of (19) shows there are 2 N possible solutions to the quadratic equation (5), which are parametrized by the binary vector b. For the PSD setting in Section IV-A the solution is unique and corresponds to b = 1.
For M > 1, the set of feasible solutions to the system of equations (11) is naturally given by It is thus conceivable that as M grows and, therefore, the number of equations increases, the cardinality of H sym M shrinks and the problem is rendered identifiable (up to an unavoidable sign ambiguity because if H ∈ H N is a solution of (11), so is −H). Next, we show that even with two excitation inputs having covariances C x,1 and C x,2 with identical eigenvectors, uniqueness can be established as long as their eigenvalues are sufficiently different.
Proposition 3: Consider the system of quadratic equations (11) for M = 2 and suppose C x,1 = Udiag(λ 1 )U T and C x,2 = Udiag(λ 2 )U T . Then (11) has a unique symmetric solution, i.e., H = VΛV T is identifiable up to a sign ambiguity if the following conditions hold: A-1) All eigenvalues in λ 1 are distinct; A-2) λ 1,i λ 2,j = λ 1,j λ 2,i for all i, j; A-3) V and U do not share any eigenvector; and A-4) rank(H) = N . Proof: See Appendix B. Conditions A-1) and A-2) encode a notion of richness on the excitation signals. In fact, condition A-2) is the specification for M = 2 of a generalizable requirement based on the Kruskal rank of a matrix related to the eigenvalues of the excitation processes (see Appendix B). Under this perspective, it becomes immediate that larger M facilitate the fulfillment of this more general requirement, leading to the expected conclusion that the more input processes we consider, the easier it becomes to identify H. Moreover, from the proof arguments it follows that symmetry of H is essential (see Lemma 2 in Appendix B). Actually, if one lifts the symmetry assumption and all input covariances have the same eigenvectors, the problem remains non-identifiable even for high values of M (regardless of the input covariance eigenvalues).

V. ALGORITHMS
Building on the findings in Section IV-B, here we propose two algorithms with complementary strengths to tackle the quadratic filter identification problem when the only assumption is for H to be symmetric (undirected graph), but not (necessarily) PSD.

A. Projected gradient descent
Going back to the beginning of Section IV, given realizations {y where the transpose of the symmetric H is written explicitly to simplify subsequent mathematical derivations. Weighted variants of the criterion could also be pertinent here, as discussed following (15). Problem (21) is a non-convex fourthorder polynomial optimization, which can potentially have multiple solutions. Since finding H * is challenging, we seek computationally efficient algorithms capable of finding stationary solutions. A viable approach is to rely on projected gradient descent (PGD) [39], which embedded as part of a feasible direction method boils down to the following update rule (k = 1, 2, . . . denotes iterations) Here α k and η are step size parameters and ε(H) :=

M m=1
Ĉ y,m − HC x,m H T 2 F is the objective function in (21), whose gradient is given by ∇ε(H) = 4 M m=1 (HC x,m H T HC x,m −Ĉ y,m HC x,m ). In addition, the operator P HN (·) denotes the Euclidean projection onto the set H N of symmetric matrices defined as Thus, to obtain matrixH k we take a gradient descent step and project the result onto H N . Finally, we take a step along the feasible directionH k − H k with step size α k to yield H k+1 . One possible choice of the step size α k is via the Armijo rule over the interval [0, 1]. For scalars µ, β ∈ (0, 1), one sets α k = β m k , where m k is the first integer m ≥ 0 for which Putting all the pieces together, the PGD update rule in (22) results in the iterations tabulated under Algorithm 1. The updates entail multiplications and additions of N × N matrices, and accordingly the computational complexity per iteration is O (M N 3 ). For the adopted step size selection rule, convergence to a stationary point can be guaranteed [39, Prop.

6:
Set α k = β m k , for m k chosen via the Armijo rule (23). 7: Since multiple stationary points exist, we typically run Algorithm 1 for I random initializations. Among the I estimated filters we select the one whose eigenvectors lead to e.g., the sparsest graph shift S when solving (6) with f (S) = S 1 ; see also the numerical tests in Section VI.
Before moving on to algorithms based on semidefinite relaxations, a remark is in order.

Remark 1 (Combining multiple sources of information):
The formulations in (8) and (21) can be combined if e.g., both input covariances and pairs of input-output realizations are available. It is also relevant in scenarios where the inputs are not zero mean but their first and second moments are known.

B. Semidefinite relaxation
Here we show that the graph filter identification task can also be tackled using semidefinite relaxation (SDR) [40], a convexification technique which has been successfully applied to a wide variety of non-convex quadratically-constrained quadratic programs (QCQP) in applications such as MIMO detection [41] and transmit beamforming [42]. To that end, we first cast the filter identification problem as a Boolean quadratic program (BQP).
Proposition 4: For m = 1, . . . , M consider matrices A m : with matrices A m defined in the statement of the proposition and for some binary vectors b m ∈ {−1, 1} N . Based on (24), the equations in (26) can be compactly and equivalently rewritten as Under the assumption that the covariances {C y,m } N m=1 are perfectly known, then the filter H can be uniquely identified (up to a sign ambiguity) provided rank(Ψ) = N M − 1.
(28) Even though the BQP is a classical NP-hard combinatorial optimization problem [40], via SDR one can obtain nearoptimal solutions with provable approximation guarantees. To derive the SDR of (25) Draw z l ∼ N (0, B * ).

5:
Roundb l = sgn(z l ). 6: end for 7: Determine l * by solving (31). 8: Returnb l * The only source of non-convexity in (29) is the rank constraint, and dropping it yields the convex SDR s. to B 0, which coincides with the bidual (dual of the dual) problem of (25). Problem (30) is a semidefinite program (SDP) and can be solved using an off-the-shelf interior-point method [43]. It is immediate that a rank-one optimal solution B * = b * (b * ) T of (30) solves the original BQP as well; however, in general rank(B * ) = 1. To generate a feasible solution of (25) from B * , we adopt the so-termed Gaussian randomization procedure [44], [40]. Specifically, interpreting B * as a covariance matrix we draw L i.i.d. samples z l ∼ N (0, B * ) and formb l := sgn(z l ), where sgn(.) is the element-wise sign function. The retained feasible solution corresponds to the best one among those L generated, namelyb l * where l * = argmin The overall procedure is tabulated under Algorithm 2 and the quality of the rounded solutions is evaluated via computer simulations in Section VI. The computational complexity is discussed under Remark 2. Interestingly, it is also possible to obtain theoretical approximation guarantees for the feasible solutions generated via the SDR scheme in Algorithm 2. We will leverage a result in [44] stated as Theorem 1 below, which offers guarantees for an extension to the maximum cut problem [45] where the weight matrix W ′ 0 may have negative entries. As we show next, (32) can be rendered equivalent to the BQP (25).
Corollary 1: Let b * be the solution of problem (25) andb l * be the output of Algorithm 2. For γ = 1 − 2 π λ max N M > 0, then Notice that although the bounds in (33) offer guarantees in terms of the expected objective value, particular realizationŝ b l * tend to fall within those bounds with high probability if L is chosen sufficiently large. All in all, the recipe to estimate the graph filter via the SDR approach entails the following steps. First we calculate {Â m } M m=1 from {Ĉ y,m , C x,m } M m=1 using the expression in the statement of Proposition 4, and formΨ as in (24) to finally obtainŴ =Ψ TΨ . Next, a feasible solutionb l * to the BQP is obtained after running Algorithm 2 withŴ and an appropriate choice of L as inputs. Finally,Ĥ is estimated using (28).
Remark 2 (SDR versus PGD): Although SDR has welldocumented merits when dealing with non-convex BQPs (and other quadratic problems), the relaxation entails dropping a rank constraint after "lifting" a (binary) vector-valued problem with N M variables to a matrix-valued one with N M (N M + 1)/2 variables. This entails an increase in memory and computational cost, since the complexity of a general-purpose interior point method to solve the resulting SDP is O(N 7 M 7 log(1/ǫ)), for a prescribed solution accuracy ǫ > 0 [40]. Such additional cost could hinder applicability of the SDR approach in Algorithm 2 to problems involving very large networks. In those scenarios, the PGD iterations in Algorithm 1 can still find stationary solutions with lower memory requirements and O(M N 3 ) complexity per iteration. While nothing can be said a priori on the quality of the aforementioned stationary points, the more costly SDR-based solutions offer quantifiable approximation guarantees as asserted in Theorem 1.

VI. NUMERICAL TESTS
We study the recovery of real-world graphs to assess the performance of the proposed network topology inference algorithms. To this end, we specialize the problem in (6) to recover a sparse adjacency matrix (thus f (S) = S 1 and S = S A ) that is close in the Frobenius-norm sense to being diagonalized by the estimated eigenbasisV. Accordingly, givenV we solve the following convex optimization problem We first analyze the inference performance in controlled synthetic settings (Section VI-A), and then use this framework to gain insights about urban mobility patterns in New York City from data of Uber pickups in 2015 (Section VI-B).

A. Inferring real-world networks from synthetic signals
Throughout this section, we infer real-world networks from the observation of diffusion processes that are synthetically generated via graph filtering as in (2). Denoting by S = A the (possibly weighted) adjacency matrix of the sought undirected graph G, we consider two types of filters H 1 =

Recovery Error
Noisy FIR with symmetry constraint Noisy FIR without symmetry constraint Noisy IIR with symmetry constraint Noisy IIR without symmetry constraint Counterpart of the top figure but for noisy observations. We discriminate two cases: one in which the symmetric structure of the filters is leveraged for recovery and one in which is not.
uniformly on [0, 1]. Notice that when inferring PSD graphshift operators, we consider a modified S to ensure positive semi-definiteness of the filter. As a measure of recovery error, we consistently adopt S * − S F / S F (averaged over multiple independent realizations of the experiment), where S * is obtained after solving (34) and S denotes the ground-truth GSO.
Inference from input-output realization pairs. We illustrate the method proposed in Section III-A by recovering a geographical network of the United States (US). More specifically, we consider a graph with N = 49 nodes given by the 48 states in continental US plus the District of Columbia, and with unweighted edges indicating pairs of states that share a border 1 . We then generate M i.i.d. random input-output pairs {x m , y m } M m=1 as explained in Section III-A, where signals (with uniformly-distributed entries in [0, 100]) are diffused on the US graph via either the FIR filter H 1 or the IIR filter H 2 . Given such data, the respective graph filters are estimated by forming (10). Problem (34) withV given by the eigenvectors of the estimated filter is then solved in order to infer the graph.
In Fig. 1 (top) we plot the recovery error as a function of M for both types of filters. First, notice that the performance is roughly independent of the filter type. More importantly, for M ≥ N , the optimal filter minimizing (10) is unique (cf. Proposition 1) and leads to perfect recovery. We also 1 Dataset from http://konect.uni-koblenz.de/networks/contiguous-usa consider the case where observations of the output signals y m are corrupted by additive Gaussian noise with −10dB power; see Fig. 1 (bottom). For this latter case, we also plot the error when the symmetry of the filter is ignored and, thus, not leveraged to improve recovery performance (cf. discussion prior to Proposition 1). First we notice that even though the estimation improves with increasing M , a larger number of observations is needed to guarantee successful recovery of the graph. Moreover, exploiting symmetry reduces the degrees of freedom in the filter to be inferred, thus markedly improving the recovery performance for the US graph adjacency matrix. Even though not shown in Fig. 1, the performance was observed to degrade gracefully when the noise level increases. Inference of PSD graph shifts. We consider the karate club social network studied by Zachary [46], which is represented by a graph G consisting of N = 34 nodes or members of the club and undirected edges symbolizing friendships among them. Denoting by L n the normalized Laplacian of G, we define the graph-shift operator S = I − ηL n with η = 1/λ max (L n ), modeling the diffusion of opinions between the members of the club. A graph signal y can be regarded as a unidimensional opinion of each club member regarding a specific topic, and each application of S can be seen as an opinion update. Our goal is to recover L n -hence, the social structure of the Karate club -from the observations of opinion profiles. We consider M different processes in the graph -corresponding, e.g., to opinions on M different topics -and assume that an opinion profile y m is obtained by diffusing through the network an initial belief x m . More precisely, for each topic m = 1, . . . , M , we model x m as a zero-mean process with known covariance C x,m . We are then given a set {y (p) m } P p=1 of independent opinion profiles generated from different sources {x (p) m } P p=1 diffused through filter H 1 of unknown nonnegative coefficients. From these P opinion profiles we first form an estimateĈ y,m of the output covariance. Leveraging that S is PSD and h l ≥ 0 for l = 0, 1, 2 (cf. Section IV-A), then we estimate the filter H 1 via (16) and solve (34) using the eigenbasisV of the estimated filter. Set S is modified accordingly for the recovery of a normalized Laplacian instead of an adjacency matrix; see [11].

Recovery Error
Algorithm 1 (PGD) Algorithm 2 (SDR) Algorithm in [11] Fig. 3. Error in recovering a brain network as a function of the number M of graph processes observed for different recovery algorithms. We assume perfect second-order statistical knowledge of the processes. SDR outperforms PGD at the expense of a higher computational complexity (cf. Remark 2). The mismatched stationarity assumption in [11] explains its worse performance.
In Fig. 2 we plot the recovery error as a function of the number of observations P and for three different values of M ∈ {1, 5, 10}. As P increases, the estimateĈ y,m becomes more reliable entailing a better estimation of the underlying filter and, ultimately, leading to more accurate eigenvectorŝ V. Hence, we observe a decreasing error with increasing P . Moreover, for a fixed number of observations P , the error in the estimation ofĈ y,m can be partially overcome by observing multiple diffusion processes, thus, larger values of M lead to smaller graph recovery errors. The results also confirm that, when only second-order statistical information is available and so the relationship between the input and output covariances is quadratic, more observations are needed (for reliable network inference) relative to those required for the linear graph-filter identification task in the previous test case (cf. Fig. 1). Inference of general symmetric graph shifts. We consider a symmetric brain graph with N = 66 nodes or neural regions and edge weights given by the density of anatomical connections between regions [47]. We generate input-output covariance pairs {C x,m , C y,m } M m=1 and evaluate the performance of the two algorithms presented in Section V. Once more, signals are generated using the FIR diffusion filter H 1 and so the covariances satisfy C y,m = H 1 C x,m H 1 for each m. We first estimate the graph filter using either the PGD approach in Algorithm 1 or the SDR approach in Algorithm 2, and then solve (34) to recover the adjacency matrix of the structural brain network. Fig. 3 depicts the recovery error versus M for Algorithms 1 and 2 as well as for a baseline GSP-based algorithm that assumes that the observed graph processes are stationary [11]. First, we notice that as M increases the recovery error associated with the PGD algorithm decreases monotonically. For instance, we can achieve perfect recovery with M = 10 observed covariances. The SDR approach, on the other hand, presents a superior performance with almost perfect recovery even when M = 2. Recall that this gain in performance comes at the price of a higher computational complexity as explained in Remark 2. Finally, both proposed methods outperform the algorithm in [11] for all M . This is expected due to the  Fig. 4. Error in recovering a collaboration network using SDR for varying budgets (number of processes M times the signals observed per process P ). For a fixed budget it is better to allocate the sensing resources in learning M = 2 processes as accurately as possible, rather than having a coarser estimate of more (M > 2) processes. model mismatch suffered when incorrectly assuming that the observed graph processes are stationary, and accordingly using the output covariance eigenvectors as spectral templates of the recovered graph. While not shown in Fig. 3 to avoid hindering clarity of presentation, large errors were also obtained for the workhorse statistical approach termed graphical lasso [17]. This is expected since the graph recovered by graphical lasso corresponds to the maximum-likelihood estimate of the precision matrix C −1 Inference with a fixed signal budget. In the previous test case we saw that when perfect second-order statistical knowledge of the processes is available, SDR can typically recover the unknown graph with M ≥ 2. However, in practice we estimate output covariances from observed signals via sample averaging. In particular, assume that we estimate the covariance of each of the M processes by observing P independent graph signals. For the cases where the total budget of signals M × P is fixed, this numerical test studies the trade-off between M and P as it pertains to recovery performance. Is it better to have accurate estimates of a few processes' covariances (larger P and smaller M ), or instead, coarser estimates of more processes (smaller P and larger M )? In order to answer this question, we consider the collaboration network of N = 31 scientists working in the field of Network Science presented in [11] (see [48] for more information), and model the diffusion of ideas among the scientists via simple linear dynamics as per H 1 . We implement Algorithm 2 (SDR) for L = 10 random draws to recover the collaboration graph from M processes, each inferred from P signals such that M × P is fixed. Fig. 4 depicts the recovery error as a function of the total budget (M × P ) of observed signals, parametrized by M ∈ {2, 3, 4, 5}. As expected, the error decreases for increasing budget for all values of M , since we can better estimate the covariances of all the processes. More interestingly, for a fixed budget it is better to consider only M = 2 processes. Given that SDR can often recover the filter with perfect knowledge of M = 2 covariances, it is better to focus on just two process and obtain the most accurate estimates of the associated covariances.

B. Unveiling urban mobility patterns from Uber pickups
We implement our SDR graph topology inference method (Algorithm 2) in order to detect mobility patterns in New York City from Uber pickups data 2 . We have access to times and locations of pickups from January 1 st to June 29 th 2015 for 263 known location IDs. For simplicity, we cluster the locations into N = 30 zones based on their geographical proximity; these are shown as red pins in Fig. 5. These zones represent the nodes of the graph to be recovered. The total number of pickups aggregated by zone during a specific time horizon can be regarded as graph signals defined on the unknown graph. More specifically, we consider M = 2 graph processes: weekday (m = 1) and weekend (m = 2) pickups. Moreover, we consider that the pickups from 6am to 11am constitute the inputs of our process whereas the pickups from 3pm to 8pm comprise the outputs of our process. To be more precise, for a specific day we aggregate all the pickups within 6-11am to form an input signal x and similarly we group all the pickups within 3-8pm to generate the associated output signal y. If the day considered is a weekday, we think of this pair as being generated from process m = 1, and if it is a weekend we consider the pair coming from process m = 2. We repeat this procedure for all the days included in the period of study, and estimate input-output covariance pairs {Ĉ x,m ,Ĉ y,m } 2 m=1 . We then run Algorithm 2 to infer an underlying graph filterĤ and solve (34) given the estimated eigenbasis ofĤ to find a sparse mobility pattern. The modeling presumption is that thorughout the day, the population diffuses over an unknown graph of mobility patterns we seek to identify.
The recovered graph is depicted in Fig. 5, where the weights of the recovered edges is represented by the line widths in the figure. Given the nature of the input and output processes considered, the graph obtained is a sparse description of the mobility pattern of people throughout the day. Notice that most connections occur between Manhattan and the other boroughs (Queens, Bronx, Staten Island, Brooklyn and Newark), while only a few edges connect zones within Manhattan. This indicates that people use Uber to commute from their homes in the suburbs to their work (or leisure activities in the weekends) in the city. These findings are consistent with exploratory research of this same dataset [49] as well as a recent New York Times article that writes: "Uber has deployed thousands of black cars across Manhattan, going bumper-to-bumper with yellow taxis for passengers and fares in lucrative commercial and tourist areas. But the ride-hail app has increasingly shifted its focus to the city's other four boroughs, where frustration over subway overcrowding and delays and fewer taxi options have made it the ride of choice for many. As a result, Uber is booming in the other boroughs, with half of all Uber rides now starting outside Manhattan. . . " [50]. Lastly, observe that the JFK, Newark and LaGuardia airports are strongly connected with Manhattan and the other boroughs, as expected.

VII. CONCLUDING SUMMARY
We studied the problem of inferring an undirected network from observations of non-stationary signals diffused on the unknown graph. Relative to the stationary setting, the main challenge is that the GSO eigenvectors differ from those of the signal covariance matrix. To overcome this hurdle, we leverage that the sought eigenbasis is preserved by the polynomial graph filter that governs the diffusion process. As a result, the novel approach is to first identify the GSO eigenvectors from a judicious graph filter estimate, and then we rely on these spectral templates to estimate the eigenvalues by imposing desirable properties (such as edge sparsity) on the graph to be recovered. We propose different estimators of the symmetric diffusion filter depending on whether: i) explicit realizations; or, ii) second-order statistical information is available from the input-output graph signal pair. These estimators arise as solutions of systems of linear and quadratic matrix equations, respectively. We thus investigate identifiability properties of the resulting problems, and we develop PGD and SDR algorithms with complementary strengths to compute near-optimal solutions of the said system of quadratic equations. The overall network topology inference pipeline is validated via comprehensive numerical tests on real social, author collaboration, and structural brain networks. Moreover, we show how the developed graph inference tools can be utilized to unveil mobility patters in New York City, from data of Uber pickups across Manhattan and the outer boroughs.
As future work, it is of interest to expand the scope of the proposed topology inference framework to accommodate directed graphs that represent the structure of signals (possibly) generated via nonlinear network interactions. Adaptive algorithms that can track slowly-varying graphs, and effect memory and computational savings by processing the signals on-the-fly is subject of ongoing investigation.

A. Proof of Proposition 1
To show that a) is true, note first that the cost in (9) can be compactly rewritten as Y − HX 2 F . Using the Kronecker product and the matrix vectorization operator, we can further rewrite it as Y − HX 2 F = vec(Y) − X T ⊗ I N vec(H) 2 2 . Moreover, the redundant entries in vec(H) can be removed using the duplication matrix D N , to yield Y − HX 2 F = vec(Y) − X T ⊗ I N D N vech(H) 2 2 . This LS cost can be minimized using the Moore-Penrose pseudoinverse as vech(H * ) = X T ⊗ I N D N † vec(Y), so (10) follows.
In order to prove that the b) holds true, we denote by a basis of the null space ker(X T ). We use these vectors to form all non-repeated symmetric matrices of the form V ij = v i v T j + v j v T i , and then collect the N H distinct entries of those symmetric matrices into vector Proof: To establish i), we need to show that ifṽ ij ∈ V ker , then [(X T ⊗ I N )D N ]ṽ ij = 0. Indeed, we have that where the last equality follows because v i , v j ∈ ker(X T ).
To prove ii) we first define a matrixṼ ker whose columns correspond to the vectors in V ker and write V ker = D NṼ ker . Notice that if V ker is full column rank, then all the columns inṼ ker must be linearly independent since D N is just a replication operator. Hence, we need to show that vectors v ij and v i ′ j ′ (both columns of V ker ) are orthogonal unless both i = i ′ and j = j ′ . To see why this is true, note that since in all four summands there is at least one inner product that is zero, hence their sum vanishes as well. From Lemma 1, we conclude that the dimension of ker(X T ) is at least the cardinality of V ker . Since |V ker | = (N − M r + 1)(N − M r )/2, it follows that rank[(X T ⊗ I N )D N ] is at most N H − (N − M r + 1)(N − M r )/2.
Finally, to see that c) is true, first notice that whenever N < M r , then b) guarantees that rank( X T ⊗ I N D N ) < N H . Consequently, the Moore-Penrose minimizer is not the unique LS minimizer. By contrast, if N = M r then X becomes full rank and, consequently, (X T ⊗ I N ) is also full rank. Given that D N is a full column rank matrix, X T ⊗ I N D N is also full column rank and the Moore-Penrose minimizer becomes the unique LS minimizer, as wanted.

B. Proof of Proposition 3
From (11) it follows that C y,1 = HC x,1 H = HUdiag(λ 1 )U T H = Qdiag(λ 1 )Q T , where we have implicitly defined Q := HU. Notice that the basis U is completely determined since all eigenvalues in λ 1 are distinct [cf. A-1)]. Similarly, for the second diffusion process we obtain that C y,2 = Qdiag(λ 2 )Q T . Furthermore, if we define the matrix R x = [λ 1 , λ 2 ] T ∈ R 2×N , and the N × N × 2 tensor C y (with slices along the third mode given by C y,1 and C y,2 ), then the partial symmetric PARAFAC decomposition of C y factors into the matrices Q, Q, and R x ; see [25], [51].
Recall that the Kruskal rank of a matrix A ∈ R N ×M (denoted by kr(A)) is defined as the maximum number k such that any combination of k columns of A constitute a full rank submatrix. In this way, from condition A-2) it follows that kr(R x ) = 2 and from the invertibility of H [cf. A-4)] it follows that kr(Q) = N . Leveraging established results on the uniqueness of PARAFAC tensor decompositions (see [25, Theorem 1]), it follows that a PARAFAC decomposition of C y recovers Q and R x up to scaling and rotation ambiguities. However, given that we know R x a priori, part of those ambiguities can be resolved; see e.g. [25,Lemma 1]. To be more precise, it follows that we can recover Q ′ , where Q ′ = Qdiag(b) for some unknownb ∈ {−1, 1} N . However, the following lemma establishes how to uniquely recoverb, and uniqueness of H = QU T = Q ′ diag(b)U T follows.
Lemma 2: Vectorb ∈ {−1, 1} N can be found as the only vector (up to sign) such that Q ′ diag(b)U T is symmetric. Proof: Combining the facts that H = Q ′ diag(b)U T for the trueb and that H is symmetric, it follows that Q ′ diag(b)U T is symmetric. Thus, we are left to show that no other b ′ ∈ {−1, 1} N leads to a symmetric Q ′ diag(b ′ )U T . To show this, begin by defining the symmetric matrix P := Udiag(b)diag(b ′ )U T . Hence it follows that if indeed (36) This means that H and P must commute and this requires them to be simultaneously diagonalizable. However, since U and V (the eigenbasis of H) do not share any eigenvector [cf. A-3)], this can only happen if P = I or P = −I. Hence, we must have that b ′ =b or b ′ = −b and identifiability of b is guaranteed.