Inferring Topology of Networks With Hidden Dynamic Variables

Inferring the network topology from the dynamics of interacting units constitutes a topical challenge that drives research on its theory and applications across physics, mathematics, biology, and engineering. Most current inference methods rely on time series data recorded from all dynamical variables in the system. In applications, often only some of these time series are accessible, while other units or variables of all units are hidden, i.e. inaccessible or unobserved. For instance, in AC power grids, frequency measurements often are easily available whereas determining the phase relations among the oscillatory units requires much more effort. Here, we propose a network inference method that allows to reconstruct the full network topology even if all units exhibit hidden variables. We illustrate the approach in terms of a basic AC power grid model with two variables per node, the local phase angle and the local instantaneous frequency. Based solely on frequency measurements, we infer the underlying network topology as well as the relative phases that are inaccessible to measurement. The presented method may be enhanced to include systems with more complex coupling functions and additional parameters such as losses in power grid models. These results may thus contribute towards developing and applying novel network inference approaches in engineering, biology and beyond.


I. INTRODUCTION
Many complex real world systems, ranging from technical supply networks to biological processes such as gene regulation, consist of a multitude of interacting units and thus represent network dynamical systems. To understand, predict and control the full network dynamics typically requires the The associate editor coordinating the review of this manuscript and approving it for publication was Ludovico Minati . knowledge of the topology defining how the units interact. However, the topology underlying the current system dynamics might be unavailable, e.g. because it is difficult or expensive to observe. In addition, several dynamical variables may not be easily accessible to measurement and thus hidden.
The inference of network topology from time series recordings or other network features [1]- [3] and inverse problems in general [4] are active fields of interdisciplinary research on complex systems. Inference methods are applied for example to obtain gene and protein interaction networks [5], observe epidemic spreading on networks [6] or determine the topology of coupled oscillator networks [7]- [9]. Depending on the field of interest, the available data and the (mathematical) understanding of the underlying system, different inference methods prove viable. Inference methods range from Bayesian networks and information theoretical methods based on mutual information or maximal entropy [10], [11] over compressed sensing [12], [13] and deep learning approaches [14] to network inference from response dynamics [7], see [2], [3] for comprehensive reviews. Such methods are useful to gain a better understanding of the structure and function of the network at hand, whether it is natural or engineered. The procedure for many inference approaches is very similar: Observing a system out of equilibrium reveals the interplay of its different units and may enable us to infer the underlying topology. Mathematically, time series of the system are combined with prior knowledge, e.g. of the intrinsic dynamics at each node, to estimate the interaction matrix via pseudo-inverses [2].
Recent advances in network inference have focused for example on revealing the physical interactions, in contrast to pure correlations or adopted model free inference approaches without any knowledge of the intrinsic dynamics, see, e.g., [7], [15]- [20]. Recent work also indicates how to infer the number of nodes in the network from observing only a few accessible nodes [21]. However, most theoretical approaches still require the full knowledge of the time series from all variables of each given unit. Uncovering the interacting topology of networks from time series in the presence of hidden dynamical variables has been successful under certain conditions by heuristic means, see for instance [18]. Aside from applications of inference methods on biological networks, such as gene-regulation [22] or neuronal networks [23], or communication networks [24], [25], there are also several methods for parameter estimation in power systems, see [13] for an overview.
As paradigmatic example systems, we consider models of electrical power grids [26] as these constitute particularly interesting applications for inference methods, for instance because they may get damaged or experience line failures. Inference techniques then serve to detect these changes in topology quickly and help localizing the fault [27], [28]. Furthermore, the power grid topology is changing rapidly, both in industrialized as well as developing countries, due to the introduction of distributed generators, charging infrastructure for electric vehicles and other factors. Combined with the fact that different parts of one grid are often operated and administered by different grid operators, the precise topology is not always fully known, specifically also of lower-voltage distribution grids [29]. These lower voltage levels become increasingly relevant for grid operators and power dispatch strategies because the number of generators connected to the low-voltage grid is growing, e.g. by the increasing installation of solar panels [30]. The additional feed-in of power on lower voltage levels thereby supports power flow from lower levels to higher ones, in addition to the traditional, reverse direction. Hence, in particular for power grids, it is essential to have access to their full topology on all scales to ensure stable planning and operation [31].
Applying inference on electrical power grids poses a special challenge. We typically have access to the voltage amplitude and frequency at each unit, since measurements are easy and cheap [32]. In contrast, determining the voltage angle (the oscillatory phase) via a PMU (phasor measurement unit) between different units [33] in a network is more expensive. Therefore, in many power grid applications, time series of voltage angles are not available. To put it in the general perspective of network dynamical systems, the time series of some variables of each unit are hidden. Simultaneously, knowledge about the voltage angle is valuable since the power flow along a transmission line is a direct function of the difference of voltage angles (phase differences), not the frequency [34].
Here we propose a method for revealing the interaction topology of networks with hidden dynamical variables as well as time series of the hidden variables themselves (relative to a base line). For instance, for an AC power grid model, the method only requires grid frequency measurements, not phase (voltage angle) or phase difference measurements, at all N nodes. We provide estimators for the network's underlying physical connectivity (which node is connected to which others) as well as power flows. We find that for a moderate number of measurements we obtain high prediction accuracy of both the network topology and the precise state of the network in terms of flows. Overall, our method is immediately applicable to power grid simulations and should also be useful for real-world power grid recordings if they follow a given model approximately. Furthermore, the same procedure may be extended to other applications in which certain variables are not directly accessible to measurements.
This article is structured as follows. We first introduce a class of model systems with two variables per unit, as well as a specific model approximately characterizing aspects of power grid dynamics. Further, we present the theoretical foundation to infer the network topology from time series available for one of the two variables (frequency-only measurements for the power grid model). We evaluate network reconstruction on differently sized network topologies and close with a discussion.

II. THEORY
We start by explaining the theoretical concept and resulting procedures underlying the inference of network topology and the network state for systems where not all of the dynamical variables are recorded. We base our approach on the direct method, which has been introduced in [15] and is further discussed in [2], for inference tasks where the time series of all variables of all units are available. The direct method allows to infer a network's topology by separating the known and unknown terms of the dynamical system, evaluating the possibly nonlinear unit dynamics and coupling functions at VOLUME 10, 2022 FIGURE 1. Network inference with hidden variables. Each node i in the network consists of two interacting dynamic variables (x i and y i ), one of which is hidden (x i ). Nodal dynamics may differ between nodes, as long as they satisfy equations (1) and (2). In addition, the interactions between nodes are hidden as well. By recording the transient relaxation of the observable variable of each node after a small perturbation, we infer the topology as well as the differences between each pair of hidden variables, both in their initial conditions and in the fixed point. a number of time points, and reformulating the problem as an optimization task of systems of linear equations. This linear system is overdetermined by recording state variables (or components of them) at sufficiently many time points at each unit and is then solved for the coupling matrix as a least squares problem (by calculating the Moore-Penrose pseudo-inverse). Fig. 1 illustrates the concept. For a network dynamical system with each node i consisting of two interacting dynamical variables x i and y i , hidden topology and hidden time series x i (t) for all i ∈ {1 . . . N }, the proposed method aims to infer both network topology and the time series of the hidden dynamic variables based on the observed time series of y i and known dynamics, as detailed below.

A. CLASS OF MODEL SYSTEMS
Throughout this work, we focus on a class of simple network dynamical systemsẋ with two variables (x i and y i ) per unit i ∈ {1, . . . , N }. The vectors x(t) = (x 1 (t), . . . , x N (t)) T and y(t) = (y 1 (t), . . . , y N (t)) T together make up the state of the entire systems at time t. The (possibly nonlinear) functions f i and g i describe the nodal dynamics, whereas the h i mediate the coupling between units and the two variables x i and y i at node i. Here, we take the f i and g i to depend on y and not on x, such that inferring the topology becomes possible without access to x-measurements. We remark that the third class of functions, the h i , still may depend on x. We expand the function in terms of Fourier series with constant matrices A k , B k ∈ R N ×N of coupling strengths in Fourier modes k, k .
In the examples illustrated below, we restrict ourselves to the coupling function for all i ∈ {1, . . . , N }, i.e., consider only the contribution of one Fourier term, A 1 = A, all other A k = 0 and all B k = 0. The extension to general h is conceptually identical but much more cumbersome. As for the direct method mentioned above, we take all functions g i (y, t) and f i (y, t) as well as the time series of y i (t) to be known for all i ∈ {1, . . . N }.

B. INFERENCE OF INTERACTION TOPOLOGY
The task is to find for which pairs (i, j), the coupling matrix is non-zero, i.e. A ij = 0, and to infer state differences x j − x i for the non-observed part of the state vector x. If we were to apply the original direct method in a network with N units with two variables each, both the time series of x and y had to be recorded at each unit and at M > 2N measurement points t m , n ∈ {1, . . . , M }. In contrast, the method we introduce in the following is applicable without knowledge of the time series of the x i (t). Given we have no access to x(t), we aim at eliminating the x(t) from the dynamical system. Substituting the x i (t) by integrating the first differential equation (1) that defines the dynamics yields where we now introduce the N initial condition components which are unknown real parameters. Substituting these integral expressions into equations (2) (for our choice (4) of the h i ) giveṡ with f ji (y, t) := f j (y, t) − f i (y, t) and x 0 ji := x 0 j − x 0 i . Separating known and unknown variables results in a linear system of equations, which then is overdetermined by recording sufficiently many data of the partial state vectors y(t). Exploiting sin (α + β) = sin α cos β + cos α sin β, the system's coupling terms become We approximate bothẏ i (t) and t t 0 f ji (y, t ) dt by numerically differentiating and integrating the recorded time series of y i (t), respectively, compare [2]. Such approximation is faithful if the temporal sampling resolution of the measured trajectory y(t) allows for robust numerical integration and differentiation. For and where N n = {1, . . . , n} for n ∈ N. Moreover, we abbreviate and The unknown, temporally constant terms that result from a combination of the unknown initial condition components x 0 i and the missing knowledge of the A ij (coupling strength from unit j to unit i) to be inferred are substituted by We hence rewrite both left-hand side and right-hand side of the dynamical system defined by (1) and (2), sampled at times t m as a system of linear equations, For each unit i ∈ {1, . . . , N }, the quantities L i and R i are known, whereas C i is unknown and searched for. The system (13) is overdetermined if the number M of time points at which state data of y is recorded exceeds 2N , twice the number of units. Due to inexact measurement data and numerical inaccuracies, we are searching for a robust solution of the overdetermined system of equations (13). Hence, we minimize the L 2 norm L i − C i R i 2 and obtain the solution C i by multiplying (13) with the Moore-Penrose-Pseudoinverse R i+ [2], By definition (12) the matrix C i already hints which of the A ij might be zero, indicating which of the potential interactions from a unit j to another i are not present. Beyond this qualitative binary information, we may quantitatively estimate the elements of the coupling matrix via for j ∈ {1, . . . , N }.

C. FEATURES OF HIDDEN STATE VARIABLES
Given the dynamics, the values of the hidden variables x cannot be obtained entirely. However, we can determine their differences for all times t, i.e., find estimates for all x i up to a uniform additive shift for all units i. We identify the initial difference between x j and x i using Euler's formula and obtain their difference where ı is the imaginary unit. The complex valued logarithm specifies the phase except for an additive term of an integer multiple of 2π. Here, we take the terms m = 0, and restrict the phase differences to the interval [−π, π). We calculate the phase differences at any where t 0 is the time at start of observation and t ∞ chosen such that t ∞ − t 0 is much larger than the relevant relaxation time scales in the system. Finally, combining (5) for two units (i and j) yields the fixed point differences, From the phase differences x ji (t) we thus obtain all the phases x i (t) up to one overall (and arbitrary) phase shift.

D. UNDIRECTED NETWORKS
For undirected networks, i.e. systems with symmetric coupling matrices A ij = A ji , we modify the equations to assure symmetry in the coupling matrix and antisymmetry in the coupling functions by requiring that Taking symmetrizes the reconstructed matrix, where the quantities on the right hand side are computed via (12). For the differences in initial conditions we obtain

A. APPLICATION ON A POWER GRID MODEL
We illustrate and discuss the proposed reconstruction method by applying it to a simple autonomous dynamical network model for power grid dynamics, namely the coupled second order swing equations. As opposed to static load-flow models, the coupled swing equations considered as an example here model power grid dynamics that include electromechanical dynamics [37]. Here, the voltage angle x i (t) = θ i (t) and the angular frequency deviation from a reference frequency (e.g., 50 Hz) y i (t) = ω i (t) are the components of the state variables, f i (ω, t) = ω i and g i (ω, t) = P i − γ i ω i both do not explicitly depend on time and h i (θ ) = N j=1 K ij sin θ j − θ i is a coupling function exhibiting one Fourier mode. We thereby consider a lossless transmission grid for clarity of presentation. Including losses would introduce a cosine Fourier term in the coupling function. As discussed in section II A, including additional terms in the coupling function would make the network inference problem technically more cumbersome without contributing any qualitative novelty. The resulting dynamical system is given by [34], [38], [39] where I i denotes the moment of inertia, P i reflects the power consumed (P i < 0) or provided (P i > 0) at unit i and K ij is the matrix of effective coupling strengths among units of the power grid. The term −γ i ω i in (23) includes both damping and primary (proportional) control. In this manuscript, we assume homogeneous damping, γ i ≡ γ for all i ∈ {1, . . . , N }, and do not include secondary (integrative) control, again because this would make the presentation more cumbersome without adding novelty. Drawing moments of inertia I i from a normal distribution with mean 1 and standard deviation 0.1 (almost certainly yielding I i > 0 for reasonable N ) creates a family of sample networks. Dividing (23) by I i > 0 gives the functional form demanded in (2). Note that we thus infer the scaled matrix elements K ij = K ij /I i , and hence have to correct for the normalization afterwards. According to the conditions under which we aim to illustrate inference, in this system, we take γ , P i and I i to be known for each unit i ∈ {1, . . . , N }. Measuring the transient frequencies at each node i as the system relaxes after a small perturbation yields the frequency time series ω i (t). We take all the θ i (t) time series to be unknown.
Following the steps described in Section II, for each measured time instant t m , m ∈ {1, . . . M }, we define   Fig. 2), we reconstructed the phase differences θ * ij according to (29) from the recorded frequency time series, yielding reasonably agreement for all edges (i , j ), see (b). The scaling of the MAE and its 95% confidence interval with the number of measurements of an ensemble of 100 networks for each network size N is illustrated in (c).
Calculating C i = L i R i+ gives the L 2 -norm minimizing solution C i of the system L i = C i R i , which subsequently yields the elements of the connectivity matrix, and the initial phase difference at the instant t = t 0 of the perturbation, where − θ 0 ij = θ 0 ji . Next, we infer the phase differences in the fixed point via As follows from linear response theory, the amplitude of the perturbation decays in first order approximation proportional to exp (Re[λ 1 ]t), where λ 1 denotes the eigenvalue of the system's Jacobian with largest non-zero real part. In the simulations, we employ this feature and take t ∞ as the time, after which the perturbation amplitude has decayed to 1% of its initial amplitude.

B. TECHNICAL DETAILS
To test our approach on a large number of networks with an adjustable number of nodes, which still display core characteristics of real power grid topology, we utilize the power grid topology generation algorithm introduced by Schultz et al. [35]. The unit locations are drawn randomly from the uniform distribution on [0, 1) × [0, 1) and the distance between them is defined via the L 2 norm. The edge weights K ij are drawn from a normal distribution with mean 13 and standard deviation 2.0. The damping coefficient is chosen as γ = 0.1. In each network with N nodes, N 4 units act as power generators, whereas the remaining N C = N − N 4 units act as consumers with P C = −1. Since the system operates at a stable fixed point, the power of the generators satisfies N i=0 P i = 0, hence P G = −N C P C /(N − N C ). We consider the transient dynamics of the system after a small perturbation. In real systems, perturbations occur even without experimental interference; here we externally perturb the system randomly. Specifically, we perturb each unit with a phase shift drawn from a normal distribution with mean zero and standard deviation π/8 and simulate the nodal frequencies ω i (t) at each unit i ∈ {1, . . . , N } as the system relaxes to it's stable fixed point by integrating the dynamical system (22), (23). We assume that the frequencies are sampled in time steps of 0.05. We use these frequency time series to reconstruct both the network topology and phase differences following the procedure described above.
An exemplary reconstruction of a modeled power grid based on a frequency time series with 1500 data points is illustrated in Fig. 2.

C. QUANTIFICATION AND ERROR MEASURES
Understanding and quantifying systematic errors occurring in the proposed reconstruction scheme, requires a systematic analysis of the initially obtained raw results. How many data points have to be measured to obtain a reliable reconstruction? How does this requirement scale with the number of nodes in the network? Answering these questions requires different methods and quantifiers, as detailed below.
The area under the receiver operating characteristic curves (AUC score) and the average precision (AP) quantify the overall reconstructability of the network given a recorded frequency time series, whereas the mean absolute error (MAE) indicates the precision of the inferred topology and edge weights. The receiver operating characteristic (ROC) curve follows from comparing the reconstructed (RM) and original (OM) adjacency matrix. Edges present both in RM and OM count as true positives. If an edge is in RM but not in OM, it counts as a false positive. The ROC curve results from calculating the true positive rates and false positive rates of RM with respect to OM. Each unique value in RM successively is taken as a threshold, and all edges with weights below this threshold value are set to zero, values above are set to 1. Comparison with the original adjacency matrix yields the true and false positive rates. The ROC curve then follows from plotting the true positive rate against the false positive rate. The area below the ROC curve is denoted the AUC score and serves as a good indicator of the ability to reconstruct the network topology. An AUC score of 1 is equivalent to perfect reconstructability, with the true positive rate equaling 1 and the false positive rate equals zero. An AUC score of 0.5 indicates, that a detected edge may be true positive or false positive with equal probability [40], equivalent to randomly guessing the outcome. Similarly, the area under the precision-recall (PR) curve, denoted AUC-PR or average precision (AP), follows from plotting precision, defined as the ratio of true positives over the sum of true positives and false positives, as a function of recall, defined as the ratio of true positives over the sum of true positives and false negatives, for varying thresholds. In contrast to the area under the receiver operating characteristic curves, AP is less sensitive to unbalanced data, where positives occur much less frequently than negatives [41]. Fig. 2 (b) and (c) display exemplary ROC and PR curves of the displayed network at various sample lengths M , respectively. For the subsequent images in Fig. 2, M = 1500 measurements have been used, leading to an AUC score and an AP of 1.
Still, calculating AP and the AUC score requires the knowledge of the original coupling matrix, which we take to be unknown. To identify correct edges and discard numerical artifacts, a different approach is needed. Plotting the distribution of the raw reconstructed edge weights (black) and the distribution of the real edge weights (blue), see Fig. 2 (d), reveals incorrectly detected edges with edge weights close to zero. Applying Otsu's method [36] yields a threshold to classify edges and artifacts. All edges with weights below this threshold are assumed to be false positives and hence discarded.
Whereas AP and the AUC score indicates the reconstructability and Otsu's method serves to distinguish between correct and incorrect edges, the mean absolute error (MAE) quantifies the accuracy of the inferred edge weights. The MAE between two matrices A, B ∈ R N ×N is defined by To take into account network sparsity, we do not set the normalization constant N to N 2 as often is the default but choose N as the number of non-zero matrix elements in the original matrix. Removing the numeric artifacts (discarding edges below a threshold, see Fig. 2 (d)), before calculating the MAE neglects their contribution to the error. Adjusting N accordingly prevents scaling of the MAE with the matrix dimensions. Fig. 2 illustrates step by step the numerical results obtained by using the introduced method to infer a medium sized synthetic power grid based on its frequency time series induced by a perturbation. When recording a sufficient amount of data (sample length M = 1500 in this example), the reconstructed matrix coincides very well with the original matrix, see Fig. 2 (e). More quantitatively, plotting the reconstructed against the original edge weights (Fig. 2 (f)) reveals a good agreement between the original and reconstructed coupling matrix. Although the method tends to slightly underestimate the edge weights with increasing sample length, most values lie very close to the diagonal that indicates perfect inference. For a systematic analysis of the reconstruction quality and the algorithm's scaling with the number of units in the network and the measurement count, we applied the method to an ensemble of 100 networks for each network size and evaluated the reconstructed matrix at different measurement counts. Fig. 3 (a) to (c) show the AUC score, AP and the average MAE as a function of the number of measurements for different network sizes. With the given parameter settings, both AUC and AP approach 1 for all tested networks. The average MAE also appears to converge to a fixed value, although larger networks require significantly more measurements before reaching this limit. Before slowing down at this tail, the required measurements for reaching a fixed AUC score, AP and MAE increase linearly with the network size, as Fig. 3 (d) to (f) illustrate. The cutoff MAE value of 1 corresponds to an average deviation between predicted and original edge weights of approximately 7.7% with the given simulation parameters. Sufficiently long measurements yield a relative edge deviation of approximately 2% with the chosen parameters. Rarely, we detected numerical outliers in the MAE due to a single very large inferred matrix element. Data of the corresponding network have been omitted in Fig. 3. The maximal reconstructable network size based on data of a single perturbation is limited, since the amplitude of the initial perturbation decays exponentially with the largest real part of the linearized system's eigenvalues not equal to zero, which is dependent on the system's parameters. The reconstructability of a dynamical system's coupling matrix hence improves with larger initial perturbations and lower damping.

D. NETWORK INFERENCE
Comparing the introduced method with the direct method to infer network topology from time series measurements [15] reveals that our proposed method requires more samples to achieve a comparable inference quality in terms of F1-Score (the harmonic mean of the precision and recall value) and MAE, see Fig. 5. However, the direct method is incapable of inferring network topology from incomplete information such as time series missing for any variables. We therefore used both the frequency and phase angle time series for inference and hence approximately twice the information available to our proposed method. With sufficient data available, both methods achieve a similarly high F1-Score and MAE.

E. PHASE RECOVERY
As discussed in Section II, the hidden phase differences θ ij along all lines in the network may be recovered from measurements of the local angular velocities ω i , see Fig. 4.
Given sufficiently many data points, the reconstructed phase differences in the fixed point θ * ij lie close to the diagonal when plotted against the measured phase differences, i.e. the measured and reconstructed θ * ij match well, see Fig. 4 (b) for the inferred phase difference from the network topology shown in Fig. 4 (a) (same power grid realization as in Fig. 2).
The scaling of the mean MAE for ensembles of 100 networks each for different measurement counts and network sizes is illustrated in Fig. 4 (c). For all networks, the MAE starts at a constant value, since phase angles differences are restricted to the interval [0, π), leading to an average initial MAE of π/2. After the number of measurements M increases, the MAE decreases very fast and converges to an exponential decay afterwards. The initially large MAE is dominated by the reconstruction error of θ 0 ij . As this error decreases, the accuracy still is restricted by the exponential damping of the dynamical system. Only as the system's state has reached its stable fixed point again, the inference error is minimal.
The amplitudes of the perturbed frequencies decay approximately exponentially and so does the error of the reconstructed phases. Adding the reconstructed initial phase difference to the integral over the frequency difference yields the phase differences in the fixed point, see (29). Note that we also could recover the fixed point phases by using the reconstructed edges to calculate the roots of the dynamical system given by (22) and (23). However, by using the introduced method we additionally obtain the unknown initial perturbation θ 0 ij and also compute phase difference trajectories θ ij (t) for all edges, purely based on frequency measurements.

IV. DISCUSSION AND OUTLOOK
We have presented an inference method to obtain the network connectivity matrix without monitoring all nodal quantities, i.e., inference with hidden (not directly monitored) variables. Using frequency recordings of a simulated synthetic power grid, both the coupling matrix and the entire phase difference time series may be recovered. Thereby, we have demonstrated by an example model application that the theoretically derived method of reconstruction from partial information works well in a simulated dynamical system. The presented inference method may prove to be a powerful tool for sys-tems with established dynamical models of appropriate functional forms (2). The inference quality is high already for a moderate number of measurements. Keeping the inference quality constant, the number of required measurements scales approximately linearly with the network size. Since we are not aware of other methods that infer topology from partial measurements, we compare the proposed method with the well-established direct method [15], which requires access to all time series (no hidden variables). As illustrated in Fig. 5, compared to the direct method, the method proposed here requires a larger number of measurements from the observable units to achieve similarly large F1-Score (the harmonic mean of the precision and the recall value) and MAE, respectively. However, since the direct method uses effectively twice as much information (hidden variable and observable variable), the total information required in both approaches is approximately equal.

FIGURE 5. Comparison with direct method
Comparing the required measurements of our proposed method (blue) and the direct method (orange) [15] to achieve a given F1-Score (a) and a given MAE (b) reveals that the direct method (assuming full knowledge of all time series) infers more accurately at the same number of measurements. However, both methods achieve similarly good scores eventually.
Our advancement is twofold: First, we have extended general network inference methods by providing an approach that does not require all nodal quantities to be monitored directly. This further moves the boundary of what quantities and information of a dynamical system can be recovered. We add to model-free inference [19] and hidden node detection [21] a method of inference when facing incomplete variable observation. This could be useful to many other applications where monitoring all quantities is either impossible or very costly, as in many biological systems, such as metabolic or neuronal systems [17]- [19], [42], [43].
Second, we have applied network inference methods to basic dynamical systems models of power grids with its dynamics given by the second order model (coupled swing equations). We thereby provide first steps towards a tool that may be useful for engineers inferring the network topology of real-world systems. This tool could for example be applied in distribution grids, e.g., to locate a faulty line or determine the status of switches if documentation is unavailable.
The proposed method requires knowledge about the full model of the underlying dynamical system, making it applicable to systems whose dynamics can at least be approximately modeled by the required functional forms. One major open question is how to infer topology from time series sampled from a subset of variables in the absence of any nonlinear dynamical systems' model. Furthermore, although there is no hard limit, in practice both finite memory and the exponential decay of the perturbation amplitude limit the maximal reconstructable network size (and thus number of variables). As discussed in section II, the dimensionality of the linear system which has to be optimized scales linearly with the number of measurements. Simultaneously, the perturbation amplitudes decay exponentially with the largest non-zero eigenvalue real part. Above some network size (or number of variables), the information contained in a single perturbation hence may not suffice to reconstruct the topology accurately. Assessing the theoretical approach with experimentally obtained data also remains a task for future research.
There remain many open questions. From the application point of view, it would be interesting to set up a laboratory experiment or see this inference methodology applied to a real distribution power grid. The method itself should be advanced to cover a broader spectrum of local dynamics f and g, and by overcoming the limit of effectively reconstructable network sizes, e.g., by using measurements of multiple perturbations. Finally, a long-term project could combine model-free approaches [19], the inference of hidden nodes [21] and our approach of determining inaccessible variables into one coherent and powerful inference framework. However, the introduced method may already contribute to the analysis of coupled dynamical systems whose topology previously could not be retrieved due to the inaccessibility of some dynamical variables.

AVAILABILITY OF CODE
Python code to infer the network topology and differences in the hidden variable and a short discussion of numerical properties and challenges is made available at https://github.com/Network-Dynamics/network-inferencewith-hidden-variables.