Foundations and modelling of dynamic networks using Dynamic Graph Neural Networks: A survey

Dynamic networks are used in a wide range of fields, including social network analysis, recommender systems, and epidemiology. Representing complex networks as structures changing over time allow network models to leverage not only structural but also temporal patterns. However, as dynamic network literature stems from diverse fields and makes use of inconsistent terminology, it is challenging to navigate. Meanwhile, graph neural networks (GNNs) have gained a lot of attention in recent years for their ability to perform well on a range of network science tasks, such as link prediction and node classification. Despite the popularity of graph neural networks and the proven benefits of dynamic network models, there has been little focus on graph neural networks for dynamic networks. To address the challenges resulting from the fact that this research crosses diverse fields as well as to survey dynamic graph neural networks, this work is split into two main parts. First, to address the ambiguity of the dynamic network terminology we establish a foundation of dynamic networks with consistent, detailed terminology and notation. Second, we present a comprehensive survey of dynamic graph neural network models using the proposed terminology


I. INTRODUCTION
The bulk of network science literature focuses on static networks, yet every network existing in the real world changes over time.In fact, dynamic network structure has been frequently seen as a complication to be suppressed, to ease progress in the study of networks [1].Since networks have been used as representations of complex systems in fields as diverse as biology and social science, advances in dynamic network analysis can have a large and far-reaching impact on any field using network analytics [2].
Dynamic networks add a new dimension to network modelling and prediction -time.This new dimension radically influences network properties which enable a more powerful representation of network data which in turn increases predictive capabilities of methods using such data [3], [4].In fact, dynamic networks are not mere generalizations of static networks, they exhibit different structural and algorithmic properties [5].
This work is both broader and narrower in scope than previous works.The first part of this survey (section II) is broader in scope than related surveys and introduces dynamic networks and dynamic network models (referring to the 'foundations and modelling of dynamic networks' part of the title).The second part of this survey (section III and section IV) is narrower in scope and more detailed than related surveys, and is a survey on dynamic graph neural networks (referring to the 'using Dynamic Graph Neural Networks' part of the title).
Foundations of dynamic networks: Dynamic networks suffer from a known terminology problem [6].Complex networks which change over time have been referred to, among others, as; dynamic networks [7], [8], temporal networks [2], [9], evolutionary networks [3] or time-varying networks [10].With models often working only on specific types of networks, a clear and more detailed terminology for dynamic networks is necessary.We describe dynamic networks foundations as well as propose and develop an associated taxonomy of dynamic networks to contextualize the models in this survey and enable a more thorough comparison between the models.We are unaware of any work with a comprehensive taxonomy of dynamic networks and therefore it can be considered as the first major contribution of this paper.
Dynamic networks is a vast and interdisciplinary field.Models of dynamic networks are designed by researchers from different disciplines and they usually use modelling methods from their fields.This survey provides a crossdisciplinary overview of dynamic network models.This overview is not intended to be seen as a dynamic models survey, but rather as a context for dynamic graph neural networks and as a reference point for further exploration of the field of dynamic networks modelling.
We consider a dynamic network to be a network where nodes and edges appear and/or disappear over time.Due to the terminology problem establishing a terminology and a clear definition of a dynamic network is a necessity for a survey of any kind of dynamic network models such as dynamic graph neural networks.In the process, we introduce a specific and comprehensive terminology that enable future works to forego the extensive definition process and simply apply our terminology.
Related surveys [2], [6], [11] focus either on specific kinds of dynamic networks, for example, temporal networks [2], [6] or on specific types of models, for example, representation learning [11]- [13].We are unaware of any work which gives as complete a picture of dynamic networks and dynamic network models as we do.The first section is thus broader in scope than other surveys that focus on only one network type or one type of network model.
Modelling dynamic networks using Dynamic Graph Neural Networks: A dynamic graph neural network (DGNN) is considered to be a neural network architecture that can encode a dynamic network and where the aggregation of neighbouring node features is part of the neural network architecture.DGNNs encode both structural and temporal patterns in dynamic networks.To encode structural patterns DGNNs often make use of a graph neural network (GNN) and for temporal patterns, they tend to use time series modules such as recurrent neural networks (RNN) or positional attention.Spatio-temporal networks (graphs where the topology is static and only node or edge features change [14]) are out of the scope of this survey and thus so are Spatio-temporal graph neural networks [14], [15].
DGNNs, like GNNs and other representation learning models, are versatile in which tasks they can be applied to.With different decoders and different data, different tasks are possible.In practice, so far DGNNs have been applied to similar tasks as GNNs, the most common of these tasks are node classification [16]- [19] and link prediction [16], [18]- [20], which both have diverse and interesting application across many disciplines.Link prediction may for example be applied in knowledge graph completion [21], [22] or by recommender systems [18], [19].DGNNs have also been used for novel tasks such as predicting path-failure in dynamic graphs [23], quantifying scientific impact [24], and detecting dominance, deception and nervousness [25].
There are several surveys on graph neural networks [8], [26], [27] as well as surveys on network representation learning [28], [29], our work differs from theirs as we cover GNNs which encode dynamic networks.Kazemi et al. [11], Xie et al. [12] and Barros et al. [13] are the works most similar to this paper as they survey dynamic network representation learning.The distinction is that they survey the broader topic of representation learning on dynamic networks whereas we survey dynamic graph neural networks which is a subset of representation learning on dynamic networks.We thus survey a more narrow scope than dynamic representation learning surveys and a different network type from the GNN surveys which focus on static networks [8], [26], [27].Wu et al. [27] and Zhou et al. [8] also survey spatio-temporal graph neural networks, which encode spatio-temporal networks (static networks with dynamic node attributes).This survey's contributions are: (i) A conceptual framework and a taxonomy for dynamic networks, (ii) an overview of dynamic network models, (iii) a survey of dynamic graph neural networks (iv) an overview of how dynamic graph neural networks are used for prediction of dynamic networks (dynamic link prediction).
This work follows the encoder-decoder framework used by Hamilton et al. [28] and is split into three distinct sections each building upon the previous one.
1) Section II is a discussion on dynamic networks.It serves as a foundation to the following sections.In this section we explore different definitions of links and introduce a novel dynamic network taxonomy.We also give a brief overview of the dynamic network model landscape, which contextualizes the rest of the survey.2) Section III is a survey of the deep learning models for encoding dynamic network topology.This covers dynamic network encoders.3) Section IV is an overview of how the encoders from section III are used for prediction.This includes dynamic network decoders, loss functions and evaluation metrics.

II. DYNAMIC NETWORKS
A complex network is a representation of a complex system.A network that changes over time can be represented as a dynamic network.A dynamic network has both temporal and structural patterns, and these patterns are described by a dynamic network model.The definition of a link is essential to any network representation.It is even more essential in dynamic networks, as it dictates when a link appears and disappears.Different link definitions affect network properties which in turn affect which models are capable of representing the dynamic network.
Dynamic networks are complex networks that change over time.Links and nodes may appear and disappear.With only this insight we can form a general definition for dynamic networks.Our definition is inspired by Rossetti and Cazabet [30].
Definition 1 (Dynamic Network) A Dynamic Network is a graph G = (V, E) where: V = {(v, t s , t e )}, with v a vertex of the graph and t s , t e are respectively the start and end timestamps for the existence of the vertex (with t s ≤ t e ).E = {(u, v, t s , t e )}, with u, v ∈ V and t s , t e are respectively the start and end timestamps for the existence of the edge (with t s ≤ t e ).
This definition and any of the later definitions represent unlabeled and undirected networks, but they can however trivially be extended with both direction and labels taken into account.
Whereas dynamic networks are defined as complex networks where links and nodes may appear and disappear, dynamic network models are often designed to work on specific kinds of dynamic networks and specific dynamic network representations.It, therefore, makes sense to distinguish between different kinds of dynamic networks and how they are represented.
Table 7 an overview of the notation and Table 8 is an overview of the abbreviations used in this work.
There are several surveys on dynamic network methods [2], [3], [6], [11], [30]- [35].These surveys focus either on specific kinds of dynamic networks or on a specific discipline and limit the scope of the survey to models in that discipline.To the best of our knowledge there is no comprehensive survey of dynamic networks, nor does any dynamic network model survey present a complete foundation or framework for dynamic networks.The aim of this section is to set the stage for the dynamic graph neural network survey by creating a conceptual framework for dynamic networks with more precise terminology and to add context by giving an overview of methods used for modelling dynamic network topology.

A. DYNAMIC NETWORK REPRESENTATIONS
Dynamic networks can be represented in different ways and there are advantages and disadvantages inherent to the different representation types.
Dynamic network representations can be grouped into four distinct levels ordered by temporal granularity: (i) static, (ii) edge-weighted, (iii) discrete, and (iv) continuous networks [36].Fig. 1 shows those four representations with increasing model complexity as the model becomes more temporally fine-grained: • Static networks have no temporal information.
• Edge weighted networks have temporal information included as labels on the edges and/or nodes of a static network.The most straightforward example of this is a static network with the edges labelled with the time they were last active.• Discrete networks are represented in discrete time intervals.These can be represented by multiple snapshots of the network at different time intervals.• Continuous networks have no temporal aggregation applied to them.This representation carries the most information but is also the most complex.Static and edge-weighted networks are used to model stable patterns or the actual state of the network, whereas discrete and continuous methods are used for more dynamic modelling [30].This work focuses on dynamic networks and will therefore only cover discrete and continuous representations.
Fine-grained representations can be trivially aggregated to produce coarser representations.For example, links in a continuous representation can be aggregated into snapshots (or time-windows) which is a discrete representation.Any discrete representation can combine the snapshots, yielding an edge-weighted representation and any edge-weighted representation can discard the weights thus yielding a static network.

1) Discrete Representation
Discrete representations use an ordered set of graphs (snapshots) to represent a dynamic graph.
where T is the number of snapshots.Discrete representations, often simply referred to as "snapshots" is common for dynamic networks [2], [3], [9].Using a discrete representation of the dynamic network allows for the use of static network analysis methods on each of the snapshots.Repeated use of the static methods on each snapshot can then collectively give insight into the network's dynamics.
There are other approaches that effectively use snapshots as well.Overlapping snapshots such as sliding timewindows [37] are also used in dynamic network analysis to have less radical change from one network snapshot to the next [38].Discrete dynamic networks need not be represented as an ordered set of graphs, they may also be represented as a multi-layered network [39] or as a tensor [40].

2) Continuous Representation
Continuous network representations are the only representations that have exact temporal information.This makes them the most complex but also the representation with the most potential.We cover three continuous representations: (i) the event-based; (ii) the contact sequence; and (iii) the graph streams.The first two representations are taken from the temporal network literature and they are suitable for networks where links do not persist for long [2], [6], [9].The third representation, i.e. the graph stream, is used in dynamic networks where edges persist for longer [3].The focus in these representations is on when edges are active, with no mention of change on nodes.All three representations are described in more detail below: 1) The event-based representation includes the time interval at which the edge on a graph is active [9].An event is synonymous with a link in this case.It is a representation for dynamic networks focusing on link duration.The network is given by a time-ordered list of events which include the time at which the event appeared and the duration of the event.
where u i and v i is a node pair on which the i-th event occurs, t i is the timestamp for when the event starts and ∆ i is the duration of the event.This is very similar to, and serves the same purpose as, the interval graph [2].The difference is that the interval graph has the time at which the event ends while the event-based representation has the duration of the event.
2) The contact sequence representation is a simplification of the event-based representation.In a contact, sequence the link is instantaneous and thus no link duration is provided.
It is common to consider event times in real systems instantaneous if the duration of the event is short or not important [2], [9].Examples of systems where this representation is suitable, include message networks such as text message and email networks.
3) The graph stream representation is used to represent static graphs that are too large to fit in memory but can also be used as a representation of a dynamic network [32].It is similar to the event-based representation, however, it treats link appearance and link disappearance as separate events.
where e i = (u i , v i , t i , δ i ), and u i and v i is the node pair on which the i-th event occurs, t i is the time at which the event occurs, and δ i ∈ {−1, 1} where −1 represents an edge removal and 1 represents that an edge is added.The original representation (used for large graphs) does not include timestamped information of when an edge is added/removed [32].Timestamps will have to be added for retrieving temporal information.Since graph streams are mostly used to circumvent hardware limitations rather than a limitation of network representations, we will not survey them in detail here.For a more in-depth discussion of the graph streams, we refer the interested reader to [3], [32], [34].
Which of the above representations is suitable for the network depends on the link duration with the intricacies of link duration covered in the next section.

B. LINK DURATION SPECTRUM
Dynamic networks go by many names and sometimes these names indicate specific types of dynamic networks.There is substantial literature on 'temporal networks' [2], [6], [9] which focuses on highly dynamic networks where links may represent events such as human interactions or a single email.On the other hand, there is also literature that refers to slowly evolving networks, where links represent persistent relations [3].To the best of our knowledge, there are only two works that take note of this distinction, Rossetti and Cazabet [30], and Holme [6].
Rossetti and Cazabet [30] refer to temporal interaction and relational networks (our temporal and evolving networks respectively), but they do not categorize or make a formal distinction between the different networks.
Holme [6] suggests that temporal networks can be distinguished by two requirements: (i) The dynamics on the network being at the same or at a similar time scale as the dynamics of the network; and (ii) The dynamic network is non-trivial at any given time (an instantaneous snapshot yield little to no network structure).
The distinction manifests itself in networks even when not considering dynamics on the networks, and this work is limited to the dynamics of the network.Therefore we distinguish temporal networks purely based on network topology.We use the second requirement noted by Holme [6].
This work not only provides a way to distinguish between temporal networks and dynamic networks, but it also proposes a framework in which all networks of dynamic topology fit.We do this by introducing the link duration spectrum.FIGURE 2: Temporal and evolving networks on the link duration spectrum.The spectrum go from 0 (links have no duration) to infinity (links last forever).Fig. 2 shows different types of networks on the link duration spectrum.The scale goes from interactions with no link duration to links that have infinite link duration.No link ever disappears in a network with infinite link duration.Temporal networks reside on the lower end of the link duration spectrum, whereas evolving networks reside on the higher end.The distinction is as follows: • Temporal networks.Highly dynamic networks which are too dynamic to be represented statically.The network is at any given time non-trivial.These networks are studied in the temporal network literature [2], [9].Network properties such as degree distribution and clustering coefficient cannot be adopted directly from static networks and are non-trivial to define.It is more natural to think of a link as an event with a duration.• Evolving networks.Dynamic networks where events persist for long enough to establish a network structure.An instantaneous snapshot yields a well-defined network.Network properties such as degree distribution and clustering coefficient can be adopted from static networks and gradually updated.These are the networks most often referred to when the term dynamic network is used.Links persist for so long that it is more natural to think of link appearance as an event and link disappearance as another event.Furthermore, there is one notable special case for each of the dynamic network types.These are types of networks that reside on the extreme ends of the link duration spectrum: • Interaction networks.A type of temporal network where links are instantaneous events.These networks are studied in the temporal network literature and often represented as contact sequences [2], [9].• Strictly evolving networks.A type of evolving network where events have infinite duration.This implies that the links never disappear.Fig. 3 shows examples of networks on the link duration spectrum.
• An email is a nearly instantaneous event 1 , an email network can therefore be considered an interaction network.• Proximity networks are used as an example of a temporal network in [2].The link is defined by who is close to whom at what time.Links require maintenance and do not typically last very long.• Employment networks are social networks where links are formed between employees and employers.The link requires an action after it has been established (termination of contract) to change its state, but also maintenance (continued work from the employee).This network resides in the fuzzy area between temporal and evolving networks and can be treated as either.• The Internet is an example of the network where we consider nodes linked if data-packets can flow between nodes.A link tends to persist for a long time once established and thus the internet can be thought of as an evolving network.• Citation networks where links are defined as one paper citing another have the most persisting links.Once a paper cites another paper, the link lasts forever.This leads to a strictly growing network where no edges disappear.These networks have the additional special characteristic that edges only appear when new nodes appear.
Link definitions influence link duration, which in turn influences a network type.Links can be modified in ways that alter their link duration (also known as time to live, TTL [30]).An email network could define a link as: Actors have once sent an email between each other.This would modify the email link, which is usually nearly instant in duration to a link that will never disappear.This modification moves the network all the way to the right on the spectrum shown in Fig. 2. It transforms an interaction network into a strictly evolving one.Another example of a modification is to use a time-window to force forgetting.A time-window can be applied to a citation network such that only citations which occurred during the time-window appear as links.This will move the network to the left on the link duration spectrum.Depending on the size of the time-window the modified network may be either an evolving or a temporal network.
An additional theoretical special case that is not covered by this concept is a network where links may only disappear.This special case may justify another dimension along which dynamic networks should be distinguished.

C. NODE DYNAMICS
Another distinguishing factor among dynamic networks is whether nodes may appear or disappear.When modelling networks, it is sometimes simpler to assume that the number of nodes may not change so that the only possible new links are links between already existing nodes.
Many evolving network models assume that edges appear as a new node appears.These models include pseudodynamic models such as preferential attachment [41], forest fire [42] and GraphRNN [43].This is fitting for a citation network where every node is a paper and the edges are cited papers, though, in many real-world networks, edges can appear and disappear regardless of whether nodes appear.
With respect to node change, we can distinguish between two kinds of networks.• Dynamic where the nodes may appear and disappear.A notable special case of node-dynamic networks are the networks where nodes may only appear: • Growing networks are those where nodes may only appear.We consider this a special case of node-dynamic networks.We are unaware of any real-world networks where nodes may only disappear.But it should be noted as at least a theoretical special case.Node growing networks on the other hand are rather common.
Any kind of node dynamics can be combined with any kind of link duration network.We can thus have, a growing evolving network or a node-static temporal network.Similarly to the edge duration spectrum, a node duration spectrum could theoretically be established, but it has no direct impact on dynamic network structure and we, therefore, chose to keep node dynamics a discrete distinction.
The node dynamics is an important consideration when modelling the network.Some models support node dynamics whereas others do not.

D. THE DYNAMIC NETWORK CUBE
Many models assume that nodes disappear when there are no longer any links connected to such nodes.This scheme can work for evolving networks, but in temporal networks, it is common that nodes have no links for the majority of the time.Thus for a temporal network, it makes sense to model node dynamics separately from link dynamics.
Different aspects of dynamic network representation have been covered in the previous sections.Section II-A defined different dynamic representations ordered by temporal granularity, section II-B defined network types by link duration and section II-C defined network types by node dynamics.This section will consider these previous sections jointly and discuss how the different network types fit together.
Table 3 includes a comprehensive list of the different dynamic network types.The types are grouped by node dynamic, temporal granularity and link duration type.Types of networks in each group can generally be combined, thus we can have a continuous node-static temporal network.The three groups can be thought of as dimensions of a space where different points in the space would represent different types of dynamic networks.
The 3D network type space resulting from excluding special cases is visualised in Fig. 4. When excluding special cases there are two types of networks along each dimension.The nodes are organised along three dimensions: temporal granularity (discrete and continuous) from Section II-A, the link duration spectrum (temporal and evolving) from Section II-B and node dynamics (node-dynamic and node-static) from Section II-C.
Additionally, Table 2 presents the suggested terminology for each of the dynamic network types.The precise dynamic network term column show the suggested terms for the different network types.These eight types represent domainindependent types of dynamic networks.

E. DYNAMIC NETWORK MODELS
This brief discussion on dynamic network models is intended to give a high-level overview of the dynamic model landscape without discussing different kinds of models in detail.For a detailed discussion, we refer to dedicated works.The aim of this section, is to give the reader the background and context needed to navigate through the field of dynamic network models.
A network model may model a variety of different network characteristics or dynamics.In this work, we focus on models of dynamic network structure.Many models define rules for how links are established [41], [42].The rules are defined such that a network evolved with those rules express some desired features.These features are often observed in real-world networks and then included in models as a rule.The search for a good dynamic network model is thus also a search for accurate rules on link formation.
Network models might aim to replicate characteristics like node degree distribution or average shortest path between nodes [44].The models define probabilistic rules for how links form such that the emerging network has certain distributions of given characteristics observed in real-world networks [44].Some dynamic network models, particularly temporal network models, focus on temporal aspects.An example of a temporal characteristic is the distribution of inter-event times [9].
There are several use cases for network models.They may be used as reference models [2], [6] or as realistic models [45]- [47], and depending on their purpose there are several tasks the model can be used for.These include: • Reference models are used in the analysis of static networks to study the importance and role of structural features of static networks.Reference models aim to preserve some characteristic such as node degree distribution and otherwise create maximally random networks.The goal is to determine how the observed network is different from a completely random network with the same characteristics.This approach has been adapted to temporal networks [2].• Realistic models aim to replicate the change in the network as closely as possible.They can be used for several tasks such as network prediction [11], [47], [48] and community detection [30].Examples include probabilistic models such as the dynamic stochastic block model [49] and representation learning based models such as E-LSTM-D [47].Some realistic models aim to generate (simulate) realistic networks [43], [50].We establish a typology of models for dynamic network topology.The typology is based on the type of method used to model the network (see Fig. 5).
We group models intended for inference or identifying statistical regularities under statistical models.These include dynamic random graph models, probabilistic models, activity driven models and relational event models.Random graph models (RGM) and Exponential random graph models (ERGM) are random graph models which produce randomly connected graphs while following known common network topology [44].Activity driven models are fit to interaction networks by modelling the activity of each node [51].Relational event models are continuous-time models for interaction networks, they define the propensity for a future event to happen between node pairs.Latent space models and stochastic block models are generative probabilistic models.Latent space models require the fitting of parameters with Markov chain Monte Carlo (MCMC) methods and are very flexible but scale to only a few hundred nodes [52].Stochastic block models, on the other hand, scale to an order of magnitude larger networks, at a few thousand nodes [52].
Stochastic actor oriented models (SAOM) are continuoustime models which consider each node an actor and model actor behaviour.SAOMs learn to represent the dependencies between a network structure, the position of the actor and the actor behaviour [53].
Dynamic network representation learning includes a diverse set of methods that can be used to embed the dynamic graph in a latent space.Representation learning on dynamic networks includes models based on tensor decomposition, random walks and deep learning.Since latent space models and stochastic block models also generate variables in a  Tensor decomposition is analogous to matrix factorization where the extra dimension is time [11].Random walk approaches for dynamic graphs are generally extensions of random walk based embedding methods for static graphs or they apply temporal random walks [9].Deep learning models include deep learning techniques to generate embeddings of the dynamic network.Deep models can be contrasted with the other networks representation learning models which are shallow models.We distinguish between two types of deep learning models: (i) Temporal restricted Boltzmann machines and (ii) Dynamic graph neural networks.Temporal restricted Boltzmann machines are probabilistic generative models which have been applied to the dynamic link prediction problem [4], [54]- [56].Dynamic graph neural networks combine deep time series encoding with the aggregation of neighbouring nodes.Often discrete versions of these models take the form of a combination of a GNN and an RNN.Continuous versions of dynamic graph neural networks cannot make direct use of a GNN since a GNN require a static graph.Continuous DGNNs must therefore modify how node aggregation is done.
A detailed survey of all kinds of dynamic network models is too broad a topic to cover in detail by one survey.Deep learning based models for dynamic networks is a rapidly growing and exciting field, however, no existing survey focuses exclusively on dynamic graph neural networks (Kazemi et al. [11], Xie et al. [12] and Barros et al. [13] being the closest).
For the models not discussed in section III there are several works describing and discussing them in detail.Random reference models for temporal networks are surveyed in [2] and [6].For activity-driven models see Perra et al. [51] and for an introduction to the Relational Event Model (REM) see Butts [57].See Hanneke et al. [58] for Temporal ERGMs (TERGM) on discrete dynamic networks.Block et al. [59] provides a comparison of TERGM and SAOM.Fritz et al. [33] provide a comparison of a discrete-time model, based on the TERGM, and the Relational Event Model (REM), a continuous-time model.Goldenberg et al. [60] survey dynamic network models and their survey include dynamic random graph models and probabilistic models.Kim et al. [31] surveys latent space models and stochastic block models for dynamic networks.For an introduction to SOAM see Snijders et al. [53].For surveys of representation learning on dynamic networks see Kazemi et al. [11], Xie et al. [12] and Barros et al. [13], and for a survey of dynamic link prediction, including Temporal restricted Boltzmann machines, see Divakaran et al. [54].

F. DISCUSSION AND SUMMARY
We have given a comprehensive overview of dynamic networks.This establishes a foundation on which dynamic network models can be defined and thus sets the stage for the survey on dynamic graph neural networks.Establishing this foundation included the introduction of a new taxonomy for dynamic networks and an overview of dynamic network models.
Section II-A presents representations of dynamic networks and distinguishes between discrete and continuous dynamic networks.In section II-B we introduce the link duration spectrum and distinguish between temporal and evolving networks, and in section II-C node dynamics is discussed, we distinguish between node-static and nodedynamic networks.Section II-D brings together the previous sections to arrive at a comprehensive dynamic network taxonomy.
Discrete representations have seen great success in use on evolving networks with slow dynamics.Graph streams are used on evolving networks that update too frequently to be represented well by snapshots [3].Both discrete and continuous representations are used to represent temporal networks [2], [9].Table 4 combines information from section II-A and section II-B and summarizes the existing representations in terms of temporal granularity and link duration.Discrete representations have several advantages.A model which works on the static network case can be extended to dynamic networks by applying it on each snapshot and then aggregating the results of the model [11], [31].This makes it relatively easy, compared to the continuous representation to design dynamic network models.Furthermore, the distinction between an evolving and a temporal network is less important.If modelling a temporal network, one only needs to make sure that a time-window size is large enough that the network structure emerges in each snapshot.However, the discrete representations have their disadvantages too.Chief among them is coarse-grained temporal granularity.When modelling a temporal network the use of a time-window is a must.By using a time-window the appearance order of the links and temporal clustering (links appearing frequently together) is lost.
Reducing the size of the time-window or the interval between snapshots is a way to increase temporal granularity.There are however some fundamental problems with this.In the case of a temporal network, a small time-window will eventually yield a snapshot with no network structure.In the case of an evolving network, we will have a sensible network no matter how small the time-window, however, there is a trade-off with run-time complexity.Discrete models tend to process the entire graph in each snapshot.In which case the run-time will increase linearly with the number of snapshots.The run-time problem is exacerbated by the fact that a lot of real-world graphs are huge which make the run-time on each snapshot significant.
Continuous representations offer superior temporal granularity and thus theoretically a higher potential to model dynamic networks.However, continuous-time models tend to be more complex and require either completely new models or significant changes to existing ones to work on the continuous representation.Continuous models are less common than discrete-time models [3], [11], [30].This is likely due to continuous methods being significantly more difficult to develop than discrete methods [3].
When modelling dynamic networks in continuous time it is essential to specify which kind of network is being modelled.As models for temporal and evolving networks may not be mutually exclusive and many models work on only specific types of networks.In these cases, it might be possible to modify the link duration of a network to run a model on the network.This modification may come at the loss of information, for example when modifying an interaction network to a strictly evolving network, any reappearing link will be removed.
This entire background section establishes a foundation and a conceptual framework in which dynamic networks can be understood.By providing an overview of dynamic network models, it maps out the landscape around deep learning on dynamic graphs thus providing the necessary context.The following sections will explore dynamic graph neural networks in detail.

III. DYNAMIC GRAPH NEURAL NETWORKS
Network representation learning and Graph Neural Networks (GNN) have seen rapid progress recently and they are becoming increasingly important in complex network analysis.Most of the progress has been done in the context of static networks, with some advances being extended to dynamic networks.Particularly GNNs have been used in a wide variety of disciplines such as chemistry [61], [62], recommender systems [63], [64] and social networks [65], [66].
GNNs are deep neural network architectures that encode graph structures.They do this by aggregating features of neighbouring nodes together.One might think of this node aggregation as similar to the convolution of pixels in convolutional neural networks (CNN).By aggregating features of neighbouring nodes together GNNs can learn to encode both local and global structure.
Several surveys exist of works on static graph representation learning [29], [67] and static graph neural networks [8], [26], [27].Time-series analysis is relevant for work on dynamic graphs, thus recent advances in this domain is of relevance.For and up to date survey of deep learning on time series we refer to Fawaz et al. [68].If dealing with an evolving graph, a static graph algorithm can be used to maintain a model of the graph.Minor changes to the graph would most likely not change the predictions of a static model too much, and the model can then be updated at regular intervals to avoid getting too outdated.We suspect that a spatial GNN is likely to stay accurate for longer than a spectral GNN, since the spectral graph convolution is based on the graph laplacian which will go through more changes than the local changes in a spatial GNN.
It is important to define what we mean by a dynamic graph neural network (DGNN).Informally we can say that a DGNN is a neural network that encodes a dynamic graph.However, there are some representation learning models for dynamic graphs using deep methods, which we do not consider dynamic graph neural networks.A key characteristic of a graph neural network is an aggregation of neighbouring node features (also known as message passing) [8].Thus, if a deep representation learning model aggregates neighbouring nodes as part of its neural architecture we call it a dynamic graph neural network.In the discrete case, a DGNN is a combination of a GNN and a time series model.Whereas in the continuous case we have more variety since the node aggregation can no longer be done using traditional GNNs.Given this definition of representation learning, network models where RNNs are used but net-work structure is learned using other methods than node aggregation (temporal random walks for example), are not considered DGNNs.
The previous section (Section II) introduced a framework for dynamic networks and an overview of dynamic network models.The overview presented in Fig. 5 shows dynamic graph neural networks to be a part of deep representation learning, which in turn is part of dynamic network representation learning.We further extend the overview in Fig. 5 to show a hierarchical overview of dynamic graph neural networks, Fig. 6.
An overview of the types of DGNN encoders is seen in Fig. 6.The encoders are grouped first by which type of network they encode, then by model type.The pseudo-dynamic approaches model a network with changing topology, but not time.Discrete DGNNs model discrete networks and continuous DGNNs model continuous networks.A discrete DGNNs encode the network snapshot by snapshot and encode a snapshot all at once, similar to how a GNN encode a static network.A continuous DGNN iterate over the network edge by edge and is thus completely independent of any snapshot size.
Common to all DGNNs is that the encoders aim to capture both structural and temporal patterns and store these patterns in embeddings.A stacked DGNNs separate encoding of structural and temporal patterns in separate layers, having one layer for structural patterns (using a static GNN) and one layer for temporal patterns (often using some form of an RNN), these models often make use of existing layers and combine them in new ways to encode dynamic networks.Integrated DGNNs combine structural and temporal patterns in one layer.This means that integrated DGNNs require the design of new layers, not just a combination of existing layers.The continuous DGNNs consist of RNN, Temporal point process (TPP) and time embedding based methods.
A timeline of dynamic network models with a focus on DGNNs is shown in Fig. 7.The timeline includes the first appearance of each of the models found in Fig. 5, significant network embedding models preceding DGNNs and DGNNs.
We consider the Albert-Barabasi model [45] the first dynamic network model, although it is only a pseudodynamic model (see section III-A).The Dynamic Social Network in Latent space" (DSNL) model [70] is the first dynamic latent space model [31].The Temporal Exponential Random Graph Model (TERGM) [58] a type of dynamic random graph model was introduced in 2009.Snijders et al. introduced Stochastic Actor Oriented Models (SAOM) [53] for dynamic networks in 2010.The first dynamic stochastic block model (DSBM) was introduced by Yang et al. [71].The first restricted boltzmann machine (RBM) for static social networks [56] in 2013 was shortly followed by the first RBM for dynamic networks, the Temporal Restricted Boltzmann Machine (TRBM) in 2014.
Prior to DGNNs there were several influential static embedding methods and graph neural networks.The first GNN [72] was introduced in 2008.Deepwalk [73], a highly influential node embedding fueled by random walks was introduced in 2014.Some Graph Convolutional Neural networks (GCN) [74], [75] which function as building blocks and inspiration for several DGNNs were released in 2016.
The first DGNNs were discrete DGNNs.First (GCRN-M1 & GCRN-M2) was introduced by Seo et al. [69], followed by Manessi et al. [76] a few months later.Know-Evolve [21] a TPP based model was the first continuous model, which in turn directly inspired DyREP [48] by the same author.JODIE [77] is notable as the RNN based DGNN, and it was quickly followed by Streaming GNN [78] which was the first DGNN for continuous strictly evolving networks.DySAT [17] introduced the first discrete DGNN which was based solely on attention, thus not using an RNN.EvolveGCN [16] introduced the first design that had an RNN feed into a GCN, rather than what the previous models did, which was to have a GCN feed into an RNN.The first pseudo-dynamic GNN, G-GCN was introduced in early 2019.TGAT [18] is the first DGNN to encode interevent time as a vector, while TGN [19] adds a memory module to TGAT.HDGNN showed how to use DGNNs for encoding discrete heterogeneous dynamic networks and TDGNN although simple was the first GNN to explicitly weight the edges to enable interaction network encoding.
This section surveys DGNNs, identifies different types of DGNNs and covers how embeddings are encoded.The next section (Section IV) covers decoding of the embeddings.

A. PSEUDO-DYNAMIC MODELS
Goldenberg et al. [60] refer to network models as "pseudodynamic" when they contain dynamic processes, but the dynamic properties of the model are not fit to the dynamic data.A well-known example of a non-DGNN pseudodynamic model is the Barabasi-Albert model [45].G-GCN [79] can be seen as an extension of the Variational Graph Autoencoder (VGAE) [80] which is able to predict links for nodes with no prior connections, the socalled cold start problem.It uses the same encoder and decoder as VGAE, namely a GCN [75] for encoding and the inner product between node embeddings as a decoder.The resulting model learns to predict links of nodes that have only just appeared.

B. EDGE-WEIGHTED MODELS
As noted earlier in Section II-A, dynamic network representations can be simplified.One way to simplify the modelling is to convert the dynamic network to an edge-weighted network and then use a static GNN on the edge-weighted network.This is exactly what Temporal Dependent GNN (TDGNN) does [81].They convert an interaction network to an edge weighted network by using an exponential distribution.An edge which appeared more recently gets a high weight and one that appeared long ago gets a low weight.After the conversion an standard GCN [75] is applied to the edge-weighted network.While the conversion from interaction network (a continuous network) to edgeweighted is done as part of the model in the original work, there appears to be is no reason why it cannot be done as a pre-processing step and thus we classify it as an edgeweighted model.

C. DISCRETE DYNAMIC GRAPH NEURAL NETWORKS
Modelling using discrete graphs has the advantage that static graph models can be used on each snapshot of the graph.Discrete DGNNs use a GNN to encode each graph snapshot.We identify two kinds of discrete DGNNs: Stacked DGNNs and Integrated DGNNs.
Autoencoders use either static graph encoders or DGNN encoders, however since they are trained a little differently from DGNNs and generally make use of (and thus extend) a DGNN encoder they are here distinguished from other models.
A discrete DGNN combines some form of deep timeseries modelling with a GNN.The time-series model often comes in the form of an RNN, but self-attention has also been used.
Given a discrete graph DG = {G 1 , G 2 , . . ., G T } a discrete DGNN using a function f for temporal modelling  can be expressed as: where f is a neural architecture for temporal modelling (in the methods surveyed f is almost always an RNN but can also be self-attention), z t i ∈ R l is the vector representation of node i at time t produced by the GNN, where l is the output dimension of the GNN.Similarity h t i ∈ R k is the vector representation produced by f , where k is the output dimension of f .This can also be written as: Informally we can say that the GNN is used to encode each network snapshot and f (the RNN or self-attention) encodes across the snapshots.
Seo et al. [69] introduce two deep learning models which encode a static graph with dynamically changing attributes.Whereas the modelling of this kind of graph is outside the scope of the survey, the two models they introduced are, to the best of our knowledge, the first DGNNs.They introduce both a stacked DGNN and an integrated DGNN: (i) Graph Convolutional Recurrent Network Model 1 (GCRN-M1) and (ii) GCRN model 2 (GCRN-M2) respectively.Very similar encoders have been used in later publications for dynamic graphs.

1) Stacked Dynamic Graph Neural Networks
The most straightforward way to model a discrete dynamic graph is to have a separate GNN handle each snapshot of the graph and feed the output of each GNN to a time series component, such as an RNN.We refer to a structure like this as a stacked DGNN.
There are several works using this architecture with different kinds of GNNs and different kinds of RNNs.We'll use GCRN-M1 [69] as an example of a stacked DGNN.This model stacks the spectral GCN from [74] and a standard peephole LSTM [82]: The gates which are normally vectors in the LSTM are now matrices.Also, z t ∈ R nl×1 is a vector and not a matrix.Even though the GNN used by Seo et al. [69] can output features with the same structure as the input, they reshaped the matrix into a vector.This allows them to use a one-dimensional LSTM to encode the entire dynamic network.
Whereas [69] use a spectral GCN and a peephole LSTM this is not a limitation of the architecture as any GNN and RNN can be used.Other examples of stacked DGNNs are: RgCNN [83] which use the Spatial GCN, PATCHY-SAN [84] stacked with a standard LSTM and DyGGNN [85] which uses a gated graph neural network (GGNN) [86] combined with a standard LSTM.
Manessi et al. [76] present two stacked DGNN encoders: Waterfall Dynamic-GCN (WD-GCN) and Concatenated Dynamic-GCN (CD-GCN).These architectures are distinct in that they use a separate LSTM per node (although the weights across the LSTMs are shared).The GNN in this case is a GCN [75] stacked with an LSTM per node.The WD-GCN encoder with a vertex level decoder is shown in Fig. 8. WD-GCN and CD-GCN differ only in that CD-GCN adds skip-connections past the GCN.The equations below are for the WD-GCN encoder.Z 1 , . . ., Z t = GNN(A 1 , X 1 ), . . ., GNN(A t , X t ) Let A ∈ R n×n be the adjacency matrix, n be the number of nodes, d be the number of features per node and X t ∈ R n×d be the matrix describing the features of each node at time t.Z t ∈ R n×l where l is the output size of the GNN and H ∈ R k×n×t where k is the output size of the LSTMs.
where LSTM is a normal LSTM [87] and V p ∈ R n is defined as V p = δ pi where δ is the Kronecker delta.Due to the v-LSTM layer the encoder can store a hidden representation per node.Since a set of snapshots is a time-series, one is not restricted to the use of RNNs and other works have stacked GNNs with other types of deep time-series models.Sankar et al. [17] present a stacked architecture that consists completely of self-attention blocks.They use attention along the spatial and temporal dimensions.For the spatial dimension, they use the Graph Attention Network (GAT) [88] and for the temporal dimension, they use a transformer [89].Wang et al. [25], [90] stacks a GNN with 1D temporal convolution (TNDCN) similar to the dilated convolution in WaveNet [91].
Stacked DGNN architectures also exist for specific types of dynamic networks.There is HDGNN [24] for heterogeneous dynamic networks and TeMP [22] for knowledge networks.
When encoding graphs one option is to split the graph into sub-graphs and use a GNN to project each sub-graph as done by Zhang et al. [92] for static GNNs.This approach has also been applied to DGNNs by Cai et al. [93], where they split each snapshot into sub-graphs and use a stacked DGNN for anomaly detection.

2) Integrated Dynamic Graph Neural Networks
Integrated DGNNs are encoders that combine GNNs and RNNs in one layer and thus combine modelling of the spatial and the temporal domain in that one layer.
Inspired by convLSTM [94] Seo et al. [69] introduced GCRN-M2.GCRN-M2 amounts to convLSTM where the convolutions are replaced by graph convolutions.ConvL-STM uses a 3D tensor as input whereas here we are using a two-dimensional signal since we have a feature vector for each node.
where x t ∈ R n×d , n is the number of nodes and x i is a signal for the i-th node at time t.W ∈ R K×k×l and U ∈ R K×k×k where k is the size of the hidden layer and K is the number of Chebyshev coefficients .W f * G x t denotes the graph convolution on x t .
EvolveGCN [16] integrates an RNN into a GCN.The RNN is used to update the weights W of the GCN.[16] name their layer the Evolving Graph Convolution Unit (EGCU) and present two versions of it: (i) EGCU-H where the weights W are treated as the hidden layer of the RNN and (ii) EGCU-O where the weights W are treated as the input and output of the RNN.In both EGCU-H and EGCU-O, the RNN operate on matrices rather than vectors as in the standard LSTM.The EGCU-H layer is given by the following equations, where (l) indicates the neural network layer: And the EGCU-O layer is given by the equations: The RNN in both layers can be replaced with any other RNN, and the GCN [75] can be replaced with any GNN given minor modifications.
Other integrated DGNN approaches are similar to GCRN-M2.They may differ in which GNN and/or which RNN they use, the target use case or even the kind of graph they are built for, but the structures of the neural architecture are similar.Examples of these include GC-LSTM [20], LRGCN [23], RE-Net [95] and TNA [96].
Chen et al. [20] present GC-LSTM, an encoder very similar to GCRN-M2.GC-LSTM takes the adjacency matrix A t at a given time as an input to the LSTM and performs a spectral graph convolution [74] on the hidden layer.In contrast, GCRN-M2 runs a convolution on both the input and the hidden layer.
LRGCN [23] integrates an R-GCN [97] into an LSTM as a step towards predicting path failure in dynamic graphs.RE-Net [95] encodes a dynamic knowledge by integrating an R-GCN [97] in several RNNs.Other modelling changes enable them to encode dynamic knowledge graphs, thus extending the use of discrete DGNNs to knowledge graphs.
A temporal neighbourhood aggregation (TNA) layer [96] stacks a GCN, a GRU and a linear layer.Bonner et al. designs an encoder that stacks two TNA layers, to achieve a 2-hop convolution and employs variational sampling for use on link prediction.This architecture is arguably a stacked DGNN, but since the authors define the TNA as one layer, we classify it as an integrated DGNN, despite the layer itself being stacked.

3) Dynamic graph autoencoders and generative models
The Dynamic Graph Embedding model (DynGEM) [98] uses a deep autoencoder to encode snapshots of discrete node-dynamic graphs.Inspired by an autoencoder for static graphs [99] DynGEM makes some modifications to improve computation on dynamic graphs.The main idea is to have the autoencoder initialized with the weights from the previous snapshot.This speeds up computation significantly and makes the embeddings stable (i.e.no major changes from snapshot to snapshot).To handle new nodes the Net2WiderNet and Net2DeeperNet approaches from [100] are used to add width and depth to the encoder and decoder while the embedding layer stays fixed in size.This allows the autoencoder to expand while approximately preserving the function the neural network is computing.
Dyngraph2vec [101] is a continuation of the work done on DynGEM.dyngraph2vec considers the last l snapshots in the encoding and can thus be thought of as a sliding time-window.The adjacency matrices A t , . . ., A t+l are used to predict A t+l+1 , it is assumed that no new nodes are added.The architecture comes in three variations: (1) dyngraph2vecAE, an autoencoder similar to DynGEM except that it leverages information from the past to make the future prediction; (2) dyngraph2vecRNN, where the encoder and decoder consist of stacked LSTMs; (3) dyn-graph2vecAERNN, where the encoder has first a few dense feed-forward layers followed by LSTM layers and the decoder is similar to dyngraph2vecAE, namely a deep feedforward network.
E-LSTM-D [47] like DynGEM, encode and decode with dense layers, however, they run an LSTM on the encoded hidden vector to predict the new embeddings.Although trained like an autoencoder, the model aims to perform a dynamic link prediction.Hajiramezanali et al. [102] introduce two variational autoencoder versions for dynamic graphs: the Variational Graph Recurrent Neural Network (VGRNN) and Semiimplicit VGRNN (SI-VGRNN).They can operate on nodedynamic graphs.Both models use a GCN integrated into an RNN as an encoder (similar to GCRN-M2 [69]) to keep track of the temporal evolution of the graph.VGRNN uses a VGAE [80] on each snapshot that is fed the hidden state of the RGNN h t−1 .This is to help the VGAE take into account how the dynamic graph changed in the past.Each node is represented in the latent space and the decoding is done by taking the inner product decoder of the embeddings [80].By integrating semi-implicit variational inference [103] with VGRNN create SI-VGRNN.Both models aim to improve dynamic link prediction.
Generative adversarial networks (GAN) [104] have proven to be very successful in the computer vision field [105].They have subsequently been adapted for dynamic network generation as well.GCN-GAN [106] and Dyn-GraphGAN [107] are two such models.Both models are aimed towards the dynamic link prediction task.The generator is used to generate an adjacency matrix and the discriminator tries to distinguish between the generated and the real adjacency matrix.The aim is to have the generator, generate realistic adjacency matrices which can be used as a prediction for the next time step.
GCN-GAN use a stacked DGNN as a generator and a dense feed-forward networks as a discriminator [106] and DynGraphGAN use a shallow generator and a GCN [75] stacked with a CNN as a discriminator [107].

D. CONTINUOUS DYNAMIC GRAPH NEURAL NETWORKS
Currently, there are three DGNN approaches to continuous modelling.RNN based approaches where node embeddings are maintained by an RNN based architecture, temporal point based (TPP) approaches where temporal point processes are parameterized by a neural network and time embedding approaches where positional embedding of the time is used to represent time as a vector.

1) RNN based models
These models use RNNs to maintain node embeddings in a continuous fashion.A common characteristic for these models is that as soon as an event occurs or there is a change to the network, the embeddings of the interacting nodes are updated.This enables the embeddings to stay up to date continuously.There are two models in this category, Streaming graph neural networks (SGNN) [78] which encode directed strictly evolving networks and JODIE [77] which encodes interaction networks.
The Streaming graph neural network [78] maintains a hidden representation in each node.The architecture consists of two components: (i) an update; and (ii) a propagation component.The update component is responsible for updating the state of the nodes involved in an interaction and the propagation component propagates the update to the involved node's neighbours.
The update and propagation component consist of 3 units each: (i) the interact unit; (ii) the update / propagate unit; and (iii) the merge unit.The difference between the update component and the propagation component is thus the second unit where the update component makes use of the update unit and the propagate component makes use of the propagate unit.
The model maintains several vectors for each node.Among them are: (i) a hidden state for the source role of the node; and (ii) a hidden state of the target role of the node.This is required to treat source and target nodes differently.The model also contains a hidden state which is based on both the source and target state of the node.The interact unit and merge units can be thought of as wrappers that handle many node states.The interact unit generates an encoding based on the interacting nodes and this can be thought of as an encoding of the interaction.The merge unit updates the combined hidden state of the nodes based on the change done to the source and target hidden states by the middle unit.
The middle units and core of the update and propagate components are the update and the propagate units.The update unit generates a new hidden state for the interacting nodes.It is based on a Time-aware LSTM [108], which is a modified LSTM that works on time-series with irregular time intervals.The propagate unit updates the hidden states of the neighbouring nodes.It consists of an attention function f , a time decay function g and a time based filter h.f estimates the importance between nodes, g gauges the magnitude of the update based on how long ago it was and h is a binary function which filters out updates when the receiving node has too old information.h has the effect of removing noise as well as making the computation more efficient.
By first running the update component and afterwards propagating, information of the edge update is added to the hidden states of the local neighbourhood.
The second method is JODIE [77].JODIE embeds nodes in an interaction network.It is however targeted towards recommender systems and built for user-item interaction networks.The intuition is that with minor modifications this model can work on general interaction networks.
JODIE uses an RNN architecture to maintain the embeddings of each node.With one RNN for users (RNN u ) and one RNN for items (RNN i ), the formula for each RNN is identical except that they use different weights.When an interaction happens between a user and an item, each of the embeddings is updated according to equation 13.
where u(t) is the embedding of the interacting user, i(t) the embedding of the interacting item, u( t) the embedding of the user just before the interaction and similarly i( t) is the embedding of the item just before the interaction.The superscript on the weights indicates which RNN they are parameters of, so W u 1 is a parameter of RNN u .f is the feature vector of the interaction and ∆ u is the time since the user interacted with an item and similarly for Delta i .
An additional functionality of JODIE is the projection component of their architecture.It is used to predict the trajectory of the dynamic embeddings.The model predicts the future position of the user or item embedding and is trained to improve this prediction.
2) Temporal point process based models Know-Evolve [21] is the precursor to the rest of the dynamic graph temporal point process models discussed in this section.It models knowledge graphs in the form of interaction networks by parameterizing a temporal point process (TPP) by a modified RNN.With some minor modifications, the model should be applicable to any interaction network, but since the original model is specifically for knowledge graphs we will rather focus on its successor, DyREP [48].
DyREP uses a temporal point process model which is parameterised by a recurrent architecture [48].The temporal point process can express both dynamics "of the network" (structural evolution) and "on the network" (node communication).By modelling this co-evolution of both dynamics they achieve a richer representation than most embeddings.
The temporal point process (TPP) is modelled by events (u, v, t, k) where u and v are the interacting nodes, t is the time of the event and k ∈ {0, 1} indicates whether the event is a structural evolution, k = 0 (edge added) or a communication k = 1.
The conditional intensity function λ describes the probability of an event happening.λ is parameterised by two functions f and g.
where t is the time just before the current event, g is a weighted concatenation of node embeddings z, g , ω k and ψ k are four parameters which enable the temporal point process to be modelled on two different time scales.
The TPP is parameterised by an RNN.The RNN incorporates aggregation of local node embeddings, the previous embedding of the given node and an exogenous drive.
where h u struct is given by an attention mechanism that aggregates embeddings of neighbours of u.The attention mechanism uses an attention matrix S which is calculated and maintained by the adjacency matrix A and the intensity function λ.In short, the λ parameterises the attention mechanism used by the RNN which in turn is used to parameterise λ.Thus λ influences the parameterisation of itself.
With λ well parameterised it serves as a model for the dynamic network and its conditional intensity function can be used to predict link appearance and time of link appearance.
Latent dynamic graph (LDG) [109] uses Kipf et al.'s Neural Relational Inference (NRI) model [110] to extend DyREP.The idea is to re-purpose NRI to encode the interactions on the graph, generate a temporal attention matrix which is then used to improve upon self-attention originally used in DyREP.
Graph Hawkes Network (GHN) [111] is another method that parameterizes a TPP through a deep neural architecture.Similarly to Know-Evolve [21], it targets temporal knowledge networks.A part of the architecture, the Graph Hawkes Process, is an adapted continuous-time LSTM for Hawkes processes [112].
3) Time embedding based models Some continuous models rely on time embedding methods.This includes using positional encoding to represent the time dimension as introduced by Vaswani et al. [89].An example of a time embedding method is time2vec [113].This is a positional encoding, similar to the transformer but especially focused on encoding temporal patterns.Another example, is the functional time embedding introduced by Xu et al. [114] which converts learning temporal patterns to the kernel learning problem and learns the kernel function.They apply classical functional analysis to enable functional learning.These time embedding methods are particularly aimed at capturing temporal difference t i − t j , which is of substantial benefit when modelling interaction networks since it enables them to effectively capture inter-event time.
Temporal Graph Attention (TGAT) [18] was the first continuous DGNN to use a time embedding.The authors use the functional time embedding they introduced separately [114], however when comparing different versions of the embedding they end up using a non-parametric version (Equation 16) which is near identical to time2vec [113].
Where ω i and ϕ i are learned weights and d is the size of the time embedding.
A TGAT layer concatenates together the node features, edge features (optional) and time features of each neighbouring node as well as the target node.It then applies maskedattention similar to the attention in GAT [88].For each layer added an additional hop of neighbours is added.The authors found 2 layers (2 hops) to be optimal, as additional hops exponentially increase run-time.
Z(t) is an entity-temporal feature matrix which include features of nodes, edges and inter-event time.l is the layer.In line with self-attention Z(t) is linearly projected to obtain the 'query', 'key' and 'value'.
[Z(t)] 0 is the features of the target node (the node we want to compute the embedding for) and [Z(t)] 1:N is the features of its neighbours.TGAT applies its attention to Z(t) to obtain h(t), the hidden representation of the node.
Finally, the hidden representation is concatenated with the (static) node embedding of the target node, x 0 , and passed to a feed-forward network.
Temporal Graph Networks (TGN) [19] extends TGAT by adding a memory module.The memory module embeds the history of the node.The memory vector is added to Z(t) in Equation 17.

E. DISCUSSION AND SUMMARY
Deep learning on dynamic graphs is still a new field, however, there are already promising methods that show the capacity to encode dynamic topology.This section has provided a comprehensive and detailed survey of deep learning models for dynamic graph topology encoding.
The encoders are summarised and compared in Table 6.Models are listed based on their encoders and the encoders capacity to model link and node dynamics.Any model which cannot model link deletion or link duration can only model strictly evolving networks or interaction networks (see section II-B).
Table 6 list many models as not supporting link deletion, it is possible to model link deletion by link deletion events and thus an interaction network can model a persistent link disappearing.Any continuous model should also be able to model node deletion by removing the node from node neighbourhood aggregation to effectively delete it.However, while these ways of modelling dynamics have been discussed by earlier works [19], to the best of our knowledge, they have not been implemented in practice.
Most methods focus on discrete graphs which enable them to leverage recent advances in graph neural networks.This allows for modelling of diverse graphs, including nodedynamic graphs, dynamic labels on nodes and due to the use of snapshots, temporal networks can also be handled.Continuous models currently exist for strictly growing networks and interaction networks.This leaves many classes of dynamic graphs unexplored.Since continuous models have some inherent advantages over discrete graphs (see section II-F), expanding the repertoire of dynamic network classes for continuous models, is a promising future direction.
All discrete DGNNs use a GNN to model graph topology and a deep time-series model, typically an RNN, to model the time dependency.Two types of architectures can be distinguished: (i) the stacked DGNN and (ii) the integrated DGNN.Different stacked DGNNs only differ in which spatial and temporal layers are used to stack (which GNN they use and which time series layer), while the integrated DGNNs may differ not only by how they model spatial and temporal patterns but also in how they integrate the spatial and temporal modules.Given the same graph, a stacked DGNN would generally have fewer parameters than a typical integrated DGNN (such as GCRN-M2 [69]).Both approaches offer great flexibility in terms of which GNN and RNN can be used.They also are rather flexible in that they can model networks with both appearing and disappearing edges as well as dynamic labels.
Discrete models tend to treat every snapshot as a static graph, thus the complexity of the model is proportional to the size of the graph in each snapshot and the number of snapshots.Whereas a continuous model complexity is generally proportional to the number of changes in the graph.If a discrete approach creates snapshots using timewindows, then it can trade off temporal granularity (and thus theoretically modelling accuracy) for faster computation by using larger time-windows for each snapshot.
Table 6 shows that every continuous DGNN is aimed at a special type of continuous network.This is reflected in Table 5 which shows that there is, as of yet, no continuous DGNN encoder for any general-purpose dynamic network.
So which one should you chose?Converting the dynamic network to an edge-weighted network is a simple, and depending on the application, possibly "good enough" approach.A practitioner only need to come up with some VOLUME 9, 2021 scheme to weight edges, and then feed that to an optimized implementation of a standard GNN, e.g.GCN [75] or GAT [88].TDGNN [81] shows a good example of such a scheme by weighting the edges using an exponential distribution, which weights more recent edges higher than old edges.
Another approach which should be considered before trying any large DGNN model is whether applying a static GNN on a discrete representation might yield good enough results.Given the same number of features and layer size, it will train faster and generally be a simpler model.
The choice between discrete and continuous depends on the data and the intended problem.If temporal granularity and performance is not a concern then one of the advanced discrete approaches such as DySAT or EvolveGCN will likely be a great fit for most dynamic network problems.Since they naturally support link deletion, node addition and node deletion, they provide good general-purpose functionality.
The Discrete DGNNs covered in this work all iterate over snapshots to encode, while the continuous DGNNs iterate edge-by-edge.The continuous therefore tend to take longer to train compared to the discrete models.This is especially true if the network is rather dense.
Evolving networks are well served by any discrete approach, however, with the recent dominance of attention architectures [89], we would expect DySAT to do well in a comparative test.EvolveGCN is expected to train fast on an evolving network with little change between snapshots.The discrete methods are also suited for temporal networks given that the length of the time-windows covered by snapshots is well selected.
If node dynamics is an important feature of the network you wish to model, then you should choose a model that can encode node dynamics such as DySAT [17], EvolveGCN [16] or HDGNN [24].
If you have an interaction network with detailed timestamps, then TGAT [18] or TGN [19] are likely good fits.If run-time complexity and time granularity are essential to the dynamic complex network at hand (for example in the case of a temporal network), then non-deep learning methods that are not covered by this survey are recommended.Those methods can be explored in the literature referred to in section II-E.

IV. DEEP LEARNING FOR PREDICTION OF NETWORK TOPOLOGY
Any embedding method can be thought of as a concatenation of an encoder and a decoder [28].Until now, we have discussed encoders, but the quality of embeddings depend on the decoder and the loss function as well.While the encoders in Section III can be paired with a variety of decoders and loss functions depending on the intended task, we focus in this section on one of the most commonly tackled problems -link prediction.
Prediction problems can be defined for many different contexts and settings.In this survey, we refer to the pre-diction of the future change to the network topology.Much work has been done on the prediction of missing links in networks, which can be thought of as an interpolation task.This section explores how dynamic graph neural networks can be used for link prediction and deal exclusively with the extrapolation (future link prediction) task.
Predictions can be done in a time-conditioned or timepredicting manner [11].Time-predicting means that a method predicts when an event will occur and timeconditioned means that a method predicts whether an event will occur at a given time t.For example, if the method predicts the existence of a link in the next snapshot, it is a time-conditioned prediction.If it predicts when a new link between nodes will appear, it is a time-predicting prediction.
Prediction of links often focuses only on the prediction of the appearance of a link.However, link disappearance is less explored but also important for the prediction of network topology.We refer to link prediction based on a dynamic network as dynamic link prediction.
For embedding methods, what is predicted and how is decided by the decoder.You can have both time-predicting and time-conditioned decoders.The prediction capabilities will depend on the information captured by the embeddings.Thus, an embedding that captures continuous-time information has a higher potential to model temporal patterns.Well modelled temporal and structural embeddings offer a better foundation for a decoder and thus potentially better predictions.
If dealing with discrete data and few timestamps, a timeconditioned decoder can be used for time prediction.This can be done by applying the time-conditioned decoder to every candidate timestamp t and then consider the t where the link has the highest probability of appearing.
The rest of this section is a description of how the surveyed models from the previous section can be used to perform predictions.This includes mainly a discussion on decoders and loss functions.Since the surveyed models aim to predict the time-conditioned existence of links, the focus will be on the dynamic link prediction task.
Autoencoders can use the same decoders and loss functions as other methods.Their aim is typically a little different.The decoder is targeted at the already observed network and tries to recreate the snapshot.A prediction for a snapshot at time t + 1 is marginally different from the decoder of an autoencoder which is targeted at already observed snapshots.

A. DECODERS
Of the surveyed approaches which apply a predicting decoder, almost all apply a time-conditioned decoder.A prediction is then often an adjacency matrix Âτ which indicates the probabilities of an edge at time τ .Often τ = t + 1.
We consider decoders to be the part of the architecture that produces Âτ from Z the dynamic graph embeddings.
Since encoders make node embeddings and predicting a link involves two nodes decoders tend to aggregate two  [17] GAT [88] & temporal attention [89] Yes Yes Yes Yes Any TNDCN [25], [90] Spectral GCN [25] & TCN [25] Yes Yes No No Any StrGNN [93] Spectral GCN [75] & GRU Yes Yes No No Any HDGNN [24] Spectral GCN [75] & A variety of RNNs Yes Yes Yes Yes Heterogeneous TeMP [22] R-GCN [97] stacked with either GRU or attention Yes Yes No No Knowledge Integrated DGNN GCRN-M2 [69] GCN [74] integrated in an LSTM Yes Yes No No Any GC-LSTM [20] GCN [74] integrated in an LSTM Yes Yes No No Any EvolveGCN [16] LSTM integrated in a GCN [75] Yes Yes Yes Yes Any LRGCN [23] R-GCN [97] integrated in an LSTM Yes Yes No No Any RE-Net [95] R-GCN [97] integrated in several RNNs Yes Yes No No Knowledge TNA [96] GCN [75]  node embeddings to predict a link.The simplest way to aggregate is to apply an operator, e.g. the inner product [80] (shown in Equation 21), concatenation, mean or Hadamard product [81].This combines the node embeddings and gives a probability of a link appearing.These simple approaches require that the encoder is able to embed the nodes in a space such that nodes that are likely to connect are close to each other or otherwise able to be decoded by the simple decoder.
Another simple decoder is to use a simple feed-forward network.The decoder as before receives two node embeddings and gives out a probability for whether the link appeared or didn't appear.This approach is used by several models for link prediction [16], [47], [101].While this requires more parameters, the decoder is can easily be dwarfed in size by the encoder and it enables decoding of non-linear relationships between node embeddings.
Where z k is the node embedding of node k.An inner product decoder works well if we only want to predict or reproduce the graph topology.If we would like to decode the feature matrix then a neural network should be used [102].
Wu et al. [115] uses GraphRNN, a deep sequential generative model as a decoder [43].What is unique with GraphRNN is that it reframes the graph generation problem as a sequential problem.The GraphRNN authors claim increased performance over feed-forward auto-encoders.
In general, there are many options for how decoding can be done.A decoder might be viable as long as the probability for each edge is produced from the latent variables and the architecture can be efficiently optimized with backpropagation.
The only surveyed method using a time-predicting decoder is DyRep [48].DyRep uses the conditional intensity function of its temporal point process to model the dynamic network.
While the focus in this section is on decoders that are used directly for the forecasting task, it is important to note that downstream learning can also be used.This is the DGNN trained on a task and the node embeddings are used for a different task.For example, the DGNN can be trained on node classification and then the node embeddings are used later for link prediction.An example of this is seen in [17], where a logistic regression classifier is trained on the node embeddings of snapshot t to predict links at t + 1.

B. LOSS FUNCTIONS
The loss function is central to any deep learning method, as it is the equation that is being optimized.Regarding loss functions, we can make a distinction between (i) link prediction optimizing methods; and (ii) autoencoder methods.As the prediction methods optimize towards link prediction directly, an autoencoder optimizes towards the recreation of the dynamic graph.Despite have slightly different aims, both approaches have been used for link prediction and have been shown to perform well.

1) Link prediction
Prediction of edges is seen as a binary classification task.Traditional link prediction is well known for being extremely unbalanced [52], [116].For predicting methods the loss function is often simply the binary cross-entropy [16], [17], [85].VOLUME 9, 2021 Some models use negative sampling [16], [17].This transforms the problem of link prediction from a multiple output classification (a prediction for each link) to a binary classification problem (is the link a "good" link or a "bad" link).This speeds up computation and deals with the wellknown class imbalance problem in link prediction.The rate of negative samples used vary from work to work, EvolveGCN [16] use 1 to 100 for training, while TGAT [18] and TGN [19] use 1 to 1.
Equation 22 is an example of a binary cross entropy loss adapted from [20].
DySAT [17] sums the loss function only over nodes that are in the same neighbourhood at time t.The neighbourhoods are extracted by taking nodes that co-occur in random walks on the graph.The inner product is calculated as a part of the summation in the loss function.This means that the inner product will be calculated only for the node pairs that the loss is computed on.Together it reduces the number of nodes that are summed up and should result in a training speed up.Any accuracy trade-off is not discussed by the authors.

2) Autoencoders
Autoencoder approaches [47], [98], [101] aim to reconstruct the dynamic network.All surveyed autoencoders operate on discrete networks.Therefore the reconstruction of the network is reduced to the reconstruction of each snapshot.This entails creating a loss function that penalizes wrong reconstruction of the input graph.Variational autoencoder approaches [79], [102] also aim to be generative models.To be generative, they need to enable interpolation in latent space.This is achieved by adding a term to the loss function which penalizes the learned latent variable distribution for being different from a normal distribution.It is also common to add regularization to the loss functions to avoid overfitting.
Equation 23 is the reconstruction penalizing component of E-LSTM-D's loss function [47].P is a matrix which increases the focus on existing links.

3) Temporal Point Processes
DyRep [48] models a dynamic network by parameterising a temporal point process.Its loss function influences how the temporal point process is optimized.
where P is the set of observed events, λ is the intensity function and Λ(τ ) = n u=1 n v=1 k∈{0,1} λ u,v k (τ ) is the survival probability for all events that did not happen.Survival probability indicates the probability of an event not happening [117].The first term thus rewards a high intensity when an event happens, whereas the second term rewards a low intensity (high survival) of events that do not happen.
Trivedi et al. [48] further identify that calculating the integral of Λ is intractable.They get around that by sampling non-events and estimating the integral using Monte Carlo estimation, this is done for each mini-batch.

4) Regularization
There are several different approaches for adding regularization to loss functions to avoid overfitting.The total loss function (equation 25) is composed of the reconstruction loss and the regularization with an optional constant α to balance the terms.Here we cover the methods that use regularization, however many models chose to not use regularization as they find that they don't have a problem with overfitting [16], [18], [19], [48].
A common way to regularize is through summing up all the weights of the model, thus keeping the weights small and the model less likely to overfit.The L 2 norm is commonly used for this [20], [47].
The variational autoencoder methods use a different regularizer.They normalize the node embeddings compared to a prior.In traditional variational autoencoders, this prior is a Normal distribution with mean 0 and standard deviation 1.In dynamic graph autoencoders [79], [102], the prior is still a Gaussian, but it is parameterised by previous observations.Equation 26 is the regularization term from [102].
where q is the encoder distribution and p is the prior distribution.KL is the Kullback-Leibler divergence which measures the difference between two distributions.The A <t indicate all adjacency matrices up to, but not including t and similarly for the other matrices.We can see that the prior is influenced by previous snapshots, but not by the current.Whereas the encoder is influenced by the previous and the current snapshot.

C. EVALUATION METRICS
Link prediction is plagued by high class imbalance.It is a binary classification, a link either exists or not and most links will not exist.In fact, actual links tend to constitute less than 1% of all possible links [118].AUC and precision@k are two commonly used evaluation metrics in static link prediction [116], [119].If dynamic link prediction requires the prediction of both appearing and disappearing edges, the evaluation metric needs to reflect that.Furthermore, traditional link prediction metrics have shortcomings when used in a dynamic setting [52].
For a detailed discussion on the evaluation of link prediction, we refer to Yang et al. [116] for static link prediction and Junuthula et al. [52] for dynamic link prediction evaluation.
1) Area under the curve (AUC).The area under the curve (AUC) is used to evaluate a binary classification and has the advantage of being independent of the classification threshold.The AUC is the area under the receiver operating characteristic (ROC) curve.The ROC is a plot of the true positive rate and the false positive rate.
The AUC evaluates predictions based on how well the classifier ranks the predictions, this provides a measure that is invariant of the classification threshold.
In link prediction, there has been little research into finding the optimal threshold [120], using the AUC for evaluation avoids this problem.Yang et al. [116] note that AUC can show deceptively high performance in link prediction due to the extreme class imbalance.They recommend the use of PRAUC instead.2) PRAUC.The PRAUC is similar to the AUC except that it is the area under the precision-recall curve.The metric is often used in highly imbalanced information retrieval problems [52].PRAUC is recommended by Yang et al. [116] as a suitable metric for traditional (static) link prediction due to the deceptive nature of the ROC curve and because PRAUC shows a more discriminative view of classification performance.And recommended for the same reasons by Li et al. for dynamic link prediction [118].One way of calculating the PRAUC is by using Mean Average Precision (MAP).MAP is the mean of the average precision (AP) per node.3) Fixed-threshold metrics.One of the most common fixed threshold metrics in traditional link prediction is Precision@k.It is the ratio of items that are correctly predicted.From the ranking prediction the top k predictions are selected, then precision is the ratio kr k , where k r is the number of correctly predicted links in the top k predictions.While a higher precision indicates a higher prediction accuracy, it is dependent on the parameter k. k might be given on web-scale information retrieval, where we care about the accuracy of the highest k ranked articles, in link prediction it is difficult to find the right cut-off [120].A fixed-threshold can be applied to other common metrics including accuracy, recall and F1 among others [116].These methods suffer from instability in their predictions, where a change of thresholds can lead to contradictory results [116].This problem is also observed in dynamic link prediction [52].Fixedthreshold metrics are not recommended unless the targeted problem has a natural threshold [116].4) Sum of absolute differences (SumD).Li et al. [118] pointed out that models often have similar AUC scores and suggested SumD as a stricter measurement of accuracy.It is simply, the number of mispredicted links.The metric has different meanings depending on how many values are predicted since it is not normalized according to the total number of links.Chen et al. considers SumD misleading for this reason [47].The metric strictly punishes false positives, since there are so many links not appearing, a slightly higher rate of false positives will have a large impact on this metric.5) Error rate.Since SumD suffers from several drawbacks an extension is suggested by Chen et al. [47].
Error rate normalizes SumD by the total number of existing links.
Error Rate = N false N true (27) where N false is the number of mispredicted links and N true is the number of existing links.The error rate is very similar to recall, except that recall focuses on true positives, where the error rate focuses on false positives.Another difference between recall and error rate is that recall is normalized between 0 and 1, while the error rate may be above 1 if the number of mispredicted links outnumber the number of existing links The error rate is a good metric if the number of false positives is a major concern.In dynamic link prediction, false positives become a major issue due to the massive class imbalance of the prediction problem.6) GMAUC.After a thorough investigation of evaluation metrics for dynamic link prediction, Junuthula et al. suggests GMAUC as an improvement over other metrics [52].The key insight is that dynamic link prediction can be divided into two sub-problems: (i) predicting the disappearance of links that already exist or the appearance of links that have once existed; and (ii) predicting links that have never been seen before.
When the problem is divided in this way, each of the sub-problems takes on different characteristics.Prediction of links that have never been seen before is equivalent to traditional link prediction, for which PRAUC is a suitable metric [116].Prediction of already existing links is both the prediction of once seen links appearing and existing links disappearing.This is a more balanced problem than traditional link prediction, thus AUC is a suitable measure.[52] note that both the mean and the harmonic mean will lead to either the AUC or the PRAUC to dominate, thus the geometric mean is used to form a unified metric.The authors note the advantages of GMAUC: • Based on threshold curves, thus avoids the pitfall of fixed-threshold metrics • Accounts for differences between predicting new and already observed edges without having the metric to be dominated by either sub-problem.• Any predictor that predicts only new edges or only previously observed edges gets a score of 0. However, it does hinge on the assumption that reoccurring edges is a balanced enough prediction problem that AUC is suitable.And that is not necessarily the case.Many realworld networks are much more sparse than the two networks used by Junuthula et al. [52].

D. DISCUSSION AND SUMMARY
In this section we have provided an overview of how, given a dynamic network encoder, one can perform network topology prediction.The overview includes how methods from section III use their embeddings for prediction.This completes the journey from establishing a dynamic network, to encoding the dynamic topology, to predicting changes in the topology.
Prediction using a deep model requires decoding and the use of a loss function that captures temporal and structural information.Prediction is largely focused on timeconditioned link prediction and the two main modelling approaches are (1) an architecture directly aimed at prediction; and (2) an architecture aimed at generating node embeddings which are then used for link prediction in a downstream step.Most dynamic network models surveyed fall into the second category, including all autoencoder approaches.All else being equal we would expect an architecture directly aimed at prediction to perform better than a two step architecture.This is because the first case will allow the entire architecture to optimize itself towards the prediction task.
The massive class imbalance makes the evaluation of dynamic link prediction is non-trivial.If the target problem has a natural fixed threshold, then adding a fixed threshold to a common metric such as F1 is likely a good fit.PRAUC (MAP) and Error rate are good metrics that avoids the class imbalance problem and are suitable for both link prediction and dynamic link prediction.The GMAUC metric incorporates the observation that reappearing and disappearing links are not an imbalanced classification.Usage of GMAUC however hinges on the assumption that reoccurring links are a reasonably balanced classification, this is not necessarily true and depends on the data.An evaluation of new methods should report the PRAUC of newly appearing links and the PRAUC or AUC of reappearing links separately.The combined score should also be reported as either the PRAUC, Error rate or GMAUC.
Prediction on dynamic networks is in its infancy.Deep models are largely focused on unattributed time-conditioned discrete link appearance prediction.This leaves opportunities for future work in a large range of prediction tasks, with some types of prediction still unexplored.Prediction based on continuous-time encoders is a particularly interesting frontier due to the representations inherent advantages and due to the limited amount of works in that area.

V. CHALLENGES AND FUTURE WORK
There are plenty of challenges and multiple avenues for the improvement of deep learning for both modelling and prediction of network topology.
Expanding modelling and prediction repertoire.In this work we have exclusively focused on dynamic network topology.However, complex networks are diverse and not only topology may vary.Topology dynamics can be represented as a 3-dimensional cube (Section II-D).However, real networks can be much more complex.Complex networks may have dynamic node and edge attributes, they may have directed and/or signed edges, be heterogeneous in terms of nodes and edges and be multilayered or multiplex.Each of these cases can be considered another dimension in the dynamic network hypercube.Designing deep learning models for encoding these network cases expand the repertoire of tasks on which deep learning can be applied.Which types of networks can be encoded can be expanded as well as an expansion of what kind of predictions can be made on those networks.For example, most DGNN models (and most GNN models) encode attributed dynamic networks but predict only graph topology without the node attributes.
Adoption of advances in closely related fields.Dynamic graph neural networks are based on GNNs and thus advances made to GNNs trickle down and can improve DGNNs.Challenges for GNNs include increasing modelling depth as GNNs struggle with vanishing gradients [121] and increasing scalability for large graphs [67].As advancements are made in deep neural networks for time series and in GNNs these advancements can be applied to dynamic network modelling and prediction to improve performance.Similarly, improvements in deep time-series modelling can easily be adapted to improve DGNNs.
Continuous DGNNs.Modelling temporal patterns is what distinguishes modelling dynamic graphs from modelling static graphs.Capturing these temporal patterns is key to making accurate predictions.However, most models rely on snapshots which are coarse-grained temporal representations.Methods modelling network change in continuous time will offer fine-grained temporal modelling.Future work is needed for modelling and prediction of continuous-time dynamic networks.
Scalability.Large scale datasets is a challenge for dynamic network modelling.Real-world datasets tend to be so large that modelling becomes prohibitively slow.Dynamic networks either use a discrete representation in the form of snapshots, in which case processing of each snapshot is the bottleneck or continuous-time modelling which scales with the number of interactions.A snapshot model will need to have frequent snapshots in order to achieve high temporal granularity.In addition, frequent snapshots might undermine the capacity to model a temporal network.Improvements in continuous-time modelling are likely to improve the performance of deep learning modelling on dynamic networks both in terms of temporal modelling capacity and ability to handle large networks.
Dynamic graph neural networks is a new exciting research direction with a broad area of applications.With these opportunities, the field is ripe with potential for future work.

FIGURE 1 :
FIGURE 1: Network representations ordered by temporal granularity.Static networks are the most coarse-grained and continuous representations are the most fine-grained.With increasing temporal granularity comes increasing model complexity.The figure is inspired by Fig. 5.1 from Rossetti[36]

FIGURE 3 :
FIGURE 3: Examples of networks on the link duration spectrum.

FIGURE 4 :
FIGURE 4:  The dynamic network cube.The cube is a novel framework that succinctly represents different kinds of dynamic networks.Each node represents a specific type of dynamic networks.The nodes are organised along three dimensions: temporal granularity (discrete and continuous) from Section II-A, the link duration spectrum (temporal and evolving) from Section II-B and node dynamics (node-dynamic and node-static) from Section II-C.The complete list of terminology from the cube is presented in Table2.

FIGURE 5 :
FIGURE 5: An overview of dynamic network models with dynamic graph neural networks outlined.Statistical models are models intended for inference or identifying statistical regularities in dynamic networks.Representation learning models are models which automatically detect features needed for the intended task.Stochastic actor oriented models are agent-based models.Dynamic network representation learning consist of shallow (tensor decomposition and random walk based) methods and deep learning based methods.This work explores dynamic graph neural networks in detail.

FIGURE 6 :
FIGURE 6: An overview of the different types of dynamic graph neural networks.This is an extension of Fig 5 where we zoom in on graph neural networks.Different models are first grouped by which type of network they encode (pseudodynamic, edge-weighted, discrete or continuous).Discrete models are grouped by whether the structural layers and temporal layers are stacked, or integrated into one layer.Continuous models are grouped by how they encode temporal patterns.

FIGURE 7 :
FIGURE 7: Timeline of dynamic graph models and dynamic graph neural networks.The timeline shows the first dynamic network models of each type of model from Fig 5 and significant representation learning models leading up to the first DGNN.After the first DGNNs (GCRN-M1 and GCRN-M2 [69]) in Dec 2016, only DGNNs are marked on the timeline.DGNNs are marked by the month they were first publicised as they appeared in tight succession.The timeline indicates when a model was first publicized (the timeline may therefore show a different year than that in the citation if the paper was pre-published)

FIGURE 8 :
FIGURE 8: Stacked DGNN structure from Manessi et al. [76].The graph convolution layer (GC) encode the graph structure in each snapshot while the LSTMs encode temporal patterns.

FIGURE 9 :
FIGURE 9: Integrated DGNN structure of EvolveGCN with an EGCU-O layer [16].The EGCU-O layer constitutes the GC (graph convolution) and the W-LSTM (LSTM for GC weights).W-LSTM is used to initialize the weights of the GC.

• 2 (
AUC prev − 0.5)(28) P RAU C new is the PRAUC score of new links, AU C prev is the AUC score of previously observed links.

TABLE 1 :
Dynamic network types by node dynamics and link duration, excluding special cases.

Table 2 .TABLE 2 :
Terminology of the dynamic network cube.Node Temporal granularity Node dynamics Link duration Precise dynamic network term 1 Discrete Node-static Evolving Discrete node-static evolving network 2 Temporal Discrete node-static temporal network 3 Node-dynamic Evolving Discrete node-dynamic evolving network 4 Temporal Discrete node-dynamic temporal network 5 Continuous Node-static Evolving Continuous node-static evolving network 6 Temporal Continuous node-static temporal network 7 Node-dynamic Evolving Continuous node-dynamic evolving network 8 Temporal Continuous node-dynamic temporal network

TABLE 3 :
Types of dynamic networks along three dimensions.Static networks and edge-weighted networks are not dynamic networks, but they are included for completeness.If we exclude special cases, we are left with two elements in each dimension.

TABLE 4 :
Suitable dynamic network representations for temporal and evolving networks.

TABLE 5 :
DGNN model types and network types.All continuous DGNNs work on specific types of networks, such as directed or knowledge networks, therefore there are no continuous DGNNs for any general purpose dynamic network.

TABLE 6 :
Deep encoders for dynamic network topology.While we note which GNNs are used in each of the discrete models it is usually trivial to replace it with another GNN.