Gaussian Embedding of Temporal Networks

Representing the nodes of continuous-time temporal graphs in a low-dimensional latent space has wide-ranging applications, from prediction to visualization. Yet, analyzing continuous-time relational data with timestamped interactions introduces unique challenges due to its sparsity. Merely embedding nodes as trajectories in the latent space overlooks this sparsity, emphasizing the need to quantify uncertainty around the latent positions. In this paper, we propose TGNE (\textbf{T}emporal \textbf{G}aussian \textbf{N}etwork \textbf{E}mbedding), an innovative method that bridges two distinct strands of literature: the statistical analysis of networks via Latent Space Models (LSM)\cite{Hoff2002} and temporal graph machine learning. TGNE embeds nodes as piece-wise linear trajectories of Gaussian distributions in the latent space, capturing both structural information and uncertainty around the trajectories. We evaluate TGNE's effectiveness in reconstructing the original graph and modelling uncertainty. The results demonstrate that TGNE generates competitive time-varying embedding locations compared to common baselines for reconstructing unobserved edge interactions based on observed edges. Furthermore, the uncertainty estimates align with the time-varying degree distribution in the network, providing valuable insights into the temporal dynamics of the graph. To facilitate reproducibility, we provide an open-source implementation of TGNE at \url{https://github.com/aida-ugent/tgne}.


Introduction
Continuous-time temporal networks arise from various sources.They have successfully been used to study communication patterns, epidemic spreading, and neuron firing, to name a few.Moreover, the interaction data is often available with a high level of detail, making it possible to model the time dimension as continuous.In that setting, any temporal interaction network can be viewed as the realization of a collection of edge-specific point processes, enabling the use of statistical methods to study the dynamics of the interactions [18].
Latent Space Models for graphs [9] are an important class of probabilistic models, where each node in the graph is embedded into a latent space, and the probabilities of links between nodes are independently distributed according to a notion of distance between node embeddings.Such dyad-independent models allow one to reliably infer unobserved node-level information based on the observation of links between the nodes.The learned node embeddings can then be used directly for downstream tasks such as clustering or link prediction [11].Latent Space Models have been extended to make them applicable to a variety of network types, and publicly available packages allow analyzing a broad range of relational data [17].
For continuous time temporal graphs, however, where each interaction is allowed to occur at any time stamp, the translation of Latent Space Models has not been fully explored yet.Indeed, for this type of data, the point process nature of the edge-level variables that generate the data does not allow parameterizing the nodes into simple embedding Dynamic Graph Layout and Diachronic Embedding.Dynamic Graph Layout [24] aims to find embedding configurations that not only represent the structural information of the graph but also maintain coherence over time.Similarly, Goel et al. [7] propose Diachronic Embedding, which enables embedding nodes from a knowledge graph into a coherent sequence of latent embeddings for temporal knowledge graph completion.As detailed in the method section, we also enforce temporal coherence by specifying a Gaussian Random Walk prior distribution over the latent trajectories.
Gaussian Graph Embedding.Recent work has explored the idea of embedding nodes in a graph as Gaussiandistributed points in a latent space, with extensions to dynamic graphs [3,25].However, the main focus of this line of research has primarily been on forecasting in discrete-time temporal graphs.In contrast with this, our work aims to provide a Bayesian dimensionality reduction method specifically tailored for temporal graphs in continuous-time.
Point Process Models for graphs.Point Process Modeling of Temporal Graphs, particularly using Hawkes Process models, has emerged as a vibrant field of research [1,10,19,26].These models characterize the changing rates of events in a network based on hidden representations.However, in existing models, the interaction rates are typically modulated by static representations of the nodes.Few efforts have been dedicated to combining these Point Process decoders with continuous-time representations, which is a key aspect of our work.
Temporal Graph Neural Networks.Automatically learning time-varying node feature vectors from time-stamped relational data through encoder-decoder architectures is a very active field of research, as surveyed in [12].Such architectures are evaluated on two classes of tasks.Interpolation aims at reconstructing past events, and is mostly evaluated on knowledge graphs at a typically low time-resolution.On the other hand, Temporal GNNs are typically evaluated on their ability to extrapolate to the future.In contrast, the method proposed in this paper is a dimensionalityreduction method aimed at capturing both the structure of the graph at a user-specified resolution, along with uncertainty on the latent node representations.

Preliminaries
In this section, Temporal Networks are defined from a Point Process point of view.Then the Poisson Process is defined: a particular type of point process that is used in this paper as a generative distribution of the data.Finally, we summarize the Continuous Latent Position Model (CLPM) in light of this theoretical background.

Notations
Continuous-time Temporal Networks.Let U denote a set of nodes, and E ⊂ U × U a set of possible edges.In the current work, a Temporal Network is defined as a time-ordered sequence of relational events T ([0, 1]) = {w m = (i m , j m , t m )|m = 1, . . ., M }, where M is the number of events, 0 < t 1 < • • • < t M < 1 is an ordered sequence of pairwise distinct, positive time stamps, and i m and j m are the source and destination nodes respectively.The time-stamps are normalized to the interval [0, 1].For any node pair i, j ∈ E, and for any 0 ≤ a < b ≤ 1, we denote T ij ([a, b]) the set of interaction times between i and j that occur in the interval [a, b].For each (i, j) ∈ E we define the function t → Y ij (t) ∈ N that counts the number of interactions between i and j before time t.We assume that the edge-level counting functions are samples from simple point processes, and we will denote t → Y ij (t) the counting process generating the time function t → Y ij (t).
Poisson Processes.A Poisson Point Process on the interval [0, 1] is a random variable that, when sampled from, yields a set of arrival times t 1 , . . ., t m .Such a random variable is governed by its rate function λ : [0, 1] → R * + , defined such that for any interval [a, b] ⊂ [0, 1], the expected number of arrival times that fall into [a, b] is given by the rate measure: In other words, λ(t) can be viewed as the expected number of events occurring in the interval [t, t + dt[.For a given rate function λ : [0, 1] → R * + , we will write Y ∼ PP(λ) to express that Y follows a Poisson Process distribution with rate function λ.The likelihood of observing the arrival times t 1 , ..., t m under a Poisson Process of rate function λ is: Remark 1 This can also be written in the following exponential family form: where Y (t) = m i=1 1 ti<t is the counting function representing the arrival times and 1 0 log λ(s)dY (s) is the Stieltjes integral of the log rate with respect to Y .While the natural parameter of this exponential family is the function s → log λ(s), Y can be interpreted as the sufficient statistics.Thus the canonical link function is the log in that case.The second term in the exponential is in turn the log-partition function of the distribution.This exponential form makes the Poisson Process a natural candidate as a generative model in a continuous-time extension of the Latent Space Distance Model [9].

The Continuous-time Latent Space Model
General Summary.The Continuous-time Latent Space Model (CLPM) can be summarized as follows.Let M be an embedding space (typically M = R d with d a small latent space dimension), Z = C([0, 1], M) the set of continuous trajectories in that latent space and C([0, 1], R * + ) the set of positive continuous functions on [0, 1].Let p z be a prior distribution over Z.The model supposes that the edge-level interaction times are generated independently conditioned on the latent trajectories, using: ) is a similarity function that maps any two trajectories z, z ′ to a positive rate function Examples of such a model include the distance model, where the similarity function is given by f z,z ′ (t) = exp(β −||z i (t)−z j (t)|| 2 ) and the dot product model corresponding to f z,z ′ (t) = exp(β +⟨z i (t), z j (t)⟩).
A Piecewise Linear Assumption.Rastelli and Corneli [22] propose to constrain the trajectories to be piece-wise linear to make the model tractable.The observation interval [0, 1] is partitioned into a set of K intervals I 1 , ..., I K with The latent trajectories are then  k) are the snapshots of latent positions at time η k .The chunks of history T (k) are conditionally independent given the latent positions at the boundaries of the interval I k , namely z (k−1) and z (k)   assumed to be linear on each interval I k .Formally, for each node i, interval k and coefficient s ∈ [0, 1], the latent position Thus, the i-th trajectory is fully determined by its successive positions at the change points {z i (η k )|k = 0, . . ., K}, which means that only (K + 1) × d variables are needed to describe it.The positions at the cut-points are referred to as critical points in the following, and denoted z Log-Likelihood of the CLPM.For each node pair (i, j) and each interval I k , let Y (k) ij be the number of interactions between i and j that occur in the interval I k .The associated random variables Y (k) ij are independent across node-pair and intervals, conditioned on the latent trajectories.Moreover, Y (k) ij only depends on the latent positions of i and j at the boundaries of the interval I k , namely {z The independence structure of the CLPM is summarized in Figure 1.The negative log-likelihood conditioned on the latent positions thus decomposes as follows: The terms in this decomposition are the following Poisson Process log-likelihoods: While the second term can be calculated directly from the parameters, the cumulative rate Λ ij (I k ) is more difficult to evaluate.We describe two options for calculating this term: the closed form already described in [22], and an approximate form based on a Riemann sum.
Closed form [22].In the particular case of the Euclidean Distance model, the cumulative rate adopts the following closed form: where: 2 du is the standard Normal N (0, 1) cumulative distribution function.
While a proof of this theorem is provided in [22], in the supplementary material we provide an alternative proof that can be reproduced for any case where the log of the rate can be expressed as a second-order spline in time, i.e. such that its expression on each subsequent interval is a second-order polynomial.
Riemann approximation of the cumulative rate.In the case where the rate function is not a second-order spline, we propose to approximate it simply using a Riemann sum: where R is a pre-specified resolution parameter.This approximation allows implementing an inference procedure agnostic to the type of similarity function used.For instance, using this approximation makes it easy to consider different latent geometries such as hyperbolic or spherical embeddings.

Method
In this section, we provide an overview of the proposed TGNE approach for performing Bayesian inference on the latent critical points, given a history of interactions.

Prior distribution
To reflect time continuity in the latent trajectories of the CLPM, a prior distribution is needed.The Gaussian Random Walk prior introduced in [22] biases the inference towards time-coherent and reasonably scaled configurations, promoting slowly evolving trajectories while faithfully representing the network's structure.This prior is defined for any node i ∈ {1, . . ., n} and time step k ∈ {0, . . ., K}, as the cumulative sum of independent Gaussian increments: where ϵ i ∼ N (0, I d ).The initial scale τ 0 controls the overall spread of the latent trajectories in the embedding space.
The transition scale parameter τ governs the amount of allowed variation between consecutive time steps.Additionally, the variance of the Gaussian increments increases linearly with the step size η k+1 − η k .Note that by taking infinitely small step sizes, this prior converges to a Brownian Motion in the embedding space.In our implementation, we choose a constant step size η k+1 − η k = 1 K , where K is the number of steps.Moreover, we select an initial scale equal to the transition scale: τ 0 = τ .This yields two hyperparameters: the scale τ and the number of change points (ticks) K.

Variational Inference on the Critical Points
The objective of TGNE is to evaluate the intractable posterior distribution p(z|Y ) ∝ p(Y |z)p(z) given the data Y .To achieve this, we use a mean-field variational approach, where we define the following variational distribution that factorizes over nodes and change points as a product of independent Normal distributions: We aim to minimize the Kullback-Leibler divergence KL(q ϕ ||p(.|Y)) between the variational distribution and the posterior.This is equivalent to minimizing the negative Evidence Lower Bound (ELBO): The KL Divergence term can be written as shown theorem 4.1, and proved in Appendix A.4.
arXiv Template Theorem 4.1 Following common practices in variational inference, the expected log-likelihood term is approximated using a single Monte-Carlo sample z ∼ q ϕ : Reparameterization [15] allows backpropagating through the latter sampling operation, by mapping standard Normaldistributed samples to the latent space through an invertible function of the variational parameters.It is used here to obtain the following differentiable loss, which can be optimized using standard gradient descent algorithms such as ADAM [14]: (2)

Effect of the hyperparameters
The proposed method has four hyperparameters: the dimension d, the number of change points K, the initial scale τ 0 , and the scale τ .The number of change points K controls the time resolution of the latent trajectories.It should be adapted to how frequently we expect the nodes' states to change in our dataset.The initial scale τ 0 controls the scale of the initial latent positions z i .Finally, the scale τ is a temperature parameter that controls the deviation of the latent positions between frames, namely ||z To illustrate its effect, in Figure ?? it can be seen that for τ = 50.0 the frames are not constrained to be close to each other, and the latent positions can change drastically between frames.On the other hand, for τ = 1.0, the latent positions are constrained to be close to each other, and the frames are more similar to each other.

Implementation
We implemented our method in Pyro, a Pytorch-based probabilistic programming language [2].This effect handleroriented programming language allows one to define the model as a Python function.The execution trace of the function can then be read and decorated by effect handlers, allowing one to define high-level probabilistic operations such as conditioning, or performing Stochastic Variational Inference.To optimize the variational parameters ϕ and the bias term β, we use the ADAM algorithm [14] with learning rates γ = 0.01 and γ = 0.00001 respectively.

Scalability
We discuss two strategies to scale the method to networks with a large number of nodes: node-batching and negative sampling.
As the log likelihood term is a sum of terms over all source nodes, node-batching can be implemented by computing the loss and gradients on a subset of the nodes at each iteration, and then averaging the gradients over the whole dataset.
The log-likelihood decomposes as a sum of contributions from positive node pairs (interacting at least once) and negative node pairs that never interact.However, most of the node pairs in the network never interact, and thus the information conveyed by the negative pairs is redundant.This opens up the possibility of negative sampling which may dramatically speed up inference on networks with many nodes.We propose the following strategy, akin to the case-control approximate likelihood introduced in [21]: for each node i, we sample K nodes j such that (i, j) never interact in the network.We denote P(i) the set of nodes that connect with i at least once in the event history, and N (i) a random subset of the set of nodes that never connected with i.The log-likelihood can be approximated as: The efficacy of the negative sampling approach can be further enhanced by tailoring the selection of negative samples to each specific interval.This refinement is intended to increase the complexity of the negative samples, leading to a more accurate estimation of the gradient.Nevertheless, in our proposed work, we employ a single set of negative samples for all intervals, as this approach already yields satisfactory results on the considered datasets.

Experiments
We performed various experiments to answer the following research questions.First, we evaluate the uncertainty of the latent positions on simulated data, and on real-world datasets.Next, we try to understand qualitatively how the parameters of the model affect the resulting latent positions.Finally, we try to understand to what extent the TGNE method allows reconstructing the events of unobserved edges, based on the event history of the observed edges.

Datasets.
In our experiments, we used a simulated dataset, as well as four real-world datasets, for which we provide a brief description below.The HighSchool dataset [6] is a contact network of student in a French preparatory class High School in Marseille.Their interactions were recorded using wearable devices over 9 days.The resulting embeddings are shown in Figure 5a.The MIT Reality Mining Dataset [5] is a dataset of face-to-face contacts between participants of an experiment ran by members of MIT media Lab.The data was collected over the course of around 9 months, between 2004 and 2005.The obtained embeddings for this dataset are shown in Figure 5b.The Workplace dataset is a dataset of face-to-face contacts between employees in a workplace [8].Their interactions were recorded on 11 days (2013/06/24 to the 2013/07/05).In this work we focus on the first day of interactions.The UCI dataset is a Facebook-like, unattributed online communication network among students of the University of California at Irvine, along with timestamps with the temporal granularity of seconds.We used the preprocessed version from the recent DGB Benchmark [20].A summary of the datasets is shown on Table 1, along with the associated runtimes of the TGNE method.1: Statistics on the Datasets, and associated runtime of TGNE for 500 epochs.

Example on data simulated using a Stochastic Block Model
We evaluate the estimated uncertainty of the interactions in an example simulated using a Stochastic Block model(SBM), where one node changes community over time, while all the other nodes stay in the same community.This data generation procedure is adapted from [22], but here we focus on the uncertainty aspect.A detailed explanation of the arXiv Template In the first period, the nodes are divided into two communities (Circles and crosses).Then in the second one, node 0 becomes a triangle and forms its own community.During that transition, node 0's uncertainty increases, especially when using a less informative prior (τ = 50.0).Finally, the same node 0 becomes a cross.
simulation procedure is provided in the Appendix.The resulting Gaussian Embeddings are shown in Figure 2a and 2b, for two sets hyperparameters.Using a low scale parameter, the positions are located with more precision, and the trajectories evolve in a smoother way between time stamps.This is to be expected since the regularization term is stronger in that case.However, the estimated uncertainty is uniform across nodes in that case.In the high-scale regime, the trajectories evolve more freely between frames, as in that case, the between-frame regularization is weaker.However, the uncertainty of the node that changes community is higher than the uncertainty of the other nodes, as expected.On Figure 2c we show the evolution of the uncertainty of the node that changes community over time, for different value of the scale parameter.

Uncertainty evaluation
We leverage the probabilistic nature of the TGNE method to analyze the model uncertainty.
Node-level uncertainty.The TGNE method outputs a Gaussian distribution for each node at each individual time stamp.Thus, the uncertainty around the latent positions can be naturally measured through the scale of the variational Normal distribution.While there are multiple potential sources of uncertainty, we empirically assess the impact of the node degree on the uncertainty of the latent positions.Namely, for each node i and each sub-interval I k , we measure the uncertainty of node i on interval and conversely calculate the number of interactions N i (I k ) of the node on this interval.Moreover, we relate the uncertainty associated with a node on a given interval to the average Euclidean distance to its neighbors on the same interval.In order to display how these different  value are related, in Figure 3 we represent the average uncertainty u(i, k) as a function of the average distance to the neighbors within the same interval.
Edge-level uncertainty.The uncertainty about the node's latent positions can be propagated into a notion of uncertainty on the distribution generating the temporal graph, materialized by the posterior predictive distribution defined by the Poisson Processes PP( λij ) with the random variable λij being defined as: Note that here we get a distribution over the set of joint Poisson Process distributions.While evaluating this posterior predictive distribution is intractable, we can approximate it by sampling B i.i.d.samples from the latent code, i.e. z (b) iid ∼ q ϕ for b = 1, . . ., B. Then for each sample z (b) we denote λ We calculate, for each unique value of N ij (I k ) the average uncertainty of the cumulative rate over all the edges that have N ij (I k ) interactions in I k .In our experiments, we found out that the model uncertainty on Λ ij (I k ) decreases with the number of interactions for i, j in I k , that we denote here N ij (I k ).
In Figure 4b, we observe that the linear regression slope decreases with the prior scale, suggesting that a less informative prior leads to a stronger correlation between uncertainty and the number of interactions.There is no generic best choice of the regularization parameter, it will depend on the task.It may for example be trained using cross-validation for predictive tasks, while for unsupervised tasks it may be less straightforward to choose it well.Its effect is nonetheless evident: it introduces a bias-variance trade-off between concentrating the trajectories in the latent space over time (which would increase the bias) and modeling the observed interactions in time more closely (thus increasing the variance).
Relationship between the uncertainty and the Poisson rate.In order to visualize the relationship between the Poisson Rate and the learned notion of uncertainty, we select negative samples for each positive event, by swapping the destination node with a random node in the network.Then we calculate the Poisson Rate for each positive event and associated negative event, and compare it with the uncertainty propagated from the latent positions to the rate function.The results are shown in Figure 4c.In general, more extreme Poisson Rates seem to be associated with less uncertainty.

Temporal Network Reconstruction
In this experiment we assess in which case the TGNE method is good at reconstructing interactions of unobserved edges.
Task.The usual setting for evaluating Temporal Graph Embedding method involves splitting the dataset into a train and a test set, using a cutoff point in time, such that the interactions before the cutoff are the train data and after the cutoff the test data.This does not apply here, as the TGNE model is not inductive on the time axis.However, TGNE is well-suited for Temporal Network Reconstruction: reconstructing missing interactions from the continuous-time temporal graph.Here we empirically evaluate it on this task.We split the edges in the network into train,validation and test sets.Then for each split, and each interval I k , we predict whether each edge e = (i, j) interacts in the interval I k , i.e. whether there exists an interaction (i, j, t) in history, such that t ∈ I k .For each interval and each positive edge we sample a single negative edge, thus casting the problem into a binary classification task of the node-pair/ interval triplets (i, j, k).
We compare four different approaches for scoring the triplets (i, j, k).For our method, a score is calculated based on a fitted TGNE model, as the expected amount of interaction on the interval: score(i, j, k) = Λ ij (I k ).A first baseline is derived by postulating a binary, Euclidean Latent Space Distance Model (LSDM) [9] on the interactions occurring on each interval: score(i, j, k) = σ(β − ||z , where the latent positions z (k) j are optimized using Maximum Likelihood Estimation.A second baseline is popularity-based prediction, also named Preferential Attachment (PA): score(i, j, k) = deg(i, k) • deg(j, k) where deg(i, k) = j N ij (I k ) is the degree of node i on interval k.Finally, we include a Random Baseline (Random), that calculates a random score for each triplet.In order to discuss more precisely the regularizing effect of the prior for Network reconstruction, techniques such as tensor decomposition could have been explored, however, since TGNE is derived from Latent Space Models, we decided to stick with this class of models in this work.
Results.For the HighSchool dataset and the UCI dataset we use 10 % of the edges as test edges, and the rest as train edges.For the Workplace dataset we use 30 percent of the edges as test edges, and the rest as train edges.The results are provided in Table 2. On the High School Dataset, it can be seen that while a binary Latent Space Distance Model could be used to predict the presence/absence of links, its resulting configuration of node embeddings overfits the training data, and thus does not perform well on the test data.In contrast, the embeddings obtained using TGNE perform worse on the training set, but much better on the test set.This showcases the benefits of the regularization term on the predictive abilities of the model.

Discussion and Future Work
In this paper, we discuss the performances of TGNE and its ability to capture the uncertainty on the estimated latent positions and the ability of the obtained locations to predict the occurrence of edges in successive intervals.However, some open questions remain, which we detail here for future work.
Adapt the changepoints to the density of interactions.The number K and the positions of the changepoints η 0 , . . ., η K are fixed in the TGNE.As all the interaction times are re-scaled to be between 0 and 1, the constant step size is fixed to η k+1 − η k = 1 K .However, adapting the step size to the observed rate of events would naturally produce a more fine-grained representation of the temporal network structure in sub-intervals where more events happen.This appears as a promising avenue to improve model efficiency.
Identifiability.In the high-scale regime (see for instance Figure 2b), there is significant rotation of latent configurations from one frame to the next.This is because the model fails to identify the rotations of the configurations.Although this issue is partly mitigated by the effect of the prior, it could potentially be resolved through the use of a Procrustes transform applied to the configurations of trajectories.
Node Inductivity.The proposed model is transductive, meaning it is limited to the set of nodes that are provided in advance and cannot embed unobserved nodes.In contrast, an alternative approach would be to use amortization, as in seminal works such as [16], to map nodes and their context to Gaussian parameters using a parametric function.This approach would allow predicting trajectories for unobserved nodes, and allow the resulting model to scale to millions of nodes.Time Inductivity.As mentioned earlier, the TGNE model could be used to learn dynamics (or distributions thereof) in the embedding space instead of directly learning a sequence of latent distributions in the latent space.This would enable extrapolating the dynamic to future unobserved links.One-step ahead Link Prediction would be a key metric to evaluate the success of such an approach.Continuous Time Encoder.Finally, related to the previous point, one limitation of the proposed approach is that it relies on a discrete time encoder since each node is essentially mapped to a sequence of Gaussian parameters.However, one alternative approach would be to build on [4] to embed the nodes into parameters of a joint stochastic process on the node state and the network state, and using a Point Process Model as a decoder.

Conclusion
In the present work, we have presented a principled approach to Temporal Graph embedding that leverages Variational inference to infer latent distributions on node trajectories from an observed temporal network.This is in contrast with traditional temporal graph embedding methods, where only a trajectory of points per node is usually estimated.Our results show that in the case where the prior distribution is not restrictive enough, the uncertainty coming from this greater degree of freedom in the latent space can be partially captured back in the scale parameter of the estimated normal distributions.On top of that, the reconstruction experiment showcases the need for regularization in the case of temporal graph embeddings, as it makes the obtained trajectories more easily readable visually, but also leads to better reconstruction results.Finally, estimating uncertain node positions through time could lead to other applications such as anomaly detection.
The log of the rate writes: where, denoting ∆ ij (η k ) = z i (η k ) − z j (η k ), γ ij is defined as In particular, γ ij is a second-order polynomial in t.Our goal now is to express γ ij as the log of the density of a normal distribution.
More precisely, let's try to write it under the form for some coefficients a, µ, σ, where s = (1 − t)η k−1 + tη k On the one hand, developing the expression 3 yields: On the other hand, developing equation 4 yields: Identifying the coefficients of the polynomial, we get the following system of equations: Finally, solving the system for a, µ and σ yields: We can conclude by using two changes of variables: A.4 KL divergence between a Gaussian Markov Chain and a product of independant Gaussians Let x 1 , ..., x T be some random variables and the two distributions q and p defined as: In particular, when q t (x t ) = N (x t ; µ t , σ 2 t I d ) and p(x) = N (x 1 ; ν 1 , τ

Figure 1 :
Figure 1: Probabilistic Graphical Model summarizing the CLPM.T (k) = {(i m , j m , t m ) ∈ T |t m ∈ I k } is the history of interactions happening in the time interval I k = [η k−1 , η k ]. z (k) are the snapshots of latent positions at time η k .The chunks of history T (k) are conditionally independent given the latent positions at the boundaries of the interval I k , namely z (k−1) and z(k)

Figure 2 :
Figure 2: Resulting latent positions on the Stochastic Block Model.Uncertainty is represented by the size of the markers.In the first period, the nodes are divided into two communities (Circles and crosses).Then in the second one, node 0 becomes a triangle and forms its own community.During that transition, node 0's uncertainty increases, especially when using a less informative prior (τ = 50.0).Finally, the same node 0 becomes a cross.

Figure 3 :
Figure 3: Log-log plot of the node-level uncertainty u(i,k) as a function of the average distance to the neighbors whithin the same interval, with (τ = 50.0,K = 15).
ij (I k )) the rate function (respectively the cumulative rate) obtained by plugging z (b) into the similarity function defined in 3.2.In our experiments, we evaluated the predictive uncertainty on the cumulative rate defined as Std of Interactions N ij (I k ) Edge-level uncertainty Std( Λij(Ik)) as a function of Nij(I k ), with (τ = 1.0,K = 15).Linear Regression Slope of Std( Λij(Ik)) against Nij(I k ) for different values of the prior scale of interactions (c) Uncertainty vs Poisson Rate on the High School Dataset with (K = 15, τ = 1.0).The Poisson Rate is calculated for each positive event and associated negative event.For each event (u, v, t) we color it by the number of interactions between (u, v) in the interval I k such that t ∈ I k .Events with extreme Poisson Rate values are associated with a low uncertainty, while intermediate Poisson Rates are associated with a higher uncertainty.

Figure 5 :
Figure 5: Latent Positions obtained on the Toy Dataset, the Highschool Dataset and the MIT Reality Mining Dataset.

Table 2 :
AUC Results on the different datasets for K = 15 intervals.