Embedding and trajectories of temporal networks

Temporal network data are increasingly available in various domains, and often represent highly complex systems with intricate structural and temporal evolutions. Due to the difficulty of processing such complex data, it may be useful to coarse grain temporal network data into a numeric trajectory embedded in a low-dimensional space. We refer to such a procedure as temporal network embedding, which is distinct from procedures that aim at embedding individual nodes. Temporal network embedding is a challenging task because we often have access only to discrete time-stamped events between node pairs, and, in general, the events occur with irregular intervals, making the construction of the network at a given time a nontrivial question already. We propose a method to generate trajectories of temporal networks embedded in a low-dimensional space given a sequence of time-stamped events as input. We realize this goal by combining the landmark multidimensional scaling, which is an out-of-sample extension of the well-known multidimensional scaling method, and the framework of tie-decay temporal networks. This combination enables us to obtain a continuous-time trajectory describing the evolution of temporal networks. We then study mathematical properties of the proposed temporal network embedding framework. Finally, we showcase the method with empirical data of social contacts to find temporal organization of contact events and loss of them over a single day and across different days.


Introduction
Many complex systems can be modeled as networks that consist of nodes and edges connecting nodes [42,53]. Network analysis has enabled us to extract useful information from data that represent relationships between elements constituting a system and provided concepts and methods with which to understand and intervene into phenomena occurring on networks such as epidemic spreading, cascading failures and synchronization. In fact, the structure of many types of networks changes over time on a timescale that is comparable with the timescale of dynamical processes occurring on networks such as epidemic processes. Examples of such networks, called temporal networks [19,27,39,65], include social contact networks whose structure evolves owing to mobility of individuals, e-mail communication networks among users, and air transportation networks in which direct commercial flights connecting a pair of airports are time-stamped and also vary over months. Research on temporal networks has revealed that the temporality of the networks changes various structural and dynamical properties of networks relative to those of static networks. For example, communities in temporal networks show much richer features than static-network counterparts, such as birth, death, split, and merge of communities, and heterogeneous distributions of lifetimes of individual communities, which are concepts that are relevant only to temporal networks [12,20]. Such dynamic community structure of temporal networks has been useful for understanding evolution of scientific research fields [22] and that of groups of senators over the history in the US [21], to name a few.
Temporal networks are intimately related to time series data. However, it seems that temporal network data analyses that have been developed in the past two decades or so have not exploited a full potential of general time series analysis, which has a long history and provides an ample repertoire of methods. Important exceptions include change-point and anomaly detection in temporal networks [17,36,58] and extraction of the discrete state of the temporal network and its transitory dynamics [32,44,57,60], which are clearly hinted on the same analyses of general time series data. We propose that this gap is because most methods for analyzing time series assume numeric data, whereas the temporal network data is network-valued by definition. Such methods include the Fourier transform, autocorrelation, curve fitting, fractal analysis, and the Lyapunov exponent, to name a few. Although one could transform a temporal network data into a numeric time series by, for example, vectorizing the adjacency matrix at each time, such a transformation is usually very high-dimensional because the number of entries of the adjacency matrix scales with O(N 2 ), where N is the number of nodes. Furthermore, temporal network data are often provided in the form of timestamped event sequences, and how one should transform such data into numeric vectors in a computationally feasible and informative manner is nontrivial.
To fill this knowledge gap, in the present study, we propose a method for generating a continuous-valued trajectory lying in a Euclidean space given temporal network data as input in the form of time-stamped event sequences; the proposed method is also directly applicable to the input given as a discrete-time sequence of adjacency matrices, which is another major representation of temporal networks [65].
Network embedding is commonly understood as embedding of nodes of a network into a low-dimensional space by preserving their pairwise similarity as much as possible [35,37,46]. Alternatively, an edge embedding method embeds edges of the given network into a low-dimensional space [43,66]. An obvious application of network embedding is visualization of networks. In addition, if network embedding preserves sufficient information of the original network data, one may be able to efficiently carry out downstream data mining tasks by working on the network represented in the embedding space, possibly exploiting machine learning algorithms. Such downstream tasks include link prediction [45], visualization [13], and anomaly detection [16]. There are various node embedding methods for temporal networks, with which nodes move over time in the embedding space [47,51,52,54,56].
In contrast to such methods, the network embedding for temporal networks that we develop in this paper does not belong to the traditional classes of network embedding such as node or edge embedding. Instead, we aim at embedding temporal networks into a Euclidean space as a single trajectory such that the network at a given point in time is mapped to a point in the embedding space. The proposed method combines elements of the mathematical modeling framework of tie-decay temporal network [69] and the landmark multidimensional scaling (LMDS), which is a dimension reduction method for general data [6]. The LMDS is an out-of-sample extension of multidimensional scaling (MDS) [23]. This particular combination of methods enables us to generate continuous-time trajectories describing structural and dynamical features of temporal network data given in the form of list of time-stamped events on network edges.
2 The proposed temporal network embedding algorithm 2.1 Tie-decay networks Consider a set of N nodes and a sequence of time-stamped contact events. A contact event, which we simply refer to as the event in the following text, is assumed to occur between a pair of individuals. Therefore, each event is a tuple (i e , j e , t e ), where i e and j e are the pair of nodes, and t e is the time of the event. We allow the event to be directed from i e to j e . We seek to embed such a data set into a low-dimensional space as a network trajectory using tie-decay networks. A tie-decay network is a continuous-time temporal network model in which the weight of the edge (i.e., its tie) increases when an event occurs on the edge, and continuously decays over time [69]. Let where H is the Heaviside function, andt is the time of the th event on edge (i, j), with 0 ≤t 1 ≤t 2 ≤ · · · . Equation (1) implies that an event increases the edge weight by 1 and that the edge weight exponentially decreases with rate α. We show in Fig. 1 an example time course of b ij (t). Next, we consider a sequence of time steps 0 ≤ t 1 < t 2 < · · · , where t is the th time when at least one event occurs in the entire network. We denote by A the adjacency matrix of the network constructed from all the events occurring at time t . The input data is then identified with a sequence of adjacency matrices {A 1 , A 2 , . . .} and the associated times {t 1 , t 2 , . . .}. We obtain (2)

Classical and landmark multidimensional scaling
The classical multidimensional scaling embeds a set of high-dimensional data points into a low-dimensional vector space while preserving the dissimilarity between any pair of data points as much as possible [8,9,23]. We consider a set of M data points, denoted by S = {x 1 , . . . , x M } in a metric space (X, d), where d is a distance measure. The MDS embeds each x i ∈ S into R k , where k is relatively small, by carrying out the following steps. First, we construct the squared distance matrix ∆ = [∆ ij ], where ∆ ij = d 2 (x i , x j ). Second, we calculate the double mean-centered dot product matrix identity matrix, and J M is the M × M matrix in which all the elements are equal to 1. Finally, we obtain the embedding coordinates in R k for x i as the ith column of the following k × M matrix: where λ i is the ith largest positive eigenvalue of D, vector v is the associated right eigenvector normalized in terms of the 2 -norm, and represents the transposition. MDS has a time complexity of O(M 3 ) because it requires eigenvalues and eigenvectors of D [38]. Therefore, it is difficult to apply MDS for large data sets. An out-of-sample extension of a dimension reduction method calculates the exact coordinates in the embedding space only for a small fraction of data points and approximates the coordinates for the remaining data points [5,18]. The LMDS is an out-of-sample extension of the classical MDS [4,6,10]. The LMDS applies the MDS to a subset of the given data points called the landmarks and then uses the landmarks to calculate the embedding coordinates for the other data points. We use S = {x 1 , . . . , x M } as the landmarks and therefore first apply MDS to S. The embedding coordinates for S are given by Eq. (3). Then, we determine the embedding coordinates of the remaining data as follows.
First, we define a k × M matrix by Second, we calculate where δ i is the ith column of ∆. Third, we define the embedding coordinate for an arbitrary out-of-sample data point z / ∈ S by where

Embedding of tie-decay networks by landmark multidimensional scaling
We propose a method to embed tie-decay networks in continuous time into a low-dimensional Euclidean space using LMDS. The key idea is to use the networks at all the times at which the input event happens, i.e., t ∈ {t 1 , t 2 , . . .}, as landmarks and embed the networks at all the other times, i.e., t / ∈ {t 1 , t 2 , . . .}, using LDMS. The algorithm proceeds as follows. First, for a given value of α (i.e., rate of tie decay), we construct the tie-decay network from the input adjacency matrices, {A 1 , . . . , A M }, using Eq. (2), where M is the number of the unique time points at which events occur on any edge. Second, we apply classical MDS to {B(t 1 ), . . . , B(t M )}. Third, we run LDMS with {B(t 1 ), . . . , B(t M )} as landmarks as follows. We need to embed an arbitrary adjacency matrix {B(t) : t ≥ 0}, which we call interpolating points when t / ∈ {t 1 , . . . , t M }, into R k to obtain a continuous trajectory. Using the exponential tie-decay property, we obtain where ∈ {1, . . . , M } satisfies t ≤ t < t +1 . Then, we use Eq. (6) to obtain the embedding coordinates for B(t) for any t as follows: where

Distance measures for nonnegative matrices
To calculate ψ(B(t)), one needs the squared distance between the matrix B(t) and each of the M landmark matrix representations. Because d(B(t), B(t m )) = d(e −α(t−t ) B(t ), B(t m )), we should use a distance measure for networks for which we can easily calculate d(cM 1 , M 2 ) given d(M 1 , M 2 ), where M 1 and M 2 are general weighted adjacency matrices and 0 < c < 1. Here we provide three eligible distance measures.

Frobenius distance
Let ·, · be the Frobenius inner product for real-valued matrices given by where P = [p ij ] and Q = [q ij ] are a × b matrices. We define the induced Frobenius norm · by P = P, P .

Laplacian distance
In this and next subsections, we assume undirected networks. The Laplacian matrix L(t) of a tie-decay network at time t is defined by where D * (t) is the diagonal matrix whose ith diagonal entry is given by b ij (t). The unnormalized Laplacian distance between two Laplacian matrices, which we refer to as the Laplacian distance for short in the following text, is defined by where ρ k (P ) is the kth smallest eigenvalue of symmetric matrix P [15,49,60]. It should be noted that one can replace N k=1 in Eq. (15) by N k=2 because the smallest eigenvalue of a Laplacian matrix is equal to 0. For any t satisfying t ≤ t < t +1 , we obtain L(t) = e −α(t−t ) L(t ) and therefore

Bregman distance
For a pair of networks G 1 and G 2 each with N nodes, we consider the distance measure given by where f : R 2 → R ≥0 is symmetric with respect to the two arguments and satisfies the triangle inequality; L i (with i ∈ {1, 2}) is the Laplacian matrix of G i ; u k (L i ) is an eigenvector of L i that is normalized in terms of the 2 -norm and associated with eigenvalue ρ k (L i ); ·, · E is the Euclidean inner product of two vectors; m is a positive integer [30]. Using the tie-decay networks at times t and t m as the input to F , where t ≤ t < t +1 , we obtain We assume m = 2, which is known to make Eq. (17) independent of the choice of the eigenvectors [30]. An example of F is given through the Bregman divergence as follows. Let g : Ω → R be a continuously differentiable and strictly convex function, where Ω ⊆ R n . We define the Bregman divergence associated with g, which we denote by d g , by where x, y ∈ R n , and ∇g(y) denotes the gradient of g(y). If g(x) = x, x E , then d g reduces to the squared 2 -norm because where |·| 2 denotes the 2 -norm [7]. By setting f = d ·,· E , n = 1, and m = 2 in Eq. (18), we obtain the Bregman distance, denoted by d B , given by where t ≤ t < t +1 . A modification of the Bregman distance that better captures some global dissimilarities between two networks such as connectivity is given by [30]

Goodness of temporal network embedding
MDS better preserves the pairwise dissimilarity between data points if the embedding dimension, k, is larger. Furthermore, the quality of the embedding produced by MDS may depend on the distance measure. To select a distance measure and k for MDS, we use the following goodness of fit criteria. The first criterion uses the fact that embedding into the Euclidean space preserves the pairwise dissimilarity for all pairs of data points if and only if matrix D = − 1 2 H M ∆H M is positive semidefinite [2,3]. The existence and magnitude of negative eigenvalues indicate the deviation from the Euclidean behavior. Therefore, as goodness of fit, we use the ratio of the k most positive eigenvalues over the sum of the absolute values of all eigenvalues [62]. Specifically, we assume that D has at least k positive eigenvalues and define where λ 1 > · · · > λ M are the eigenvalues of D. Index g ranges between 0 and 1 and it is close to 1 when the embedding is quasi-Euclidean. Another goodness of fit criterion for MDS is given by the normalized stress function [1,8,9], denoted by σ, which is defined as Note that |ψ(x ) − ψ(x )| 2 is the distance between the two data points in the embedding space. A small σ value indicates that MDS is accurate at representing the dissimilarity structure of the original data set. Values of σ less than 0.15 have been suggested to be acceptable [8].

Time complexity of the temporal network embedding
Given a squared distance matrix ∆, the computation of the embedding coordinate, given by (9) In the case of the Laplacian distance, the time complexity for computing the Laplacian matrices of all landmarks is O(M N 2 ). To compute the distance between two landmarks, we need the eigenvalues of L(t )∀ ∈ {1, . . . , M }. Because the time complexity for calculating the eigenvalues is Finally, we consider the case of the Bregman distance. In addition to the time complexity of O(M N 2 ) for computing the Laplacian matrices of all landmarks, it takes O(M N 3 ) time to compute all the eigenvalues and eigenvectors of all the Laplacian matrices, i.e., ρ k (L(t )) and u k (L(t )), ∀k, ∀t . Given them, it takes O(M 2 N 3 ) time to compute the squared Bregman distance matrix using (21) or the squared modified Bregman distance using (22). Then, the time complexity for computing δ B (t) is O(M N 3 ). Therefore, the total time complexity of embedding

Theoretical results
In this section, we mathematically investigate two properties of the proposed embedding algorithm for temporal networks: periodicity and boundedness.

Periodic input events
We first examine the convergence of the trajectory in the embedding space when the input events periodically arrive over time. Let {A } ∞ =1 be an infinite sequence of adjacency matrices with each A encoding the new network structure at time t . We assume that {A } ∞ =1 is periodic such that A qT +k =Ã k and t qT +k = qT +t k for each q ∈ N ∪ {0} and k ∈ {1, . . . , T }, where T ∈ N is the number of time points in one cycle at which at least one event occurs, andT is the period in terms of absolute time.
In particular, the tie-decay network with the periodic input converges to the limit cycle of periodT given by where 0 ≤ t <T .
Theorem 3.2. The embedding coordinates of tie-decay networks, given by Eq. (9), with periodic input converge to a limit cycle.
Proof. Under the periodic input, Lemma 3.1 guarantees that lim q→∞ B(qT + t ) converges to the limit cycle X(t ). Furthermore, because of the continuity of the distance metric function, d, the lim q→∞ operation and d are interchangeable. Therefore, we obtain for any ∈ {1, . . . , T }. Using Eq. (29), we obtain By combining Eqs. (9) and (30), we obtain Therefore, the limit cycle in the embedding space is given by ψ(X(t )), where 0 ≤ t <T .
To illustrate Theorem 3.2 by an example, we consider a temporal network on N = 3 nodes. We give a periodic input with T = 4 input matrices defined bỹ We set t = for each ∈ N and α = 0.15. The network trajectory in the two-dimensional embedding space is shown in Fig. 2. Figures 2(a) and 2(b) show a trajectory obtained with the Frobenius and the Laplacian distance measure, respectively. As expected, both trajectories approach a limit cycle with period 4. Under the Frobenius distance measure and α = 0.15, we obtain g = 0.617 and σ = 0.32 with k = 1, g = 0.951 and σ = 0.06 with k = 2, and g > 0.999 and σ = 9.35 × 10 −15 with k = 3. We remind that g and σ are goodness of embedding measures defined by Eqs. (23) and (24), respectively. Under the Laplacian distance measure, D has only two positive eigenvalues. Therefore, we consider g and σ when k = 1, 2. When α = 0.15, we obtain g = 0.820 and σ = 0.26 with k = 1 and g > 0.999 and σ = 5.34 × 10 −15 with k = 2. Therefore, for both distance measures, we conclude that k = 2 is sufficient for embedding this tie-decay network.

Boundedness of trajectories
Next we investigate boundedness of the network trajectory, i.e., the property of the trajectory that it remains within a particular bounded domain. We assume that adjacency matrices {A } ∞ =1 arrive with regular intervals, i.e., t = τ , where τ (> 0) is a constant. However, the extension of the following results to the case of aperiodic inputs, with the minimal interval between two consecutive input adjacency matrices being equal to τ , is straightforward. We also assume that each adjacency matrix A is selected from a finite set, which we denote by {Ã m } M m=1 . In the periodic input case discussed in the previous section, we obtain M = T and A =Ã (( −1) mod T )+1 .
Now we show a boundedness of the network trajectory constructed with the Frobenius distance.
Theorem 3.4. If we use the Frobenius distance for MDS, the embedding coordinate ψ(B(t)) for any t ∈ [0, ∞) is bounded as follows: Proof. Using Eqs. (33) and (34), we obtain Next, we evaluate the distance of the embedding coordinates of B(t) from the origin. Using Eq. (9), we obtain We obtain By applying Cauchy-Schwarz inequality to Eq. (38) and using the fact that the eigenvectors of D are normalized, we obtain By combining Eqs. (36) and (39), we obtain By substituting Eq. (40) in Eq. (37), we obtain Eq. (35).
Next we show the boundedness when we use the Laplacian distance for the MDS. Recall that L(t) denotes the Laplacian matrix of the tie-decay matrix B(t).
Using Eq. (41), we obtain which proves the linearity of the map.
By applying Lemma 3.5 to the tie-decay adjacency matrix B(t) = M m=0c m (t)Ã m , we express L(t) as follows: where LÃ is the Laplacian matrix ofÃ . In the next theorem, we use the same notations as those defined in section 2.4.
Theorem 3.6. If we use the Laplacian distance for the MDS, the embedding coordinate for any t ∈ [0, ∞) is bounded as follows: where Proof. We obtain Because any eigenvalue of a Laplacian matrix is non-negative, we obtain Similarly, we obtain Similarly to the derivation of Eq. (40), we obtain By substituting Eq. (49) in Eq. (37), we obtain Eq. (44).

Trajectories of empirical contact networks
In this section, we apply the proposed temporal network embedding method to two empirical data sets of time-stamped proximity events between human individuals. Both data sets are provided by the Sociopatterns project. The recording was made every 20 seconds. We use the Laplacian distance measure for computing the distance between networks. We set α = 10 −2 . The results of the two data sets with α = 10 −3 , shown in Appendix A, are similar to those with α = 10 −2 , supporting the robustness of our method with respect to the α value.

Primary school
The Primary School data set is a temporal contact network recorded from students and teachers in a primary school in Lyon, France over two days, i.e., 10 Figs. 3(a) and 3(b) the trajectories in the two-dimensional embedding space for the first and the second day, respectively. We set α = 10 −2 . In each day, the tie-decay network is initially empty, which corresponds to (x, y) ≈ (−51.42, 14.93) in the embedding space. In both days, the trajectory resides in the first quadrant during the morning break (see the points labeled 10:30 AM in Fig. 3) and in the fourth quadrant during the lunch break (i.e., 12 PM to 2 PM). The segments of the trajectories during these time windows are relatively, while not completely, distinguishable from those for the rest of the day.
To compare between the two days, we calculate the average coordinate of the trajectory in the embedding space in each of the consecutive 30-minute windows. We show the average coordinate for the two days in Fig. 4(a). The point labeled 10 AM, for example, represents the (x, y) coordinate averaged between 10 AM and 10:30 AM. Figure 4(a) suggests that the two networks at the same point of time on the two days tend to be located close to each other in the embedding space. For example, the networks at 10:30 AM, corresponding to the morning break, on the two days are both located at (x, y) with large x and y values. As another example, the networks during lunch breaks (i.e., 12 PM to 2 PM) tend to linger around (x, y) with −10 ≤ x ≤ 20 and −3 ≤ y ≤ 0. To quantitatively examine the day-to-day consistency of the network trajectory, we compute the distance between the 30-minute average of the network at the same time between the two days, which we show in Fig. 4(b). First, we observe that the networks during the lunch and afternoon breaks are in particular similar between the two days compared to other times of the daytimes. Second, the distance between the two networks at the same time of the day tends to decrease over the course of the day. Third, the coordinate at the same time of the day is in fact close between the two days. Specifically, the average distance between an arbitrarily chosen pair of times indicated by the dots in Fig. 4(a) is equal to 16.87, which is substantially larger than the distance between the two networks at the same time of the day shown in Fig. 4(b), except at 8:30 AM. To justify our choice of the dimension of the embedding space (i.e., k = 2) and the distance measure (i.e., Laplacian distance measure), we carry out the following analysis. We show in Table 1 the values of g and σ for k = 1, 2, and 3 for the two distance measures. We find that the Laplacian distance measure yields much larger g values and much smaller σ values than the Frobenius distance measure does. This result justifies the use of the Laplacian distance. Moreover, Table 1 shows that, with the Laplacian distance, the embedding dimension of k = 2 is enough, but not k = 1. Note that the σ value with k = 2 is equal to 0.06, which is smaller than a suggested criterion value of 0.15 [8]. With k = 3, the goodness of embedding further improves compared with the case of k = 2. However, the gain with k = 3 is not large because the embedding is already good with k = 2. Therefore, we conclude that the combination of the Laplacian distance measure and k = 2 is a reasonable choice for this data set.

Hospital
The second data set is time-stamped contact network collected from a hospital ward in Lyon, France [29]. We refer to this data set as Hospital. The network is composed of 75 nodes, of which 11 nodes represent medical doctors, 35 nurses, and 29 patients. The network has 1, 139 edges and 32, 424 contacts. The data span the weekdays of a week, from 1 PM on Monday, 12/6/2010 to 2 PM on Friday, 12/10/2010.
To compare the network trajectory across days, we show in Fig. 5(a) the hourly average of the embedding coordinate. For example, the coordinate labeled 10 AM represents the coordinate of the trajectory averaged between 10 AM and 11 AM. Figure 5(a) apparently indicates that, similar to the case of the Primary School data, the network trajectory for the different days tends to visit a similar location at the same time of the day. To quantitatively investigate this property, we first compute the distance between the coordinate at a certain time and day and the coordinate at the same time of a different day. Then, given the time of the day, we average the obtained distance value over all possible pairs of the day of the week. For example, There are four tie-decay networks at 10 AM, corresponding to Tuesday, Wednesday, Thursday, and Friday. Therefore, we average the pairwise distance over the 6 possible pairs of days. We show the average distance at each time of the day in Fig. 5(b). The figure suggests that the tie-decay networks at the same time of the day are apparently close to each other across the days in the morning, but less so in the afternoon, which is opposite to the case of the Primary School data (see Fig. 4(b)). We note that all the day-to-day distance values shown in Fig. 5(b) are substantially less than 7.61, which is the average distance between an arbitrary pair of hours shown in Fig. 5(a). Therefore, we conclude that the tie-decay network in different days recurs to a similar network at the same time of the day, similar to the case of the Primary School data.
The previous study showed that contacts among nurses and patients are more abundant in the morning than the afternoon and that those among medical doctors are similar in frequency between the morning and the afternoon [29]. We verify this observation in Fig. 6(a), which shows the normalized number of contacts of different types in each hour. For normalization, we first divided the number of contacts of each type in each hourly window by the number of the days available. For example, data between 8 AM and 9AM was recorded from Tuesday to Friday. Therefore, we divided the number of contacts between 8 AM and 9 AM by four. Second, we further normalized the obtained count so that the average over the hours of the day is equal to 1, separately for each type of contact. Similar to the previous study [29], we consider three types of contacts, i.e., contacts between medical doctors, labeled MD-MD, those between a medical doctor and a nondoctor, i.e., nurse or patient, which we label MD-nonMD, and those between individuals that are not medical doctors, labeled nonMD-nonMD. Figure 6(a) indicates that the number of MD-MD contacts is relatively stable over the course of the day. In contrast, the MD-nonMD and nonMD-nonMD contacts are roughly more abundant in the morning than the afternoon.
Because the entire network trajectory is more diverse and the MD-MD contacts are relatively more frequent in the afternoon than in the morning, we hypothesize that the temporal subnetwork composed only of the MD-MD contacts varies more considerably across the days of the week than the MD-nonMD and nonMD-nonMD subnetworks. To examine this hypothesis, we analyze the trajectory of the MD-MD, MD-nonMD, and nonMD-nonMD temporal subnetworks. The union of the sets of contacts from the three subnetworks is the set of all contacts in the original data set. For each subnetwork, we embed the corresponding tie-decay network and compute its hourly average coordinate in the embedding space. Then, for each subnetwork, similar to the calculation for Fig. 5(b), we compute the average distance between pairs of coordinates at the same time of the day. Finally, we divide this number by the average distance over all the possible pairs of hours. If this normalized distance is smaller than 1, then the trajectory of the subnetwork tends to visit similar locations at the same time of the day across the days.
We show the normalized distance for the three subnetworks from 6 AM to 8 PM in Fig. 6(b). We find Day-to-day distance between the networks at the same time of the day, which is averaged over all pairs of days. We omitted the tie-decay networks before 6 AM and after 8 PM because most of the contacts (i.e., 95.6%) occur between 6 AM and 8 PM. We set α = 10 −2 .
that the day-to-day difference of the MD-MD subnetwork at the same time of the day is not particularly small, with the normalized day-to-day distance value being close to 1 across the different hours of the day (shown in blue in Fig. 6(b)). The day-to-day difference of the MD-nonMD and nonMD-nonMD subnetworks at the same time of the day is smaller than that of the MD-MD subnetwork in most times of the day. In particular, the normalized day-to-day distance for the nonMD-nonMD subnetwork is subsantially smaller than 1 throughout the day. Figure 6(a) implies that the fraction of the MD-MD contacts among all contacts is higher in the afternoon than the morning. Figure 6(b) indicates that the medical doctors form more dissimilar networks across the days than nurses and patients. These two phenomena in combination explain why the day-to-day variation in the entire tie-decay network at the same time of the day is larger in the afternoon than the morning.
We end this section by justifying the choice of the distance measure (i.e., Laplacian distance) and embedding dimension (i.e., k = 2). We show the g and σ values for the two distance measures and three values of k in Table 1. Similar to the case of the Primary School data, we find that the Laplacian distance is much superior to the Frobenius distance, that k = 2 is necessary, and that the accuracy of the embedding does not improve much with k = 3 compared to k = 2. The combination of the Laplacian distance and k = 2 is also satisfactory for embedding the three subnetworks, as we show in Table 2.

Graph classification
To demonstrate the utility of our embedding method, we apply the Support Vector Machine (SVM) to the embedding coordinates of the Primary School data to carry out a graph classification task [61,64]. We define four groups of the time of the school day depending on the activities in the school: morning break, lunch break, afternoon break, and class time. We apply the SVM to the set of the embedding coordinates of tie-decay networks averaged over each 30-minute time window (see Fig. 4) to estimate the label of the group to which each 30-minute network has. There are 36 such averaged coordinates, 18 from each of the   [13], and MDS. In the PCA and t-SNE, we compute an adjacency matrix by aggregating the events in each 30-minute time window. We then vectorize the upperdiagonal part of the adjacency matrices into N (N − 1)/2-dimensional vectors and apply the PCA or t-SNE, obtaining the embedding coordinates in a two-dimensional vector space. In the MDS, we use the same set of aggregated adjacency matrices to compute the distance between each pair of the aggregated adjacency matrices, on which we run the MDS to generate a two-dimensional representation of the adjacency matrices.
To learn the SVM, we first randomly choose 80% of the coordinates in the embedding space as the training set. The remaining 20% of the coordinates are the test set. Then, we measure the accuracy of the classifier as the number of the correctly classified 30-minute networks divided by the total number of the 30-minute networks in the test set. After repeating this procedure 10 4 times, we compute the average and the standard deviation of the accuracy. We show the accuracy values for the tie-decay networks with α = 10 −2 and 10 −3 , PCA, t-SNE, and MDS in Table 3. The mean accuracy for the different classifiers is between 0.8 and 0.9 except for the t-SNE, for  Table 3: Mean and standard deviation of accuracy of the different embedding methods in the graph classification task. We set the embedding dimension to two. We calculated the mean and standard deviation on the basis of 10 4 iterations of the graph classification task. which the mean accuracy is substantially lower (i.e., 0.65) than for the other classifiers. The accuracy for the tie-decay networks with α = 10 −3 (i.e., 0.81) is also notably lower than that for the tie-decay networks with α = 10 −2 , PCA, and MDS. These results remain similar when the embedding dimension is three, as we show in Appendix B. Then, for the embedding dimension equal to two, we ran a repeated measures ANOVA to test the null hypothesis that the mean accuracy is not different among the remaining three embedding methods, i.e., tie-decay networks with α = 10 −2 , PCA, and MDS. The critical value of the F-statistics corresponding to the significant level of p = 0.05 is approximately 1.03. Our test yielded a F-statistics value of 0.99, which is less than 1.03. Therefore, we accept the null hypothesis to conclude that our method based on the tie-decay networks with α = 10 −2 performs on par with PCA and MDS in this graph classification task.

Discussion
In the present study, we have developed a method to embed temporal network data into an embedding space as a continuous trajectory. To this end, we combined the framework of tie-decay networks and the LMDS. We then proved some mathematical properties of the proposed embedding method and applied the method to two empirical data sets of temporal social contact networks among human individuals. For both data sets, we find that the networks at the same time of the day are relatively similar across the different days compared to those at different times of the day. In addition, for the Primary School data set, the coordinates of the morning break and the lunch break in the embedding space are in particular distinguishable from the rest of the day. For the Hospital data set, we have found that the day-to-day variation of the network at the same time of the day is larger in the afternoon than the morning and that this owes to the networking behavior of the medical doctors rather than the nurses and patients.
The network embedding problem that we have addressed in this study is distinct from the more popular node embedding, which aims at mapping individual nodes into a low-dimensional space such that their pairwise similarity is preserved as much as possible. Examples of node embedding include DeepWalk [35], node2vec [43], and LINE [41]. Node embedding methods for temporal networks are also available [11,24,25].
See [50,68] for reviews of node embedding for static and temporal networks. With node embedding for temporal networks, each node moves in the embedding space as the input network varies over time. In contrast, we have proposed to embed the entire network structure at any given time into a single point in the embedding space. Thus, the sequence of networks observed over time generates a continuous-time trajectory lying in the embedding space. Node embedding for temporal networks and the embedding of a temporal network into a trajectory in the embedding space have different goals. With the latter, we drastically coarse grain the original temporal network by representing the snapshot network at each time point, B(t), by a single point in the embedding space. In this manner, we are not focusing on the individual nodes and edges, or their relationships. Rather, we aim at capturing both structural and dynamical features of the evolving networks. By construction, the distance between two points on the trajectory reflects how different the networks at two times are. This property is potentially useful for downstream tasks such as identifying macroscopic states and dynamics of the system state of temporal networks [60,72], change-point analysis [17,36,55], anomaly detection [40], systematic identification of recurrence using recurrence quantification analysis [70,72], and link prediction [11,45].
The LMDS is one among many out-of-sample extension methods for dimension reduction, and it can be regarded as an application of the Nyström approximation of the eigenvalues and eigenvectors of matrices [10]. The Nyström approximation is useful for our temporal network embedding framework because, with a suitably chosen network distance measure, we can efficiently approximate the elements of the distance matrix that are unobserved. We specifically exploited the exponentially decaying nature of B(t) in the absence of input to calculate the distance between pairs of networks at arbitrary continuous times in a manner compatible with the Nyström approximation. In fact, the LMDS is not the only Nyström-type out-of-sample extension of the MDS, and MDS is not the only dimension reduction method that allows the Nyström algorithm [10]. Therefore, our method should also be generalizable to other Nyström-type out-of-sample extensions. Examples include the landmark Isomap algorithm [4,5] (see [14] for the Nyström algorithm for the Isomap) and the landmark diffusion map based on the Nyström approximation [59].
Our temporal network embedding method has arbitrary parameters and allows methodological choices. The tie-decay rate, α, is defined by the user. The embedding results were qualitatively the same between α = 10 −2 and α = 10 −3 for the two empirical temporal networks used in this study. If one uses a large α value, the tie-decay network decays fast such that the network trajectory would quickly approach and linger near the coordinate to which the null network is mapped. In contrast, if α is small, the tie-decay network at time t reflects the input events that are far in the past. Then, the network trajectory may change too slowly and be insensitive to new input events, potentially compromising the interpretation of the results and performance of downstream tasks. Automatic determination of the α value warrants future work. The choice of the embedding dimension, k, and the network distance measure plays an important role. To this end, we evaluated two indices to assess the embedding quality, concluding that the combination of k = 2 and the Laplacian network distance is our recommendation. Generalizing this result to a larger set of goodness of fit indices [48], network distance measures [49,63,65,67], and various data of time-stamped contact events also warrants future work.
Although we focused on contact network data, one can apply the proposed framework and methods of temporal network embedding to various temporal network data. For example, dynamic functional connectivity [28,31] and chronnectome [33] aim to construct and interpret temporal networks from time-dependent correlational measurements of brain activity. Our temporal network embedding method and its possible extensions are expected to give new insights into such functional network data in neuroscience. In particular, the location and dynamics of the network trajectory in the embedding space may give us a compact summary of brain dynamics without the need of explicitly defining and detecting discrete dynamical states of the brain network or their transient dynamics [32,44,57]. State-transition dynamics has also been applied to data of cell cycle dynamics [71]. It may be interesting to apply our methods to these types of data to characterize periodicity and fluctuations around periodic dynamics in cell cycles.    3). We show in Fig. 8(a) the 30-minute average of the embedding coordinate over the two days. The figure suggests that the coordinate at the same time of the day tends to be close between the two days. We show in Fig. 8(b) the distance between the time-averaged tie-decay networks at the same time of the two days in the embedding space. We find that, at any time of the day, the distance is substantially smaller than 156.07, which is the average distance between all the pairs of 30-minute windows. These results are similar to those with α = 10 −2 (see Fig. 4). We carried out the same analysis for the Hospital temporal network, with the window size for averaging being an hour. The results shown in Fig. 9 are similar to those with α = 10 −2 (see Fig. 5). In particular, the average day-to-day distance between the tie-decay networks at the same time of the day, shown in Fig. 9(b), is substantially smaller than the average distance between an arbitrary pair of networks (i.e., 61.01), and this tendency is stronger in the morning than in the afternoon. The variability of the day-to-day distance over the course of the day for each subnetwork of the Hospital temporal network is shown in Fig. 10. The results are similar to those with α = 10 −2 (see Fig. 6).
We conclude that our temporal network embedding method yields qualitatively similar trajectories between α = 10 −3 and α = 10 −2 .  Table 6: Mean and standard deviation of accuracy of the different embedding methods in the graph classification task when the embedding dimension is three. We calculated the mean and standard deviation on the basis of 10 4 iterations of the graph classification task.

Method
Mean Standard deviation Tie-decay with α = 10 −2 0.88 0. Day-to-day distance between the networks at the same time of the day. We omit tie-decay networks before 6 AM and after 8 PM. Normalized day-to-day distance at the same time of the day for the different subnetworks. In (a), we use the same normalization as that used in Fig. 6(a).