Merge Tree Geodesics and Barycenters with Path Mappings

Comparative visualization of scalar fields is often facilitated using similarity measures such as edit distances. In this paper, we describe a novel approach for similarity analysis of scalar fields that combines two recently introduced techniques: Wasserstein geodesics/barycenters as well as path mappings, a branch decomposition-independent edit distance. Effectively, we are able to leverage the reduced susceptibility of path mappings to small perturbations in the data when compared with the original Wasserstein distance. Our approach therefore exhibits superior performance and quality in typical tasks such as ensemble summarization, ensemble clustering, and temporal reduction of time series, while retaining practically feasible runtimes. Beyond studying theoretical properties of our approach and discussing implementation aspects, we describe a number of case studies that provide empirical insights into its utility for comparative visualization, and demonstrate the advantages of our method in both synthetic and real-world scenarios. We supply a C++ implementation that can be used to reproduce our results.


A GEODESIC PROOF
In this section, we provide a formal proof that for two input merge trees the interpolation defined in Sec. 3 indeed forms a geodesic between the input trees.We now give a formal description of the interpolated geodesic trees to make formal arguments easier.Let (T0, f0), (T1, f1) be two merge trees, M ⊆ P(T0) × P(T1) a path mapping between T0 and T1 and let (Tα, fα) α∈(0,1) as follows.We write ℓ0, ℓ1, ℓα for ℓ f 0 , ℓ f 1 , ℓ fα .
Intuitively, for a given α, we interpolate the two trees with coefficients α and (1−α).We first interpolate all mapped paths and move the nodes on these paths such that their relative position on the path remains constant.Then, we interpolate the inserted/deleted edges from/to length zero.Note that the problem of contradicting paths described before does not arise here, as only one mapping is considered.Structurally, the interpolated tree is the supertree induced by the path mapping.To define scalars, we first interpolate the labels of the matched nodes, i.e. the start and end vertices of the matched paths.Then, we move the nodes on these paths such that their relative position stays the same.Furthermore, we contract all deleted or inserted edges to 1 − α or α of their original lengths.An example is shown in Fig. 1.
For a formal definition, recall that we say a vertex v ∈ V (T0) is present in M (and write v ∈ M ) if there is a pair of paths (p, p ′ ) ∈ M such that v is in p (and analogously for vertices of T1).Furthermore, we assume that V (T0) and V (T1) are disjoint and denote the scalar function on the union V (T0) ∪V (T1) of nodes by f .Now we define V (Tα) to be the set In the example in Fig. 1, the ellipse nodes form the first set, nodes C0, C1, F0 form the second set and E0, E1, H0 form the last set.Next, we define the edge set of Tα.For a pair of mapped paths (v1 . . .v k , u1 . . .u k ′ ) ∈ M (an example path is highlighted in Fig. 1), let s1, s2, . . ., s k+k ′ −4 be the sorted union of the inner nodes of the two paths, i.e. f (s1 ∈ M and the same way for edges of T1.In the case where v ′ is the start or end vertex of a mapped path, we have to replace it by its corresponding node in V (Tα) (the resulting path in Tα is also highlighted in the example).
As a last step, we need to define the scalar function on the new nodes.For the matched nodes (v, v ′ ) (i.e.v, v ′ are the start or end nodes on two mapped paths), we define fα((v, v ′ )) = (1−α)•f0(v)+α•f1(v ′ ).For easier notation, we also write fα(v) or fα(v ′ ) instead of fα((v, v ′ )).For a node pi on a path p1 . . .p k matched to p  Fig. 1: Merge trees T 0 and T 1 with barycenter T 0.5 .The optimal path mapping between T 0 and T 1 is illustrated by the dotted lines.Two mapped paths and interpolated path in the geodesic tree are highlighted in cyan.Scalar values and edge lengths can be read from the grid.
we define fα(pi) Note that at least one pair of paths containing the roots of both trees is in the optimal mapping and thus the recursive definition above is well-defined.Furthermore, the described tree is indeed the result of the first iteration barycenter computation in Sec. 3 when the number of inputs is two.
Based on this definition, we can now show that the merge trees Tα (0 ≤ α ≤ 1) define a geodesic between T0 and T1.Clearly, the tree Tα can be created from T0, T1 in linear time.
Recall that, for a metric d, a continuous path P = (Tα) 0≤α≤1 between two trees T0, T1 is a geodesic if its length is exactly the distance d(T1, T2) between T1 and T2.Now consider two time points s, t ∈ [0, 1].With the above definition, we can derive a path mapping Ms,t between Ts and Tt from M .We have to make a case distinction on whether the one of the two time points is 0 or 1.For 0 < s ≤ t < 1, the two trees are structurally the same (only the labels differ).We define Ms,t to be the identity mapping on the edges, which is obviously a valid path mapping.Thus, δ(Ts, Tt) ≤ c(Ms,t).Next, we determine the cost of Ms,t.
Let I and D be the inserted and deleted edges of M .We have Now consider the rest of the edges in Ts and Tt.They are constructed from a mapped path as described above.A pair of mapped paths (we assume leaf-to-root direction in a split tree) (p1..

Invalid edge
In total, we get that for each deleted or inserted edge as well as each mapped path that contributes x to c(M ) contributes (t−s)x to c(Ms,t).Thus, c(Ms,t) = (t − s)c(M ).Now consider the case where 0 = s < t.Since Tt is structurally a supertree of T0, we can define the mapping M0,t to be the embedding from T0 in Tt.Consider some inserted (in M ) edge e ∈ I. Since e / ∈ E(T0), it is also inserted in the embedding M0,t.Thus, it contributes c(0, ℓt(e)) = t • ℓ1(e) = (t − s) • ℓ1(e) to M0,t, whereas it contributes ℓ1(e) to M .A deleted (in M ) edge e ∈ D is mapped to itself in the embedding M0,t.Thus, it contributes c(ℓ0(e), ℓt(e)) = ℓ0(e) − (1 − t) • ℓ0(e) = t • ℓ0(e) = (t − s) • ℓ0(e) to M0,t, whereas it contributes ℓ0(e) to M .
For mapped paths, all arguments are analogous to the previous case.Thus, in total we again have that for each deleted or inserted edge as well as each mapped path that contributes x to c(M ) contributes (t − s)x to c(Ms,t), and therefore c(Ms,t) = (t − s)c(M ).The same holds for the case where s < t = 1 with analogous arguments.

Original
Path mapping Wasserstein Fig. 4: Exemplary time steps of the temporal reduction and reconstruction.The original time series is shown in the middle row.The top row shows the result of the path mapping geodesic, the bottom row of the Wasserstein geodesics.The path mapping reconstruction produces merge trees with a diiferent branch decomposition (to the original series) in the time steps highlighted in yellowm, which is not the case for the Wasserstein geodesics.In particular, the original trees and their Wasserstein reconstructions have the long fork structure as main branch, whereas the path mapping reconstruction has a different one.However, this does not change the fact the path mapping reconstructions are very similar to the original series (when ignoring the branch order).

B STARTING VORTEX BARYCENTERS
We now provide further screenshots of the barycenters computed on the starting vortex ensemble.Fig. 2 shows the barycenter merge trees for each possible initial candidate and both methods.For five out of six initial candidates, the Wasserstein barycenter contains two fork structures of high persistence, which is not the case in the member trees (see Fig. 4), whereas only one contains a long, non-forking edge.In contrast, all six path mapping barycenters are a good summary of the ensemble.

C COMPARISON TO CONTOUR TREE ALIGNMENTS
Next, we quickly illustrate the advantages of path mapping and Wasserstein barycenters over contour tree alignments.We computed barycenter merge trees, the contour tree alignment and the fuzzy contour tree layout for an ensemble consisting of a fixed time steps (in the late phase of the simulation, see Fig. 10) from different runs of the heated cylinder dataset.We applied topological simplification with a threshold of 2% of the scalar range.Fig. 3 shows both barycenters, the fuzzy contour tree rendering (see [30] for details) as well as a ParaView rendering of the alignment tree.While the branch decomposition layout of the fuzzy contour tree summarizes the ensemble well, the ParaView rendering reveals that the alignment tree is not a valid merge tree.This is due to a different averaging technique based on the nodes instead of arcs, branches or paths.It is therefore harder to use for further analysis tasks.

D TEMPORAL REDUCTION
In this section, we provide more detailed results for the temporal reduction on the ionization front time series.In Sec. 4, we compared the reconstructed series of the path mapping geodesics and the Wasserstein geodesics with three key frames for both methods.The quantitative comparison in terms of actual distances between the original and reconstructed trees are given in Table 1.Furthermore, the path mapping geodesics yield good reconstructions already for two key frames, since it can compute meaningful mappings even between the first and last time step.The Wasserstein geodesics need four key frames to produce a good reconstruction, since it can not map the first tree to the last one in a meaningful manner.It therefore needs the time step right before the maximum swap to correctly interpolate the first half of the series and the time step right after the maximum swap to correctly interpolate the second half.We illustrate this behavior on example time steps in Fig. 4. The path mapping reconstruction with two key frames is of similar quality to the one with three key frames (rated on a purely visual basis), whereas the Wasserstein reconstruction is significantly improved with four keyframes (visually bad interpolation as highlighted in red in in Fig. 11 do no longer happen).

Fig. 3 :
Fig. 3: Comparison of merge tree barycenters and contour tree alignments: The top left image shows the path mapping barycenter, the top right image the Wasserstein barycenter.The bottom row shows the fuzzy contour tree on the left and the ParaView rendering of the alignment tree on the right.The latter illustrates the problem of the fuzzy contour tree summarization: the ensemble representative is not a valid merge tree.

Table 1 :
Comparison of the temporal reduction results based on path mapping geodesics and Wasserstein geodesics.The first table shows the path mapping distance between the original and reconstructed merge trees for each time point and both methods.The second table depicts the Wasserstein distances.Keyframes are again highlighted in bold.