Geodesic Learning With Uniform Interpolation on Data Manifold

Recently with the development of deep learning on data representation and generation, how to sampling on a data manifold becomes a crucial problem for research. In this paper, we propose a method to learn a minimizing geodesic within a data manifold. Along the learned geodesic, our method is able to generate high-quality uniform interpolations with the shortest path between two given data samples. Specifically, we use an autoencoder network to map data samples into the latent space and perform interpolation in the latent space via an interpolation network. We add prior geometric information to regularize our autoencoder for a flat latent embedding. The Riemannian metric on the data manifold is induced by the canonical metric in the Euclidean space in which the data manifold is isometrically immersed. Based on this defined Riemannian metric, we introduce a constant-speed loss and a minimizing geodesic loss to regularize the interpolation network to generate uniform interpolations along the learned geodesic on the manifold. We provide a theoretical analysis of our model and use image interpolation as an example to demonstrate the effectiveness of our method.

Howerver, little works of literature focus on the inter-28 polation along geodesics which can be very helpful to 29 some downstream tasks such as clustering [9], classifica-30 tion [10] and segmentation [11]. For geodesic learning, 31 Arvanitidis et al. [9], [10] and Chen et al. [12], [13] both 32 The associate editor coordinating the review of this manuscript and approving it for publication was Mostafa M. Fouda . present a magnification factor [14] to help find the shortest 33 path that follows the regions of high data density in the latent 34 space as the learned geodesic. But these methods explore 35 geodesics on latent space rather than on data space and lack 36 strict guarantees for geodesics in mathematics. 37 In this paper, we propose a method to capture the geometric 38 structure of a data manifold and to find smooth geodesics 39 between data samples. First, motivated by DIMAL [15], 40 we introduce a framework combining an autoencoder with 41 traditional manifold learning methods to explore the geomet-42 ric structure of a data manifold. In contrast to existing combi-43 nations of autoencoders and GANs, we resort to the classical 44 manifold learning algorithms to obtain an approximation of 45 geodesic distance as a constraint for an autoencoder. Using 46 this method, our encoder can unfold the data manifold to 47 obtain a flat latent representation. 48 Second, we propose a geodesic learning method by inter-49 polating on the data manifold to establish smooth geodesics 50 between data samples. The Riemannian metric on the data 51 manifold can be induced by the canonical metric in the 52 Euclidean space in which the manifold is isometrically 53 immersed. The interpolation between data samples has 54 make the network as simple as possible, resulting in non-flat 94 representations. 95 For our method, we add some prior geometric informa-96 tion obtained by traditional manifold learning approaches 97 to encode a flat latent representation. Traditional non-linear 98 manifold learning approaches such as Isomap [19], LLE [20] 99 and LTSA [21] are classical algorithms to get a flat embed-100 ding by unfolding some curved surfaces. We apply them to 101 our method by adding a regularizer to the autoencoder. The 102 loss function of the autoencoder can be written as: 104 where x is the input sampled from the data manifold. details we refer the interested reader to [19], [20], [21]. 108 Encoder E and Decoder D of an autoencoder are trained to 109 minimize the above loss L AE . With DKS ij as an expected 110 approximation, the encoder is forced to train towards obtain-111 ing a flat latent representation while the decoder is forced 112 to learn the lost curvature information from latent embed-113 dings. Behaviors induced by the L AE loss and other four 114 autoencoder-based methods can be observed in Fig. 1: 115 only by using our L AE loss, the swiss-roll can be flat-116 tened on 2-dimensional latent space. For our experiments, 117 we choose LTSA to compute the approximated geodesic 118 distance because its learned local geometry views the neigh-119 borhood of a data point as a tangent space to flat the manifold. 120 The parameters setting for those compared methods are the 121 same as their original papers.

123
In our model, we denote X as a data manifold. The Rie-124 mannian metric on X can be induced from the canonical 125 metric on the Euclidean space R N to guarantee the immersion 126 is an isometric immersion. Thus to obtain a geodesic on 127 manifold X , we can use the Riemannian geometry on R N and 128 the characteristics of isometric immersion. 129

130
We produce geodesics on manifold X by interpolating in the 131 latent space and decoding them into data space. The simplest 132 interpolation is linear interpolation as z = (1 − t) · z 1 + t · z 2 . 133 For geodesic learning, linear interpolation is not applicable in 134 most situations. Yang et al. [9] propose to use the restricted 135 class of quadratic functions and Chen et al. [12] employ a 136 neural network to parameterize the geodesic curves. We use 137 polynomial functions similar to Yang's approach as our inter-138 polation network. The difference is that we employ cubic 139 functions to parameterize interpolants considering the diver-140 sity of latent representations, i.e., c(t) = at 3 + bt 2 + ct + d. 141 Therefore, a curve generated by our interpolation network 142 has four m-dimensional free parametric vectors a, b, c and d, 143 where m is the dimension of latent coordinates. In practice, 144 we train a geodesic curve c(t) that connects two pre-specified points z0 and z1, so the function should be constrained to 146 satisfy c(0) = z0 and c(1) = z1. We initialize our inter-147 polation network by setting a = 0 and b = 0 to make 148 the initial interpolation be a linear interpolation and perform 149 the optimization using gradient descent. More details can be 150 referred to in Yang's paper [9].  2 is a constant, for ∀t ∈ I .

163
As stated in Theorem 1, the length of the tangent vector 164 along a geodesic is a constant. Let G(t) = D(c(t)) denote the 165 output of the decoder taking the interpolation curve as input, 166 we design the following constant-speed loss as:

168
where n denotes the number of sampling points and

177
After guaranteeing the output curve G(t) of our decoder 178 has a constant speed, we need to let the curve G(t) be a 179 geodesic. We have the following theorem to ensure a curve 180 is a geodesic.

181
Theorem 2: in R N , then γ (t) is a geodesic on X , if and only if: 187 Theorem 2 demonstrates a curve is a geodesic if and only if 188 its second derivative with respect to parameter t is orthogonal 189 to the tangent space. In practice, we can assume our encoder 190 maps a point on the manifold X to its local coordinates.

191
So based on Theorem 2, we are able to optimize the following 192 problem as the geodesic loss: where D is the function of decoder and D c( is an N × m matrix 197 corresponding to ∂h ∂z γ (t) in Theorem 2. Geodesic loss and 198 constant-speed loss jointly force curve G(t) to have zero 199 acceleration. That is, G(t) is a geodesic on data manifold. 200 But geodesic connecting two points may not be unique, such 201 as the geodesic on a sphere. The minimizing geodesic is the 202 curve with minimal length connecting two points. Thus we 203 add a minimizing length constraint to ensure G(t) is a mini-204 mizing geodesic. We approximate the curve length using the 205 summation of velocity at t i . The minimizing loss is proposed 206 to minimize curve length as: For implementation, we use the following difference 209 approximation to reduce the computational burden: For D c(t) , we can use the Jacobian of the decoder as 213 implemented by Pytorch or difference approximation that 214 is similar to G (t). To summarize this part, the overall loss 215 function of our interpolation network is: where λ 1 , λ 2 and λ 3 are the weights to balance these three 218 losses. Under this loss constraint, we can generate interpo-219 lations moving along the minimizing geodesic with constant 220 speed and thus fulfill a uniform interpolation.

221
The overall geodesic learning algorithm is given 222 in Algorithm 1.

224
In this section, we present experiments on the geodesic gener-225 ation and image interpolation to demonstrate the effectiveness 226 of our method. First, we do experiments on 3-dimensional datasets since 229 their geodesics can be better visualized. We choose the 230 semi-sphere and swiss-roll as our data manifolds. 231 1) SEMI-SPHERE DATASET 232 We randomly sample 4,956 points subjecting to the uni-233 form distribution on semi-sphere. In Fig. 2, we com-234 pare our approach with other interpolation methods, i.e., 235

Algorithm 1 Geodesic Learning
The number of iterations of the interpolation network n iter ; The weight parameters λ 1 , λ 2 , λ 3 ; Delta-time t. 1: Train an autoencoder by optimizing Eq. (1); 2: Obtain latent embeddings with encoder E of the trained autoencoder: Obtain latent representations of interpolations c(t i ) using interpolation network. Decode Based on Eq. (6), compute G (t i ) and G (t i ) with t. 8: Compute the overall loss L total using Eq. (7).  VAE network and their stochastic Riemannian metric. Chen's 248 method can generate interpolations along a geodesic, but they 249 cannot fulfill a uniform interpolation. In Fig. 2, we observe 250 that our method can generate uniform interpolation along a 251 fairly accurate geodesic on semi-sphere based on the defined 252 Riemannian metric. 253 We also do ablation study on the semi-sphere dataset to 254 investigate the effect of different losses proposed in our 255 method, including the constant-speed loss, geodesic loss, and 256 minimizing loss. The estimated curve length shown in Table 1 Table 1, we just set the corresponding weight of the 263 removed loss term as 0 to compare the estimated geodesic 264 length. From Table 1 we can see that even though without the 265 geodesic loss, our network can generate a shorter path that 266 is close to real geodesic compared with linear interpolation.  method [5]. All the methods are reproduced using the same 283 autoencoder-based network architecture with ours. Parame-284 ters are chosen as in the original papers. Fig. 3 shows the 285 experimental results. We observe that except our approach, 286 other methods fail to generate interpolations within the data 287 manifold.

288
AAE just forces the latent distribution to be a Gaussian 289 distribution, even though there is a trade-off between latent 290 distribution matching and data reconstruction [23]. So there 291 is no guarantee for a useful latent interpolation, let alone 292 along a geodesic. ACAI, GAIA and Chen's method try to 293 make decoded points from linear latent interpolations remain 294 within data manifold, but from Fig. 3 we can see that for 295 manifold with large curvature, it's hard to obtain a flat 296 unfolded latent embedding without a strong prior geometric 297 information, which can be further verified through Fig. 1.   298 Besides, the interpolation curve with these three methods are 299 not related to the geodesics, which means we can't find a  changes. This demonstrates our geodesic learning method can 343 fulfill a uniform interpolation along a geodesic. 344 We further verify the characteristic of the geodesic's short-345 est path for our method on both datasets above. In Fig. 6, 346 we present a comparison of the interpolant trajectory's 347 length with linear and geodesic interpolation for both VAE 348 and AAE. We randomly choose 250 pairs of endpoints on 349 data manifold for each evaluation and approximate the tra-350 jectory's length using the summation of velocity at t i which 351 is described in Section II-B3. Fig. 6 shows that our geodesic 352 interpolation has smaller average length and variance on both 353 MNIST and Fashion-MNIST datasets. This demonstrates that 354 compared with linear interpolation, our interpolation method 355 can make the interpolation curve traverse along a shorter path 356 which is the main characteristic of a geodesic. 358 We explore the geometric structure of the data manifold by 359 proposing a geodesic learning algorithm with uniform inter-360 polation. We add prior geometric information to regularize 361 our autoencoder to generate a flat unfolded latent embedding. 362 We also propose a constant-speed loss and a minimizing 363 geodesic loss to interpolate along geodesics on the underlying 364 data manifold given two endpoints. Different from existing 365 methods in which geodesic is defined as the shortest path on 366 the graph connecting data points, our model defines geodesic 367 consistent with the definition of a geodesic in Riemannian 368 geometry. Experiments demonstrate our model can fulfill a 369 uniform interpolation along the minimizing geodesics both 370 on 3-D curved manifolds and high-dimensional image space. Proof of Theorem 1. Proof: Suppose the Riemannian 383 metric on X is induced from R N by identity mapping i. 384 [x 1 , x 2 , · · · , x N ] is the Cartesian coordinate system of R N . 385 x 1 (t), x 2 (t), · · · , x N (t) is the Cartesian coordinate of γ (t). 386 Based on the characteristic of di, we obtain the corresponding 387 tangent vector of γ (t) in T γ (t) R N :

IV. CONCLUSION
According to the canonical metric on R N , we have T γ (t) R N as follows: Combined with Eq. (10), we can deduce

421
Therefore, based on the definition of geodesic, γ : I → X is 422 a geodesic on X , if and only if: 424

425
That is,

429
We write it in the form of matrix multiplication,