Geodesic-Based Bayesian Coherent Point Drift

Coherent point drift is a well-known algorithm for non-rigid registration, i.e., a procedure for deforming a shape to match another shape. Despite its prevalence, the algorithm has a major drawback that remains unsolved: It unnaturally deforms the different parts of a shape, e.g., human legs, when they are neighboring each other. The inappropriate deformations originate from a proximity-based deformation constraint, called motion coherence. This study proposes a non-rigid registration method that addresses the drawback. The key to solving the problem is to redefine the motion coherence using a geodesic, i.e., the shortest route between points on a shape's surface. We also propose the accelerated variant of the registration method. In numerical studies, we demonstrate that the algorithms can circumvent the drawback of coherent point drift. We also show that the accelerated algorithm can be applied to shapes comprising several millions of points.


INTRODUCTION
N ON-RIGID registration is a procedure for deforming a shape to match another shape. Automated registration methods have actively been researched because registered shapes become a basis for organizing, reconstructing, and synthesizing a broader class of shapes [1], [2], [3].
A key to improving registration quality is to impose coherent moves on a group of points during registration. The coherent moves, also called motion coherence, effectively shrink a shape deformation space; therefore, non-rigid registration algorithms reasonably avoid superfluous deformations. The most common type of motion coherence is based on the proximity between points. The motion coherence of this type is typically introduced by imposing local rigidity on a shape surface [4], [5] and the penalty on a nonsmooth deformation field [6], [7], [8], [9], [10], [11], [12]. Apart from the coherent moves based on proximity, nonrigid registration techniques impose the coherence based on prior knowledge to register shapes involving more specific deformations, such as human face [13], [14], human hand shape and pose [15], and human body pose [16], [17], [18], [19], [20]. These types of motion coherence improve the registration accuracy for the shapes of interest; however, they limit the classes of shapes that can be registered.
Unlike registration, shape matching directly identifies corresponding points or regions without shape deformation. The prevalent methods of this type [21], [22], [23], [24] are typically based on a geodesic, i.e., the shortest route between points on a shape surface. These methods assume the isometry of a deforming object, implying that the geodesic distance between points is invariant under shape deformation. Isometry helps solve shape correspondence problems, such as the body shapes of the same person with different postures. A drawback of these methods is their scalability; isometry computation typically involves the geodesic distance between every pair of points for each shape, as suggested in [25]. Shape matching can also be performed using functional map, i.e., finding a map between functions, where each function represents a shape. A functional map implicitly encodes shape correspondence; therefore, post-processing algorithms are essential to recover point-to-point correspondences [26], [27], [28], [29].
Despite the key role of a geodesic in shape matching, geodesic-based motion coherence has not received adequate attention in shape registration; this is probably because the geodesic kernel is typically indefinite [30], and typical objective functions become ill-posed when applying the indefinite kernel to non-rigid registration. Very recently, Jousse et al. [31] proposed incorporating the geodesic exponential kernel into a non-rigid registration algorithm called Gaussian process morphable model [32]. To circumvent the indefiniteness of the kernel, they approximated the corresponding Gram matrix using a low-rank approach. Their approach provided a promising direction for future research; however, its theoretical basis is still unknown.
We propose a non-rigid registration method incorporating geodesic information into motion coherence. This modification aims to address a drawback of coherent point drift (CPD) [7]; the algorithm unnaturally deforms a shape if different parts of the shape are neighboring each other. Our registration algorithm is based on Bayesian coherent point drift (BCPD) [33], which improved CPD in terms of scalability and robustness. The main contributions of this paper are as follows: We propose geodesic-based BCPD, which utilizes geodesic exponential and Gaussian kernels. We provide a theoretical basis of Jousse's approach that converts the geodesic exponential kernel into a positive-semidefinite alternative.
We accelerate the conversion using the Nystr€ om method and the parallel runs of Dijkstra's algorithm. We accelerate geodesic-based BCPD by extending it to the BCPD++ framework [34]. The remainder of this paper is organized as follows. Section 2 summarizes related work regarding the acceleration of non-rigid registration. Section 3 describes our theoretical and practical contributions to non-rigid registration. Section 4 reports the performance of the proposed algorithms using numerical experiments. Section 5 concludes this study. The implementation of the algorithm is available at the author's GitHub repository: https://github.com/ ohirose/bcpd.

RELATED WORK
In this section, we review the acceleration methods for nonrigid registration. We classified non-rigid registration algorithms into hard-matching and soft-matching methods based on how the algorithms decide corresponding points between shapes. The hard-matching algorithms are typically based on iterative closest point (ICP) [35], which estimates point-to-point correspondences in a one-to-one manner. These algorithms have actively been developed in the field of mesh registration [4], [16], [18], [19], [36]; a 3D mesh typically comprises at least several thousands of points, and ICP's scalability is compatible with mesh registration. In contrast, soft-matching methods estimate corresponding points in a one-to-many manner [6], [7], [8], [33], [34]. Soft-matching methods often register shapes more accurately than hard-matching methods. However, a naive computation of soft matching is computationally expensive because it requires the affinity computation of every pair of points across the shapes.
Subsequently, we review the acceleration of soft-matching methods because they have recently approached hardmatching methods in terms of scalability. CPD accelerates soft-matching procedure using fast Gauss transform (FGT) [7]. However, CPD often suffers from slow convergence, particularly for the shapes comprising more than 10 5 points. This is because it switches from FGT to truncated Gauss computation near convergence to avoid the inaccuracy originating from FGT. CPD's truncated Gauss computation requires roughly the same computational cost as the naive method, as suggested by [10].
To address this issue, Golyanik et al. proposed a simulation-based registration method based on physical laws [37], [38]. A key idea of their method is to replace the soft-matching procedure with the gravity computation between points. They accelerated the gravity computation using the Barnes-Hut octree [39], in which the gravity from distant points was roughly computed. Hirose improved the scalability of CPD by accelerating the truncated Gaussian computation using a neighbor-search method with a k-D tree [33]. However, these tree-based methods become inefficient when a shape contains densely distributed points; neighboring points rapidly increase as the point distribution of a shape becomes dense.
Several approaches can be used to avoid the issue with neighbor-based methods because they only register candidates of corresponding points extracted by feature descriptors [40], [41], [42], [43]. These methods sometimes produce sparse regions of candidate points and reduce the registration quality for such areas. Hirose proposed an acceleration method called BCPD++ that is scalable to more than 10 7 points. The method divides non-rigid registration into three steps: downsampling input shapes, registering the downsampled shapes, and interpolating deformation vectors corresponding to the points removed during downsampling. The method extracts points following the uniform distribution on a shape surface, and therefore, it circumvents the issue with sparse regions. In this paper, we accelerate geodesic-based BCPD using the framework of BCPD++.

METHODS
In this section, we propose a shape registration algorithm, geodesic-based Bayesian coherent point drift (GBCPD), which incorporates a geodesic into motion coherence. In Section 3.1, we review BCPD and geodesic-based motion coherence. In Section 3.2, we propose an improved variant of motion coherence using a geodesic. In Section 3.3, we propose an efficient algorithm for computing the improved motion coherence. In Section 3.4, we propose an acceleration scheme of GBCPD, called GBCPD++. In Section 3.5, we propose a hierarchical registration strategy regarding GBCPD/ GBCPD++. In Section 3.6, we list the tuning parameters and summarize their effects.

Background
This section first reviews BCPD and motion coherence prior and then introduces the motion coherence based on the geodesic distance. We use the following notation throughout this manuscript. Notation 1.
D -the dimensionality of the space wherein a point set is embedded. N; M -the numbers of points in the target and source shapes, respectively.
. . . ; v M Þ T 2 R MÂD -a matrix that collects deformation vectors of a source shape. f sim ðÁÞ -similarity transformation. I D -the identity matrix of size D.

Definition of Non-Rigid Point Set Registration
We define the non-rigid point set registration problem as an optimization problem of finding a similarity transformation f sim and a deformation matrix V 2 R MÂD corresponding to a non-rigid deformation as follows: where L is a loss function evaluating the dissimilarity between X 2 R NÂD and T ¼ f sim ðY þ V Þ 2 R MÂD . Some formulations of non-rigid registration include soft-matching weights as additional arguments to be optimized. We omitted the weights because they can typically be represented as a function of ðV; f sim ; X; Y Þ. We note that typical definitions of L satisfy the permutation invariance, i.e., L is unaffected by the order of input points.

Bayesian Coherence Point Drift
BCPD is a non-rigid point set registration algorithm that improves CPD regarding computation time and robustness against outliers and target shape rotation [33]. BCPD formulates non-rigid registration as a Bayesian inference problem, which can also be interpreted as the optimization problem defined as Eq. (1) with the permutation invariance. We base the registration algorithm proposed herein on the BCPD algorithm owing to its efficiency and robustness.

Motion Coherence Prior
One important building block of BCPD is the prior distribution of deformation vectors. The prior distribution is called motion coherence prior because it imposes coherent moves on neighboring source points during the deformation. Motion coherence prior is defined as the Gaussian distribution of augmented deformation vectors, as follows: where > 0 is a positive constant that controls the expected length of deformation vectors, G YY 2 R MÂM is the matrix encoding motion coherence, and the symbol is the Kronecker product. Motion coherence prior also plays a key role in deriving its acceleration scheme called BCPD++ [34].

Coherence Matrix
The definition of G YY characterizes how the motion coherence prior tends to deform a shape. We call G YY a coherence matrix, which is focused in this paper. Typically, the coherence matrix G YY is defined using a similarity function K : R D Â R D ! R, as follows: The matrix G YY is guaranteed to be positive definite if K is a positive-definite kernel. The similarity function K plays a central role in motion coherence.
To confirm this, we decompose the covariance matrix in Eq. (2) into block matrices of size D Â D. The ijth block matrix among them corresponds to the covariance matrix between v i and v j , represented as follows: This equation indicates that the angle between v i and v j tends to decrease with increasing similarity Kðy i ; y j Þ. Therefore, the choice of K strongly affects registration performance. In the next section, we discuss a drawback of the Gaussian kernel, a typical kernel function of BCPD.

Drawback of a Gaussian Kernel
BCPD often fails to register specific shapes, such as two humans with different poses. Figs. 1a and 1b show two body surfaces before and after the BCPD registration, respectively. We observe that BCPD incorrectly deformed the hip and thighs of the source shape. The incorrect deformation originates from the Gaussian kernel incorporated into BCPD.
To investigate the cause, we rewrite the similarity function K as the combination of a function f : R ! R and a distance function D : R D Â R D ! R, as follows: Typically, f is defined as a bell-shaped function that has the unique global maximum at zero and that asymptotically approaches zero with increasing argument value. Then, Eqs. (3) and (4) claim that a pair of source points becomes more coherent as their distance decreases. The kernel function K is called the Gaussian kernel when D is the Euclidean distance and f is the Gaussian function, which is denoted by where b > 0 is a parameter controlling the width of the Gaussian function. We use the Gaussian function f b ðtÞ as an example of the bell-shaped function f throughout this manuscript, and hereafter we focus on the definition of a distance function D.
The Gaussian kernel often encodes inappropriate motion coherence when different body parts are neighboring each other. The blue-circled points in Fig. 1a should independently move; however, they coherently moved with each other, as shown in Fig. 1b. This inappropriate deformation is due to the small Euclidean distance across different body parts. The proximity between different body parts, such as the hip and arms, caused their incorrect deformation.

Geodesic-Based Motion Coherence
The geodesic distance is a promising alternative to the Euclidean distance; a geodesic is defined as the shortest route between two points on a shape surface. Fig. 1c shows the Euclidean and geodesic distances between blue-circled points. We observe that the geodesic distance is significantly larger than the Euclidean distance. The motion coherence between the points decreases if we replace the Euclidean distance with the geodesic distance.
Hereafter, we denote Euclidean and geodesic distances by D ðeucÞ and D ðgeoÞ , respectively. We also denote Euclidean and geodesic coherence matrices for arbitrary point set matrices A and B as follows: corresponding to the blue-circled point, where we set the parameter b to 1.0. We observe that Euclidean motion coherence spans multiple body parts, whereas geodesic motion coherence only spans the left arm.

Improved BCPD Coherence Matrix
In this section, we define an improved coherence matrix based on the geodesic distance, and propose an approach to circumventing the indefiniteness of the coherence matrix.

Surface Coherence Matrix
We propose an improved coherence matrix sharing the merits of geodesic and Euclidean distances. The Euclidean distance can encode the motion coherence across bumps and cracks on a shape surface, whereas the geodesic distance can encode the motion coherence restricted to a local surface. To benefit from both distances, we combine geodesic and Euclidean coherence matrices as follows:

YY ;
where t 2 ½0; 1 is a weight parameter. We call G YY a surface coherence matrix. Unfortunately, the matrix G YY defined above can be indefinite unless t ¼ 0. because G ðgeoÞ YY is typically an indefinite matrix. Table 1 summarizes the eigenvalues of nine surface coherence matrices, computed from TOSCA shapes [44] under b ¼ 1:0. We observe that all the minimum eigenvalues are negative, meaning that the coherence matrices are indefinite for all cases. In Section 4.1, we further discuss the indefiniteness of the matrices.
The indefiniteness of a coherence matrix violates the definition of motion coherence prior. This implies that the corresponding loss function L is unbounded, and the optimization problem represented as Eq. (1) is ill-posed. In the next section, we discuss converting the indefinite matrix into a definite one.

Optimal Positive-Semidefinite Approximation
To address the indefiniteness issue of the coherence matrix, we approximate it using a low-rank, positive-semidefinite matrix that minimizes the approximation error. That is, we solve the following optimization problem: where G 2 R MÂM is a symmetric matrix, jj Á jj F is the Frobenius norm, and S þ M;K is the set of positive-semidefinite matrices of size M satisfying rankðAÞ K. Fortunately, the optimum of the above problem is known to be represented as follows [45], [46]:Ĝ where K 0 is the number of positive eigenvalues, L þ K 0 2 R K 0 ÂK 0 is a diagonal matrix comprising the positive eigenvalues, and Q 2 R MÂK 0 is a matrix comprising corresponding eigenvectors.
Using this formula, we can reasonably convert the indefinite coherence matrix into a positive-semidefinite alternative. Furthermore, Q and L þ K 0 can directly be applied to the acceleration scheme inside the BCPD algorithm, which requires the eigendecomposition of a coherence matrix [33]. We note that this formula provides a theoretical basis for Jousse's approach [31].

Approximation Accuracy
Here, we evaluate the approximation strategy experimentally. First, we define an accuracy measure of the approximation. From the result in Section 3.2.2, we can decompose jjGjj 2 F into the fitting and error terms as follows: The listed shapes were obtained from the TOSCA dataset. Each column represents a distinct attribute, such as the shape name, the smallest eigenvalue, the largest eigenvalue, the sum of negative eigenvalues, the sum of positve eigenvalues, and the accuracy of the optimal positive-semidefinite approximation. Each shape is composed of 3,400 points except for gorilla, composed of is 2,046 points.
The decomposition justifies the following definition of an approximation accuracy measure: where L 2 R MÂM represents a diagonal matrix collecting the eigenvalues of G. The definition guarantees A G ðĜÞ 2 ½0; 1. The last column in Table 1 shows A G ðĜÞ for each shape under ðt; bÞ ¼ ð0; 1Þ. All the accuracies are greater than 0.9992, suggesting the approximation strategy is promising.

Computational Issues
When we apply the optimal positive-semidefinite approximation to the surface coherence matrix G YY 2 R MÂM , we face the following bottleneck computations: (1) finding the shortest path between every pair of points in Y and (2) solving the eigendecomposition of G YY . Both computational costs are at least OðM 2 Þ, meaning that the computations are intractable for a moderately large M. In the next section, we propose a method for circumventing these computational issues.

Fast Positive-Semidefinite Approximation
We propose an efficient method for a low-rank, positivesemidefinite approximation that is nearly optimal. The Nystr€ om method and Dijkstra's algorithm play a central role in this task for the following reasons: The eigendecomposition of G YY 2 R MÂM is unnecessary. Instead, the method eigendecomposes a significantly smaller submatrix G UU 2 R KÂK , created by a The calculation of G YY is also unnecessary. The method evaluates a significantly smaller matrix G YU 2 R MÂK instead of G YY . Dijkstra's algorithm efficiently computes G YU containing G UU . Fig. 2 illustrates how we accelerate the approximation. We refer to the BCPD algorithm involving the accelerated positive-semidefinite approximation as GBCPD. Fig. 3 shows an example application of GBCPD. We see that the BCPD issue is successfully relaxed.
Hereafter, we describe the acceleration method by dividing it into the following steps: Finally, we describe how we circumvent the numerical instability that originates from applying the Nystr€ om method to an indefinite matrix.

Graph Construction for Geodesic Computation
We define a geodesic as the shortest path between points on a surface mesh to efficiently calculate geodesics. If the surface mesh is unavailable, we construct a k-nearest-neighbor (k-NN) graph using a point set to define candidate paths. If a point belongs to the k-NNs of another point, we add an undirected edge between them, unless duplicated. We define an edge weight as the Euclidean distance between points connected by the edge.
We note that an undirected graph guarantees the symmetry of the shortest-path distance, i.e., D ðgeoÞ ½y i ; y j ¼ D ðgeoÞ ½y j ; y i for all i and j, which consequently guarantees the symmetry of the surface coherence matrix G YY . We also note that the graph construction procedure satisfies the permutation invariance, i.e., the constructed graph is unaffected by the order of input points. The computational cost of the graph construction is OðMlog MÞ because of the k-D tree search algorithm if we assume that k is a constant.

Surface Coherence Submatrix Computation
We use Dijkstra's algorithm to efficiently compute G ðgeoÞ YU involved in G YU . A single run of Dijkstra's algorithm efficiently finds the shortest paths from a point to all the other points. Substituting t in the Gaussian function f b ðtÞ with the shortest-path distances, we obtain a column of G ðgeoÞ YU . Thereafter, we compute G YU according to its definition, i.e.,  The computational cost of G YU is OðMlog MÞ if we assume the number of neighbors k and the number of points in U are constants. This is because the computational cost of Dijkstra's algorithm is OððE þ MÞlog MÞ, where E is the number of edges, and the number of edges in the k-NN graph that we construct is OðMÞ. We note that the computational cost of G UU can be ignored because of G UU & G YU . In practice, we further reduce the runtime required for G YU by parallelizing the multiple runs of Dijkstra's algorithm.

Eigendecomposition with the Nystr € om Method
We approximately eigendecompose G YY using the submatrices G YU and G UU to accelerate the positive-semidefinite approximation. Suppose K is the number of points in U.
The Nystr€ om method computes the reduced-rank eigendecomposition G YY % Q Ã L Ã Q ÃT as follows: whereQ 2 R KÂK andL 2 R KÂK are matrices, both of which are obtained from the exact eigendecomposition of G UU 1 QLQ T . We note that the Nystr€ om method can eigendecompose an indefinite matrix [47], [48], although numerical instability arises unless we carefully avoid it [49]. In Section 3.3.5, we discuss how we avoid the numerical instability. The computational cost of the Nystr€ om eigendecomposition is OðMÞ if we assume G YU and G UU are already computed and K is a constant.

Positive-Semidefinite Approximation
We extract positive eigenvalues and corresponding eigenvectors from Q Ã and L Ã according to the optimal positive-semidefinite approximation described in Section 3.2.2. That is, we compute a low-rank, positive-semidefinite approximation of G YY % QL þ K 0 Q T as follows: where K 0 is the number of positive eigenvalues inL, and L þ K 0 2 R K 0 ÂK 0 andQ 1:K 0 2 R KÂK 0 are submatrices ofL and Q corresponding to positive eigenvalues. We note that this procedure requires no additional computational cost after the Nystr€ om eigendecomposition. Therefore, we can perform the positive-semidefinite approximation of G YY in OðMlog MÞ time in total.

Implementation Issue
The approximation scheme can be inaccurate unless it is carefully implemented. As suggested in [49], when applying the Nystr€ om method to an indefinite matrix, (1) positive eigenvalues of approximately zero lead to numerical instability, and (2) the approximation error tends to increase unless the sampled points uniformly spread over the support of the data distribution.
To numerically stabilize the approximation, we impose a condition number constraint on the approximated G YY . We remove positive eigenvalues satisfying k = 1 < , where k is the kth leading eigenvalue inL, and > 0 is an acceptable condition number of G YY . We note that the condition number is scale-invariant; therefore, we can choose without considering the scale of input shapes.
Furthermore, we use the voxel-grid resampling algorithm [34] to extract U because it guarantees that sampled points uniformly distribute over the input shape. We summarize the approximation procedure that we call the fast positivesemidefinite approximation (FPSA) in Algorithm 1.

Geodesic-Based BCPD++ Algorithm
In this section, we propose GBCPD++, incorporating the surface coherence matrix G YY into the BCPD++ algorithm [34], i.e., a key technique for accelerating BCPD. To this end, we revise BCPD++ because it originates from the motion coherence prior with a positive-definite coherence matrix. For convenience, we define the following notation: N 0 ; M 0 -the numbers of points after downsampling X and Y .
In this section, we review the original BCPD++, and subsequently we propose GBCPD++, i.e., a variant of BCPD++ accepting the surface coherence matrix.

BCPD++ Acceleration
BCPD++ accelerates the non-rigid registration by dividing it into three steps: (1) downsampling input point sets, (2) registering downsampled point sets, and (3) interpolating missing deformation vectors, as shown in Fig. 4. More formally, the non-rigid point set registration defined as Eq. (1) was replaced by the following subproblems: where ' : R M 0 ÂD ! R MÂD is an interpolation function. The first subproblem is to register downsampled point sets, whereas the second subproblem is to interpolate V fromV Z such that V minimizes the loss function LðV;f sim ; X; Y Þ. The BCPD algorithm was used for the first subproblem. In the next section, we review the interpolation function used for the second subproblem.

Deformation Vector Interpolation
BCPD++ interpolates the deformation vectors of source points Y on the basis of the motion coherence prior. This prior distribution assumes that the deformation vectors of neighboring source points correlate with one another. Owing to the correlation, we can estimate the deformation vector of a removed point from the neighboring source points that remain after downsampling.
Here, we review the interpolation function ' incorporated into BCPD++. Suppose y 2 R D is a point in Y and v y 2 R D is the corresponding deformation vector. Since the motion coherence prior is a Gaussian process prior, the predictive distribution pðv y jy; Z;V Z Þ can be derived from the theory of Gaussian process regression [34]. This distribution can estimate any deformation vector v y as its predictive mean given ðy; Z;V Z Þ. The interpolation function ' is defined as a collection of the predictive mean deformation vectors regarding Y as follows: where C 2 R M 0 ÂM 0 is a diagonal matrix that controls the smoothness ofV Z , and E 2 R M 0 ÂD is a matrix that collects non-smooth displacement vectors. We can compute C and E by reformulatingV Z inside the BCPD algorithm aŝ

Low-Rank Approximation in BCPD++
The computation of Eq. (5) is still time-consuming if M 0 is relatively large. To accelerate the computation, the original BCPD++ performs low-rank approximations of G YZ and G ZZ using the Nystr€ om method as follows: where Q Ã Z and L Ã Z are matrices obtained from the Nystr€ om eigendecomposition of G ZZ . These low-rank approximations cannot be applied to GBCPD because of the indefiniteness of G UU and G ZZ , which violates the definition of the motion coherence prior. In the next section, we propose an approach to avoiding this issue.

Modified Low-Rank Approximation
We avoid the BCPD++ indefiniteness issue using the positive-semidefinite approximation described in Section 3.3. Here, we suppose Z is a subset of source points Y . By changing the order of points, we can rewrite the positivesemidefinite approximation G YY % QL þ K 0 Q T as a blockmatrix expression: where Q Z and Q Y nZ are the submatrices that divides Q into the rows corresponding to Z and the remaining rows, respectively. We note this re-ordering is always valid because the BCPD loss function L satisfies the permutation invariance, as described in Sections 3.1.1 and 3.1.2. From this expression, we obtain the following approximations: Applying these approximations and the Woodbury identity to Eq. (5), we obtain the following formula: where S ¼ Q T Z C À1 Q Z is a matrix of size K 0 Â K 0 , and I K 0 is the identity matrix of size K 0 . The computational cost ofV becomes OðMÞ if we already compute the positive-semidefinite approximation G YY % QL þ K 0 Q. Moreover, the approximation G ZZ % Q Z L þ K 0 Q T Z accelerates the BCPD algorithm during the first subproblem. Therefore, we perform the positive-semidefinite approximation at the beginning of the modified BCPD++ algorithm, i.e., GBCPD++, summarized in Algorithm 2.

Algorithm 2. Geodesic-Based BCPD++
Input: X 2 R NÂD ; Y 2 R MÂD ; N 0 ; M 0 . Output: T ¼f sim ðY þV Þ. a) Eigendecomposition: Á Compute Q 2 R MÂK 0 and L þ K 0 2 R K 0 ÂK 0 that satisfies G YY % QL þ K 0 Q T using the FPSA algorithm. b) Downsampling: Á Generate X 0 2 R N 0 ÂD and Z 2 R M 0 ÂD by applying a downsampling method to X and Y , respectively. Á Set Q Z to the rows of Q corresponding to Z. c) Registration: Á Find T Z ¼f sim ðZ þV Z Þ that matches X 0 by applying BCPD to X 0 and Z under G ZZ % Q

GBCPD++ runs in
OðMlog MÞ time if we assume N 0 ; M 0 < < M and N % M. To address this, we discuss the computational cost of each step in Algorithm 2. In the eigendecomposition step, the positive-semidefinite approximation requires OðMlog MÞ time, as described in Section 3.3. In the downsampling step, we use the voxel-grid resampling algorithm [34] to extract ðZ; X 0 Þ from ðY; XÞ, which runs in OðMÞ time if N % M. In this step, we can use the farthest point sampling [50], a more standard downsampling technique. We typically use the voxel-grid resampling because it is often much faster than the farthest point resampling in practice, especially if M; N ! 10 6 and M 0 ; N 0 ! 10 4 . In the registration step, the acceleration scheme inside the BCPD algorithm reduces the computational cost to OððN 0 þ M 0 Þlog ðN 0 þ M 0 ÞÞ, which is at most OðMlog MÞ if N % M. In the interpolation step, the computational cost is OðMÞ as described in the previous section. Therefore, the computational cost of GBCPD++ algorithm results in OðMlog MÞ.

Hierarchical Registration
We propose a hierarchical registration strategy focussing on BCPD. The BCPD algorithm contains the motion coherence parameters Q that remain fixed during a registration. Therefore, BCPD often fails a global registration under weak motion coherence or a local registration under strong motion coherence. To reduce such failures, we repeat BCPD in a coarse-to-fine manner if needed; we gradually decrease motion coherence during the repetition. More formally, we recursively define the hierarchical registration as follows: where T ðX; Y; QÞ is the BCPD registration function that generates a deformed source shape T , and the index l ! 0 reprensents how many times we repeat registration. This recursive formula becomes a coarse-to-fine registration by setting fQ ðlÞ g L l¼1 to a sequence decreasing motion coherence. We note that BCPD++/GBCPD/GBCPD++ can employ the hierarchical registration strategy.

Tuning Parameters and Shape Normalization
GBCPD/GBCPD++ requires the tuning parameters of BCPD [33] because it uses BCPD as a subroutine. In this section, we list the tuning parameters of GBCPD/GBCPD++ and describe their effects on shape registration. Table 2 summarizes the tuning parameters of the proposed algorithms and their recommended ranges, and Fig. 5 shows the relation between related algorithms and the parameters. We omitted the parameter k used for BCPD from the table because k ¼ 1 typically works the best. We also omitted the parameter L used for BCPD++ [34]; L is the number of Nystr€ om samples approximating G YY at the interpolation step. This is because L ¼ K typically works; the choice is reasonable in that the numbers of Nystr€ om The number of source points after downsampling. The smaller, the faster (although less accurate). N 0 ½2000; 50000

List of Tuning Parameters
The number of target points after downsampling. The smaller, the faster (although less accurate).
The first six parameters must be predefined. The last four parameters are acceleration parameters, and BCPD runs without them. In contrast, GBCPD and GBCPD++ always require K to avoid the indefiniteness of the coherence matrix G YY .
samples approximating G YY are the same in the registration and interpolation steps.

Remarks on Tuning Parameters
Here, we remark on the effect of the tuning parameters, leading to an insight into how to set the parameters.
Rough and Fine Registrations. The proposed algorithms can be applied to both rough and fine registrations. As shown in Fig. 6, the parameter g changes the registration trajectory; the registration becomes more dependent on the initial alignment with decreasing g, and vice versa. Therefore, we use either g ! 1 or g ¼ 0:1 according to rough or fine registration, respectively.
Surface Coherence Matrix. The parameter t > 0 activates GBCPD and avoids the issue of interest, described in Section 3.1.5. In this case, we recommend the width parameter b 2:0. Unless the issue arises, we recommend t ¼ 0, i.e., BCPD, because it is numerically stable irrespective of . In this case, we use either b ! 1 or b < 1 according to rough or fine registration, respectively. We also recommend the rank parameter K is at least 70 regardless of t and b.
Acceleration. The parameters J accelerates the computation inside GBCPD's iterative optimization, whereas the parameters N 0 and M 0 activate GBCPD++, accelerating the computation outside the iterative optimization. For 3D registration, we recommend activating the internal acceleration; J ¼ 300 typically works if we switch the acceleration method to the k-D tree search near convergence. If N and M are more than 10 5 , we recommend activating both accelerations. The acceleration methods reduce memory consumption down to the space complexity OðM þ NÞ, and therefore, the acceleration methods avoid a memory allocation error for the shapes with dense points.

Shape Normalization
The choice of shape normalization schemes affects the registration quality. We design a shape normalization having the following characteristics: (1) the normalization pre-aligns input shapes unless they are pre-aligned, and (2) it changes the shape size to ease the setting of scale-dependent parameters b and . Here, we describe how we normalize input shapes. For convenience, we define the following normalization function of a shape A: where 1 is the vector of all 1s, m B 2 R D is the mean point in a shape B, and s B is the standard deviation of the elements in B. We calculate the normalized shapes ðX;Ỹ Þ as follows: This normalization strategy prevents the initial alignment from being spoiled by a standard normalization technique when we perform fine registration.

EXPERIMENTS
In this section, we analyze the indefiniteness of G YY and evaluate the performances of the FPSA algorithm, GBCPD, and GBCPD++.

In-Depth Analysis of Indefiniteness
We analyzed the indefiniteness of geodesic coherence matrices to investigate whether the optimal positive-semidefinite approximation is crucial to our approach. The positive definiteness of the matrices might be sufficiently probable in practice; it has been reported that geodesic exponential kernels tend to be positive definite within a broad range of b [51]. Furthermore, it has also been suggested that the following extension can relax the indefiniteness issue [52], [53]: i.e., the geodesic coherence matrix parameterized by the exponent of geodesic distance q, which satisfies 0 q 2. Fig. 7 shows how b and q affect the indefiniteness of geodesic coherence matrices. Here, we computed the exact minimum eigenvalue for each of nine TOSCA shapes [44] without using the FPSA algorithm. The left graph shows that G ðgeoÞ YY is positive definite for b < 0:0125. We typically use b ! 0:1 for shape registration, and therefore, shape registration fails under a typical choice of b. Also, the right graph shows G ðgeo;qÞ YY is positive definite for q < 0:3 under b ¼ 1:0. These results suggest an advantage of the optimal positivesemidefinite approximation; the positive definiteness is independent of b, and the extension G ðgeo;qÞ YY is unnecessary.

Performance of the FPSA Algorithm
We evaluated the FPSA algorithm in terms of accuracy and runtime. We set the parameters of the surface coherence matrix G YY to ðt; bÞ ¼ ð1; 1Þ; the matrix indefiniteness was most severe at t ¼ 1.

Computational Environment
We used a Mac Mini (2018, macOS 10.14.6) with a 3.2 GHz Intel Core i7 CPU (six cores) and 64 GB RAM. We implemented  GBCPD and GBCPD++ in C and used Apple clang 11.0.0 as a C compiler. We parallelized GBCPD and GBCPD++ using OpenMP and measured the wall-clock time and CPU time.

Demonstration
We compared the exact and approximate computations of G YY using a shape containing 6,890 points in the FAUST dataset. We visualized a column of G YY corresponding to the blue-circled point by setting K ¼ 200 and ¼ 10 À3 .
The human shapes in Fig. 8 show the motion coherence values of the blue-circled point. We observe the similarity between the red regions corresponding to the large motion coherence. This similarity suggests that the FPSA algorithm adequately approximated the motion coherence values. The histogram in Fig. 8 shows the error distribution of the motion coherence values. The maximum absolute error was less than 0.04, and 96:5% of the absolute errors were less than 0.01, suggesting that the FPSA algorithm appropriately approximated the motion coherence values.

Effect of Tuning Parameters
We evaluated how K and affect approximation errors using the same shape as that in Section 4.2.2. We defined the approximation error as follows: whereĜ YY represents an approximated G YY , defined aŝ G YY ¼ QL þ K 0 Q T . We averaged the errors over ten trials for each ðK; Þ set because the FPSA algorithm depends on random numbers. We also computed the standard deviation (SD) to evaluate the numerical stability of the algorithm.
Figs. 9a and 9b show the effect of K and on the approximation errors. The average errors and SDs decreased as K increased for ¼ 10 À5 and 10 À3 . We observe that ¼ 10 À3 achieved the best for all K. In contrast, ¼ 10 À7 typically resulted in the largest error and SD for each K. These results suggest that a relatively large increases the accuracy and numerical stability of the FPSA algorithm. We also observe that ¼ 10 À1 resulted in large errors, although SDs were relatively small, suggesting that an excessively large leads to a stable but inaccurate approximation.

Computational Time
We measured the runtime of the FPSA algorithm with and without parallelization using the Lucy dataset [34]. To evaluate the scalability, we executed the FPSA algorithm for 10 point sets, downsampled from the dataset. We set ðK; Þ ¼ ð200; 10 À3 Þ. Fig. 9c shows the runtime and registration accuracy of the FPSA algorithm. The runtime slopes were close to be linear, supporting that the computational cost of the FPSA algorithm is OðMlog MÞ, described in Section 3.3. The parallelization increased CPU times owing to the multithreading overhead; however, it roughly reduced wall-clock times to 25%, suggesting that the FPSA algorithm can be parallelized efficiently.

2D Point Set Registration
In this section, we evaluate the 2D registration performance of GBCPD, CPD [7], BCPD [33], PR-GLS [9], and RPM-L2E [54]. We used the Leaf dataset [56] and the Tools 2D dataset [44]. From these data, we created an input point set by randomly sampling the black pixels for each black/white binary image. We sampled 500 and 2,000 pixels for Leaf data and Tools 2D data, respectively. For PR-GLS and RPM-L2E, we further sampled 500 pixels among the 2,000 pixels because they were more time-consuming than the other methods.

Parameter Setting
We set the registration parameters ðb; ; v; g; t; Þ to ð0:7; 500; 0:0; 3:0; 0:5; 10 À3 Þ and ð2:0; 50; 0:0; 0:1; 0:5; 10 À3 Þ for the Leaf data and Tools 2D data, respectively. We used the same acceleration parameters ðJ; KÞ ¼ ð300; 100Þ for both datasets. To compute geodesics, we constructed the 10-NN  graph from an input point cloud and removed the edges longer than 0.15 after shape normalization. PR-GLS, CPD, and BCPD share some parameters with GBCPD. For these methods, we set the shared parameters to the same values as those used for GBCPD. For RPM-L2E, we used the same parameters as those described in the article [54]. Fig. 10 shows the registration results for the Leaf data. We see that all methods registered the leaf shapes correctly; the registration quality of RPM-L2E is slightly lower than the other methods. Fig. 11 shows the registration results for the Tools 2D data. We observe the typical registration errors originating from the Gaussian motion coherence, e.g., (BCPD, B2) and (RPM-L2E, B1). In contrast, GBCPD successfully registered these shapes without the typical errors.

Registration Runtime Analysis
This section analyzes the scalability and registration accuracy of GBCPD/GBCPD++ using a synthetic dataset.

Accuracy Measure
We evaluate registration accuracy based on the root-meansquare distance (RMSD). For point sets A 2 R MÂD and B 2 R MÂD with the ground-truth point-to-point correspondences, we define the RMSD, denoted by rðA; BÞ, as follows: Then, we define an accuracy measure as follows: where T 2 R MÂD is a deformed shape. The accuracy becomes one for perfect registration, and it becomes zero if the source shape Y and the deformed shape T are identical. The accuracy becomes negative if rðX; T Þ ! rðX; Y Þ. We use this registration accuracy measure throughout this manuscript.

Test Data Generation
For a shape denoted by Y 2 R MÂD , we created a deformed shape X ¼ Y þ V 2 R MÂD preserving point-to-point correspondences, where V 2 R MÂD is a deformation matrix sampled from the motion coherence prior with the covariance matrix À1 G YY I D . Here, we defined G YY as the surface coherence matrix of the shape Y with ðt; bÞ ¼ ð1; 1:5Þ, and we set ¼ 50.  10. 2D shape registration using leaf data. Fig. 11. 2D shape registration using Tools 2D data.
Because the prior is a Gaussian distribution, we can sample a deformation matrix V 2 R MÂD via the eigendecomposition of the covariance matrix and the generation of random numbers following the standard normal distribution [55]. To accelerate the eigendecomposition, we performed the FPSA algorithm with the remaining parameters ðK; Þ ¼ ð200; 10 À3 Þ. Fig. 12 shows the original and corresponding deformed shapes.

Results
We executed GBCPD and GBCPD++ using the parameters ðb; ; v; g; t; Þ ¼ ð0:0; 1:0; 200; 0:1; 0:5; 10 À3 Þ and ðJ; K; M 0 ; N 0 Þ ¼ ð300; 200; 50000; 50000Þ. To compute geodesics, we constructed the 10-NN graph from an input point cloud and removed the edges longer than 0.15 after shape normalization. Table 3 shows the results of the runtime analysis. GBCPD++ reduced computation times for all pairs of shapes; GBCPD++ finished the computation within 200s for Asian Dragon, whereas GBCPD continued to run for over 60 minutes. Furthermore, the accuracy differences between GBCPD++ and GBCPD were small, suggesting that GBCDP++ is competitive with GBCPD regarding registration accuracy.

Parameter Setting
For all experiments in Section 4.5, we used the parameters ðb; ; v; g; t; Þ ¼ ð0:7; 100; 0; 3; 0:5; 10 À3 Þ and the acceleration parameters ðJ; K; M 0 ; N 0 Þ ¼ ð300; 200; 3000; 3000Þ. We set the parameters shared by the competitors to the same values as those used for the GBCPD. For Fast RNRR and AMM NRR, we selected the best parameters for each dataset through trial and error.

Large-Scale Analysis
We evaluated the registration accuracies using the human shapes extracted from the FAUST data [19] and the bear shapes included in the Bear3EP Aggression category of the DeformingThings4D data [59]. We randomly extracted 20 shapes for each dataset and divided them into ten source and ten target shapes, some of which are shown in Fig. 13a. Then, we registered a hundred pairs of the shapes, i.e., all combinations of source and target shapes.
Figs. 13b and 13c show the accuracy distributions for the FAUST and DeformingThings4D data, respectively. We summarize the results as follows: GBCPD, GBCPD++, CPD, BCPD, and BCPD++ outperformed Fast RNRR and AMM NRR.
BCPD worked the best for the bear shapes, suggesting that the surface coherence matrix does not always improve the registration accuracy. The registration of the human shapes was less accurate than that of the bear shapes for all methods. We will further discuss these observations in Section 4.8.

Focused Analysis
We registered the shapes whose different parts are neighboring each other to verify whether the proposed methods relax the BCPD issue. We extracted several shapes from DFAUST [60] and Space-Time Faces [61] data. Fig. 14 shows the registration results. From the figure, we see that GBCPD and GBCPD++ successfully registered the shapes unlike the other methods, suggesting an advantage of GBCPD and GBCPD++ in these cases.

Ablation Study
We performed an ablation study to investigate how the tuning parameters of G YY affect registration accuracy; we changed each of the parameters K, t, b, and , setting the other parameters to those described in Section 4.5.1. We used the human shapes in the previous section; we denote the female and male shapes by Data A and Data B, respectively. Fig. 15 shows the effect of tuning parameters on registration accuracy; We computed the median accuracy over 50 trials for each set of parameters because GBCPD uses random numbers. In the first figure, we observe the registration accuracy reached the plateau at around K ¼ 100, suggesting that a relatively small K is sufficient, at least for these shapes. In the second figure, we observe the peaks, especially for GBCPD, suggesting the effectiveness of combining  the Euclidean and geodesic coherence matrices. We note that t ¼ 0 corresponds to BCPD; we also observe that GBCPD is more accurate than BCPD for these data. In the third figure, we observe the clear peaks for all combinations of data and methods, suggesting the need for tuning b to achieve better performance. In the last figure, we observed the weak peaks for Data B, i.e., the female shapes, suggesting the approximate eigenvalues close to zero affect the registration accuracy.

Point Cloud Registration
We investigated the point cloud registration performance of GBCPD; the surface mesh of a shape is often unavailable, especially for the shapes captured by a 3D scanner. We applied GBCPD to the point clouds shown in Fig. 16a. To apply GBCPD to the point clouds, we constructed the 10-NN graph from an input point cloud and removed the edges longer than 0.15 after shape normalization. We used the same parameters described in Section 4.5.1. Fig. 16b shows the registration results. We see that GBCPD failed to register Data 1, i.e., the same shapes as those used in Section 4.5.3 except for the point connectivity. This result suggests that the lack of point connectivity reduces the registration accuracy of GBCPD. From the figure, we also see that GBCPD successfully registered Data 2, whose adjacent parts are slightly longer than those of Data 1. The computational time of the FPSA algorithm was 0.063 s for Data 2, suggesting that GBCPD is advantageous to BCPD even if we consider the extra time required for the FPSA algorithm.
The red curves in Fig. 16c show the example geodesics required for computing the surface coherence matrix G YY . We see that the geodesic computation is inappropriate for Data 1, suggesting the method failed to reconstruct the shape surface between legs. For Data 2, the method computed a more appropriate geodesic. Fig. 16d represents a column of G ðgeoÞ YY that corresponds to the blue-circled point for each data. We see that the motion  coherence spans both legs for Data 1, whereas, for Data 2, it spans the left leg only. This suggests that the neighbor graph of a point cloud helps prevent the motion coherence from spanning different body parts unless the distance between adjacent parts is extremely small. Fig. 16e shows the registration results of the related algorithms for Data 2; the corresponding result of GBCPD is shown at the bottom of Fig. 16b. These results suggest that GBCPD and GBCPD++ can outperform CPD, BCPD, and BCPD++ in terms of point cloud registration.

Hierarchical Registration
We incorporated GBCPD and GBCPD++ into the hierarchical registration described in Section 3.5 and applied them to the DFAUST [60] data and TOSCA [44] high-resolution data, respectively. We designed the three-level hierarchy, defined as the parameter sequence decreasing motion coherence, as follows: We used t ð1Þ ¼ 0:5 and t ð1Þ ¼ 0:1 for the DFAUST and TOSCA data, respectively. For the remaining parameters, we used the parameters described in Section 4.5.1. Fig. 17 shows the hierarchical registration using GBCPD. We used the same target shapes as those in Fig. 14. At L ¼ 1, GBCPD roughly registered the input shapes, although it incorrectly deformed several details, e.g., heads, fingers, and toes. At L ¼ 2, the method improved the shape details, such as the female shape's fingers and the male shape's toes. At L ¼ 3, the method slightly improved the details, e.g., the female shape's thumb. Moreover, the hierarchical registration improved the deformation of both shapes' heads for each level. Fig. 18 shows the hierarchical registration using GBCPD++. We see that GBCPD++ successfully registered cat and horse shapes. Furthermore, the deformed shapes approached the target shapes as L increased. These results suggest that the hierarchical registration technique expands the potential of GBCPD and GBCPD++.

Limitations and Future Work
In this section, we summarize the limitations of GBCPD and GBCPD++.

Large Deformation
The proposed methods often fail when the difference between source and target shapes is large. Fig. 19a shows a typical failure example of the proposed method. GBCPD failed to find the target shape's arms because of the large distance between corresponding arms. This example explains why the registration accuracy for the FAUST human shapes was lower than that for the DeformingTh-ings4D bear shapes, as described in Section 4.5.2. The deformation space of the human shapes is considerably larger than those of the bear shapes, as we see from the shapes shown in Fig. 13a.

Partial Shape Registration
The proposed methods often suffer from registering the shapes with large missing regions. Figs. 19b and 19c show the registration examples of the shapes with missing regions. As shown in Fig. 19b, GBCPD can register the shapes if the missing region of a shape is small. In contrast, GBCPD often fails if the missing region is large, as shown in Fig. 19c.

Parameter Setting
Another limitation of the proposed methods is that they lacks the automation technique that optimizes the tuning parameters. Therefore, we need to optimize them manually, which sometimes requires trial and error.

Future Work
One approach to avoiding the issue of registering largely deformed shapes is to use the functional map [26] based on shape descriptors. It can avoid the issue because several shape descriptors are preserved regardless of the distance between corresponding points. Moreover, the functional map can filter out noisy shape features using the Laplace-Beltrami basis functions, which can convert the shape features into the ones sorted according to their frequency.
To refine the shape matching, several methods based on the functional map utilize registration algorithms, e.g., the iterative closest point algorithm [26] and CPD [27], in a post-processing manner. The proposed methods can also be applied to the shape matching refinement. However, the registration performance of the proposed methods is unknown in this scenario. This is because the refinement requires the registration in a high-dimensional space, whose dimensionality is typically higher than three. Applying the proposed methods to highdimensional data can be a good future direction.
A promising approach to solving partial matching problems is to find the partiality mask representing the overlapped regions of input shapes. Pioneering work in this direction can be found in [62], [63]. Incorporating the proposed methods into these partial matching methods is another interesting future work. In addition, the proposed methods will be more practical if the tuning parameters are automatically optimized; automating parameter tuning is also a future research direction.

CONCLUSION
BCPD often fails to register shapes when the different parts of a source shape are neighboring each other. This is because of the Euclidean distance incorporated into the motion coherence of BCPD; BCPD imposes strong coherent moves on the points in a shape surface during the registration if the points are close in terms of the Euclidean distance.
An approach to mitigating this issue is to impose coherent moves based on a geodesic. The geodesic distance between a shape's different parts is typically larger than the corresponding Euclidean distance; therefore, it diminishes the motion coherence between them. A drawback of the geodesic distance is the lack of robustness against surface cracks; it becomes infinity if the two regions in the shape  surface are disjoint. In contrast, the Euclidean distance is less affected by such cracks.
This study proposed GBCPD, i.e., a BCPD algorithm based on geodesic-based motion coherence. The motion coherence is based on the surface coherence matrix, utilizing geodesic and Euclidean distances. To circumvent the indefiniteness of the matrix, we approximated it using a low-rank, positive-semidefinite matrix. We accelerated GBCPD using the following algorithms: (1) the FPSA algorithm, accelerating the optimal approximation of the surface coherence matrix; and (2) GBCPD++, extending GBCPD to the BCPD++ framework.
In numerical studies, we demonstrated that the FPSA algorithm efficiently approximates the surface coherence matrix. We also evaluated the registration performance of GBCPD/GBCPD++. The results showed that GBCPD can circumvent the drawback of CPD and BCPD. We also showed that GBCPD++ was scalable to more than 3M points, maintaining registration accuracy. These results suggest that GBCPD/GBCPD++ can be a useful technique for non-rigid registration.

ACKNOWLEDGMENTS
The author would like to thank the anonymous referees who provided insightful, constructive comments on the first version of the manuscript.