A Fast Solution for the Generalized Radial Basis Functions Interpolant

In this paper, we propose a fast solution method of the generalized radial basis functions interpolant for global interpolation. The method can be used to efficiently interpolate large numbers of geometry constraints for implicit surface reconstruction. The basic idea of our approach is based on the far field expansion of the kernel and the preconditioned Krylov iteration using the domain decomposition method as a preconditioner. We present a fast evaluation method of the matrix-vector product for the linear system. To minimize the number of iterations for large numbers of constraints, the multi-level domain decomposition method is applied to improve overlap using a nested sequence of levels. The implemented solution algorithm generally achieves $O(NlogN)$ complexity and $O(N)$ storage. It is kernel independent both in the evaluation and solution processes without analytical expansions. It is very convenient to apply various types of RBF kernels in different applications without excessive modifications to the existing process. Numerical results show that the fast evaluation method has a good performance for the evaluation of the matrix-vector product and the preconditioned Krylov subspace iterative method has a good convergence rate with a small number of iterations.


I. INTRODUCTION
Radial basis functions (RBFs) are widely used in large-scale scattered data interpolation and approximation. Applications include surface reconstruction of dense point clouds, solution of partial differential equations and particle interactions in computational physics.
The generalized radial basis functions (GRBF) interpolant [1]- [3] is initially developed to exactly interpolate the Hermite data (a set of points with normals). Various types of the interpolation constraints, including gradient constraints and tangent constraints, are developed to satisfy different application needs [4]- [6]. As a special case of the GRBF interpolant, the Hermite radial basis functions (HRBF) [7] can exactly interpolate the gradient directions of an implicit surface. However, the direct solution methods of the interpolation equation cost O(N 3 ) complexity and O(N 2 ) storage. For a large number of constraints (more than a few thousand The associate editor coordinating the review of this manuscript and approving it for publication was Jiju Poovvancheri . points), it is too expensive to solve the interpolation equation using a globally supported basis function.
In recent years, great progress has been made to efficiently solve the large-scale linear system for the RBF-based methods. The domain decomposition-based iteration method (e.g., the Fast RBF method [8]) is a kind of efficient solution method. The optimal domain decomposition strategy is helpful to improve the convergence of iteration. However, as the evaluation process is different, the existing global interpolation methods cannot be applied to the generalized radial basis functions interpolant directly. The previous works focus on solving the radial basis functions interpolant only with domain constraints (the constraints with specified function values). Besides the domain constraints, the directional constraints (e.g., the gradient constraints and the tangent constraints) should be interpolated efficiently. It is necessary to develop efficient methods for the evaluation and solution of GRBF.
In this paper, based on a multilevel domain decomposition method, we develop a fast global solution method for the GRBF interpolant with globally supported radial basis functions. It is an extension of the fast radial basis functions interpolation method. Similar to the fast radial basis functions interpolation method, the idea of this method is based on the far field expansion of the RBF kernel and the preconditioned Krylov iteration using the domain decomposition method as a preconditioner.
To solve the dense linear system efficiently, a fast evaluation method of the matrix-vector product for GRBF is presented. We expand the evaluation of the domain constraints and the difference constrains into several sums with different source points and evaluation points respectively. Each sum is efficiently evaluated using the black-box fast multipole method (FMM) [9] for a wide class of RBF kernels. Then the whole evaluation of the matrix-vector product consists of the evaluations of several sub-matrix blocks. Based on the fast evaluation method, the preconditioned Krylov subspace method is used to solve the linear system iteratively. The flexible generalized minimal residual (FGMRES) method [10] with variable preconditioners is used as the outer iterative method. And the result of the domain decomposition method at each iteration step is used as the variable preconditioners. To minimize the number of iterations for large numbers of constraints, the multi-level domain decomposition method is applied to improve overlap using a nested sequence of levels.
We apply this method to efficiently recover an implicit surface using the GRBF interpolant with large numbers of geometry constraints. The interpolation constraints are converted into geometry constraints by constructing a signed distance field (SDF) [11]. As an example, the Hermite data can be approximated using the domain constraints and the difference constraints of the gradient [5]. Numerical results show that the solution process converges with a small number of iterations by specifying a precision.
The paper is organized as follows. The previous works are studied in Section II. Section III gives a brief description of the GRBF interpolant. In Section IV, the evaluation of GRBF is decomposed into several fast summations with different source points. Based on the fast evaluation method, we use a multilevel domain decomposition method as a preconditioner to efficiently solve the linear system in Section V. In Section VI, the fast solution method is implemented to validate the performance and optimal parameters of the algorithm. The limitations and extensions of the method are discussed in Section VIII.

II. RELATED WORKS
In the past two decades, a number of efficient numerical methods such as radial basis functions have been proposed for scattered data interpolation. One of the common approaches is to adjust the support radius as a function of the data density using the compactly supported basis functions (e.g., Wendland's CSRBF [12]). There are several ways for local interpolation. One of the extensions is to decompose the set of interpolation centers into a nested sequence of subsets using a multilevel method [13]. The advantage of this approach is that the compactly supported basis functions lead to a sparse linear system by ignoring the effects of farther interpolation centers and it is easy for implementation [14]. Moreover, these local interpolation methods can be simply applied to the GRBF interpolant. As an example, Liu et al. [15] proposed a fast HRBF interpolation method by automatically adjusting the support sizes of radial basis functions. However, the compact support bases are inferior to the global bases in some applications. The approximation of ignoring the effects of farther interpolation centers may lead to an impact that is difficult to estimate, especially in sparse and uneven data environments. In fact, as a more general approach, the compact support bases can be directly used in the global interpolation method.
For the global bases, it is necessary to implement a fast evaluation for computing matrix-vector product. Based on the idea of low rank approximation, the kernel-based summation can be efficiently evaluated using far field expansion, which is known as the fast multipole method [16], [17]. The fast multipole method was originally developed for the fast summation of the potential fields. At present, a number of expansion methods have been proposed for the RBF kernels, including polyharmonic splines [18], generalized multiquadrics [18] and thin-plate splines [20]. The fast Gauss transform (FGT) [21] is used to compute the summation for Gaussian-type kernels. The expansions of these classical methods depend on the kernel in an analytic way. In recent years, the kernel independent FMM methods are developed to expand a wide class of kernels. Ying [22] proposed a FMM method which can expand various types of RBF kernels in both two and three dimensions. More recently, the black-box FMM method proposed by Fong and Darve [9] can expand the kernel using only the kernel values without analytical expansions.
In the past two decades, several efficient solution methods for the RBF interpolation with globally supported radial basis functions have been proposed. These methods usually utilize the preconditioned Krylov subspace iterative method (e.g., the conjugate gradient method or the generalized minimal residual method). Note that many traditional preconditioners cannot be used directly for GRBF. The preconditioner should satisfy the matrix-free environment as the linear system isn't stored explicitly. The development of some preconditioning strategies for RBF greatly improves the convergence of the iteration. There are two kinds of preconditioning technology for RBF. One is to change for a better basis using the approximate cardinal basis functions (ACBFs) [23], the related researches can be found in [24]- [26]. The other is to decompose the whole domain into subdomains using the alternating projection method. As an example, Beatson et al. [27] proposed a two-level domain decomposition method to improve the convergence. Recently, Yokota et al. [28] developed a parallel algorithm for radial basis functions interpolation using the restricted additive Schwarz method (RASM) as a preconditioner. The strategy of domain decomposition can be also applied in the meshfree method [29]. VOLUME 8, 2020 As the efficient global evaluation of the GRBF interpolant is difficult, the common approach is to solve the interpolation equation using the compactly supported basis functions. Wendland [30] studied the fast evaluation of large generalized interpolation problems based on the far field expansion. However, the derivation of the kernel is complex and kernel dependent. Considering the difference constraints can be used to approximate the differential constraints, we try to evaluate the GRBF interpolant efficiently by expanding the corresponding constraints into several sums. It provides a novel approach for the fast global solution of the generalized radial basis functions interpolant.

III. GRBF INTERPOLANT
The GRBF interpolant is built upon the theory of Hermite-Birkhoff interpolation with RBFs [3]. To exactly interpolate the different values between unknown function values, the difference operator can be used to construct the difference constraints. We shall first briefly review this method. Then we describe the fast solution method of GRBF interpolant.
To simplify the problem, we only consider the domain constraints and the difference constraints of the gradient. Given a set of scattered data points P = {x 1 , x 2 , . . . , x N } with N = µ + σ interpolation constraints (the domain constraints and the difference constraints of the gradient), the GRBF interpolant tries to approximate an unknown function f by the interpolant where a j and b k are weight coefficients to be determined.
x, x can be viewed as a common radial basis function ϕ(x) : R 3 → R. The superscript and subscript are used to distinguish the different effects of the difference operator [5]. n acts on the first variable x = (x, y, z) of x, x and n acts on the second variable x = (x , y , z ) of x, x . The subscript n represents the normal direction at the corresponding variable.
The low degree polynomials p (x) consist of monomials where g s , 1 ≤ s ≤ Q are weight coefficients to be determined and p s (x) , 1 ≤ s ≤ Q are monomials. The number of terms Q is determined by the degree of polynomials. All the unknown coefficients can be computed by the interpolation conditions and orthogonality conditions. For the domain constraints For the difference constraints of the gradient For the orthogonality constraints, the GRBF interpolant satisfies The above constraints lead to a linear system The direct solution methods (e.g., LU decomposition) cost O N 3 operations and O N 2 memory usage. When the number of domain constraints and difference constraints becomes large (more than a few thousand constraints), both the evaluation and solution of the interpolant are unbearable in the case of globally supported basis functions.

IV. FAST EVALUATION A. FAR FIELD EXPANSION
To solve the linear system efficiently, we first consider a fast evaluation method for GRBF. Taking the 1-D kernel function as an example, the evaluation of the matrix-vector product involves a sum of N kernel function where ω j , 1 ≤ j ≤ N are coefficients of the source/center points and K (x, y) is a kind of kernel function. For convenience, we distinguish between the two variables of the kernel function K (x, y). The first variable x in K (x, y) is viewed as an evaluation point (or target point) and the second variable y is viewed as a source point. Every evaluation of such sum costs O (N ) operations and the evaluation of f (x) at N evaluation points obviously costs O N 2 operations.
To compute the matrix-vector product efficiently, fast summation methods are required to implement. Among them, the fast multipole method in O(logN ) or even constant time is a promising method. It is based on a far field expansion of the kernel function. We can expand the kernel K (x, y) using a low-rank approximation in the form where p is the number of items in the expanded series φ n and ψ n .
This expansion of a finite series serves as a separation of variables. Such kernels are also known as degenerate kernels or finite rank [16].
First, we can pre-compute the moments Second, we can approximate f (x) at each evaluation point efficiently using the pre-computed moments Then the evaluation of f (x) at N evaluation points costs O(pN ) operations if they are well separated from the source points. Since p is a small constant, the complexity of the evaluation points is linear in the best case. In practical applications, the source points should be decomposed into the well separated points (far field) and not well separated points (near field) using the adaptive multilevel FMM method [31], as shown in Figure 1. The source points in near field will be evaluated using direct summation.

B. KERNEL INDEPENDENT FMM
To expand an arbitrary kernel of the GRBF interpolant conveniently, it is necessary to use a kernel independent or blackbox fast multipole method. The black-box FMM proposed by Fong and Darve [9] is a promising method which can be used to expand the low-rank approximation of a kernel function without analytical expansions.
The basic idea of black-box FMM is to interpolate the kernel function using interpolation bases (e.g., Chebyshev polynomials). Taking the 1-D black-box FMM as an example, the first variable of the kernel function can be interpolated by fixing the second variable where {x n } p n=1 are the p interpolation nodes corresponding to the first variable and S p (x n , x) is the interpolating function at the nodex n . The second variable of K (x n , y) can be interpolated in the same way where {ȳ m } p m=1 are the p interpolation nodes corresponding to the second variable and S p (x n , x) is the interpolating function at the nodeȳ m . Then we can approximate f (x) efficiently by changing the order of summation The interpolation-based scheme can achieve the same effect of far field expansion. As the expansions are defined by the kernel values, this method is very useful to expand kernels with complex analytical expansions.
To utilize the fast summation method, we should first expand the constraints. The evaluation of the domain constraints can be simply decomposed into three sums with different evaluation points and source points where x + µ+k and x − µ+k are the two offset points by projecting along the gradient direction at x µ+k via the definition of the difference operator [5]. The polynomial part can be calculated in linear time.
Similarly, the evaluation of the difference constraints of the gradient can be decomposed into six sums with different VOLUME 8, 2020 evaluation points and source points B T + bD + cF, as shown at the bottom of this page, where x + i and x − i are the two offset points by projecting along the gradient direction at x i via the definition of the difference operator [5]. Table 1 shows the nine summations corresponding to different kernels. Compared to the differential operator, the difference operator changes the evaluation points and source points instead of the kernel type. This ensures that the RBF kernel retains its original properties (e.g., strictly positive definite or conditionally positive definite). And we can expand the nine kernels using a low-rank approximation respectively.
It is worth noting that the RBF kernel with the same source points has the same pre-computed moments n (y). According to the analysis of the expansions in Table 1, the source points can be classified into three categories x j , x + µ+k and x − µ+k . Therefore, we can represent the nine summations as three kinds of RBF kernels with different evaluation points, which can simplify the amount of calculation.

V. FAST SOLUTION A. PRECONDITIONING
We have solved the fast evaluation problem of matrix-vector product for GRBF, and we can now solve the linear system efficiently using the Krylov subspace iterative methods. The common iterative methods used for RBF are the conjugate gradient method and the generalized minimum residual method. There are several possible preconditioning technologies that can be used to improve the rate of convergence of the linear system. Since the evaluation process still takes a lot of time per iteration for large problems, the number of iterations must be reduced as much as possible and the preconditioning technology is required to be implemented. The preconditioning strategy is crucial for satisfactory convergence.
Different from the existing methods, we use the Flexible GMRES iterative method with variable preconditioners to solve the linear system. The FGMRES method also stores an orthonormal basis of the Krylov subspace to form the conjugate vectors at each iteration step. One difference between GMRES and FGMRES is that the latter is represented by a linear combination of the different preconditioned residual vectors. For a linear system A x = f , taking the right preconditioned GMRES as an example, the Arnoldi iteration is used to recursively construct an orthonormal basis of the right preconditioned Krylov space One of the benefits is that any iterative method can be used as a preconditioner, which is known as an inner-outer preconditioned GMRES method. Then we can take the result of each iteration step as a different preconditioner in the inner iteration. And we can expect a more accurate preconditioner along with the process of outer iteration. It is a useful method to reduce the number of iterations for the Krylov subspace method.

B. DOMAIN DECOMPOSITION METHOD
Now we can use an iterative algorithm that is more suitable for radial basis functions as a preconditioner. As a variant of the two-level domain decomposition method for RBF proposed by Beatson et al. [27], a multilevel domain decomposition method is analyzed to improve overlap using a nested sequence of levels.
The basic idea of the domain decomposition method is to divide the whole domain X into overlapping subdomains X i , i = 1, . . . , D such that X = ∪ D i=1 X i and X i ∩X j = ∅, until the subdomains become simple enough to be solved directly. Note that the domain refers to a set of interpolation centers. The number of interpolation centers in X i is denoted as N i = |X i |. The interpolation centers in each subdomain are classified as inner points (non-overlapping part) and outer points (overlapping part), as shown in Figure 2. The solutions to the subdomains are then combined to give a solution to the whole domain using the alternating projection method [3]. The alternating projection method interpolates a subdomain to correct the associated local residual and then interpolates the next subdomain with the corrected residual in sequence. The global residual will converge within a specified accuracy in finite cycles only if the subdomains are weakly distinct. Details about the convergence theory of the alternating projection method can be found in [3], [27].
The number of iteration is strongly affected by the size of the inner points and outer points in each subdomain. To further improve the convergence, a two-level method [27] can be employed to reduce the spectral radius of the linear system. The divided subdomains X i are viewed as a fine level. In addition to the correction of the fine level, this method adds a correction of coarse level Y by randomly choosing some inner points from each subdomain X i in a certain ratio ρ 1 . Both the results of fine level and coarse level are combined to give a solution to the whole domain using the alternating projection method. The coarse level improves the overlap between subdomains globally. Numerical experiments show that the added coarse level correction can significantly improve the convergence rate which becomes better as the ratio of the coarse level points, or the amount of overlap, is increased.
Since the coarse level is selected in a certain ratio over the whole domain X, the interpolation centers in coarse level would be too large to solve directly. If the coarse level is viewed as a new whole domain, it is natural to apply the twolevel method to the multiple levels recursively. This allows us to turn a larger domain into subdomains that are easy to solve. The multilevel domain decomposition method decomposes the whole domain X into a nested sequence of levels X l , l = 1, . . . , L such that X l ⊂ X l+1 , l = 1, . . . , L − 1 and X L = X. Each level X l is divided into overlapping subdomains X i,l , i = 1, . . . , D l such that X l = ∪ D l i=1 X i,l and X i,l ∩X j,l = ∅. X l is constructed by randomly choosing some inner points from each subdomain X i,l+1 in a certain ratio ρ 1 recursively. The coarse level Y is constructed by randomly choosing some inner points from each subdomain X i,1 in a certain ratio ρ 1 . The number of interpolation centers in X i,l is denoted as N i,l = X i,l and in X l is denoted as N l = |X l |. Then the results of multiple levels are combined to give a solution to the whole domain using the alternating projection method.
The simplified pseudo code of the extension to the multilevel domain decomposition method is given below. s g , x g and f g will denote the current approximate GRBF interpolant, the current solution coefficients and the current residual. s l , x l and f l denote the same meaning at each level.
Input: The whole domain X and the divided subdomains X l , l = 1, . . . , L, accuracy > 0, right-hand side of the linear system f .

C. SOLUTION PROCEDURE
According to the above analyses, the fast solution method of GRBF includes three key parts: a fast evaluation method for computing the matrix-vector product and the residuals, an inner iteration method for preconditioning and an outer iteration method. In this paper, we use the black-box FMM as the fast evaluation method, the multilevel domain decomposition method as the inner iteration method and the Flexible GMRES iterative method as the outer iteration method. As the number of interpolation centers at each level is large, the fast evaluation method is also used to evaluate the residuals in the process of inner iteration.
For a given linear system of GRBF and the desired accuracy , a simplified procedure of the fast solution method is given below. r k and x k are a sequence of the residual vectors and solution vectors for FGMRES.

VI. NUMERICAL RESULTS
We have implemented the algorithm of the fast solution method using some open source libraries, especially the ScalFMM library [32]. The black-box FMM in ScalFMM was used to implement the fast evaluation process. To speed up the process of iteration, the solution of subdomains was implemented with Intel Math Kernel Library (Intel MKL) in parallel.
We tested the solution method on several data sets composed of Hermite points. The data sets were randomly sampled from several real objects, as shown in Figure 4. Based on the method of implicit surface reconstruction [5], the sampling points can be converted into the domain constraints and the sampling normals can be converted into the difference constraints of the gradient. To recover the implicit surface, we can efficiently interpolate these constraints (including the domain constrains and the difference constraints of the gradient) for the solution process of implicit surface reconstruction.  The solution method utilizes several parameters, the number of levels L, the overlapping ratio ρ 1 , the number of interpolation centers N Y in coarse level, the max number of interpolation centers N i,l in X i,l and the ratio of inner points in one subdomain ρ 2 . Most of them are used to divide the multilevel subdomains. To ensure each subdomain is solved efficiently using a direct method, the value of N i,l should be small enough. However, a small N i,l leads to a large number of subdomains. Under the premise of equalization efficiency and convergence, a good choice is to divide the subdomains automatically. We use the following set of parameter values for all the numerical examples: The number of levels L can be approximated by the value of ρ 1 , ρ 2 and N Y . The actual values are adjusted according to the distribution of the data, which is related to the uniformity of the data distribution. In the following examples, we used the triharmonic spline as the basic function. For the far field expansion, the number of items in the expanded series was set to 10.
To validate the viability and performance of this method, we compared the results with the direct solution method (LU decomposition method) and tested on a Windows 64-bit PC with 3.00 GHz Intel(R) Core(TM) i5-7400 and 8GB RAM. The performance of the solution method mainly depends on the number of constraints and the desired accuracy. Figure 5 and Figure 6 reported the timings of the solution process of the direct method and the fast method. As expected, the improved efficiency is significant both in the evaluation and solution processes. For the problems with larger than 10,000 interpolation centers, the direct solution method will be very time consuming. Though the direct solution method cannot be used to solve large-scale problems, it has better performance for smaller data sets, especially the problems with less than 1000 interpolation centers. This is also the basis for the choice of subdomain size N i,l .
The convergence rate of the fast solution method mainly depends on the division of the whole domain X. The method converges with a small number of iterations as long as a sufficient overlap is created, as shown in Figure 7 and Figure 8. More importantly, the solution failure of a subdomain will result in the interruption of the whole solution process. Therefore, the method requires that each subdomain is solvable. The failure example occurs in trivial solutions when the FIGURE 6. Comparison of the performance using the direct method with the fast method. Both the number of domain constraints and difference constraints varied from 1000 to 5000 (µ = σ ). The direct method was out of memory when µ = σ = 5000. The desired accuracy for the fast method is 10 −4 .  right-hand side of the linear system is zero. For example, a subdomain only contains the domain constraints with zero function values. To avoid trivial solutions, we should ensure that the domain constraints and the difference constraints are uniformly distributed in subdomains at each level. Table 2 showed that the correction of coarse level has a great impact on the convergence for the two-level domain decomposition method. The number of iterations increases remarkably as the ratio of the coarse level points is reduced. However, a large number of the interpolation centers in coarse   level would be too large to solve directly. The multilevel domain decomposition method converges with a coarse level of roughly 1024 ∼ 2048 points, as shown in Table 3 and Table 4. The multilevel method improves the overlap using a nested sequence of levels instead of the ratio of the coarse level points. It avoids the number of interpolation centers in the coarse level to be too large to solve. As mentioned earlier, the solution method is kernel independent both in the evaluation and solution processes. It is very convenient to implement the solution method with different RBF kernels for the selection of most adequate kernel [33]. Table 5 shows the performance of the solution method with different RBF kernels, including thin-plate spline (TPS), biharmonic spline (BIS) and triharmonic spline (TRS).

VII. DISCUSSION
The method still has several limitations that await further investigation and improvement. One of the main limitations is that the subdomains are solved using the natural basis instead of a better basis. For the radial basis function interpolation problem, it is well known that the interpolation equation is frequently ill-conditioned [34], even when the number of interpolation centers is small. For larger data sets, it is worth noting that the ill-conditioning can influence the accuracy of evaluation and reduce the convergence rate of the iterative method. For example, the algorithm will not converge with a small number of iterations. To improve the robustness of the iterative method, the natural basis should be changed to reduce the condition number of the interpolation equation. The approximate cardinal basis functions preconditioning technology are recommended to be extended for the GRBF interpolant. Another limitation of the fast solution method is that it can only be used to interpolate the domain constraints and difference constraints.
An important extension to the fast solution method is to improve the performance of the evaluation process. The same number of the difference constrains costs more time than the domain constraints. In the evaluation process, we evaluate the several summations respectively. In fact, there is a certain relationship between these summations. If we can evaluate these summations in a more general way, we can further improve the efficiency of the evaluation. To further improve the solution performance, the optimum division of subdomains can be investigated to reduce the number of iterations.
Although we only consider the fast solution of the GRBF interpolant with domain constraints and difference constraints, the solution of the differential constraints (including the gradient constraints and the tangent constraints) can be studied in the same way. As the derivation of the RBF kernel hinders the fast summation, the main problem that needs to be solved is to evaluate the differential constraints efficiently.

VIII. CONCLUSION
We have presented a fast solution method of the generalized radial basis functions interpolant for global interpolation. The method can be used in the GRBF interpolant with large numbers of domain constraints and difference constraints. One of the main features is that the implemented solution algorithm generally achieves O(NlogN ) complexity and O(N ) storage. Moreover, it is kernel independent both in the evaluation and solution processes. It is very convenient to apply various types of RBF kernels in different applications without excessive modifications to the existing process. The preconditioning technology is the key to the numerical stability of solution. The optimum division of subdomains and the appropriate overlap rate of subdomains can improve the convergence significantly. Compared to the two-level method, the multilevel domain decomposition method has a better convergence rate for larger data sets by improving the overlap rate of coarse level. Numerical results showed that the fast evaluation method has a good performance for the evaluation of GRBF and the preconditioned Krylov subspace iterative method has a good convergence rate with a small number of iterations.