A Preconditioned Fast Collocation Method for a Linear Nonlocal Diffusion Model in Convex Domains

Recently, there are many papers dedicated to develop fast numerical methods for nonlocal diffusion and peridynamic models. However, these methods require the physical domain where we solve the governing equations is rectangular. To relax this restriction, in this article, we develop a fast collocation method for a static linear nonlocal diffusion model in convex domains via a volume-penalization approach. Based on the analysis of the structure of the coefficient matrix, we also present a fast solution technique of the linear system arising from the collocation discretization by accelerating the matrix-vector multiplications in the usual Krylov subspace iteration method. This technique helps to reduce the computational work in each Krylov subspace iteration from <inline-formula> <tex-math notation="LaTeX">$O(N^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$O(N\log N)$ </tex-math></inline-formula> and the storage requirement of the coefficient matrix from <inline-formula> <tex-math notation="LaTeX">$O(N^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$O(N)$ </tex-math></inline-formula> without any lossy compression, where <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> is the number of unknowns. Moreover, an efficient preconditioner was also provided in this work to accelerate the convergence of the Krylov subspace iteration method. Numerical experiments show the applicability of this method.


I. INTRODUCTION
In the last decades, nonlocal models have been extensively studied in many fields such as fluid dynamics [1], materials science [2], thermodynamics [3], fracture mechanics [4], biology [5] and image processing [6]. One of the nonlocal models is fractional partial differential equations (FPDEs). The other one that arouse researcher's interests is nonlocal diffusion and peridynamic models. These nonlocal models all contain integral operators. For the new developments in usage of integral euations (especially the usage in FPDEs), we can refer to [7]- [11]. For nonlocal diffusion and peridynamic models, they have been applied and showed the effectiveness in the fields of machanics of materials, including failure and damage in composite materials [12], [13], multiscale materials modeling [14], phase transformations in solids [15], crack nucleation [16] and so on. Nonlocal diffusion and peridynamic models can often provide a more natural description of physical problems especially near The associate editor coordinating the review of this manuscript and approving it for publication was Guido Lombardi . singularities and discontinuities than the PDE models [17], [18]. In contrast to the classical PDEs, the nonlocal diffusion and peridynamic models are free of spatial derivatives and feature interactions that occur between material points separated by a finite distance.
However the nonlocality increases the computational cost and the storage compared to the PDE models. It is mainly because the coefficient matrices arising from some numerical methods such as finite element method and the collocation method are often dense or full matrices, which usually require O(N 2 ) storage and O(N 2 ) operations in each Krylov subspace iteration. Nowadays, several numerical methods have been developed and analyzed for the nonlocal diffusion and peridynamic models to reduce the computational work and the storage requirement of the coefficient matrices. References [19] and [20] developed a fast Galerkin method and a discontinuous Galerkin method for a peridynamic model, respectively. References [21], [22] developed a fast collocation method for one-dimensional and two-dimensional peridynamic models, respectively. A fast finite difference method was developed for peridynamic model in [23]. A fast and accurate implementation of Fourier spectral approximation was developed for nonlocal diffusion model in [24]. In particular, [22] developed a fast collocation method for a two-dimensional linear nonlocal diffusion model. This method successfully got O(N ) storage of the coefficient matrix and O(N log N ) operations per Krylov subspace iteration. However, this fast collocation method only holds for problems on rectangular domain.
To ease the similar restriction in developing the fast methods for fractional PDEs, [25] developed a fast finite volume method for conservative space-fractional diffusion equations in convex domains via a penalization approach. In addition, In [26], a fast finite difference method is developed for solving space-fractional diffusion equations with variable coefficient in convex domains using a volume penalization approach. The numerical experiments in [25] and [26] show the utility of this approach.
In this article, we follow the work started in [22] and develop a fast collocation method for a linear nonlocal diffusion model on convex domain via a volume-penalization technique. We extend the original nonlocal diffusion model on convex domain to a new model on standard rectangular domain by introducing a penalization term to the original model. The latter domain is required to contain the former physical convex domain. Furthermore, based on the analysis of the structure of the coefficient matrix, we present a fast solution technique of the linear system arising from the collocation discretization of the extended model by accelerating the matrix-vector multiplications in the usual Krylov subspace iteration method. This technique helps to reduce the computational work in each Krylov subspace iteration from O(N 2 ) to O(N log N ) and the storage requirement of the coefficient matrix from O(N 2 ) to O(N ) without any lossy compression. In addition, to further accelerate the convergence of the Krylov subspace iteration method and reduce the CPU time, we introduce an efficient preconditioner which is easy to invert.
The rest of the paper is organized as follows. In Sect. 2, we extend the model problem on convex domain to the penalized problem via a volume penalization method and give the bilinear collocation method of the penalized problem. In Sect. 3, we analyze the structure of coefficient matrix obtained from Sect. 2. Based on this analysis, the operations in each Krylov subspace iteration and the storage of the coefficient matrix can be reduced. In Sect.4, an efficient preconditioner is provided. In Sect. 5 and Sect. 6, we carry out some numerical experiments and draw some conclusion remarks, respectively.

II. A VOLUME-PENALIZED COLLOCATION METHOD FOR THE LINEAR NONLOCAL DIFFUSION MODEL IN CONVEX DOMAIN
In this work, we consider the following two-dimensional steady-state linear nonlocal diffusion model: Here = (a 1 (y), b 1 (y)) × (a 2 (x), b 2 (x)) is a (not necessarily rectangular) convex domain; σ (x, y) is the kernel function; δ is the horizon parameter of the material which determines the range of interactions; c denotes a boundary zone surrounding with width δ; u(x, y) and f (x, y) represent the diffusing material density and the prescribed source term, respectively. B δ (x, y) represents an open disk centered at (x, y) with radius δ, i.e., For clarity, we present the sketch of computational domain as follows: We then follow the idea in [27] to adopt a volume penalization method to extend problem (1) to the following problem on an extended rectangular domain r = (â 1 Here η represents the penalization parameter; 1 (x, y) is the character function of , which equals to 1 if (x, y) ∈ or zero otherwise; rc is a boundary zone surrounding r with width δ;f (x, y) andĝ(x, y) are extensions of f (x, y) and g(x, y) from to r , and from c to rc , respectively. For example, zero extensions are used in the numerical experiments of this article. Heuristically, we anticipate to have Therefore, the nonlocal diffusion model (2) reduces to (1) if (x, y) ∈ . Let the positive integers N x and N y denote the number of mesh grids in the x and y directions, respectively. We define a spatial partition on¯ r by x i :=â 1 + ih x for i = 0, 1, · · · , N x and y j :=â 2 + jh y for j = 0, 1, · · · , N y , where h x := (b 1 −â 1 )/N x and h y := (b 2 −â 2 )/N y be the mesh sizes in the x and y directions, respectively. To handle the nodes on rc , we extend the partition to (x i , y j ) for i = −K + 1, · · · , −1, 0, 1, · · · , N x , N x + 1, · · · , N x + K − 1 and j = −L + 1, · · · , −1, 0, 1, · · · , N y , N y + 1, · · · , N y + L − 1. Here, are the ceilings of δ/h x and δ/h y , respectively. Let ψ(ξ ) = 1−|ξ | for ξ ∈ [−1, 1] and zero otherwise. The two dimensional pyramid function φ ij (x, y) centered at (x i , y j ) can be given by Then the trial function u η can be expressed as Because for the spatial nodes ( are known, we choose the spatial nodes (x i , y j ) for i = 1, 2, · · · , N x − 1 and j = 1, 2, · · · , N y − 1 as collocation points. This gives the following collocation formulation for problem (2) Substituting from (6) into (7) results in Here be the number of collocation points at which the values of u η need to be computed. Let u η and f be N -dimensional vectors defined by and Then, The collocation scheme (8) can be expressed as a matrix form Here, A ∈ R N ×N and P ∈ R N ×N are associated with the first term and the second term of the left-hand side of (8), respectively. We consider the matrix P firstly. The matrix P is referred to as the penalization matrix. From (8), we can obtain that the matrix P is a diagonal matrix, with the entries P m,m , m = 1, · · · , N being defined by For the matrix A, each entry A m,n for m = 1, · · · , N and n = 1, · · · , N in it is defined as Here, the global indices m and n are related to the local indices (i, j) and (i , j ) by δ m,n = 1 if m = n and zero otherwise. The entries f m , m = 1, · · · , N in the right-hand side vector f are given by where the indices m and n as well as the indices (i, j) and (i , j ) are related by (14) and the indices (i , j ) refer to the nodes on the boundary zone rc . Because of the nonlocality of the nonlocal diffusion model, the matrix A in (11) is usually dense since K and L tend to infinity as h x and h y tend to zero, for which the widely used direct solvers require O(N 2 ) memory and O(N 3 ) computational work to solve the linear system (11). In the next section, we will analyze the structure of the matrix B. Accordingly, we can significantly reduce the memory requirement of the coefficient matrix B and the computational work for solving the linear system (11).

III. STRUCTURE OF THE COEFFICIENT MATRIX
Next we analyse the structure of the matrix A.
Theorem 1: The matrix A can be expressed as a N y − 1-by-N y − 1 banded-block Toeplitz matrix with a block bandwidth (2L + 1) Proof: First of all, A has the following form Here each matrix block A j,j of order N x − 1, with 1 ≤ j ≤ N y − 1, 1 ≤ j ≤ N y − 1 represents the interaction of row j and row j in the split mesh and can be expressed as Here each entry A j,j i,i of A j,j represents the interaction between the i-th node in the j-th row and the i -th node in the j -th row.
Combining (12) and (13), we have A m,n = 0 if and only if We observe from (18) and (20) that all the matrix blocks A j,j with |j−j | > L vanish. Thus A is a block-banded matrix with a block bandwidth 2L + 1 From (19) and (20), we also observe that all the entries A By the transformations the pyramid function given in (5) can be written as Then, equation (12) can be deduced to VOLUME 8, 2020 Let j 1 − j 1 = j 2 − j 2 = l, −L ≤ l ≤ L. That is, the matrix blocks A j 1 ,j 1 and A j 2 ,j 2 are on the same diagonal. Let Then we observe that for According to (25), we have proved in each matrix block A j,j are on the same diagonal. Let Then we observe that According to (27), we obtain that each block matrix A j,j is a banded Toeplitz matrix. Combining (25) and (27), we conclude that the matrix A is a block-Toeplitz-Toeplitz-block (BTTB) matrix and can be expressed as the form showed in (16) and (17).

Theorem 2: The coefficient matrix B can be stored in O(N ) memories.
Proof: From Theorem 3.1 we can find that the matrix A can be stored only by storing the following (2K + 1)-by-(2L + 1) entries Hence, A can be stored in O((2K + 1) * (2L + 1)) = O(N ) memories. Together with the fact that the penalization matrix P is diagonal, we can find that the stiffness matrix B can be stored in O(N ) memories. Theorem 3: For any vector u ∈ R N , the matrix-vector multiplication Bu can be carried out in O(N log N ) operations.
Proof: On the one hand, because the BTTB structure of the matrix A, the matrix-vector multiplication Au for any u ∈ R N can be computed in O(N log N ) operations [28], [29]. On the other hand, the matrix-vector multiplication Pu can be computed in O(N ) operations since the matrix P is diagonal. In conclusion, the total matrix-vector multiplication Bu = Au + Pu can be evaluated in O (N log N )

IV. A PRECONDITIONED FAST KRYLOV SUBSPACE METHOD
In the Krylov subspace iteration method, each iteration consists of matrix-vector multiplications and related vector operations. Because of the nonlocal property of the nonlocal diffusion model, the stiffness matrix B is usually dense. Hence, the evaluation of the matrix-vector multiplication in each iteration requires O(N 2 ) computational work. All other computations in each iteration still require O(N ) computational work. Thus, a fast Krylov subspace method can be developed due to the fast matrix-vector multiplication proved in the previous section. It can reduce the computational work from O(N 2 ) to O (N log N ) per Krylov subspace iteration. Furthermore, from Theorem 3.1, we can see that the memory requirement can also be reduced from O(N 2 ) to O(N ). However, the number of iterations still be large due to the singularity of the kernel function σ . This would in turn increases the overall computational cost and might even affect the convergence behavior of a Krylov subspace iterative solver.
Accordingly, we need to introduce an effective preconditioner to reduce the number of iterations. We consider the following preconditioner.
To begin with, because of the BTTB structure of the matrix A, we can apply the definition of the T. Chan's circulant matrix to T j for −L ≤ j ≤ L to construct a block-Toeplitz-Circulant-block(BTCB) matrix [28], [29]. More precisely, we obtain the matrix as follows Here each C l with −L ≤ l ≤ L is defined as where the diagonals of C l are given by Then using (29) and the definition of T.Chan's circulant matrix, we obtain the following block-circulant-circulantblock(BCCB) matrix where each circulant matrix blockC −k with 2 − N y ≤ k ≤ N y − 2 is defined as The matrixC is N y − 1-by-N y − 1 block circulant matrix with N x − 1-by-N x − 1 circulant blocks. It can be efficiently stored in O (N ) memory and inverted in O(N log N ) operations.
A straightforward choice of preconditioner for the coefficient matrix B is where P is the penalization matrix whose entries equal to 0 or 1/η, depending on the geometric structure of the domains and r . However, it's difficult to inverse the preconditioner B.
We then follow the idea in [30] to construct an efficient preconditioner.
Let := {1, 2, · · · , N }, c := {i|i ∈ , P i,i = 0} and r := {i|i ∈ , P i,i = 1/η}. Define In fact, and By (34) and (35), we can obtain in which the vector e i represents the i-th column of the identity matrix. Thus we have Then we obtain the practical preconditioner B 1 whose inverse can be expressed as Note that K c and K r are BCCB matrices, each inverse of these two matrices can be computed in O(N log N ) operations by fast Fourier transform method(FFT). Thus the inverse of the preconditioner B 1 can be computed in O(N log N ) operations. VOLUME 8, 2020

V. NUMERICAL EXPERIMENTS
In this section, we carry out several numerical experiments to investigate the performance of the fast collocation method for the nonlocal diffusion model in convex domains. In the numerical runs as follows, we consider the nonlocal diffusion model (1) with the kernel function In the following numerical experiments, the convex domain in (1) is chosen to be a circle = {(x, y) : The horizon parameter is set to be fixed as δ = 1/8. The true solution u(x, y) = x(1−x)y(1−y). The right-hand term f (x, y) can be computed using polar coordinate transformation as follows: The true solution is also used to define g(x, y) on the boundary zone c . We use MATLAB to run all the numerical experiments on a 4G-memory laptop. The code is home-made and the pick speed of CPU is 1.8 GHz.
Example 1: In this numerical example, we investigate the convergence behavior of the volume-penalized fast collocation method by choosing different kernel functions. The value of s in (42) is chosen to be 3/8, 0 and −1/2, respectively. The penalization parameter η is fixed as η = 10 −5 .
As discussed in Section 2, in order to solve the problem (1), we turn to solve the penalized problem (2). Therefore, we extend the physical convex domain to a rectangular domain r = (0, 1)×(0, 1) and letf (x, y) = 0 if (x, y) ∈ r \ . Letĝ(x, y) = 0, (x, y) ∈ rc . We partition r by uniform rectangular meshes, i.e., N x = N y . We solve the resulting linear system (11) by Gauss elimination method(Gauss), the conjugate gradient squared method(CGS), the fast conjugate gradient squared method(FCGS), and the preconditioned fast conjugate gradient squared method(PFCGS), respectively. We present the L 2 errors of the numerical solutions in the original convex domain , the corresponding convergence rates, the number of iterations in the corresponding conjugate gradient methods and the CPU time consumed by the above solvers for (11) in the Tables 1-3. Furthermore, we use a linear regression to fit the convergence rate α and the associated constant C α in the error estimate From Tables 1-3, we can obtain the following results: (i) The L 2 errors of the numerical solutions in the original convex domain become small as s decreases. This phenomenon is also observed in the work [22]. (ii) FCGS and PFCGS need less memory than the traditional solvers like Gauss and CGS. We can observe from Tables 1-3 that when N is equal to or bigger than 2 8 , the memory is not enough. However, FCGS and PFCGS do not have this trouble because of less memory requirement. (iii) From Tables 1-3, we can observe that FCGS and PFCGS need less CPU time than   To further clearly illustrate the time reduction for the fast collocation method, we present the comparison of the CPU time consumed by Gauss, CGS, FCGS, and PFCGS in the Figure 2 (the top one). Moreover, to illustrate the effectiveness of the preconditioner developed in this article, we also show the the CPU time consumed by FCGS and PFCGS in the Figure 2 (the bottom one). We can observe from Figure 1 that the CPU time consumed by the fast solvers FCGS and PFCGS is significantly reduced, and the PFCGS solver is more effective than FCGS solver. To show the effectiveness of the preconditioner introduced in previous section, we present the distributions of eigenvalues before and after precondition in Figure 3. From Figure 3, we can observe the eigenvalues vary dramatically before precondition, but they fluctuate around 1 after introducing the preconditioner.
Example 2: In this numerical experiment, we run the example with s = −1/2 considered in Example 1 and we investigate the influence of the penalization parameter η introduced in (2). The value penalization parameter η is  chosen to be 10 −5 , 10 −3 , 10 −1 , and 1, respectively. We only use PFCGS to run this numerical example. The L 2 errors of the numerical results are shown in Table 4.
From Table 4, we can observe that the L 2 errors of the numerical results become small while η reduces. Moreover, the convergence rate of the errors improves when η reduces.

VI. CONCLUSION AND FUTURE WORK
The fast methods for nonlocal models developed before require the physical domain where we numerically solve the governing equations is rectangular. To relax this restriction, In this article, we develop a fast collocation method for a static linear nonlocal diffusion model in convex domains via a volume-penalization approach. The main contributions of this article are listed as follows: (i) By introducing a penalization term to the original nonlocal diffusion model in convex domain, we extend the original model to a new problem on a rectangle which contains the physical convex domain. Then we develop a fast collocation method based on the extended problem by carefully analysing the structure of the coefficient matrix. (ii) We introduce an efficient preconditioner by carefully analysing the Toeplitz-like coefficient matrix. This method can be used when the physical domain is convex. Compared to the traditional collocation method, our method can significantly reduce the simulation time and computer storage when numerically solve the linear nonlocal diffusion model. However, because the coefficient matrix obtained from collocation discretization is no longer a BTTB matrix, this method does not apply to the nonlocal diffusion model with variable coefficients. We will investigate this problem in the future.