D3M: A Deep Domain Decomposition Method for Partial Differential Equations

A state-of-the-art deep domain decomposition method (D3M) based on the variational principle is proposed for partial differential equations (PDEs). The solution of PDEs can be formulated as the solution of a constrained optimization problem, and we design a hierarchical neural network framework to solve this optimization problem. Through decomposing a PDE system into components parts, our D3M builds local neural networks on physical subdomains independently (which can be implemented in parallel), so as to obtain efficient neural network approximations for complex problems. Our analysis shows that the D3M approximation solution converges to the exact solution of the underlying PDEs. The accuracy and the efficiency of D3M are validated and demonstrated with numerical experiments.

1. Introduction.Partial differential equations (PDEs) are among the most ubiquitous tools employed in describing computational science and engineering problems.When modeling complex problems, the governing PDEs are typically expensive to solve through transitional numerical methods, e.g., the finite element methods [9].While principal component analysis [44,35] (PCA), proper orthogonal decomposition [5,43] (POD) and reduced basis methods [40,31,8,6,19,13] are classical approaches for model reduction to reduce the computational costs, deep learning [16] currently gains a lot of interests for efficiently solving PDEs.There are mathematical guarantees called universal approximation theorems [12] stating that a single layer neural network can approximate most functions in Soblev spaces.Although there is still a lack of theoretical frameworks for explaining the effectiveness of multilayer neural networks, deep learning has become a widely used tool.Marvelous successful practices of deep neural networks encourages their applications to different areas, where the curse of dimensionality is a tormenting issue.
New approaches are actively proposed to solve PDEs based on deep learning techniques.E et al. [41,42] connect deep learning with dynamic system and propose a deep Ritz method (DRM) for solving PDEs via variational methods.Raissi et al. [33,34,32] develop physics-informed neural networks which combine observed data with PDE models.By leveraging a prior knowledge that the underlying PDE model obeys a specific form, they can make accurate predictions with limited data.Long et al. [25] present a feed-forward neural network, called PDE-Net, to accomplish two tasks at the same time: predicting time-dependent behavior of an unknown PDE-governed dynamic system, and revealing the PDE model that generates observed data.Later, Sirignano et al. [37] propose a deep Galerkin method (DGM), which is a meshfree deep learning algorithm to solve PDEs without requiring observed data (solution samples of PDEs).When a steady-state high-dimensional parametric PDE system is considered, Zhu et al. [47,48] propose Bayesian deep convolutional encoder-decoder networks for problems with high-dimensional random inputs.
When considering computational problems arising in practical engineering, e.g.aeronautics and astronautics, systems are typically designed by multiple groups along disciplinary.The complexity of solving large-scale problems may take an expensive cost of hardware.The balance of accuracy and generalization is also hard to trade off.For this reason, decomposing a given system into component parts to manage the complexity is a strategy, and the domain decomposition method is a traditional numerical method to achieve this goal.Schwarz [36] proposes an iterative method for solving harmonic functions.Then this method is improved by S.L.Sobolev [38], S.G.Michlin [27], P.L.Lions et al. [22,23].Domain decomposition is also employed for optimal design or control [2], for decomposing a complex design task (e.g., decomposition approaches to multidisciplinary optimization [21,15]), and for uncertainty analysis of models governed by PDEs [20,7].
In this work, we propose a variational deep learning solver based on domain decomposition methods, which is referred to as the deep domain decomposition method (D3M) to implement parallel computations along physical subdomains.Especially, efficient treatments of complex boundary conditions are developed.Solving PDEs using D3M has several benefits: Complexity and generalization.D3M manages complexity at the local level.Overfitting is a challenging problem in deep learning.The risk of overfitting can be reduced by splitting the physical domain into subdomains, so that each network focuses on a specific subdomain.
Mesh-free and data-free.D3M constructs and trains networks under variational formulation.So, it does not require given data, which can be potentially used for complex and high-dimensional problems.
Parallel computation.The computational procedures of D3M are in parallel for different subdomains.This feature helps D3M work efficiently on large-scale and multidisciplinary problems.
In this work, D3M is developed based on iterative domain decomposition methods.The development of using domain decomposition leads to an independent model-training procedure in each subdomain in an "offline" phase, followed by assembling global solution using pre-computed local information in an "online" phase.Section 2 reviews iterative overlapping domain decomposition methods.Section 3 presents the normal variational principle informed neural networks, and our D3M algorithms.A convergence analysis of D3M is discussed in Section 4, and a summary of our full approach is presented in Section 5. Numerical studies are discussed in Section 6.Finally, Section 7 concludes the paper.
2. Overlapping domain decomposition.The Schwarz method [36] is the most classical example of domain decomposition approach for PDEs, and it is still efficient with variant improvements [18,45,46].
Given a classical Poisson's equation We divide Ω into two overlapping subdomains Ω i , i = 1, 2 (see Figure 2.1), where We introduce the original formula named Schwarz alternating method here.Let u 0 be an initial guess defined in Ω and vanishing on ∂Ω.For k ≥ 0, we define sequences u k i where u k i denotes u k in Ω i .The u k+1 i is determined from an iteration algorithm: 3. Deep domain decomposition method with variational principle.Before introducing D3M, we first give a brief introduction of variational principle.In this section, we consider the Poisson's equation and reformulate (2.1) as a constrained minimization problem, and then we introduce the D3M algorithm.
3.1.Variational principle.The Poisson's equation with the homogeneous Dirichlet boundary condition is (2.1), and we consider the situation that f ∈ L 2 (Ω) and Ω is a square domain in this section.The idea of the standard Deep Ritz method is based on the variational principle.That is, the PDE can be derived by a functional minimization problem as described in the following proposition.
Proposition 1. Solving the Poisson's equation (2.1) is equivalent to an optimization problem The Lagrangian formula of (3.1) is given by where q is the Lagrange multiplier.Definition 1. H(div) denotes symmetric tensor-fields in H 1 space, in which functions are square integrable and have square integrable divergence.
We employ a mixed residual loss [48] following Hellinger-Reissner principle [3].With an additional variable τ ∈ H(div), which represents flux, we can turn Equation (2.1) into The mixed residual loss is 3.2.Variational principle informed neural networks.Though the Poisson's equation (2.1) is reformulated as an optimization problem, it is intractable to find the optimum in an infinite-dimensional function space.Instead, we seek to approximate the solution u(x, y) by neural networks.We utilize N u (x, y; θ u ), N τ (x, y; θ τ ) to approximate the solution u and the flux τ in domain Ω, where θ u and θ τ are the parameters to train.The input is the spatial variable in Ω, and the outputs represent the function value corresponding to the input.With these settings, we can train a neural network by variational principle to represent the solution of Poisson's equation.The functional minimization problem (3.4) turns into the following optimization problem (3.5) min Remark 1.In practical implementation, N u and N τ are embedded in one network N parameterized with θ, and the two outputs of N denote the function values of N u and N τ respectively.
Therefore, the infinite-dimensional optimization problem (3.1) is transformed into a finite-dimensional optimization problem (3.5).Our goal is to find the optimal (or sub optimal) parameters θ to minimize the loss in (3.5).To this end, we choose a mini-batch of points randomly sampled in Ω.These data points can give an estimation of the integral in (3.5) and the gradient information to update the parameters θ.For example, a mini-batch points {(x i , y i )} m+n i=1 are drawn in Ω randomly, where {(x i , y i )} m i=1 in Ω and {(x i , y i )} n i=1 on ∂Ω.Then the parameters can be updated by using optimization approaches 3.3.Implementation details for neural networks.This section provides details for the architecture of our neural networks.
For giving a direct-viewing impression, we show the implementation with a plain vanilla densely connected neural network to introduce how the structure works in Figure 3.1.For illustration only, the network depicted consists of 2 where φ is called the activation function.There are some commonly used activation functions such as the sigmoidal function, the tanh function, the rectified linear units (ReLU) function [28], and the Leaky ReLU function [26].We employ the tanh function, and the automatic differentiation is obtained by using PyTorch [30].The total loss function comprises the residual loss terms L 1 , L 2 and the Lagrangian term which guarantees the constraint conditions.The parameters are trained with backpropogating gradients of the loss function and the optimizer is L-BFGS [24] where the learning rate is 0.5.In practice, the model architecture of neural networks is the residual network (ResNet) [11].These residual networks are easier to optimize, and it can gain accuracy from considerably increased depth.The structure of ResNet improves the result of deep networks, because there are more previous information retained.A brief illustration of ResNet is in Figure 3.2.

Deep domain decomposition method.
In this part, we propose the main algorithms of our D3M.Because the physics-constrained neural network is mesh-free, we improve Schwarz alternating method with a better inhomogeneous D3M sampling method at junctions Γ i to accelerate convergence.We note the performance of normal deep variational networks and mixed residual networks can deteriorate when the underlying problem has inhomogeneous boundary conditions.Our treatment to overcome this weakness is to introduce the following boundary function.
Definition 2. (Boundary function) A smooth function g(x, y) is called a boundary function associated with Ω if where a 1 is a coefficient, the notation d(x, y, ∂Ω) denotes the shortest Euclidean distance between (x, y) and ∂Ω.If the point (x, y) is on the boundary, g(x, y) = u(x, y).If not, the value of g(x, y) decreases to zero sharply.And we define v := u − g, where v satisfies Letting v i = u i − g i on each subdomain, Equation (3.3) can be represented as The mixed residual loss is (3.11) It should be noted that, the integration is completed by Monte Carlo, such that the domain decomposition reduces the variance of samples significantly with the same number of data because the area of samples becomes smaller.
Run Algorithm (2) in each subdomain in parallel.

6:
Update the parameters θ i by descending its stochastic gradient: The procedure of D3M is as follows.We first divide the domain Ω into d subdomains, and each two neighboring subdomains are overlapping.The local solution of PDEs on each subdomain is replaced by neural networks which can be trained through the variational principle, where the global solution on the whole domain consists of these local solutions on subdomains.To be more precise, let Γ i denote decomposed junctions, θ is initial weights of neural networks, η is the threshold of accuracy, S i and g i are the samples generated in Ω i and on interface Γ i to evaluate the output of networks in each iteration, Ŝ i are training samples in subdomain Ω i , ĝi are training samples on Γ i , n is training time in each iteration, m 1 and m 2 are batch sizes, N u is the neural networks for u, N τ is the neural networks for τ , k is the iteration time, and S ol (k+1) i is the output of networks for subdomain Ω i in (k + 1)-th iteration.The formal description of D3M is presented in Algorithm 1.

Analysis.
While mixed residual formulation is a special case, we consider the basic functional formulation first, (4.1) Definition 3. Given m closed subspaces {V i } m i=1 and V = m i=1 V i ∈ C 2 (Ω), for any R < ∞, and a proper, lower semi-continuous, coercive convex functional J : V → , we denote K R := {u ∈ V|J(u) < R}.
where J' is uniformly continuous on K R .Proposition 2. (P.L. Lions, 1989) The alternating Schwarz method (2.3) and (2.4) converges to the solution of u of Equation (3.10).The error bound of û1 k+1 and û2 k+1 can be estimated via maximum principle [14,23], ∃ ρ ∈ (0, 1) where constants ρ is close to one if the overlapping region Ω i, j is thin.Lemma 1. (K.Hornik, 1991) On each subdomain Ω i , neural network N i with continuous derivatives up to order K are universal approximators in Sobolev space with an order K, which means N i ∈ H 1 (Ω i ).
Lemma 2. (P.L. Lions, 1988) If the variational formulation (4.1) satisfies the assumption (1), then it follows that there exists a sequential u n ∈ V i obtained by Schwarz alternating method converges to the minimum u * i of J(u i ) on each subdomain Ω i .
Theorem 1. J i (N i ) denotes the objective function on the subdomain Ω i .Under above assumptions, for ∀ > 0, ∃M > 0, while iteration times k > M, N k i converges to optimal solution u * i of J(u i ) in subdomain Ω i for a constant C > 0 (4.4) For concision, we use N i to represent N k i in the following part.Proof.u n ∈ V i denotes the sequential in Lemma 2, there exist (4.5) Then with Lemma 1 [12], in each subdomain Ω i , neural network N i ∈ H 1 .If the training times k in each subdomain is enough, the universal approximation ensures the distance between functions N i and u n is close enough in the Sobolev space, with the constant While u n converging to the minimum of u * i of J(u), by the optimality conditions it is clear that Under the assumption, ∃ C > 0 and the difference between two iterations can be represented as Consider equation (4.5), (4.6) and (4.8), we have (4.9) Theorem 2. For a given boundary function g and a fixed q, the optimal solution N * u of Equation Function v we optimized is also the optimal solution N u with the formula N * u = v * + g.Up to now, we prove that D3M can solve steady Poisson's equation with variational formulations.Then, we extend D3M to more general quasilinear parabolic PDEs (4.11) with physics-constrained approaches.
where τ i denotes ∇u i , and Ω i ∈ R d are decomposed boundary sets with smooth boundaries ∂Ω i .We recall the network space of subdomain Ω i with generated data according to [12] (4.12) where σ is any activation function, x ∈ R k is one set of generated data, β ∈ R n , α ∈ R k×n and θ ∈ R k×n denote coefficients of networks.Set N i (σ) = ∞ i=1 N n i (σ).Under the universal approximation of neural networks Lemma 1, in each subdomain the neural networks f n i satisfies (4.13) where h n and b n satisfy (4.14) h n 2 2,Ω i + b n 2 2,∂Ω i → 0, as n → ∞.For the following part of analysis, we make some assumptions.Assumption 2.
• There is a constant µ > 0 and positive functions κ(x), λ(x) such that for all x ∈ Ω i we have • In each subdomain, the derivatives of solutions from alternating Schwarz method (2.3), (2.4) converge to the derivative of solution u i .Precisely, there exists a constant ρ 1 ∈ (0, 1), such that for iteration times ∀k ≥ 0 Theorem 3. Suppose the domain Ω is decomposed into {Ω i } p i=1 , k > 0 denotes iteration times (omitted in notations for brief).N i (ψ) denotes networks space space in subdomain Ω i , where subdomains are compact.Assume that target function (4.11) has unique solution in each subdomain, nonlinear terms div(x, u, ∇u) and γ(x, u, ∇u) are locally Lipschitz in (u i , ∇u i ), and ∇u k i uniformly converges to ∇u * i with k.For ∀ > 0, there ∃ K > 0 such that there exists a set of neural networks {N i ∈ N i (ψ)} p i=1 satisfies the L 2 error E 2 (N i ) as follow Proof.In each subdomain Ω i , with iteration times k > 0, E k 2 (N i ) denotes the L 2 loss between N k i and u * i . (4.16) With Lemma 1, it is clear that the sum of last term is smaller than K 1 , where K 1 > 0 is a constant.We assumed that ∇u k i uniformly converges to ∇u * i with k, and this means that (4.17) where K 2 > 0 is a constant, and the error bound between u k i and N k i is proved from Theorem 7.1 in [37].
where K 3 > 0 is a constant depends on .While γ(•) is also Lipschitz continuous, we can prove the upper bound of Under Assumption 2 and Equation (4.14), with iteration times k → ∞, the set of neural networks N i converge to the unique solutions to (4.11), strongly in L ρ (Ω i ) for every ρ < 2. In addition, in each subdomain the sequence {N n i (x)} n∈N is bounded in n under the constraint of Proposition 2 and converges to u i .Proof.In each subdomain, the convergence can be obtained from the Theorems 7.1 and 7.3 in [37].With the Proposition 2, the sequence {N n i } n∈N is uniform bounded in n, and the rates of convergence to the solution u * are related to overlapping areas.
Specially, the n in {N n i (x)} n∈N means training times instead of iteration times.We leave the proof for time-dependent data-free variational formulations for future work.

D3M summary.
To summarize the strategies in Section 3 and 4, the full procedure of D3M comprises the following steps: • pre-step: Set the architecture of neural networks in each subdomain; • offline step: Construct functions for boundary conditions, train networks and generate local solutions; • online step: Estimate target input data using neural networks.If solutions don't converge, transfer information on interfaces and go back to the offline step.The pre-step is cheap, because the setting of neural networks is an easy task.In the offline stage, the complex system is fully decomposed into component parts, which means that there is no data exchange between subdomains.Since we only approximate the data on interface with normal simple approach such as Fourier series and polynomial chaos [7,4], the approximation is also low costly.After decomposition, the requirement for number of samples for the Monte Carlo integration of the residual loss function (3.11) is significantly reduced, while the density of samples does not change.Since the number of samples decreasing and the domain becoming simpler, we can use neural networks with few layers to achieve a relatively high accuracy.If we keep the same number of samples and layers as the global setting, D3M should obtain a better accuracy.The cost of the online step is low, since no PDE solve is required.This full approach is also summarized in Figure 5.1, where transformations of data happen between adjacent subdomains.
For the problems we focus on (systems governed by PDEs), the cost of the D3M is dominated by the local training procedures in the offline step.Here we present a rough order of magnitude analysis of the computational costs where C solve denotes the cost of one block in each training epoch (i.e., the cost of any block with the same number of neurons in each training iteration is taken to be equal for simplicity).The dominant cost of D3M is the total number of blocks of neural network and training times, p i=1 N i B i T i C solve , where N i is sample size, B i is number of blocks and T i is training times.If we consider equal offline sample sizes, number of blocks and training epochs, N i = N off , B i = B off , T i = T off for all subdomains {D i } p i=1 , then total cost can be written as pN off B off T off C solve .The total cost is decreased by employing the idea of hierarchical neural networks [29,17].Remark 2. According to the research on overfitting, the hypothesis set (w.r.t. the complexity of neural networks) should match the quantity and quality of data instead of target functions [1].So in the initial several iterations, the number of residual blocks is small.The number increases while the sample size in Figure 6.2 increases.
After decomposition, with designed g i satisfying Definition 1 for each subdomains, the function v i := u i − g i satisfies the homogeneous Poisson's equation and follows (2.3) and (2.4).
The results of D3M is shown in 6.3(a).For comparison, we plot the result of normal deep Ritz method (DRM) with the same type of network and the finite element method (FEM) in Figure 6.3(b) and 6.3(c).We set the result of FEM as the groundtruth and define the relative error e r = sol− f em sol 2 f em sol 2 . We compare results using residual network (ResNet), and the comparison including relative errors are shown in Table 6.1.We can see that with the same setting for networks, our D3M offers a higher accuracy than normal DRM in this experiment.Where h = h 2π is the reduced Planck constant, m is the particle's mass and E is a known constant related to the energy level.This equation occurs often in quantum mechanics where V(r) is the function for potential energy.Here we consider an infinite potential well The variational loss is (6.4) where Ω |Ψ(r)| 2 dr = 1, because Ψ(r) is the wave function and Ψ(r) 2 means the probability density of particle appearing.P i denotes the probability of particle appearing in Ω\Ω i .It should be noted that N i is an approximation of Ψ(r) i − g i , where Ψ(r) i is the global solution Ψ(r) restricted on the subdomain Ω i and g i is the boundary function for the interface of Ω i .The most significant contribution of the proposed approach is parallel computation, which lays a foundation for employing physics-constrained deep learning framework in large-scalar engineering simulations or designs.This is accomplished by incorporating domain decomposition method into the loss function.Based on the property of mesh-free, we propose a new D3M sampling method to improve computational efficiency.And our framework absorbs the idea of mixed finite element method, so that the boundary condition can be satisfied more accurately.We have demonstrated that deep domain decomposition method can solve general parabolic PDEs with high accuracy.Furthermore, the generalization performance of D3M should be better, because domains are smaller.
In theory, our approach is feasible to solve complex systems with many subdomains and corresponding neural networks.In practice, the approach suffers from three main bottlenecks.One is that a bad initialization of neural networks can lead to superfluous cost for following iterations.One is the choice of function for approximating interfaces.And another is that the choice of Lagrangian multiplier is important but lacks of prior.We leave these questions for future work.

Fig. 3 . 1 .
Fig. 3.1.Illustration of the neural networks.x, y are inputs, u, τ x , τ y are outputs and the dashed box with σ means the architecture of plain fully-connected neural networks.

Fig. 6 . 4 .
Fig. 6.4.The solutions of wave function and probability density for a time-independent Schrdinger equation.
in the same formula with Equation (4.19), denoted as K 4 .

Table 6 . 1
The relative error for Poisson's equation.

Table 6 . 2
Relative errors for the wave function and the probability density.This paper has proposed a new deep domain decomposition method.