Unsupervised Legendre–Galerkin Neural Network for Solving Partial Differential Equations

In recent years, machine learning methods have been used to solve partial differential equations (PDEs) and dynamical systems, leading to the development of a new research field called scientific machine learning, which combines techniques such as deep neural networks and statistical learning with classical problems in applied mathematics. In this paper, we present a novel numerical algorithm that uses machine learning and artificial intelligence to solve PDEs. Based on the Legendre-Galerkin framework, we propose an unsupervised machine learning algorithm that learns multiple instances of the solutions for different types of PDEs. Our approach addresses the limitations of both data-driven and physics-based methods. We apply the proposed neural network to general 1D and 2D PDEs with various boundary conditions, as well as convection-dominated singularly perturbed PDEs that exhibit strong boundary layer behavior.


I. INTRODUCTION
Modern machine learning methods using deep neural networks (DNNs) have achieved impressive results in a variety of areas, including image recognition [1], natural language processes [2], cognitive science [3], and time series classification [4]. Recently, machine learning techniques have become popular for solving partial differential equations (PDEs) and dynamical systems. Neural networks, which are effective at approximating nonlinear functions, have been used for computational parameterization through machine learning and optimization methods. This has led to the emergence of a new field called scientific machine learning, which uses machine learning techniques to solve a variety of PDEs; see e.g. [5], [6], [7], [8] and the references therein. In this paper, we propose a novel algorithm for numerically solving PDEs using machine learning and artificial intelligence. The algorithm is an unsupervised machine learning method based The associate editor coordinating the review of this manuscript and approving it for publication was Christian Pilato . on the Legendre-Galerkin neural network framework, which allows us to find accurate approximations to the solutions of various types of PDEs. Our method has the unique ability to learn multiple instances without the need for a training database. The proposed neural network is effective in solving not only general 1D and 2D differential equations, but also convection-dominated singularly perturbed PDEs that exhibit boundary layer behavior.
We develop an unsupervised Legendre Galerkin neural network (ULGNet) to solve various (partial) differential equations without the training dataset consisting of solutions of the PDE. When the BCs and external forcing functions are given as inputs, the proposed ULGNet predicts numerical solutions to the PDEs. Hence, the ULGNet can learn multiple instances of the solutions of PDEs. The loss function is set as a residual quantity motivated by the weak formulation of the Legendre-Galerkin approximation, as in the spectral element methods [9]. Generating the training dataset is often inefficient because the process relies on numerical methods such as finite difference, finite element, and finite volume VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ approximation. Although unsupervised learning seems efficient, most unsupervised algorithms for scientific machine learning predict only a single instance of the PDE. Hence, when the input information of initial conditions, boundary condition, and external force is changed, the neural network should be retrained. The proposed ULGNet is designed to overcome the drawbacks of recently developed algorithms in scientific machine learning. The ULGNet does not need to generate the solution dataset, as it is unsupervised learning, as well as the algorithm predicts multiple instances of PDE solutions. In order to construct an approximation to the solution of PDEs, we borrow a framework from the spectral method (SM) as in [10]. In particular, once the ULGNet infers coefficients, say α i , of polynomial basis functions φ i , we construct the linear combination N −1 i=0 α i φ i to approximate a solution to PDEs [9], [11]. Since each basis function satisfies the exact boundary condition, the predicted solution obtains the exact boundary condition. In addition, since the SM is known to be a high-resolution numerical scheme, we expect our predicted numerical solution to yield relatively small errors compared to other machine learning-based approaches.
The proposed ULGNet consists of a convolutional neural network (CNN) and a nonlinear activation function between each layer. A convolutional layer not only compresses the input data and maintains a correlation between the input. Hence, the main architecture of the ULGNets is adopted from network structures used in the computer vision tasks [1], [4]. We consider the Legendre-Galerkin (LG) framework of the boundary value problem at the heart of our full solver. Compared to the residual of a strong form of PDEs in PINNs, the weak form can reduce higher order differentiation to lower order by integration by parts to lessen the errors caused by higher order differentiation. Hence, the performance of the network is further improved by avoiding the high-order numerical derivatives. In addition, we set the Legendre polynomial as a test function of the LG approximation in the simulations. Hence, the numerical integration is highly accurate due to the Gauss-Legendre quadrature.
One of the main contributions of our study is the development of a learning architecture that accurately solves singular perturbation and boundary layer problems. Numerical methods for singularly perturbed convection-diffusion problems pose substantial difficulties since a small diffusive parameter produces a sharp transition inside thin layers. In most classical simulations of singularly perturbed equations, the computed solutions produce a large numerical error when the diffusive term is small. Hence, semi-analytic methods have been proposed and successfully applied as ''enriched spaces''. The main component of the enriched space method is to add a global basis function (a so-called corrector function in the analysis) and supplement the discrete space with the corrector function. The corrector functions constructed by singular perturbation analysis can absorb the boundary/interior layer singularities near the sharp transition. In machine learning, it is well-known that spectral bases prioritize learning of low-frequency components of the target function [12]. More precisely, since neural networks rely on the smooth prior, the spectral bias leads to a failure to accurately capture high oscillation or sharp transition in solution functions. Hence, without care, neural networks cannot capture the sharp transition caused by the boundary layer [13], [14]. In our algorithm, since the coefficients of the spectral approximation are predicted by the neural network, the main structure of the numerical scheme is maintained. From this perspective, existing numerical methods, such as enriched space methods, are adaptable to the ULGNet architecture by adding a proper basis function according to [22], [23], [24], and [25]; for more details, see e.g. Section V. Hence, the ULGNet successfully resolves the boundary layer behavior which is not well-studied in recent machine learning approaches. In Table 1, we provide a comparison of recent machine learning approaches for solving PDEs.
Our contributions are listed as follows: • We propose a novel ULGNet (Legendre-Galerkinbased) architecture that is able to learn multiple instances of solutions without the need for a training dataset.
• The predicted solution produced by the proposed algorithm satisfies an exact boundary condition, reducing numerical errors caused by boundary values.
• The ULGNet is expected to have low generalization errors, as the structure of our basis framework enhances the expressive power of the algorithm.
• The ULGNet is a learning architecture that is capable of accurately solving convection-dominated singular perturbed problems that exhibit strong boundary layers.

II. RELATED WORKS
Several neural network approaches that do not require solution datasets (also known as training datasets) have been developed with success, such as physics informed neural networks (PINN) [8], [26], [27], [28], [29], [30], deep Ritz method (DRM) [15], [31], [32], and deep Galerkin method (DGM) [16]. These approaches introduce a residual function to define a loss function based on differential equations, rather than relying on a solution dataset. Specifically, PINN utilizes collocation points as training data points in the space-time domain [7], [27], [28], [29], [33], [34], [35]. As a result, the PINN algorithm can be applied to time-dependent, multi-dimensional PDEs, fractional PDEs, computational domains of various shapes, and adaptive collocation points; see e.g., [36], [37], [38], [39], [40], [41], and [42]. However, the neural networks optimize only a single instance from a fixed input, such as initial conditions, boundary conditions, external forcing terms, and coefficients. Therefore, when the input is changed, the training process has to be repeated. Obtaining outputs in real time for complex systems that require various sets of input data is difficult with these approaches that learn a single PDE solution. On the other hand, data-supervised or data-driven  [15], Deep Galerkin Method [16], Fourier Neural Operator [17], Deep Operator Network [18], Data-Driven Discretization [19], physics-informed neural operator [20], and physics-informed DeepONet [21], respectively. methods (DDM) have been explored in [10], [17], [19], [43], [44], [45], [46], and [47]. Recently, an impressive discovery in the field of nonlinear dynamics has been achieved with the development of a new method called Sparse Identification of Nonlinear Dynamics (SINDy), which promotes sparsity in the identified models; see e.g. [48] and [49]. Since the loss functions of DDMs directly compare a true solution with an approximation generated by DNNs, neural networks fit the numerical solution in a process similar to supervised learning tasks in machine learning. Since neural networks for DDMs are designed to take various input data such as initial conditions (ICs), boundary conditions (BCs), and external forcing terms, they are able to produce multiple instances of the PDEs. However, to train DNNs, solutions of PDEs are required as a training dataset, which must be analytically or numerically provided in advance. The deep operator network (DeepONet, [18]) has been also robustly studied lately in multiple fields, such as multiphysics and multiscale problems in [50] and [51], a method using Fourier Neural Operator in [52], a method for multiple-input operators in [53], and a method for input data with various degrees of accuracy in [54]. The authors in [21] and [20] introduced an operator learning approach that can learn multiple PDE solutions. However, as in other ML-based approaches, they are not able to accurately predict numerical solutions to the stiff PDEs which exhibit a sharp transition near the boundary. Recently, Galerkin Neural Network (GNN) was introduced to learn a test function for a single instance in [55] and [56] and solve a boundary layer issue arose from reaction-diffusion types of singular perturbation problems.

III. METHOD
Since the ULGNet is motivated by the spectral element method, we briefly describe the LG spectral element method to guide our algorithm. Let us consider second-order elliptic differential equations with boundary conditions where F is a linear or nonlinear operator, B is a boundary operator, and ϵ > 0 is a constant. An integral form or a weak form of (1) can be defined as for any φ in an appropriate Hilbert space H . Thanks to the elliptic theory [57], [58], [59], it is well-known that there exists a weak solution to satisfy (2) in H . The numerical solution of (2) in H can be represented as Throughout this paper, we set the basis function to be a compact combination of Legendre polynomials such as where L k are Legendre polynomials of degree k. One can represent various boundary conditions by choosing a proper a k and b k ; see e.g., [9]. This paper provides examples for the homogeneous Dirichlet boundary (see (12) for more details) condition as well as the homogeneous Neumann boundary condition (see (17)) as a paradigm example. Apply the LG method to (3), we find an approximation to PDEs in a finite subspace of H spanned by {φ k } N −2 k=0 where N is a finite integer, that is, Now, the LG method aims to determine the coefficients α k for 0 ≤ k ≤ N − 2. While the spectral method is quiet similar to the finite elements method, the basis functions in the spectral method are global and continuously differentiable on the whole domain. Specifically, if a function u is in a Hilbert space, H m , then the corresponding approxima- ; for more details see e.g., [9]. Hence, provided that u is smooth enough, the spectral method have excellent error properties, that is, it achieves the so-called exponential convergence. The compact form can represent various boundary conditions such as the Dirichlet (see Section IV-A) and Neumann conditions (see Section IV-B) by choosing a k and b k . In addition, the orthogonality of the Legendre polynomials enables the mass and stiff matrices to be sparse; see e.g., [9].
Regarding the neural network architecture, we construct ULGNet based on the Legendre-Galerkin Deep Neural Network (LGNet) algorithm introduced in our previous study [10]. The input of the neural network is the external forcing function of the PDEs. Then, the input feature passes through the neural network and produces an output, { α k }. After that, u is reconstructed by In order to train the ULGNet without a true solution, we employ a weak form as a loss function, as given below.
The parameters in the ULGNet should be updated to minimize the loss function defined in (7). For the optimization algorithm, we adopt the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method, as in [60]. If the number of Legendre-Gauss-Lobatto (LGL) collocation points is N + 1, the number of corresponding basis functions is N − 1.
Hence, the loss function in (7) uses N − 1 test functions in total. Figure 1 displays a schematic diagram of our method, the ULGNet. Once forcing functions f are given as input data, they are inserted into the ULGNet. Subsequently, the input data pass the neural network which consists of multiple layers of convolution and activation. Then, the feature is flattened by a fully connected network. For the output, a set of coefficients is obtained. The approximations u corresponding to the given forcing function f are constructed by N −2 k=0 α k φ k . After that, the loss of u are measured by N weak formulations of the target PDEs in order to compute the difference between u and the true solution in a certain norm.
To close this section, we define some metrics below to measure the errors. The relative L 2 norm is defined as The mean absolute error (MAE) is written as where M is the number of nodal points. The L ∞ norm is defined by

IV. NUMERICAL EXPERIMENTS: NON-STIFF DIFFERENTIAL EQUATIONS
We train an ULGNet model to solve non-stiff (partial) differential equations. Since the solution profile is smooth, the LG framework can successfully approximate the solution.
In Table 2, convergence and useful hyperparameters are presented to demonstrate the performance of our proposed algorithm.

A. CONVECTION DIFFUSION EQUATION
As a paradigm example, we demonstrate that the ULGNet is able to solve the one-dimensional convection diffusion equation (CDE) with the homogeneous Dirichlet boundary  condition: where ϵ = 0.1.
To assign the homogeneous boundary condition, we set a k = 0 and b k = −1 for all k in (4). Hence, the compact form of the Legendre polynomial basis can be written as where L k is the k-th Legendre polynomial. In our experiments, we choose N +1 = 32 for the LGL nodal points. Then, {φ k } 29 k=0 becomes the corresponding set of basis functions for the nodal points. Following [10], we use the ReLU activation function for the linear problem with 32 convolutional filters and a kernel size of 5.
As for inputs of the ULGNet, we generate P = 10, 000 forcing functions which are given by where h ji and m ji for j = 1, 2 and i = 1, 2, · · · , P are drawn from a uniform distribution on [3,5] and [0, 2π ], respectively. After feeding the inputs, the ULGNet yields 10,000 sets of the coefficients {α ik } 29 k=0 corresponding to the i-th forcing function, f i . As a result, the i-th predicted numerical solutions u i is constructed by (6). We define a loss function derived from weak formulations of (11) as: for 0 ≤ j ≤ 29. From the weak formulations, the loss function for the ULGNet is defined as where u i is the predicted solution of u i , and P is the number of inputs. The trained neural network is then tested using an outof-sample set (unseen inputs) of 1, 000 randomly generated functions as in (13). These 1,000 inputs for testing are not present in the inputs for training. As shown in Figure 2 (a), the loss values (15) for training and testing decrease against the epoch. The predicted solution is close to the corresponding exact solution on the order of 10 −4 in relative L 2 errors. In Figure 2 (c), the predicted solution on out-of-sample data is close to the true solution with MAE: 7.940 · 10 −4 , relative L 2 : 4.257 · 10 −4 , and L ∞ : 2.560 · 10 −3 .

B. HELMHOLTZ EQUATION WITH NEUMANN BOUNDARY CONDITION
We turn our attention into the Helmholtz Equation with the Neumann boundary condition, where k u is a constant. In order to impose the homogeneous Neumann boundary condition, we consider the compact form of the basis as Since the corresponding weak formulation to (16) is defined by (18) the loss function is written as where P is the number of inputs. The architecture of the neural network is the same as that described in Section IV-A. Figure 3 (a) shows the loss values of the inputs for training and testing against the epoch. Figure 3 (c) indicates that the predicted solution on the out-of-sample data is close to the corresponding exact solution, with MAE:4.248 · 10 −5 , relative L 2 : 4.819 · 10 −4 , and L ∞ : 2.910 · 10 −3 . In fact, the numerical solution predicted by ULGNet exactly assigns the homogeneous Neumann boundary condition, the numerical solutions attain good accuracy.

C. BURGERS' EQUATION
In the section, we apply the proposed method to the nonlinear Burgers' equations The corresponding weak formulation to (20) is defined by where φ j 's are described in (12). The loss function is written as where φ j 's are as in (12) and P is the number of inputs.
In order to deal with the nonlinearity, we use the Swish activation function, as .  The architecture of the neural network is as described in Section IV-B. In Figure 4 (a), the losses for training and testing decrease against the epoch. Figure 4 (c) shows that the predicted solution on out-of-sample data is close to the corresponding exact solution with MAE:1.214·10 −3 , relative L 2 : 1.963·10 −3 , and L ∞ : 3.173 · 10 −3 .

D. TWO-DIMENSIONAL CONVECTION-DIFFUSION EQUATION
In this section, we consider the two dimensional convectiondiffusion equation, To construct the two dimensional basis functions with Dirichlet BC, we naturally employ and φ y j are basis in x and y direction as in (12), respectively. The corresponding weak formulation in (24) is defined by We then obtain the loss function as where P is the number of inputs. We generate forcing functions as input data + h 2p cos(π(m 3p x + m 4p y)), where h ip for i = 1, 2 and m jp for 1 ≤ j ≤ 4 are drawn from a uniform distribution on [1,2] and [0, 1], respectively and p = 1, 2, · · · , P. For numerical experiments, we set ϵ = 0.1 and v = (−1, 0). As shown in Figure 5 (a), the losses for the training and test decrease against the epoch. Figure 5 (c) shows that the predicted solution on out-of-sample is close to the corresponding exact solution with MAE: 3.023 · 10 −5 , relative L 2 : 6.942 · 10 −3 , and L ∞ : 5.248 · 10 −3 .

V. NUMERICAL EXPERIMENTS: SINGULAR PERTURBATION PROBLEM
In numerical analysis, convection-dominated singularly perturbed problems are difficult to solve without special treatment since a small diffusive parameter produces a sharp transition inside thin layers. As in scientific computing, ML approaches suffer from the boundary layer issue since neural networks rely on the smooth prior. In this regard, we develop a basis enriched method for our ULGNet to resolve the boundary layer issue. Our numerical experiments manifest that our algorithm is effective in finding an accurate numerical solution for singularly perturbed convection-diffusion equations which exhibit strong boundary layer behavior.

A. CONVECTION-DIFFUSION EQUATION WITH A BOUNDARY LAYER: ONE DIMENSIONAL CASE
We propose our enriched algorithm to the convectiondiffusion equation in one dimension, with ϵ ≪ 1. In contrast to Section IV-A, due to the structure of (29), the solution profile has a sharp transition near x = −1, which is referred to as a boundary layer. Capturing the stiff boundary layer is challenging in numerical analysis, but understanding the boundary layer phenomenon is important in many scientific applications. In order to handle the boundary layer as in [22] and [23], the additional basis function can VOLUME 11, 2023  be used named a boundary layer corrector (for the specific derivation, see the Appendix). The former exponential term in (30) stands for a boundary layer profile. After adding the latter linear term to the former, (30) satisfies the Dirichlet BC.
We then enrich (6) as where φ N −1 = φ cor in (30). We define the loss function with the N − 1 base functions as For numerical experiments below, we set ϵ = 10 −6 . Figure 6 depicts that the predicted solution using the ULGNet on the out-of-sample input is close to the corresponding exact solution within MAE: 6.395 · 10 −4 , relative L 2 : 2.523 · 10 −4 , and L ∞ : 2.689 · 10 −3 . For comparison, we provide an experiment using the PINN method in panel (a) of figure 6. The numerical results manifest that the ULGNet can provide an accurate numerical solution to the convection-dominated singularly perturbed problem while the standard PINN method does not effectively capture the sharp transition.

B. CONVECTION-DIFFUSION EQUATION WITH A BOUNDARY LAYER: TWO DIMENSIONAL CASE
We further investigate the singular perturbation problem in a multi-dimension. Here, we consider a two dimensional convection-diffusion equation, where ϵ ≪ 1. As in (24), we employ φ ij (x, y) = φ x i (x)φ y j (y) where φ x i (x) and )φ y j (y) stand for the enriched LG basis of (31) in x and y direction, respectively. Then, the corresponding weak equation for (33) is written as Accordingly, the loss function is written as To simplify the boundary layer behavior, we assume a compatibility condition, f = f yy = 0 at y = −1 and y = 1, which is required to expect the boundary layer near x = −1 only; see e.g., [23]. We generate the inputs for (33) as where p = 1, 2, · · · , P, and h 1p and h 2p are drawn from a uniform distribution over [1,2], and m 1p and m 2p are over [0, 2π]. In addition, n 1p and n 2p are integers randomly drawn from {1, 2, 3}. In order to enhance the performance, the experiments in this section are conducted with double precision floating format.
For numerical experiments, we set ϵ = 10 −6 and v = (−1, 0). Figure 7 indicated that the predicted solution using the ULGNet on the out-of-sample data is close to the corresponding exact solution within MAE: 7.271 · 10 −4 , relative L 2 : 6.532 · 10 −4 , and L ∞ : 3.779 · 10 −3 . For comparison, VOLUME 11, 2023 we provide an experiment using the PINN method in panel (a) of figure 7. Clearly, the PINN solution does not capture the sharp transition arose from the singular perturbation. A closer look is available in Figure 8 which shows a 1D cross section of the predicted solution u and reference solution u along the line y = −0.05.

VI. CONCLUDING REMARK
This work proposed ULGNet which is designed to predict a solution to an elliptic PDE without access to the true solution.
To do this, we considered the Legendre-Galerkin framework, in which the solution of a PDE can be approximated by a linear combination of a set of orthogonal bases, such as the Legendre bases, in an appropriate L 2 space. Hence, the approximation process of the spectral element method is similar to what the proposed ULGNet is trained on. One of the main contributions of our study is the development of a learning architecture that directly tackles singular perturbation and boundary layer problems. Thanks to the boundary layer corrector function as the additional basis function, the sharp transition from the stiff PDEs is captured by the enriched scheme.
The present research can be extended to several directions as follows. First, as in Section V, the machine learning with the enriched space method can be applied to various complex problems such as interior layers Burgers equations [22], boundary layers on a circular domain [24], and 2D Navier-Stokes equations with no-slip boundary condition [61]. Furthermore, since our scheme is based on the Legendre-Galerkin method, the algorithm can be extended to utilizing other orthogonal polynomials such as Chebyshev polynomials and Jacobi polynomials depending on the features of PDEs. The ULGNet is also expected to be modified for finite element method (FEM) which employs, in the same manner, weak formulations of PDEs with basis functions defined on reference elements. Hence, we expect that an unsupervised FEM neural network can be developed to problems on domains of various shapes. In future research, we also plan to apply ULGNet to time dependent problems. In terms of numerical methods, time dependent problems are implemented by Euler methods, Runge-Kutta methods, and so on. Subsequently, the methods iteratively yield the solution at the next time step from the solution at the previous time step. Thus, we expect that once ULGNet is trained on the modified time independent problem, it can iteratively generate the next time solution from the previous time solution.
Assuming ϵ = 0, the formal limit equation of (38) reads −u 0 x = f , x ∈ (−1, 1), u 0 (1) = 0. (39) Because the boundary condition u 0 at x = −1 is not assigned, one can expect a discrepancy between u ϵ and u 0 in the vicinity of x = −1. We observe that the boundary layer of size ϵ occurs near the outflow boundary at x = −1. In this regard, we derive that the asymptotic equation for the corrector θ, which approximate the discrepancy u ϵ − u 0 , is given in the form, The corrector θ is given in the form where the e.s.t. denotes an exponentially small term with respect to ϵ. The 2D boundary corrector can be derived in a similar fashion to the 1D corrector; for more details, see e.g., [22] and [23] JUNHO CHOI received the Ph.D. degree in mathematics from the Ulsan National Institute of Science and Technology, in 2020.