A Data-driven Reduced Order Modeling for Fluid Flow Analysis based on Series Forecasting Intelligent Algorithm

In this work, we propose a data-driven reduced-order model (ROM) for high dimensional ﬂow ﬁelds by combining ﬂow modal decomposition and multiple regression. SVD-based proper orthogonal decomposition (POD) is employed to extract principal spatial modes representing energy and dynamics level of ﬂow ﬁeld. The temporal coefﬁcient regression for ﬂow modal series is realized through intelligent algorithms: light gradient boosting machine (LGBM), long short-term memory (LSTM), and temporal convolutional neural network (TCN). The performance of the ROMs are assessed by predicting and analyzing low Reynolds number ﬂow around a circular cylinder and transonic ﬂow around a airfoil. The experiments show that vortex ﬂow and shock ﬂow are both well predicted with the POD-LGBM, POD-LSTM and POD-TCN, whereas the prediction result of POD-TCN is the closest to the numerical solution, with the minimum root mean squared error. Also, it should be noted that the prediction accuracy depends on the reduced-order results of ﬂow ﬁeld.


I. INTRODUCTION
Computational fluid dynamics (CFD) has tremendously promoted the progress of exploring complex flows such as multiscale, instabilities and turbulence with the enormous advantage of being not restricted by environment and region, which is of technical significance in the fundamental research and monumental projects in aerospace, civil engineering, wind power generation, marine engineering, and other fields [1][2][3][4]. Whereas, data processing and flow analysis are indispensable parts. Especially, the enormous calculation scale of CFD and the big data processing with more data what the human brain cannot grip severely restrict the progress of CFD. Reducedorder model (ROM) has been proposed against the backdrop of distilling features from magnanimity data and studying the physical mechanism of fluid flow more cost-effectively [5].
There are some prior researches on modal reduction techniques about ROMs. Proper orthogonal decomposition (POD), a typical modal-decomposition technique, is to project high-dimensional nonlinear data onto orthogonal low-dimensional "coordinates", resulting in the reducedorder target. Actually, the reduced dimensional coordinates generated by POD are consistent with the principal components of governing fluid flow. POD was extended to fluid mechanics by Lumley et al. in 1967 [6]. Their work verified that the coherent structure of turbulence could be extracted with POD and applied to the numerical simulation of flow field. But the inefficiency of computing at that time hampered its wider spread. Fortunately, in 1987, Sirovich [7] proposed the fast snapshot method for processing data matrix, which provided an efficient computing strategy for POD. Some other POD-type reduced order methods involving balanced POD [8], sequential POD [9], TPOD [10], and Spectral POD [11], have been evolved along with the exploration of applying POD to analyze the physical mechanism. These ROMs are employed in characterizing complex fluid flows and forecasting the future states by constructing dynamical models about underlying data, where a breakthrough in big data identification is to fusing ROM with intelligent algorithm in machine learning to better improve the prediction accuracy [12][13][14].
For non-intrusive reduced dimension method, POD realizes the extraction of dominant modes from high-fidelity  [15][16][17][18][19]. Hesthaven et al. [20] successfully predicted the time coefficients of POD using FNN for Poisson equation problem and driven cavity flow. Swischuk et al. [21] used four machine learning techniques including FNN, multivariate polynomial regression, k-nearest neighbor and decision tree to predict the airfoil pressure field, and analyzed the advantages and disadvantages of the four methods. Long short-term memory (LSTM) presented in 1997 [22] and continuously improved and promoted [23][24], as a classical RNN, has great superiority in the time series prediction and is also popular here. Deng Z et al. [25]. used LSTM-based POD to reconstruct the turbulent flow field, proving the good learning ability of LSTM in reconstruction. Mohan et al. [26] adopted classical LSTM and bidirectional LSTM in turbulence control, and their results revealed that the latter reduces the prediction accuracy due to overfitting. In the establishment of large-scale finite element model, POD combined with LSTM was applied to predict plastic strain and von Mises stress, controlling the error within 1% [27]. In terms of wind turbine wake prediction, Zhang et al.
[28] established a POD-LSTMbased wake model using high-fidelity data from large eddy simulation, which could predict the unsteady vortex wake of wind turbines in a short time.
Although LSTM has certain advantages in time series modeling, its limitations mainly manifest in gradient vanishing, complex internal structure, and vast amounts of parameters. Temporal convolutional network (TCN) with relatively simple internal structure and more stable gradient calculation was presented by Bai et al. [29] through redesigning one-dimensional convolutional structure merged with causal convolution, which can perform convolution calculations in parallel. Their experimental results also verified that TCN performs significantly better than LSTM under multiple sequence modeling. Therefore, we try to adopt a temporal convolution structure to construct a POD-TCN reduced-order model.
In order to verify the advantages of POD-TCN, we choose POD-LSTM that has been proposed by other researchers and proved to have good results for comparison [30][31][32][33]. In addition, considering that gradient boosting decision tree (GBDT) [34] is an enduring model in machine learning and light gradient boosting machine (LGBM) [35], meanwhile, is stemming from solving the difficulties encountered by GBDT in massive data, we design POD-LGBM as another comparative model. The rest of this paper is organized as follows.
TCN. • Section 4: verifying the performance of these models via flow prediction around circular cylinder and airfoil, and demonstrating the advantages of POD-TCN over other models. • Section 5: summarizing our results.

II. PROCESS: FROM DIMENSIONALITY REDUCTION TO RECONSTRUCTION
This work presents a non-intrusive reduced dimension method for high dimensional flow field data. The configuration of flow field reconstruction with the proposed method is illustrated in Fig.1. Firstly, we attain the data u(x, t) as historical snapshot information by computing low Reynolds number flow around a cylinder and transonic flow over a 2D airfoil. Secondly, SVD-based POD method, an efficient reduced-order model, is employed to extract principal spatial modes ϕ i (x) = 1, 2, · · · , r representing the actual flow field features. And then modal regression is applied to determine the function relation between modes and measured flow fields.
We apply TCN to predict temporal coefficients in regression model, yielding the reconstructed flow field u(x p , t p ) = r k=1 a k (t p )ϕ k (x) +ū. In addition to this, we choose LSTM and LGBM as comparison models. For sequence data composed of flow field information, the high-dimensional property makes it difficult for RNNs that are often used in series forecasting to converge when modeling it. More importantly, under the condition of a large number of nonlinear relationships in the flow field information, whether it is a steady or unsteady flow field, it is difficult to accurately judge whether the data at the next moment is related to the historical data and how long ago the historical data is related. The shortcoming of RNN's inability to memorize all historical information will have an unpredictable impact on the prediction results. The convolutional structure of TCN determines that each upper-layer neuron inside it contains a part of historical information. Its unique structural design ensures that the model can obtain more periodic information while obtaining the overall trend data of the series, which reduces the computational cost and enhances the robustness. The tree-based model LGBM is also powerful for series forecasting, but the POD coefficients are derived from orthogonal projection, and there is no correlation between them.
LGBM requires a large number of related exogenous feature to split nodes, so it may be affected by data characteristics in the flow field prediction task based on orthogonal decomposition, resulting in a decrease in prediction accuracy, while TCN does not have this defect of tree-based model. Therefore, in order to prove that TCN is indeed more effective in practical engineering, we choose LSTM and LGBM to participate in the experiments.
We show the reduction-prediction process illustrated in Fig. 1 in pseudocode form, which can not only convey the procedure of our work, but also provide theoretical guidance to similar physical problems to a certain extent.
Algorithm 1 shows the core algorithm of the reducedorder predictive model. The main input and output together with formulas are listed.

Algorithm 1 The stages of Reduced-Order-Predictive models
Require: Specific flow field parameters under known conditions j, k, l: v , truncation coefficient r, training snapshots n, prediction steps T p , frontier parameters v , under predicted conditions x, historical raw information u. Ensure: temporal coefficients a into training data for modef ; 6:

III. METHODOLOGY
This section provides a theoretical basic for subsequent numerical experiments by establishing POD mathematical models and three intelligent learning algorithms eminently suitable for time-series analysis.

A. PROPER ORTHOGONAL DECOMPOSITION
Proper orthogonal decomposition (POD) [6,7] is a mathematical approach to the extraction of feature representing energy and dynamics level from discrete data. Here, these discrete data are flow field data calculated based on finite volume method under structural grid. We collect historical data of flow physical quantity (e.g., density, pressure, velocity) at times t 1 , t 2 , · · · , t m over spatial locations x = x i , i = 1, 2, · · · , n as samples.

VOLUME 4, 2016
A time transient corresponds to a dimension, so this matrix involves dimensional random variables. There may be some correlation and information overlap between these variables. Naturally, in algebra, the correlation and overlap of these variables mean that the original high-dimensional system can be substituted with low-dimensional system reserving most of the information of the original variables, which actually an idea of dimension reduction. For the flow field, we refer to the low-dimensional variables as the main flow modes whose linear superposition constitutes the main characteristics of the flow field. The benefits are as follows.
i. If this main flow mode can be captured, we can investigate what kind of flow has similar flow mode. ii. Whether this unique flow mode is amplified or attenuated with time. iii. These dominant flow modes can express most of the flow behavior, so the complex flow problem can be transformed into the evolution of several main flow modes with time. The mathematical model of POD is given below. Historical snapshot information u(x, t) denotes a vector field (e.g., density, pressure, velocity) in formula (1) with its temporal expectationū(x), and then the vector field u(x, t) −ū(x) can be expanded in the manner of space and time splitting where ϕ i (x) are spatial orthogonal modes which are orthogonal basis functions and a i (t) are temporal coefficients. POD of the field data is mainly performed by three methods: space, snapshot and SVD methods.

1) Spatial POD method
To stress the column vector as the snapshot at t moment, we set and then form a covariance matrix According to the idea of POD extracting principal features, the larger the variance Var(ϕ T i X) of the vector ϕ T i X , the more information it contains. The vector ϕ i that is also the basic functions in formula (2) can be obtained by solving the orthogonal eigenvectors ϕ i and the corresponding eigenvalues λ i of matrix Σ.
Obviously, the eigenvalues are the variance, namely Hence, the larger eigenvalues λ i represents the more feature captured by the eigenvector ϕ i . We refer to the eigenvectorϕ i as POD mode. In velocity field, for instance, the eigenvalues λ i represent the kinetic energy contained by the matching POD modes. The objective of POD is to select a small number of base functions that can best express the given flow field data by ignoring the components corresponding to the smaller variance. We can utilize the cumulative contribution rate to determine the mode number of modes. When the cumulative contribution rate approximately reaches the overall energy value as follows.
We believe that the reconstruction is successful and keep modes ϕ i , i = 1, 2, · · · , r to express the given flow field. As the principal modes are determined, the flow field can be expressed by the truncated series The coefficients a i (t) are calculated by In the actual flow calculation, the number of computing nodes is much larger than the sample size, that is n m. Thus, the large size of matrix Σ = XX T ∈ R n×n challenges memory resources and makes it very hard to solve the main modes when using the spatial POD method. The snapshot POD presented by Sirovich [7], as an alternative and tractable approach, determines the flow field modes from the temporal variance matrixΣ = X T X ∈ R m×m . Based on the idea of snapshots, we firstly solve the eigenvalue problem forΣ of size m × m instead of Σ = XX T of size n × n as follows Here, we might as well suppose eigenvalues λ i > 0, i = 0, 1, · · · , m, considering that the modes corresponding to the eigenvalues greater than zero may become the main modes.
MatricesΣ = X T X and Σ = XX T have a transpose relationship, and then they have the same eigenvalues, and their eigenvectors can be transformed into each other. Therefore, the next step we implement is to determine the POD modes by the formula which relations can be expressed in the matrix form as follows with Φ = [ϕ 1 , ϕ 2 , · · · , ϕ m ], Ψ = [ψ 1 , ψ 2 , · · · , ψ m ], and Λ = diag(λ 1 , λ 2 , · · · , λ m ). At present, the constructive role of this snapshot-based method in processing big data makes it widely used and developed in the CFD field.
The aforementioned POD is carried out through eigenvalue decomposition for a square matrix. Singular value decomposition (SVD) is a convenient tool suitable for a rectangular matrix decomposition, which can compress data in matrix form, and provide an effective and robust modal extracting technique. Thus, SVD has been widely applied to image compression, semantic extraction, and flow field analysis involving large data. Here, we can experience its advantages first-hand through employing SVD to find the POD modes.
In fact, SVD is applied to directly decompose the given flow data matrix X ∈ R n×m by the formula (14) following by the matrix form Both of Φ and Ψ are orthogonal matrices meeting ΦΦ T = I and ΨΨ T = I . We refer to their column vectors ϕ i , i = 1, 2, · · · , n and ψ i , i = 1, 2, · · · , m as left and right singular vectors of X, respectively. Singular values σ i , i = 1, 2, · · · , m in the generalized diagonal matrixΛ are related with the above eigenvalues by σ 2 i = λ i , i = 1, 2, · · · , m. It can be observed that SVD-based POD has the ability to quickly determine the principal modes.

1) LGBM
Light gradient boosting machine (LGBM) is a boosting algorithm framework based on the gradient boosting tree (GBDT) [30], which has good training effects and less prone to overfitting. Fig. 2 shows an overview of LGBM structure.
LGBM introduces two new technologies on the basis of GBDT: gradient-based one-side sampling (GOSS) and exclusive feature bunding (EFB). GOSS selects the samples with larger gradient to calculate the information gain, while the samples with smaller gradient is not considered, avoiding the influence of long tail effect. EFB can bind many mutually exclusive features into one feature, thus achieving the purpose of dimensionality reduction. In addition, LGBM also uses histogram algorithm to discretize continuous floatingpoint eigenvalues into integers and counts all the data with the histogram. Leaf-wise strategy is enforced to find the optimal split node by traversing. The combination of these technologies improves the computational efficiency of LGBM by ten times compared to the GBDT algorithm without reducing the algorithm capability, and reduces the memory usage by onethird, making LGBM more suitable for a large amount of data and parallel calculation.

2) LSTM
Recurrent neural networks (RNNs) often have problems such as gradient disappearance and gradient explosion during training, resulting in the difficulties of training underlying parameters in network. Long short-term memory (LSTM) [22] alleviates the gradient problem by adding three gate structures (input gate, forget gate, output gate), a cell state, candidate state and memory state to control the update and circulation of valid information. Fig. 3 shows the LSTM structure. The input gate represents the new information added to the memory. The output gate represents the information output by the current cell state. The forget gate selectively forgets the useless information in the cell state. The cell state represents long-term memory, and the memory state represents short-term memory. The input gate, forget gate, and output gate can be expressed by While the cell state, memory state, and candidate state can be written by 3) TCN  is mainly composed of three components: causal convolution, dilated convolution, and residual connection.
Causal convolution with only one-way structure between layers can be visualized in Fig. 4. Different from traditional 1D-CNN that can see the future value, convolution output at time t only depends on the cells from time t and at the previous layer. Therefore, causal convolution builds a severe time-constrained model.
Assume filter F = (f 1 , f 2 , f 3 , · · · , f K ) and sequence X = (x 1 , x 2 , x 3 , · · · , x T ) , the causal convolution about x t can be written as The simple causal convolution structure is still subject to the receptive field, that is, the modeling length of the network for time series depends on the size of the convolution kernel. In order to obtain the dependence between values or features of a long period of time ago, many layers need to be stacked, which greatly increases computing cost. To solve this problem, TCN applies dilated convolution. Fig. 5 shows the dilated convolution diagram. Dilated convolution allows interval sampling in convolution input, and sampling rate is controlled by d in the Fig.5. d = 1 means that each node is sampled during input, and d=2 means that each two nodes are sampled as input. In general, a larger d is set in a higher layer. Therefore, dilated convolution makes the size of the receptive field exponentially increase with the number of layers, so that the network can obtain a large receptive field with fewer layers.
Residual connection [36] has proved to be an important approach to training deep networks, which enables cross-layer connections as information flows between layers. It contains side effects such as smoothing and optimizing terrain, which are very effective for training. Fig. 6 shows the framework of residual connection. A layer of temporal block is used to replace a layer of convolution, which contains two layers of convolution and activation functions, and Dropout is added to regularize the network.

IV. PERFORMANCE EVALUATION
The coefficients are generated in the process of POD reduced-order decomposition for the experimental flow field. How to use neural network model to predict the coefficient of flow modal series is one of the key points of this research. In this section, two examples are used to test the models: unsteady flow around cylinder and steady flow around airfoil.
In the method proposed in this paper, two machine learning libraries, Scikit-learn and PyTorch, are applied to construct LGBM, LSTM, and TCN models. Root mean squared error is used as the loss function to update gradients during model

training. The loss function is shown in formula
where n is the number of grids, u t i is the i th component of the truth value at time t, and u t pre,i is the i th component predicted by the POD-LSTM, POD-LGBM, POD-TCN reduced-order models at time t .
The three predictive models are integrated into the reduced-order model in the same way. The snapshot data of flow around a cylinder in this paper are calculated through OpenFOAM, and the snapshot data on airfoil flowfield are acquired with our Fortran demo. The reduced-order models based on intelligent algorithms are implemented in Python environment. These operations are performed on a computer equipped with a Core i7-7700H CPU and 16G memory.

A. EXPERIMENT 1: LOW REYNOLDS NUMBER FLOW AROUND A CIRCULAR CYLINDER
Flow around a circular cylinder is one of the basic flows in fluid mechanics, and it is also an important example for studying reduced-order models. OpenFOAM software is employed to numerically calculate the flow around cylinder in unsteady state to verify the proposed method. The calculation grid is shown in Fig. 7. We simulated the flow field under the incoming flow conditions of three velocities: v 1 = 0.8m/s, v 2 = 1.0m/s, v 3 = 1.2m/s , which results constitute the data set. The main parameters of numerical simulation experiments are given in Table 1. In this paper, X-velocity component U , a representative physical quantity, is used for analysis. The changing curve of U at a monitoring point behind the cylinder with time is given in Fig. 8. We can observe that the velocity begins to be stable at 260s, and the period after stabilization is 10s . In order to capture the dynamic characteristics of cylindrical wake more accurately, we collect the snapshot information when the flow reaches the limit cycle state. The snapshots collected after 400s were selected and the time step ∆t = 0.2s. A total of 251 snapshots in 5 cycles were selected as the research objects, and combined into a 36800×251 matrix for POD order reduction analysis.
The energy proportion of the first ten-order modes is calculated by formula (8). In Fig. 9, it can be seen that the first two-order modal energy accounts for 92.63%, and the energy ratios of the first four-order, six-order, and eight-order modes are 98.48%, 99.7%, and 99.94%, respectively. We choose the first four-order modes that have captured most of the energy to reconstruct the velocity field. Three groups of temporal coefficients after POD decomposition are calculated under three working conditions. The set of temporal coefficients is then trained using intelligent methods. The time series of the velocity field based on POD decomposition is plotted in Fig. 10. We can observed that the temporal coefficients The well trained models with POD-LGBM, POD-LSTM and POD-TCN are applied to predicting the flow field under the conditions of inflow velocity v = 0.9m/s and v = 1.1m/s, respectively. Actually, the snapshots on velocity field of U from t = 400s to t = 440s are selected for training. The ordered spatial modes (r = 4) well approximately representing the real field features are used to predict the construction of corresponding field. Here, we predict the velocity field from t = 440.2s to t = 450s. Thus, the number of training snapshots n = 251, and the number of prediction steps T p = 50.
The 251 training snapshots after POD reduction are transformed into a matrix form of [251 × m] , where is the number of modalities we use. Since POD order reduction uses the orthogonal projection technique, there is no linear and nonlinear relationship between different modes, so during training and prediction, we can train and predict different time coefficients separately without affecting the final result. In this experiment, a constant 7 is selected as the sequence length of the input data, which is gradually organized at an interval of 1 time step. Finally, the input data dimension of the training and verification of a single time coefficient in each working condition is [244 × 1 × 7] . Combine all the data under three similar working conditions and the first T data of the predicted working conditions as the final training and validation set, and the data dimension is [926 × 1 × 7] . For the prediction case, the dimension of each input data is [1 × 1 × 7] , the data from timestep T − 6 to T is used to predict the (T + 1) th data, and then the input is recursively organized, with the data from timestep T − 5 to T and the (T +1) th data predicted in the previous step is used to predict the (T + 2) th data, and so on, until the (T + 50) th data prediction is completed.
After more than 100 times of hyperparameter tuning, we selected three different sets of hyperparameters that made the predictions of the three models the best as the final settings, and compared the prediction effects of the three models under this setting. To show the prediction effect, a spatial modality with a timestamp is randomly selected for reconstruction (t = 447.6s), which results are illustrated in Fig.11. We can see that POD-LGBM, POD-LSTM and POD-TCN well predict the position and structure of vortex wake at the inflow velocities v = 0.9m/s and v = 1.1m/s. This indicates that our ROMs combining SVD-based POD and intelligent algorithms are reasonable and reliable.
Moreover, the predicted temporal coefficients with three ROMs in two cases of incoming flow velocity v = 0.9m/s and v = 1.1m/s are shown in Fig. 12(a) and 13(a), compared with the real values. In the POD process, the first several temporal coefficients and spatial modes have a major impact on the prediction of overall flow field. Fig. 12(b) and Fig. 13(b) show the predicted values of coefficient a 1 for further careful observation. In comparison, the predicted values with POD-LGBM deviate from the actual values in some stages, which does not affect the overall prediction effect of POD-LGBM. The predicted results of POD-LSTM and POD-TCN are in good agreement with the real data. When the temporal coefficient becomes stable, the accuracy of predicted value also accordingly increases, which may be related to the periodicity of temporal coefficients, whereas the periodic sequences are theoretically more conducive to machine learning. In order to validate the universality of the proposed model in this paper, aperiodic time coefficient prediction is discussed in detail in Section 4.2.
To further evaluate the calculation effect of three methods: POD-LGBM, POD-LSTM and POD-TCN, we calculate their errors RMSE by formula (19), and the results are shown in Fig. 14. We can observe that RMSE from POD-TCN under two conditions is between 1% and 1.5%. For the inflow velocity v = 0.9m/s, RMSE from POD-TCN is higher than that in the first few seconds of prediction. As the number of prediction steps increases, the overall RMSE value is lower than those from POD-LSTM and POD-LGBM. For the inflow velocity v = 1.1m/s, RMSE from POD-LSTM and POD-TCN within 50 steps are significantly lower than that from POD-LGBM, while RMSE from POD-TCN is again lower than that from POD-LSTM in multiple time steps. Therefore, POD-TCN model outperforms the other two models when predicting the flow around cylinder.

B. EXPERIMENT 2: TRANSONIC FLOW AROUND RAE2822 AIRFOIL
Firstly, we establish the transonic flow database of RAE2822 airfoil. Based on fifth-order WENO-Z scheme, we simulate the transonic steady viscous flow over RAE2822 airfoil for each fixed attack angel. The computational grid is composed of C-type grid with a size of 280 × 60, as shown in Fig.15  angle is steady, and the state change of attack angle is isochronous with the time flow, the state of attack angle can be equivalent to time stamp. Table 2 shows the main calculation parameters. Flow fields under three Reynolds number: Re 1 = 6.4 × 10 6 , Re 2 = 6.5 × 10 6 , Re 3 = 6.6 × 10 6 and 301 attack angles are simulated to constitute the flow field database.
Then, for each Reynolds number, 301 flow fields are calculated and a snapshot matrix is formed. Without loss of generality, we focus on pressure field analysis. Theoretically, taking the pressure snapshots in three different Reynolds as input, the proposed model can predict the pressure field with similar Reynolds number. Now, we investigate the series of pressure field data by POD and extract the main modes. Specifically, after being processed by SVD-based POD, the correlation matrix of pressure field is successfully decomposed into spatial modes and temporal coefficients. When three field matrixes under three different Reynolds numbers are decomposed by POD method, there is little difference in the results of decomposition. So taking the case of Reynolds number Re 2 = 6.5 × 10 6 as an example, Fig. 16 shows the first four-order coefficients with angles of attack ranging from 0 • to 3 • , which are comparable with temporal coefficients. We can observe that different from the periodic temporal coefficient of reduced-order flow around cylinder, the temporal coefficients of flow over airfoil are non-periodic. Reynolds number 6.4 × 10 6 ; 6.5 × 10 6 ; 6.6 × 10 6 Grid 16800 Attack angle a i = −3 + 0.02 × i, i = 0, 1, · · · ; a i ∈ [−3, 3] Mach number M a = 0.731 Correspondingly, Fig.17 illustrates the energy ratio of the first ten-order modes of pressure field of RAE1222 airfoil.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. We can see that the energy ratio of the first two-order modes reaches 99.69%, followed by the energy of the first four-order, six-order and eight-order modes accounting for 99.92%, 99.97%, and 99.99% respectively. From the perspective of model energy ratio, it is very successful so far to construct the transonic flow database of airfoil by changing the angle of attack, even if shock happens in the flow. It also confirms that flow field analysis using reduced-order model is feasible. We choose the first four-order modes which represent the main information of the pressure field, and employ POD-LGBM, POD-LSTM and POD-TCN to train the series of temporal coefficients. For Reynolds numbers Re = 6.45 × 10 6 and Re = 6.55 × 10 6 respectively, given the pressure fields at the attack angles a i = −3 + 0.02 × i, i = 0, 1, · · · , 250; a i ∈ [−3, 2], the pressure data in the range of attack angle 2 < a ≤ 3 are predicted. In experiment 2, we organized the training data and test data in the same way as in experiment 1. Using the same time step and interval, we obtained a single time coefficient training and validation data set of [294 × 1 × 7] under a single working condition, and finally obtained [1126 × 1 × 7] training and validation data, predicting the data from timestep T + 1 to T + 50 under specific conditions by recursively calling the model. Likewise, in more than 100 times of detailed hyperparameter tuning, the three sets of hyperparameters that each made the three models predict the best performance were selected as the final settings. Based on this, we can assume that more detailed parameter tuning will not change the final experimental conclusions. Fig. 18, at a = 2.88 • for Re = 6.45 × 10 6 , and Fig. 19, at a = 2.46 • for Re = 6.55 × 10 6 , show predicted pressure fields based on three methods and numerical simulation results. It can be seen from the two cases that the overall pressure gradient is roughly the same, and the shock locates in the middle and rear of the upper surface of the airfoil. The position and strength of shock are also well predicted.
Next, we analyze the error generated during prediction. Fig. 20 and 21 show the first four-order coefficients, a 1 , a 2 , a 3 , a 4 of POD decomposition series and predicted values from three methods: POD-LGBM, POD-LSTM and POD-TCN. Fig. 20(b) and 21(b) illustrate the predicted plot of a 1 alone for clarity. The results indicate that POD-TCN and POD-LSTM deviate slightly from the truth, whereas POD-LGBM diverges from the truth more and its error accumulation is obvious.
The errors of three methods under the two conditions: Re = 6.45 × 10 6 and Re = 6.55 × 10 6 are shown in Fig. 22. The initial error and cumulative error growth rate of POD-TCN are significantly smaller than those of POD-LGBM and POD-LSTM, indicating that POD-TCN has a good prediction ability in predicting aperiodic sequences.

V. DISCUSSION
Our work proposes a non-intrusive reduced-order methodology for flow field analysis via combining SVD-based POD and intelligent algorithms: LGBM, LSTM and TCN. It is worth noting that LGBM, LSTM and TCN are typical and advanced representatives of tree-based model, RNN-based model and CNN-based model respectively. Principal flow modes expressing flow behavior are attained by SVD-based POD for field snapshots. Modal regression analysis is applied to determine the mapping relation between modes and measured flow fields, where LGBM, LSTM and TCN are enforced to predict temporal coefficients in regression model, resulting in the reconstructed flow field. The experimental results show that POD-LGBM, POD-LSTM, and POD-TCN have high ability to predict vortex flow and shock flow, whereas POD-TCN has stronger prediction ability, more stable robustness and slower cumulative error growth rate for multi-step prediction. The hierarchical structure of POD-TCN ensures that the model can obtain the general trend of sequence data whilst achieve more periodic information, which is one of the important reasons why it can achieve higher prediction accuracy with less cost.
In our experiments, we find that the model prediction effect cannot be simply attributed to the limitations of the model itself. The size of the database for constructing snapshots has an important impact on the prediction accuracy. We will investigate this issue in the future work.  order modeling for turbulent flow control using LSTM neural networks," arXiv preprint, arXiv:1804.09269, 2018.