A novel particulate matter 2.5 concentration prediction model based on double-layer decomposition and feedback of model learning effect

Accurate and effective particulate matter 2.5(PM2.5) concentration prediction can provide early warning information for decision-making departments, so as to take governance and preventive measures. A combined model based on double-layer decomposition(DLD) and feedback of model learning effect for PM 2.5 concentration prediction is proposed in this paper. Firstly, ensemble empirical mode decomposition (EEMD) and variational mode decomposition (VMD) were used for double-level decomposition for the PM2.5 concentration series, to reduce the nonstationarity and nonlinearity of the concentration series and improve the predictability; Secondly, a Wavelet neural network (WNN) prediction model based on the feedback of model learning effect is established for the subsequence obtained by double-layer decomposition. Finally, the prediction results of each subsequence are superimposed to obtain the final prediction results. The case study shows that the prediction model proposed in this paper is scientific.


A. Motivation
With the rapid development of industry, the problem of particulate matter pollution in the air has become a global problem. Compared with coarser atmospheric particulate matter, fine particulate matter (PM2.5) has a smaller particle size, a larger area, and strong activity. Substances (for example, heavy metals, microorganisms, etc.), and their long residence time in the atmosphere and long transportation distance, thus have a greater impact on human health and atmospheric environmental quality [1].
Therefore, If the concentration of PM2.5 can be accurately predicted, the air quality monitoring department can understand the future air conditions and change trends in advance, analyze the causes of changes, and take effective and feasible prevention and treatment measures. At the same time, the prediction information can also remind people to avoid going out when serious air pollution is about to occur or take some protective measures in advance to reduce the harm of pollutants to the body. Therefore, it is of great significance to carry out accurate and effective PM2.5 concentration prediction and early warning in areas with serious air pollution.

B. Related works about prediction models
Researchers can predict PM2.5 concentration prediction models into three categories: one is a prediction model based on the mechanism of PM2.5 occurrence, and this is called a mechanism-driven model. The second type of prediction model is based on the data-driven prediction model, which is called the data-driven model in this article. The third type is the combined prediction model. The combined prediction model has a combination of mechanism prediction and statistical prediction, as well as a combination of multiple data-driven prediction models. The mechanism prediction model is based on the study of the occurrence and development mechanism of PM2.5, combined with the process of chemical substance transfer, and the sources and changes of chemical substances. Combined with high-resolution aerosol products and weather research and prediction model meteorological field, a regional scale regression model is established to predict PM2.5 concentration [1]. Five three-dimensional chemical transport models were applied to simulate PM2.5 concentration in Central Europe [2]. The chemical transport model is used to predict PM10 in Beijing [3]. This kind of model effectively analyzes the movement of PM2.5 occurrence and development, Mechanism prediction modeling effectively analyzes the movement mechanism of PM2.5 occurrence and development, so that people can better understand the generation and development process of PM2.5. However, the modeling of the mechanism prediction model needs more analysis processes, the model is more complex, the calculation time is more time-consuming, and the simulation results are also vulnerable to the impact of pollutant emissions and selected variables [4].
The data-driven prediction model is built by analyzing the characteristics of PM2.5 concentration data. The data-driven prediction model is divided into statistical model and machine learning prediction model. The advantages of the statistical prediction method, such as simple modeling process and easy operation, make it applied to air pollution prediction. The common models of statistical prediction methods include the Markov model [5], the Lagrange coupling model [6], Arima [7,8]. The concentration series of PM2.5 has not only linear characteristics, but also some nonlinear and non-stationary characteristics.
Facing these problems, the machine learning prediction model has attracted people's attention. Through analyzing and training a large number of historical data with machine learning technology, the machine learning prediction method obtains the prediction model that can track the changing trend of the PM2.5 concentration series. The typical representatives include shallow and deep artificial neural network model (ANN) [9][10][11][12], support vector machine model(SVM) [13] and least squares support vector machine model(LSSVM) [14]. Among various machine learning models, WNN has the advantages of multi-resolution of wavelet transform and the self-learning ability of the neural network. However, it has not been found to be applied to PM2.5 concentration prediction.
There are mainly three ways to combine the combined forecasting model, one is to combine the machine model and the data-driven model [15]. The second is the combination of multiple models [16,17]. The third is the combination of data analysis and data modeling [18][19][20][21][22][23][24].

C. Related works about data preprocessing
One of the reasons for the high prediction accuracy of the third model is that the effective decomposition of PM2.5 concentration series data with a certain scale or fluctuation trend can reduce the nonstationarity of the signal and improve the predictability of the original series. Data processing methods can be divided into EMD is based methods [19][20][21]and VMD is based methods [22,23].
Among various decomposition methods, ensemble empirical mode decomposition is an effective improvement on the traditional empirical mode decomposition method, which solves the mode aliasing problem of the traditional empirical mode decomposition [19][20][21]. The subsequence obtained by EEMD decomposition performs well in the lowfrequency range, but the volatility in the high-frequency range is also too large, which is not conducive to prediction. VMD uses a non-recursive form to decompose the original sequence into a series of eigenmode functions [22][23][24] which has a better effect in restraining high-frequency fluctuations. However, it is relatively general in low-frequency areas. Therefore, this paper explores a two-layer decomposition method integrating the advantages of EEMD and VMD for data preprocessing.

D. Contributions of this study
Considering the above analysis, this paper proposes a new PM2.5 concentration prediction model based on double-layer decomposition and feedback of model learning effect of wavelet neural network (FMLE-WNN), Firstly, the wind speed series is processed by EEMD to obtain a series of subsequences. Secondly, based on the sample entropy theory, the subsequences whose sample entropy value is greater than the average value of all subsequences are superimposed to form a new sequence. The new sequence is decomposed by VMD to obtain a series of subsequences. Then, FMLE-WNN is used to model and predict the subsequence. Finally, all wavelet neural network prediction results are superimposed to obtain the final prediction result. Correspondingly, the main contributions of this study are listed as follows: 1)A data processing method of double-layer decomposition is proposed. Based on EEMD and VMD, EEMD is used to decompose the first layer data to obtain a series of subsequences, and then VMD is used to decompose the subsequences in which the sample entropy is higher than the average value. The double-layer decomposition is refined to mine PM2 5 sequence characteristics.
2) WNN is applied to the concentration prediction of PM2.5. WNN has the advantages of wavelet transform and artificial neural networks. It has a strong learning ability for sequence. However, it has not been found to be used in PM2 5 concentration prediction.
3) Model learning effect feedback is used to improve the prediction ability of WNN, and the model frame structure affects the prediction ability of WNN. This paper explores using model learning effect feedback to determine the network frame structure of WNN. 4)Compared with other models mentioned in Section 4, the model proposed in this paper achieves superior performance with relatively high accuracy The framework of the study is as follows. Section II introduces the Methodology of the proposed model. Section III describes the flow chart and the proposed model. Section IV presents the PM 2.5 concentration prediction experiments and the test results of the proposed model. In this section, two datasets are used for analysis and discussion. Section V concludes this paper.

A. Ensemble empirical mode decomposition(EEMD)
Empirical mode decomposition is essentially an adaptive signal screening method, which can screen out the trends of different features existing in the original sequence step by step to obtain the intrinsic mode function (IMF) with the same characteristics { ( )}, EMD steps are as follows [19] 3) Repeat step (1) and step (2) until the difference ℎ ( ) after times of filtering meets the inherent modal component condition, which is called an IMF, and is denoted as 1 ( ) = ℎ ( ).
4) According to Eq. 1, 1 ( ) is obtained from the signal ( ), and the remaining component 1 ( ) is obtained: (1) 5) Repeat the above steps for 1 ( ) to obtain the remaining IMF components, and terminate when the residual ( ) is a monotonic function.
Compared with the traditional wavelet decomposition algorithm, EMD reduces the influence of human factors on the decomposition results, and it has a certain degree of advancement. However, the above algorithm will appear modal aliasing in some cases. Set EEMD can effectively avoid this problem by using noise characteristics. The steps of set EEMD are as follows [25,26]: 1) Add the white noise sequence to the PM2.5 concentration sequence { ( )}; 2) Use EMD to decompose the PM2.5 concentration sequence after adding white noise into several intrinsic modal components ( ) and a residual component ( ); 3) Repeat steps 1 and 2 a total of s times, and the amplitude of the white noise sequence added each time is different; 4) Take the overall average of the IMF obtained by s − ℎ decomposition and use it as the final IMF component of the original signal; The white noise sequence added in the above steps should follow the normal distribution of (0, ( ) 2 ), where α is the noise intensity parameter and ε is the standard deviation of the signal. when r is 100 and α is selected from [0.1, 0.3], a better decomposition result can be obtained. Therefore, in this article, r is selected to be equal to 100, and α to be equal to 0.25.

B. Variational Mode Decomposition(VMD)
Variational mode decomposition is a completely nonrecursive variational mode decomposition model [27,28]. Its purpose is to decompose a signal ( ) into discrete patterns ( )( = 1,2,3, ⋯ ) . For each mode ( ) mainly surrounds a center frequency ( ) Decompose sequence ( ) into a series of finite bandwidth modal functions, the main process is as follows: 1) For each mode ( ), the Hilbert transform is used to calculate the correlation analysis signal to obtain the onesided spectrum.
2) For each mode ( ) , the spectrum of the mode is transferred to the baseband and an index is tuned to the respective estimated center frequency.
3) Estimate the bandwidth using the H 1 Gaussian smoothness of the demodulated signal. Then, the constraint variation problem can be expressed as follows: where ( ) indicates the k mode; ω(k) is the center frequency; is the total number of modes; ( ) is the Dirac distribution; ( ) is the original signal. In order to eliminate the constraints in Eq. (2), a quadratic penalty term and Lagrange multiplier are introduced to transform constrained optimization into unconstrained optimization: Where α is the penalty parameter; λ is the Lagrange multiplier; ⨂ is the convolution operator.
Therefore, the original minimization problem is transformed into an optional model, which can be solved by the alternating direction multiplier method (ADMM) method. According to ADMM, the update formulas of ( ) and ( ) are as follows.

C. Sample entropy (SE)
Sample Entropy (SE) [29] is similar in physical meaning to approximate entropy. Both measure the sequence complexity by measuring the probability of generating a new pattern in the signal. Compared with approximate entropy, sample entropy has two advantages: 1) Sample The calculation of entropy does not depend on the data length; 2) The sample entropy has a better consistency. The lower the sample entropy value, the higher the sequence self-similarity and the stronger the predictability; the larger the sample entropy value, the more complicated the sample sequence and the more difficult the prediction. Sample entropy is currently used to assess the complexity of physiological time series (EEG [30], SEMG [31], etc.) and to diagnose pathological conditions. The specific sample entropy calculation process can be found in the literature.

D. Error evaluation index
Scientific and effective error evaluation standards are of great significance for judging the performance of the model. However, at present, there is no uniform error evaluation standard for PM2.5 concentration prediction. This article selects the following two commonly used error standards to judge the model performance, including Mean absolute error (MAE) (μg.m-3) and Mean Square Percent Error (MSPE) (%). which are expressed as follows: Where ( ) is the predicted value, ( ) is the actual value.

E. Wavelet neural network(WNN)
Wavelet neural network is the combination of wavelet theory and artificial neural network. It has the advantage of multiresolution of wavelet transform and the self-learning ability of neural networks [32]. The basic idea is to use the wavelet basis function as the excitation function of the neuron to establish the relationship between wavelet transform and the neural network. Because the wavelet neural network inherits the characteristics of wavelet decomposition, it can learn in the loose area of sequence distribution at low resolution, and in the dense area of sequence distribution at high resolution. These characteristics are conducive to the neural network to "capture" the internal law between input and output data more easily. Therefore, the wavelet neural network is more intelligent, efficient than the traditional neural network.
Let ( ) ∈ 2 * ( ) be a wavelet basis function, and ( ) satisfy the admissibility condition: Where ( ) is the Fourier transform of ( ) . ( ) generates a set of wavelet basis functions after translation and expansion: Where is the scale factor,τ is the expansion factor. For signal ( ) ∈ 2 ( ), the wavelet transform is:

Figure 1 Topological structure of wavelet neural network
The topological structure of the three-layer wavelet neural network model is shown in Fig. 1.
In Fig. 1, ( = 1,2, ⋯ , )is the input sequence of WNN, is the output sequence, is the connection weight between the input layer and the hidden layer, is the connection weight between the hidden layer and the output layer. When the input signal sequence is ( = 1,2, ⋯ , ), the output calculation formula of the hidden layer is: Where ( ) is the output of the -th node of the hidden layer, ψ j is the wavelet basis function, a j is the expansion factor of ψ j , τ j is the translation factor of ψ j , and the calculation formula of the output layer is: The basic strategy of WNN network learning is to use the principle of error function minimization to constantly change the shape and scale of the wavelet base. The correction process of weight and wavelet basis function coefficient is as follows: 1) Calculation of WNN network fitting error = ( ) ∧ − (13) Where ( ) is the expected output and Y is the WNN fitting output.
2) Correction of wavelet neural network weight and wavelet basis function coefficient based on error e as follow: Where , +1 ， +1 ， +1 are calculated by the network prediction error e according to Eq. (15): Where η is the learning efficiency. The choice of learning step size η in WNN is very important. If η is too large, it will cause instability, but the convergence speed is fast; if η is small, the convergence speed is slow, but instability can be avoided. To overcome this contradiction. It can be improved by adding momentum terms. The WNN learning process formula of increasing momentum term can be modified as follows: The specific steps of the improved wavelet neural network algorithm prediction model are as follows: 1)Initialize the network. , and network connection weights , are initialized randomly. 2)Calculation of fitting error. Input the training samples into the network, calculate the prediction output of the network, and calculate the error between the output and the expected output.
3)Modify the weight and wavelet basis function coefficients. Modify the network weight and wavelet function parameters according to Eq. (15). 4)Determine whether the algorithm ends. When the absolute value of the error function is less than the preset error ξ or the maximum number of iterations is reached, it ends; no, return to step 2).
The performance of the wavelet neural network prediction model is also related to the selection of the wavelet basis function. There are many kinds of wavelet bases, among which the Morlet function has strong robustness and adaptability in regression prediction. In this paper, the Morlet function is used as a wavelet basis function, and its mathematical expression is as follows:

F. Feedback of the model learning effect (FMLE)
As a kind of neural network, WNN has the characteristics of a neural network. The framework of network structure has a great influence on the prediction performance of WNN. When the WNN dimension is too small, it can't perfectly mine the data features that affect the output layer. Too much dimension of the input layer will reduce the generalization ability of the model. The determination of the dimension of the input layer of the neuron is based on the reconstruction of the phase space and the ARIMA, ARMA, and other methods based on the box Jenkins theory. Both methods can effectively mine the data features, but it is not clear about the learning effect of the model on the data, which is similar to a person providing better food, but the absorption effect of food is not clear. The number of hidden layer neurons in the model will also affect the performance of the network to a large extent. If the number of hidden layer nodes is too small, the fault tolerance of the network will be reduced, and a complex mapping relationship will not be established, even the model will not be able to train the network and results. If too many hidden layer neurons are selected, the training error will be affected, and the learning time of the network will be delayed Long. In order to solve the problem of the input layer dimension and the number of hidden layer neurons, this paper proposes a neural network framework determination method based on the learning effect feedback of the WNN model. The flow chart is shown in Fig. 2, and the corresponding steps are as follows: 1. The first step is to determine the input dimension and the number of neuron nodes.
2. The second step is to train the WNN model of each combination and determine the learning effect of each combination.
3. The third step is to Evaluate the learning effect of the model. At the end of model training, the training data of one ten minutes in the sample is selected randomly to judge the learning fitting effect of the model. In this paper, MSPE is used to judge the learning effect of the model. 4. The fourth step is to output the best WNN framework combination.

III. EEMD-VMD-FMLE-WNN modeling process
PM2.5 concentration series is a typical non-linear and nonstationary series, and EEMD-VMD-FMLE-WNN prediction model modeling structure flow chart is shown in Fig. 3. The modeling process is as follows: Step 1: The original data sequence is acquired, and the training and testing data of the sequence are divided.
Step 2: PM2.5 time series is decomposed by EEMD and a series of subsequences are obtained.
Step 3: The SE of each subsequence is calculated, and get the average value D.
Step 4: If the SE of the subsequence is less than D, it will be left. All the subsequences whose SE is greater than D are superposed to form a new sequence.
Step 5: VMD is used to decompose the new sequence formed in the previous step, and the new subsequence formed after VMD decomposition is obtained.
Step 6: The subsequences left after EEMD decomposition and VMD decomposition are combined to form a new subsequence set.
Step 7: A FMLE-WNN prediction model based on model learning effect feedback is established for all subsequences.
Step 8: The prediction results of each subsequence are superposed to get the final prediction results.
Step 9: Analyze the error of the prediction results.

A. Data acquisition and double-layer decomposition
In order to verify the effectiveness and advanced nature of the model, two city PM2.5 data are selected for research and analysis. The two cities are located in Beijing and Guangzhou respectively. Beijing, located in North China, is the capital of the people's Republic of China, a megacity, a national political center, cultural center, international communication center, and scientific and technological innovation center. The climate of Beijing is a typical warm temperate semihumid continental monsoon climate, with high temperature and rain in summer, cold and dry in winter. Located in South China, Guangzhou is the capital of Guangdong Province and the political, military, economic, cultural, scientific, and educational center of South China. Guangzhou is the radiation center of Guangfu culture and the world-famous Oriental port city. Guangzhou is located on the subtropical coast. The Tropic of Cancer passes through the south-central part. It has a marine subtropical monsoon climate, characterized by warm and rainy, sufficient light and heat, long summer, and a short frost period.  Fig. 5 and Fig. 6. Calculate the SE value of each subsequence obtained by EEMD decomposition, and choose the embedding dimension = 2 when calculating the sample entropy value; similar tolerance = 0.25 * , where std represents the standard deviation of the calculated sequence. SE value calculation results are shown in Fig. 7 (Beijing) and Fig. 8 (Guangzhou).
As can be seen from Fig. 7, the SE values of the four sequences of IMF6-IMF9 in each IMF obtained from the decomposition of PM2.5 concentration sequences in Beijing by EEMD are greater than the average value D. Too large SE values indicate poor predictability. The four subsequences of IMF6-IMF9 are superimposed to obtain a new subsequence. VMD decomposition is used for the new sequence. Since the number of sequences before the merge is 4, the K value of the decomposition subsequence when setting VMD decomposition is 4, the new sequence and VMD decomposition The results are shown in Fig. 9. The four subsequences after decomposition replace IMF6-IMF9 before decomposition to form a subsequence set after secondary decomposition in Beijing. The SE values of the first five IMFs in the new sequence set have not changed. The comparison of SE values of the last four IMF6-IMF9 is shown in Fig. 10. It can be seen from Fig. 10 that the SE values of the three sequences of IMF7-IMF9 in each IMF obtained by the decomposition of EEMD in the PM2.5 concentration sequence of Guangzhou City are greater than the average value D. The three subsequences of IMF7-IMF9 are superimposed to obtain a new subsequence. VMD decomposition is used for the new sequence. Since the number of sequences before the merge is 3, the K value of the decomposition subsequence when setting VMD decomposition is 3, the new sequence and VMD decomposition The results are shown in Fig. 11. The four subsequences after the decomposition replace the IMF6-IMF9 before the decomposition to form a subsequence set after secondary decomposition in Guangzhou. The SE values of the first six IMFs in the new sequence set have not changed. The SE values of the last three IMFs are shown in Fig. 12.

B. Determination of FMLE-WNN prediction model framework
As can be seen from Fig. 10 and Fig. 12, after the second decomposition, the SE value in the combined new sequence set is reduced to some extent. To establish the WNN prediction model based on the feedback of the model learning effect mentioned in this article for the distribution of each subsequence, The model framework of WNN needs to be determined first. Based on the method of determining the WNN framework based on the model learning effect proposed in Section II of this article The sequence determines the model framework. Because of the characteristics of the data, this paper considers the selected input dimension as , and sets the range of p to 3 ≤ ≤ 8 . When the input dimension is p, the number of hidden layer nodes is q, q The range is ≤ ≤ 3 . All frame combinations are used to learn and fit the training data of each subsequence, and the optimal frame is output according to the learning results. The optimal WNN frame combination data of each subsequence in Beijing and Guangzhou are shown in Table I. After determining the optimal frame data of each subsequence, the

C. Prediction results
In order to verify the performance of the prediction model proposed in this paper, this paper uses the following three types of six models for comparative study. The first type of model is based on the Double-layer Decomposition (DLD) data proposed in the previous paper, one is based on the WNN proposed in this paper, which is based on the feedback of the model learning effect (FMLE). This paper is called the DLD-LFE-WNN model; The second is to build a prediction model based on PSO to optimize the performance of LSSVM(PSO-LSSVM), which is called the DLD-PSO-LSSVM model. The prediction results of the first type of model are shown in Fig.  13 (Beijing) and Fig. 14 (Guangzhou). The second type of model is based on VMD data processing. One is based on the WNN proposed in this paper, which is based on the feedback of the model learning effect. This is called the VMD-LFE-WNN model. The second model is based on PSO-LSSVM, which is called the VMD-PSO-LSSVM model. The prediction results of the second type of model are shown in Fig. 15(Beijing) and Fig. 16 (Guangzhou). The third type of model is based on EEMD processing data. One is based on the WNN proposed in this paper, which is based on the feedback of the model learning effect. This is called the EEMD-FMLE-WNN model; the second is based on PSO-LSSVM, which is called the EEMD-PSO-LSSVM model. The prediction results of the third type of model are shown in Fig. 17(Beijing) and Fig. 18 (Guangzhou). The prediction errors of all models are shown in Table II.

D. Discussion
It can be found from Table II that the prediction error of EM is less than that of other models, whether Beijing or Guangzhou, which shows the advanced and robustness of the model. The minimum Mae value of Beijing prediction error is 16.491ug/m3, and the minimum MSPE is 9.81%. The best prediction results of each model in Guangzhou are 6.0155ug / m3 and 2.11%. The minimum prediction error of the two cities is quite different, which is related to the fluctuation of PM2.5 concentration in the two cities. The fluctuation range of Beijing is large and the prediction is difficult, while the fluctuation range of Guangzhou is smaller and the prediction is more predictable. This shows that the predictability of the sample has a great influence on the prediction effect of the model. Comparing the prediction results of various methods, it is found that in terms of the data samples in this paper, no matter in Beijing or Guangzhou, the prediction effect of double-layer decomposition proposed in this paper is better than that of VMD and EEMD. This shows that data mining can improve predictability to a certain extent.
Comparing the prediction results of three kinds of data processing methods, it can be found that the prediction model proposed in this paper, in the 12 groups of error comparison between Beijing and Guangzhou, except for Beijing's The MAE of VMD-PSO-LSSVM model is better than that of VMD-FMLE-WNN model, and the prediction error of the other 11 groups is less than that of PSO-LSSVM, and the better prediction effect is 91.67%.
In general, the combination prediction model proposed in this paper has a better data processing effect on the one hand and better performance on the other hand. The combination of the two makes the prediction model have better prediction performance.

V. Conclusions
A novel PM2.5 concentration prediction model EEMD-VMD-FMLE-WNN is presented. For the first time, the denoising method FMLE-WNN is systematically introduced by the authors for the PM2.5 concentration prediction application. Moreover, promising feedback of model learning effect of WNN(FMLE-WNN), specialized in extracting timeseries information, is applied to forecast the future PM2.5 concentration series. With the combined advantages of the two promising methods, superior performance of the proposed combined model for PM2.5 concentration prediction with relatively high accuracy, compared with the other models mentioned herein.
According to the test results of the prediction by EEMD-VMD-FMLE-WNN, it can be summarized that 1) The proposed EEMD-VMD-FMLE-WNN model can evidently achieve a much higher forecasting accuracy compared with the one layer methods; 2) It is the FMLE-WNN, instead of the other forecasting methods mentioned herein, which has the better forecasting performance when it cooperates with WNN only. 3) The EEMD-VMD-FMLE-WNN owns the smaller volatility of results in the experiments, compared to the other forecast methods.
Based on the two different real PM2.5 concentration datasets, it reveals that the proposed combined model has superior performance, particularly in adaptability in practice.