A Novel Mutual Fractional Grey Bernoulli Model With Differential Evolution Algorithm and Its Application in Education Investment Forecasting in China

Nowadays, the nonlinear grey Bernoulli model [NGBM(1,1)] has been successfully applied to various fields. Its main advantage is that the power exponent can better reflect the non-linear characteristics of the original data. However, the parameters of the model (i.e., the order of accumulation, coefficient of background value, and power index) must be optimized to fit the development law of the system. In this study, a fractional non-linear grey Bernoulli model [MFNGBM (1,1)] is proposed to reduce the perturbation limit of the classical NGBM and further improve the accuracy of the model, which uses mutual fractional operators and a new optimization scheme with a differential evolution (DE) algorithm for forecasting education investment. In the scheme, the power exponent of the Bernoulli differential equation, coefficient of background, and cumulative order of the original sequence are taken as decision variables, and their optimal parameters obtained by iteratively adjusting fitness functions. The experimental evaluation is conducted on two types of open-source data, and the results show that the proposed method can be very competitive with the popular baselines. Finally, MFNGBM(1,1) is used to predict China’s education investment in 2020–2025.


I. INTRODUCTION
With rapid economic development and an aging population, the cultivation and building of a talent reserve have become one of the development strategies of various countries. Achieving this goal largely relies on investment in education. Taking the allocation of educational resources in China as an example, as shown in Fig. 1, one can easily find that China's investment in educational resources is increasing year by year. Second, the distribution of educational resources in southeast China has been far higher than that in northwest China. To balance and promote the development of the education industry in the central and western regions, The associate editor coordinating the review of this manuscript and approving it for publication was Gianmaria Silvello . the government is trying to change this situation. Forecasting education investment funds has profound guiding significance when making corresponding policies for the government and provides corresponding strategies for the government to allocate educational resources, which have practical application values.
However, the studies of education investment forecasting have been ignored in recent years, and forecasting educational investment funds face two main challenges. First, although forecasting education investment may be related to diverse data, such as population statistics, economy, transportation, and area range, the access required to obtain these data is limited, and obtaining these data can be time consuming, labor intensive, and delayed. In other words, how to improve prediction accuracy on a limited small sample VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ dataset is required. Second, although education investment is sustainable, it may remain unstable because of government policy and natural disasters. Therefore, how to make predictions while reducing the randomness and uncertainty and requiring a smaller amount of operational data should be analysed. Grey-based approaches are suitable to address the above challenges. A grey-based model is suitable for uncertain systems with poor information, and it requires fewer operational data, so it can be used for education funds prediction. A grey forecasting method as a novel prediction method has been applied in many fields [1]- [6], and a number of attempts have been made to carry out the related Grey system studies. Many systems in the real world, such as energy, agriculture, and education, can be theoretically regarded as an energy system, and their characteristic values follow the Grey exponential law. Based on this consideration, a pioneering work [7] by Deng, was given for small-sample forecasting, which is a basic grey model called GM (1,1). The main idea of GM(1,1) is to use the solution of the differential equation to establish a time-response equation, and then discretize the differential equation to obtain the difference equation. The estimated parameters of the difference equation are brought into the time-response sequence for prediction. With this structure, GM(1,1) can effectively fit a time series with exponential growth. However, many actual systems are affected more or less by other relative factors, and their characteristic values never follow the grey exponential law completely [8]. To solve this problem, a non-homogeneous exponential law is given in literature [9] that achieves better performance. Xie et al. proposed a discrete gray prediction model that can directly derive the time-response equation through the difference equation and avoid the error caused by the transition from the differential equation to the difference equation [10].
Zeng et al. proposed a new multivariable grey prediction method by adding the dependent lag term, linear correction term, and random perturbation term to the traditional GM(1,N) model, which achieved good results [11]. Zeng et al. followed this study and continued to present a new grey Verhulst model by introducing a new nonhomogeneous exponential function [12]. To overcome the shortage of a static value and improve effectiveness, Xia et al. presented a real-time rolling grey prediction method that can provide efficient and accurate prediction of the health of machinery when considering and analyzing the influence of operational load and other factors [13]. This method greatly expands the range of using a grey prediction model and greatly improves applicability and effectiveness. Other important studies on improved grey models and applications have been reported [14]- [18].
Although the currently grey model has achieved success in many areas, it must be further improved. For example, the time-response function of many grey models is always a monotonic sequence. Therefore, these basic models are not applicable to raw data with strong volatility, and cannot fit the non-linear characteristics of real-world data. NGBM(1,1) is proposed to solve this problem [19]. Compared with the traditional grey model, the power index of the background value of the grey derivative is added to the grey interaction, where the power index is an uncertain parameter. Thus, as long as the power index and other structural parameters can be accurately found, the fluctuation characteristics of the data can be reflected.
Some studies have applied NGBM(1,1) in analyzing and forecasting problem. For example, Chen et al. used the non-linear grey Bernoulli model for forecasting unemployment rate and trading partners [20], [21]. Then, some methods were designed to optimize NGBM(1,1). Particle swarm optimization and genetic algorithms were used to optimize the power index in the model, achieving better results [22], [23]. Wang et al. [24] established a non-linear model to simultaneously optimize the background value and power index, further enhancing the model's predictive ability, and applied the optimization model to the prediction and early warning of the compliance rate of industrial wastewater discharge. In addition, several different methods for optimizing NGBM (1,1) have been proposed, such as Nash NGBM (1,1) [25] and NLS-based NGBM (1,1) [26].
From the above analysis, it can be seen that the non-linear Bernoulli model is very dependent on the choice of parameters. Moreover, the current non-linear Bernoulli models are mostly integer-order accumulation, which limits the scope of use for a certain extent. This is mainly because integer-order accumulation may impede the model's ability to achieve the best performance. Meanwhile, the structure parameters of the model should be dynamically fitted according to the characteristics of the data themselves.
Based on such considerations, a novel fractional Bernoulli prediction method, abbreviated MFNGBM(1,1), is proposed, which uses a new optimization scheme with a differential evolution (DE) algorithm to optimize the power index, order, and background value at the same time, further improving the prediction accuracy. Previous grey Bernoulli models did not consider optimizing power index, exponents, background value at the same time. Finally, MFNGBM(1,1) is established to predict China's education funds. The main contributions of this study are the following: • A new non-linear Grey Bernoulli model is presented using a mutual fractional operator, which can reduce the original data perturbation bounds.
• A framework to optimize MFNGBM(1,1) using a DE algorithm is proposed, which can optimize the power, coefficient of background value, and order of original sequence.
• According to the characteristics of the X sequence dataset and China's education investment funds data, the proposed new model, that is, MFNGBM, is more adaptive and intelligent based on the optimization mechanism of the DE algorithm, which achieves the best performance.
• According to the experimental analysis of China's education funds, the new MFNGBM model is more accurate than the traditional classical grey GM, NGM, DGM, and FGM models. Based on this, China's educational funds in 2020-2025 are forecasted. Based on the prediction results, providing some effective suggestions to balance education sources. The remainder of this paper is organized as follows: In Section 2, we describe the detail of the NGBM(1,1) and MFNGBM, and parameter optimization with DE algorithm. In Section 3, we verify the performance of the MFNGBM (1, 1) on two datasets and give a discussion. In Section 4, we apply MFNGBM (1, 1) to forecast education funding from 2020 to 2025 in China. In section 5, we conclude and give future work.

II. MFNGBM METHOD
In this section, we show the detail of our approach. MFNGBM is an extension of NGBM. Firstly, previous traditional NGBM is introduced. Secondly, Based on a study of NGBM, some related definitions, theorems, and proofs of our approach are given. To demonstrate the advantages of our method, two data sets are used to verify the proposed method. We simultaneously optimize the accumulation order, background value, power index of our model (i.e., MAPE as the objective function), and compare it with grey models including the classic NGBM (1,1). At the same time, the DE optimizer and the classic GA optimizer for the optimization are compared in this paper.
A. NGBM (1,1) Previous grey models almost were based on linear methods and theories. Deng first proposed a non-linear Bernoulli model to solve the problem of non-linear prediction under poor information small samples [1]. Assuming is the original sequence, and the corresponding first-order accumulation generation sequence is is called whitening equation of NGBM(1,1), and α represents the power of the model. If α = 1, the NGBM (1,1) will becomes the GM (1,1) model. We can get the discrete form of the model through the trapezoidal formula calculation where z (1) (k) indicates the background value and it is obtained as The model parameters can be estimated by the least squares as Therefore, the solution to Eq. (3) iŝ By first-order difference calculation ofx (1) (k), the prediction result can be obtained aŝ The definition of fractional accumulation is given. Definition 1 [27]:. Let , · · ·, x (0) (n) be a non-negative sequence. If the accumulated generating operator is applied r times on X (0) , r ∈ R + , we obtain where The matrix expression of Eq. (11) is . . .
The definition of fractional subtraction operator is given. Definition 2 [27]: Set The r-order cumulative generation and r-order decrementing generation have the following relationship, Definition 3: Establish a whitenization differential equation (WDE) for X (r) as follows Corresponding difference form of our model is In previous traditional Grey studies, the value of ζ was always set as 0.5 [28]- [30]. There is a certain source of error for background value calculation, which uses right-angle trapezoid formula instead of the curved-edge trapezoid. As shown in Fig.2, x (1) (k) is convex, concave and non-convex in each subinterval, respectively. The related definitions of estimation for the parameter are given.
Definition 4: Assuming that definition 1 and definition 2 exist, then the least-squares estimation matrix form of the parameters can be estimated aŝ when ζ = 0.5 then (21), as shown at the bottom of this page.
97842 VOLUME 8, 2020 For further derivation of Y , see Eq. (22), (23), as shown at the bottom of this page.

C. PARAMETER ESTIMATION BASED ON DE ALGORITHM
The power, order, and background value coefficients of the MFNGBM are not given by an exact value. In this paper, we can find the best parameters of MFNGBM by DE algorithm.

1) DIFFERENTIAL EVOLUTION ALGORITHM
DE algorithms, which were first introduced in [31], depend on a few input parameters and require a very modest computational compared with the most traditional genetic algorithm (GA). Let i = 1, 2, · · · , N , j = 1, 2, · · · , L, rand ∈ [0, 1], t is the current generation, j ∈ [1, L], and j r is an integer, f is a fitness function, the main steps of DE algorithm are described as follows: (1) Initialization Encode population in an interval of [U min , U max ] and the initialization is described as where X ij represents the j − th gene on i − th chromosome.
(2) Mutation Two different individuals in the population are randomly selected and the vector difference is scaled to make vector synthesis with the individual to be mutated.
where F is the individual variation rate, and X t rd represents the rd − th individual in the t − th generation population.
(3) Crossover If the random rate of individuals in the population is less than the crossover rate CR, perform crossover operation.
where CR is individual crossover rate.

(4) Selection
The greedy algorithm is used to select individuals to enter the next generation of population.

2) OPTIMIZATION OF THE HYPERPARAMETERS WITH DIFFERENTIAL EVOLUTION ALGORITHM
Three parameters are optimized: power exponent of Bernoulli differential equation α, coefficient of background value r and cumulative order of original sequence ζ . As explained in Eq. (37), MAPE (mean absolute percentage error) is chose as objective function. Hyperparameters corresponding to the minimum MAPE will be obtained after optimization process. Optimization process is shown in Fig.3. We construct the mathematical programming model for optimization, which is formulated as Eq. (23).
where α, r and ζ represent power exponent of Bernoulli differential equation, production coefficient of background and cumulative order of original sequence, respectively. x (0) (k i ) andx (0) (k i ) are ground truth and the predicted series, respectively. Parameters optimization steps of MFNGBM (1,1) are described as follows: Step 1: Calculate the order accumulating generation sequence of an original data sequence.
Step 2: If iterations of algorithm reach the limit, then the procedure goes to step 6, otherwise, it goes to step 3.
Step 3: Perform mutation on three individuals randomly according to Eq. (33).
Step 5: Construct a new population by selecting new individuals according to Eq. (36). If the algorithm satisfies the termination condition, it goes to step 6, if not then go to step 3.
Step 6: Output the optimal individual and get the optimal solution after optimization. Correspondingly, pseudo-code of the above optimization process is described in Algorithm 1.

Input:
The given dataset of historical sequence (x

III. SIMULATION EXPERIMENTS
In this section, two datasets are used to evaluate the proposed approach and a discussion is given. Through the above analysis, it indicates that MFNGBM (1,1) can set appropriate parameters according to the law of the data itself, and can fit different types of data to characterize the development rules of different systems. The performance of MFNGBM (1,1) is evaluated on two datasets, which is X sequence data [32], [33] and the education investment data in real-world come from National Bureau of Statistics of China. X data contains six type characteristics of time series, which is widely used for evaluating Grey methods. The experiment on Chinese investment funds dataset is to verify the performance of our method in real-world dataset.

A. DATASET AND SETTING
The MFNGBM is evaluated with the best performance on two datasets, containing X sequence data and Chinese education funds. The investment funds dataset can be obtained at http://www.stats.gov.cn. The X data contains six sequences, which contains the characteristics of rising convex, rising concave, falling convex, falling concave, strictly nonhomogeneous, and approximate non-homogeneous. These six type characteristics represent the diversity of X sequence. Chinese education funds are real-world dataset, which comes from the official website of the National Bureau of Statistics of China. All experiments in this paper were deployed on a PC with configurations as follows: Intel Core i7-2600k 3.40GHz (8 cores), 8GB of RAM, Windows 10 of Operating System. Matlab libraries and program are used to build our model. The max iteration, scaling factor, and crossover factor are set to 30, 0.5 and 0.8, respectively.

B. ASSESSMENT CRITERIA
MAPE is used to evaluate predicting performance of baselines and our approach, which is defined as follows: where x (0) (k) is ground truth,x (0) (k) is the predicted series. Lewis' [34] benchmark of accuracy evaluation is applied in this paper, which is listed in Table 1.  1, k, k). Six baselines and our approach are employed to predict on six sequences and their predictive errors are shown in Table 2. As seen in Table 2, the predictive accuracy of MFNGBM wins the smallest error compared with those of the classical models. The predictive accuracy of MFNGBM is significantly improved compared with all baselines on six different sequence data, which indicates that the proposed approach adapts to different conditions and shows the robustness of MFNGBM. Our approach achieves the best performance in X-sequence data prediction, which demonstrates the overall effectiveness of MFNGBM. Compared with NGBM (1,1) and MFNGBM (1,1) on X sequence data, MFNGBM (1,1) achieve lower error, which further demonstrates the extension version of NGBM (1,1) [i.e., MFNGBM (1,1)] is effective. It mainly because that classic grey Bernoulli model is integer order accumulation, which is not necessarily the best accumulation of the model. In addition, we compared the performance of MFNGBM optimization using GA and DE algorithms on the X sequence dataset in 10 times independently. As shown in Fig.4, compared with DE, the GA algorithm achieves the small errors (average of 10 optimization results) at the beginning of the iteration. And then, their errors meet at an uncertain iteration, after the encounter, the errors of the DE algorithm decrease rapidly with the number of iterations compared with GA algorithm. As seen from Fig.4 (a)-(e), the DE algorithm reaches the smallest prediction errors after 30 iterations compared with DA, which denotes the advantages of the DE.

2) RESULTS ON EDUCATION FUNDS IN CHINA
The above chapter shows that the MFNGBM optimized by DE is suitable for small sample modeling without additional data. We conduct the experimental to predict on education investment data in China. The first experimental setting is to show the stability of MFNGBM. We observe the convergence of MFNGBM under different population sizes. As shown in Fig.5 (a), MFNGBM is always convergent when the population size is set 30 to 90. Meanwhile, the predictive error is very close after the stable of the model.
We ran the MFNGBM model 14 times randomly as the second experiment, which finds that the running fitness curve of each MFNGBM is convergent after oscillating within a certain range, as depicted in Fig.5 (b). The first and   second experiment indicates the better stability performance of MFNGBM.
Next, to better understand the parameter change process, the correlation between power, order and background value coefficient is visualized under a 20 populaton size and a different iteration. The iteration of Fig.6 (a) to (c), (d) to (f), (g) to (i), and (j) to (l) are set as 1, 4, 10, 25, respectively. In Fig.6 (a) to (c), the MAPE is larger than others ( Fig. 6 (d) to (l)), which indicates that there are a large number of solutions in the solution space. With the number of iterations increases, the poorer solutions are gradually eliminated and the better solutions are in the solution space, which indicates a better model with a smaller error (depicted Fig. 6 (j) to (l)).
We used historical data from 2006 to 2015 to train the model. The education funds of 2016 and 2017 are used as tests. Fig. 7 shows the curve of raw data, the prediction of our approach and baselines. Predictive errors of China's   education investment funds are given in Table 3. From Table 3, MFNGBM (1, 1) yielded lower MAPE compared with the traditional DGM (1,1), DGM (2,1), GM (1,1), and NGBM (1,1). To make it more intuitive, Fig.8 illustrates overall prediction error of all approaches. According to Table 1, the forecasting accuracy of MFNGBM (1,1), DGM (2,1) and Accurate prediction of educational funds is an important theoretical basis for scientific decision-making and balanced education resources, which is related to the overall development of the educational cause [36]. The above sections have shown the advantages of our approach. Education funds from 2020 to 2025 are predicted using the proposed approach in China. Prediction results are shown in Fig.9 (a). It is a continuous growth between 2020 and 2025, which probably indicates the government has been devoted to investing more funds in developing education.
We can see a slight decrease in the rate of Chinese investment funds between 2020 to 2025 in Fig.9 (b). This may indicate that China's education infrastructure has gradually improved.

IV. CONCLUSION
Forecasting education investment funds is an important part of developing a good education system and growing the talent reserve. Scientific, reasonable, and accurate forecasting of China's education funds can provide effective suggestions to solve the balanced education investment problem. Based on the features of grey prediction theory and the fractional extension operator, a fractional non-linear grey Bernoulli model [MFNGBM(1,1)] is proposed in this study to reduce the perturbation limit of the classic NGBM model and further improve the accuracy of the model, which uses mutual fractional operators and a new optimization scheme with a differential evolution (DE) algorithm for forecasting education investment. Although the MFNGBM (1,1) model achieves excellent performance, there is some improvement work to be done in the future.
• The proposed approach is a relatively complex structure based on the mutual fractional order theory and the non-linear Bernoulli model, which may have some limits. In the next work, the MFNGBM should be simplified.
• Although we realize the optimization of parameters of our approach and obtain better performance, a more efficient optimization framework should be constructed in the future. Her research interests include software engineering, system optimization, big data processing, and mining.