Industrial Ultra-Short-Term Load Forecasting With Data Completion

Accurate and efficient ultra-short-term load forecasting is crucial for industrial power users to have stabilized and optimized operations. In this paper, we develop novel strategies for industrial power users to handle their challenges in ultra-short-term load forecasting. Firstly, this paper proposes a two-way Genetic Algorithm Back Propagation Neural Networks (GABPNN) missing data completion model to handle data loss, which is common power load data mining. A particle swarm optimization – supporting vector regression (PSO-SVR) algorithm is further used to integrate the two-way completion results with better accuracy. In addition, the paper introduces a combined ultra-short-term load forecasting model for industrial power users. The proposed model combines the Cubature Kalman filter (CKF) prediction model with good performance in nonlinear dynamic systems and the least square support vector machine (LS-SVM) prediction model with good performance in small-scale data prediction. The grey neural network is used to integrate the two algorithms, which further improves the accuracy of ultra-short-term load forecasting. Lastly, we test the proposed strategy in case study with real industrial data and demonstrate that the proposed model has a high degree of precision in load forecasting.


I. INTRODUCTION
As the foundation of national life, the stability of power system determines the progress of science and technology and economic development. Load forecasting has become an important work in smart grid. Ultra-short-term load forecasting is a kind of power load forecasting, which generally has strong real-time performance for load forecasting within one hour in the future. Its accuracy will also affect the accuracy of short-term and medium-term load forecasting. The higher the accuracy of ultra-short-term load forecasting, the more conducive to improving the utilization rate of power generation equipment and economic dispatch. Therefore, the study of ultra-short-term load forecasting has certain practical significance.
In the power system, the load data can directly reflect the user's power consumption status, and it can obtain a large amount of effective information through deep mining, The associate editor coordinating the review of this manuscript and approving it for publication was Fanbiao Li .
which plays an important role in power grid planning, system maintenance, and accident alarm [1]. Although power users have better updates on hardware devices, the integrity of load data transmission cannot be guaranteed. In the process of acquisition and transmission, it might be affected by problems such as network fluctuation or signal interference. This leads to incomplete data collection and error result in data analysis. Therefore, in order to avoid the impact of data loss, the completion of missing data should be conducted before data mining. As part of the power load data mining work, load forecasting can provide important data for the planning and operation of the power system, and it is also one of the signs of management modernization. Load forecast can be divided according to the length of forecasting time: ultra-short-term load forecasting, short-term load forecasting, medium-term load forecasting, and long-term load forecasting [2]- [4]. The ultra-short-term load forecasting mainly refers to the prediction of the load value of the next 10-15min to realize realtime control of the system. As the competition mechanism is gradually introduced into the electricity market, the accuracy of the load forecast will directly affect the power trading strategy of industrial users, which will further affect the economic benefits of power users. Therefore, ultra-short-term load forecasting is crucial for industrial power users to optimize their operations [5]. In the industrial field, the industrial load is different from other electric loads, and the external influences on temperature, population, income, holidays are small and are mainly determined by the industrial power user production plan [6]. Therefore, the law of industrial user load forecasting can only be judged and analyzed through the change law of the load data itself [7]. The ultra-shortterm prediction work for electric load has been carried out for a long time. Compared with classic prediction techniques, artificial intelligence technology has been widely applied due to its flexibility and accuracy. Particle swarm optimization is an interactive optimization algorithm and it can search very large space of candidate solutions. Hao et al proposed a chaos-optimized particle swarm least squares support vector machine (SVM) algorithm [9]. The method uses the global search ability of particle swarm optimization algorithm and the ergodic characteristics of a chaotic algorithm, which improves the selection parameter method and the local part of the particle swarm optimization algorithm. But the selection of the number of iterations and the determination of the initial value in the global search has not been effectively solved in the method. Li et al proposed an extreme learning machine prediction model based on particle swarm optimization (PSO) algorithm [10]. The model uses the particle swarm optimization algorithm to obtain the optimal input weight and hidden layer deviation in the extreme learning machine, so as to reduce the error of random parameters. In addition, a chaotic adaptive strategy is introduced to enhance the diversity of the particle swarm optimization algorithm and prevent the occurrence of local convergence and other conditions. However, this method only improves the parameter selection without optimizing the algorithm structure itself. Moreover, load forecasting using prior methods still has a local optimal solution and a single algorithm is not competent enough to handle complex processes.
Kalman filtering is a mature linear quadratic estimation algorithm and has been applied in various areas such as dynamic control and navigation. In recent years, the Kalman filtering algorithm is increasingly used in the field of load forecasting due to its strong real-time response and good adaptive ability [11], [12]. With the improved Kalman filter algorithm, the short-term load forecasting can achieve better performance in convergence speed and computational efficiency. However, for ultra-short-term load forecasting, the common load forecasting model is nonlinear. Kalman filter originated from the linear system, so discarding the parameters above the second-order term has a great influence on the final prediction result. Secondly, the Kalman filter has a certain hysteresis, which causes a large error in the prediction result, when the load curve has strong fluctuations. Emre Akarslan proposed a new short-term load forecasting method based on adaptive neuro-fuzzy inference system (ANFIS), however, it requires a large amount of historical data to analyze [13]. Therefore, it is not suited for the ultra-short-term load due to the lack of data and fast dynamics. Data especially for training data, if the production plan cannot be guaranteed to be regular, will lead to predictions interfered with by different production plans, resulting in increased error. Gritsay proposed a short-term power load forecasting method combining sinusoidal function and artificial neural network [14]. The artificial neural network is used to calculate the approximate coefficient to complete the load forecasting. The method is convenient and fast and is suitable for small loads. As for medium and long term-load forecasting, the time series in the forecasting process is relatively stable compared with the ultra-short-term, so the method of PSO-LSSVM is usually used [14].
In this paper, the neural network algorithm is used because the neural network is suitable for nonlinear, uncertain and complex systems, but the neural network will produce a local optimal solution in training. Document [45] proposes that the first step of model development is the basis of artificial neural network architecture, i.e. genetic algorithm optimization. BP neural network can adjust the weights of neurons through back propagation error [46]- [48], so BP neural network and genetic algorithm are combined, because genetic algorithm has better global searching ability.
Currently, the ultra-short-term power load forecasting of industrial users still faces following major challenges: • In the process of load data collection, there will be loss of data due to various issues (e.g. communication failure). What method should be used to compensate for missing data?
• The load forecasting performed by a single algorithm cannot meet the needs of actual work, and it is easy to fall into local optimum or forecast delay. So how to establish a combined model to achieve ultra-short-term load forecasting?
• What is the basis for selecting different techniques to construct a combined model? And how to achieve the fusion between different algorithms?
• Usually, the fitting method of the load curve model is the least squares method, and the method has the problem of low precision in practical application. What kind of method should be used to improve the load model modeling? Establishing a combined forecasting model to perform ultra-short-term load forecasting has become an active research field [15], [16]. The forecasting model based on a single algorithm has limitations to cope with the complex load changes, and the industrial load changes are less affected by external factors, which makes it more difficult to select a suitable dependent variable as the basis for load forecasting. So the combined forecasting model can realize the characteristics of complementary advantages between the different techniques. To handle data loss, the usual method is to implement the missing data completion by one-way prediction [18]. However, it has obvious limitations: if the VOLUME 8, 2020 missing point is at the highest point of the curve, then the predicted trend will continue to rise as the trend continues. If it appears at the lowest point, the prediction result will continue to decline. Therefore, it is necessary to determine the missing data from the trend on both sides of the data missing point. Based on the above ideas, four improvement measures are proposed in this paper: • An online completion method for power load missing data based on an improved two-way GABPNN neural network is proposed. The data training model complements the missing data points in two directions and introduces the PSO-SVR algorithm to achieve the fusion of the calculation results in two directions, replacing the traditional weighted summation method. The model also improves the selection of genetic algorithm mutation rate and retains the population with high fitness degree to a greater extent.
• The clonal immune genetic algorithm is used to estimate the state equation of the Cubature Kalman filter, which replaces the traditional fitting methods such as least squares used in the past. Also, the problems of poor memory ability, high randomness of operation and difficulty in controlling population diversity are improved.
• A method for predicting the combined model of Cubature Kalman filter and least squares support vector machine is proposed. Realize the combination of vertical and horizontal prediction on the data structure. It has the advantage of the generalization ability of the least squares support vector machine and does not require a lot of training data. It also has the feature of a real-time update of the Cubature Kalman filter. Finally realizes the complementary advantages of different algorithms.
• The gray neural network is used to realize the fusion of the two algorithms. This combination method is greatly improved compared with the simple linear weighting in the past, also the accuracy of load prediction is further improved. In this paper, we will introduce the data completion algorithm based on two-way GABPNN algorithm in section 2. The GABPNN algorithm is used for two-way data completion, and the PSO algorithm is used for the fusion of two-way completion results. In section 3, we propose an ultrashort-term load forecasting algorithm based on the combined model. LS-SVM algorithm and CKF algorithm are proposed for load prediction respectively. The gray neural network is used to achieve the fusion of the results of the two algorithms. In section 4, we conduct case studies using real industrial data and demonstrate the performance of proposed strategy. Lastly, we will discuss conclusions and future directions for this work.

II. LOAD DATA COMPLETION MODEL BASED ON IMPROVED TWO-WAY GABPNN
In order to solve the problem of data loss in power system load forecasting, this paper avoids the possible deviations in the data mining process such as ultra-short-term load forecasting due to the problem of load data integrity and validity [17]. An online completion method for power load missing data based on improved two-way GABPNN neural network is proposed. Compared with the one-way data completion method [18], the proposed method not only uses the load data before the missing point to train the model but also uses the data after missing point. The data training model complements the missing data points in two directions and introduces the PSO-SVR algorithm to achieve the fusion of the calculation results in two directions, replacing the traditional weighted summation method, so that the data change trend after the deletion is also considered to ensure the accuracy of data completion. The method can effectively conduct the completion of data missing, ensure the input of system data and the training of the model, and has the ability of realtime completion under the condition of ensuring the accuracy of data completion. Two-way GABPNN neural network data complement structure is shown in Fig.1.

A. BACK PROPAGATION NEURAL NETWORK
Back Propagation Neural Networks (BPNN) was proposed by Rumelhart et al. in 1986 as a neural network that continuously trains itself based on error backpropagation [19]. The network is suitable for all kinds of complex non-linear problem, the power load data to be completed in this paper is nonlinearly arranged. Therefore, BP neural network is chosen as the basis of the algorithm, and improved BP neural network training flow chart is shown in Fig.2.
The core algorithm of the BP neural network is the gradient descent method, the error between the actual predicted output value and the expected output value of the network is taken as the target. The error of repeatedly adjusting the network connection weight is gradually reduced. Finally, the regional convergence can be considered as training completion. The study of the network mainly includes two parts: one is the forward propagation of the network input information, and the other is the back propagation of the training data output error information. BP neural network also has some shortcomings. It is easy to fall into the local optimum or the error result does not converge during the training process. Moreover, the problem of slow calculation speed and low efficiency will lead to the unsatisfactory final training result, even falling into the infinite loop. In order to improve these problems, this paper chooses a genetic algorithm combined with BP neural network to complete the missing load data.

B. IMPROVED GENETIC ALGORITHM
Because BP neural network is prone to fall into local optimum and the error result is not convergent during the training process, this paper will optimize the range of initial weights of BP neural network and improve the prediction by narrowing the value range of weights to improves the accuracy, and proposes a combination of genetic algorithm and BP neural network [20]. The traditional genetic algorithm often chose a certain value for the selection of the mutation rate, and then repeatedly iteratively adjusting this value in reverse until the desired result is finally achieved, but this causes waste of computing resources, which also consumes a lot of time to complete the process, so in order to improve the efficiency of the genetic algorithm and effectively retain the population. Herein, the adaptive mutation rate is used to improve the algorithm. The specific content includes: for individuals with higher fitness than the average fitness value of the population, use a small mutation rate to maximize the retention of good individuals and avoid excellent individual transformed into a bad individual; for individuals whose fitness is lower than the average fitness of the population, a larger mutation rate is adopted to minimize the number of bad individuals and increase the number of samples of bad individuals into excellent individuals. The specific formula is as follows [21]: In the equation, P m1 is 0.1and P m2 is 0.001, f max is maximum fitness value in the population, f avg is average finess value

C. SUPPORTING VECTOR REGRESSION-ALGORITHM BASED ON PARTICLE SWARM OPTIMIZATION
The combination of different algorithms in the combined model has always been accompanied with new problems. The traditional way is to use statistical methods, simple weighted summation or averaging, etc., and these methods have several disadvantages. It has poor adaptability to nonlinear problems. The statistical method is too complicated, the number of data samples is too high, and the amount of calculation is large; simple weighted summation does not have a good method for calculating the weights, and the selected weights are not updated, so the weights are impossible to apply to any set of data; Similarly, the variables predicted by the regression method are continuous and linear in nature, while the power system load prediction is a non-linear problem with large and complicated computation, so the linear method used by the regression method will generally produce large errors. The method of averaging is theoretically unsupported, and the calculation results have large errors. In order to effectively improve the problems of the above methods, this paper adopts a supporting vector regression (SVR) algorithm based on particle swarm optimization [22]. The advantage of the SVR algorithm lies in the fast operation speed. It has a strong generalization ability for data and has good performance when dealing with non-linear array data. In addition, a particle swarm optimization algorithm is added to optimize parameter selection, which further improves the fitting of the algorithm. The effect has a large increase in the prediction results [23]. In the previous work, PSO and reinforcement learning were adopted to optimize the SVM.

D. TWO-WAY GABPNN NEURAL NETWORK MODEL
In the actual situation, there is a certain problem in the data completion in a single direction, because there is certain feasibility that the load value before the missing point is the peak value or the valley value in a period of time, so that in the process of data completion. There will be a trend before the predicted value continues. If the value before the missing point is the peak value, the complete result will continue to increase, but the actual situation will be less than the completion value. Similarly, if the value before the missing point is the valley value, the complete result will continue to decrease, but the actual situation will be greater than the completion value. In order to solve this problem, a two-way GABPNN algorithm model is designed in this paper. After the completion of the forward data completion, the small amount of load data after the missing points is fully utilized for the reverse missing data completion, which is combined with the positive and negative directions. The flow chart of the two-way GABPNN neural network is shown in Fig.3 Assume that in the 3-layer BP neural network with 3 inputs and 1 output, the number of hidden layer neurons is N, and the transfer function uses a hyperbolic tangent S function (tansig) and a linear function (purelin). It can be determined that the input and output relationship of the neural network is VOLUME 8, 2020 as follows [24]: In the formula, ω ij represents the connection weight of the input layer to the hidden layer, ω ki represents the connection weight of the hidden layer to the output layer, b (1) i represents the threshold of the hidden layer, b (2) k represents the threshold of the output layer. In order to ensure that the error between the final output layer data and the expected data is as small as possible, the following equation is selected as the fitness function in the genetic algorithm.
In the above formula, K represents the number of neurons in the output layer, M represents the total number of samples, y_pi represents the expected output of the network, and y_oi represents the actual output of the network [25]. Through the increase of the number of iterations of the genetic algorithm, the value of the fitness function is gradually reduced, the error between the output value and the expected value is further reduced, finally, the network further converges to obtain the output weight.
The specific steps of the model are as follows: • Two sets of GABPNN neural network models are established. The first group uses forward-sorted load data to train the model. The main purpose of this step is to mine the previous trend information of the missing load data points and obtain the basis of the forward GABPNN neural network. Then, the second group uses the reversesorted load data to train the model. This step can mine the data change trend after the missing point of the load data so that the result of the complementary load data based on the reverse GABPNN neural network can be obtained. This can calculate two sets of results.
• The PSO-SVR model is mainly used for the weight distribution of the results of the forward GABPNN neural network model and the inverse GABPNN neural network. The function of the model is to combine the above two combined models to influence the forward data and the reverse data. Factors are taken into account to achieve a two-way combination of data structures. The general combination model usually adopts a simple linear weighting or averaging method when assigning weights. This simple processing method makes the weight distribution and calculation methods unreasonable, resulting in inaccurate prediction results.
In the PSO-SVR model, the input data is the positive GABPNN neural network load data completion result and the reverse GABPNN neural network load data completion result, and the output data is the actual load data value, and the optimal weight is calculated by the PSO-SVR model. Suppose the i th example in the particle swarm is represented as an n-dimensional vector in the D-dimensional , Determine the single iteration displacement of particles during the search. The optimal particle in the current particle swarm is represented as p i = (p i1 , p i2 . . . , p il . . . p iD ), The global optimal particle is expressed as g i = (g i1 , g i2 . . . , g il . . . g iD ) .
During the iterative process, the particle updates its speed and position through the optimal solution and the global optimal solution in the current population, as shown in the equation [26]: In present work, the local search ability parameter c 1 = 1.5, the global search ability parameter c 2 = 1.7; the maximum evolution quantity of the population maxgen = 200; the maximum number of population sizepop = 20. The kernel function parameter g ranges from 0.1-100, the penalty factor C The value ranges from 0.1 to 100. r 1 , r 2 is a random number between (0, 1).
Set the training set T: In this equation, x l1 represents positive completion result and x l2 represents reverse completion result,y l represents actual data. Looking for a map φ(·), so that low-dimensional spatial nonlinear problems can be mapped into highdimensional space. The training set can be expressed as  VOLUME 8, 2020 In this equation, represent insensitive loss factor. The duality can be obtained by using the dual principle and the Lagrangian multiplier algorithm [22]: Using the above two equations, the values of a i and a * i can be solved, according to the KKT condition [23]: By this equation can obtain the value of the parameter b, and the resulting optimal regression hyperplane is [23]: In this paper, the radial basis function is used as the kernel function, and the final equation is: To deal with the loss of load data caused by signal interference, network fluctuations, etc. during the power load collection process, an improved bidirectional GABPNN neural network model is proposed. Compared with the traditional one-way completion model, the two-way model complements the data from both sides of the data missing point, taking into account the changes before and after the data missing point, avoiding the situation that the data change information is insufficient in the single direction prediction; The model improves the selection of partial mutation rate of genetic algorithm, and adopts low variability rate for populations with high fitness, and high variability rate for populations with low fitness, and produces more highly adaptive populations as much as possible. In this paper, the PSO-SVR algorithm is used to fuse the data of two-way completion, which replaces the commonly used weighted average method to further improve the accuracy.

III. COMBINED MODELS FOR ULTRA-SHORT-TERM LOAD FORECASTING
For the industrial sector, the load impact of different secondary industries is less affected by external factors such as temperature, population, income, and holidays. It is mainly determined by the production plan of industrial power users. In order to realize its work of ultra-short-term load forecasting, this paper proposes an ultra-short-term load forecasting model based on machine learning. The least squares support vector machine and Cubature Kalman filter is used to complete the lateral and vertical prediction of the electrical load. And through the gray neural network to achieve the combination of two algorithms. Because the load model needs to be established as the state equation of Cubature Kalman filter in the combined model, this paper selects the clone immune algorithm to realize the load modeling, which replaces the more commonly used least squares method to improve the modeling accuracy.

A. ULTRA-SHORT-TERM LOAD FORECASTING WITH LEAST SQUARES SUPPORT VECTOR MACHINE
The least squares support vector machine is one of the support vector machines [27]. The model contains fewer parameters to be selected, so it does not require a large amount of training data to adjust parameters during prediction. The constraint is used instead of the inequality constraint, which further reduces the uncertainty of model. Assume training data set {(x 1 , y 1 ), (x 2 , y 2 ) . . . (x n , y n )}, x represents input data and y represents output data [16]. The basic idea of the least squares supports vector machine regression theory is to find the nonlinear function to complete the mapping of input data to output data, and then map the input data to a high-dimensional space, and complete the linear regression of the estimated function in this high-dimensional space. Since the least squares support vector machine converts the nonlinear equations into linear equations, the calculation will be much smaller than the support vector machine, and the calculation speed will also be improved, but the error in accuracy is slightly larger than that of the support vector machine, also the data processing ability is weak. This paper will choose other algorithms combined with LS-SVM to complete the prediction work [28]- [30].
In the combined model, the least squares support vector machine prediction model is mainly used to find the load value at a certain point in time, which is affected by the load change at the same time but different days. In the process of actual industrial production, the production manager's production plan in the short-term cycle will affect the daily load change. The purpose of the model is to find the load change trend at different times and the next load change influences. When using the model for load forecasting, the N pieces of data in sequence are taken as a group, wherein the input data is the first N-1 load values, and the output is the last load value, which is analyzed from the load data training structure, and the minimum. The two-squares support vector machine model prediction is a vertical prediction.

B. ULTRA-SHORT-TERM LOAD FORECASTING WITH CUBATURE KALMAN FILTER 1) TIME SERIES ANALYSIS MODEL
Load prediction based on the Cubature Kalman filter model, the system state equation should be first established. Herein, time series analysis is used to establish this equation [31]. The time series analysis model mainly includes three models: VOLUME 8, 2020 Auto-Regressive Model (AR), Moving Average Model (MA) and Auto Regressive Moving Average Model (ARMA). Among them, the autoregressive moving average model is widely used, and the ARMA(p,q) model expression is [32]:

θ m a t−m
Time series modeling generally consists of two parts: model ordering and model determination. In the process of power load modeling, the third-order equations have already met the model's requirements for the order, but there are many different methods in terms of the selections of model parameters. The traditional parameter estimation methods mainly include the least squares method and the maximum likelihood estimation method. The least square method is essentially a parameter estimation method, which can be used for parameter estimation of linear regression models as well as nonlinear regression models. (such as the curve model in the parameter estimation), but the calculation concept is more basic, it is difficult to deal with the complex data changes that may be encountered in machine learning; the maximum likelihood estimation is used in many practical situations, the current sample is obtained. The data is not necessarily the most probable group in the real model. Based on the similarity with the law of large numbers, this assumption will only hold if the number of samples is large. When the number of samples is small, the assumption that the current sample has the highest probability is not true. The opportunity is large and the requirements of the data sample are the limitations of the maximum likelihood estimate. In order to solve the above problems of parameter estimation, this paper uses the clone immune algorithm to estimate the parameters of the equation [33]. This method is based on the idea of a genetic algorithm and increases the diversity of the population based on the ideas of crossover and mutation in the genetic algorithm, and is characterized by different clone numbers for individuals with different fitness [16], [27]- [34]. And the flow chart of the clone immune algorithm is shown in Fig.4.

2) CUBATURE KALMAN FILTERS
Cubature Kalman filter (CKF) is a nonlinear Gaussian filtering method based on the third-order integral principle [36]. The system selects 2n equal weight volume points for nonlinear transfer, and then the transferred volume points are weighted and summed to approximate the covariance of the system. The equation of state and the measurement equation can be set to the following form [37]: In the equation, k represents discrete time; x k represents the n-dimensional state vector of the system at time k; z k represents the m-dimensional observation signal of the corresponding state; w k represents the white noise sequence of the state transition equation, v k represents the observed noise; f (·) represents the state equation function, and h(·) represents observation equation function. The reason why the Cubature Kalman filter is selected as the combined model is that the model has good real-time performance and can effectively correct the prediction result through the latest data so that the combined prediction model will make adjustments when special circumstances occur.
The Cubature Kalman filter prediction model is mainly used to find the load value at a certain time [38], which is affected by the load change on the same day but at different times. In the process of actual industrial production, production managers are formulating short-and medium-term production plans, so there will be different situations every day. On the same day, the load will be high or low affected by certain uncontrollable factors, the purpose of the model is to find the load change trend at the same time on the same day, and the impact on the subsequent load changes. The model generally uses the third-order equation as the state equation of the Cubature Kalman filter. The clonal immune genetic algorithm is used to fit the one-dimensional data to realize the parameter estimation of the equation. When the model is used for load forecasting, the input data collected at the first three points of load data. So Cubature Kalman filter prediction model has better real-time performance, and the parameters can also be updated in real time, which is more convenient to reflect the influence of load change factors on the predicted value. From the structural analysis of the load data training, the prediction of the Cubature Kalman filter model belongs to the lateral prediction.

C. GREY NEURAL NETWORK MODEL
Grey neural network is a combination of grey system theory model and neural network. Traditional neural network is not suitable for data with small amount and incomplete data, and grey model can just make up for these deficiencies. The grey neural network model can not only solve the problem of less and incomplete data, but also has the characteristics of selfadaptability and self-study.
In the process of establishing a traditional combination model, a commonly used algorithm fusion method is the weighted average method, but it is difficult to achieve good results in practical applications. Too simple processing methods will cause bias to some extent, and it is difficult to implement different algorithms. The advantages are complementary. In this paper, the ultra-short-term load forecasting will use the gray neural network to realize the fusion of the least squares support vector machine model and the Cubature Kalman filter model, and realize the combination of the horizontal prediction and the longitudinal prediction of the prediction model, and further improve the prediction through more reasonable weight distribution.
The gray neural network learning process consists of five parts: • Initialize the network structure and parameters a,b according to training data characteristics and calculate µ; • Initialize the network structure and parameters a,b according to training data characteristics and calculate µ; • Calculate network weights ω 11 , ω 21 , ω 22 , ω 31 . . .; • Input parameters and calculation for each layer of the neural network, as shown in follow equation [39], [40]: • Calculate the prediction error and adjust the weight according to the error; • Determine whether the training is over, and if not, return to the third step; The gray neural network model is mainly used to calculate the weight distribution of the Cubature Kalman filter prediction model and the least squares support vector machine prediction model. The function of the model is to combine the above two combination models to make the same day different time and different days at the same time. The influencing factors are taken into account to achieve a combination of horizontal and vertical prediction of the data structure. The general combination model usually adopts a simple linear weighting or averaging method when assigning weights. This simple processing method makes the weight distribution and calculation methods unreasonable, resulting in inaccurate prediction results. In the gray neural network model, the input data is the prediction result of the least squares support vector machine prediction model and the prediction result of the Cubature Kalman filter prediction model. The output data is the actual load data value, which calculated by the gray neural network and use optimal weights for prediction.

D. COMBINED FORECASTING MODEL FOR OPTIMIZATION WEIGHT
Ultra-short-term load forecasting based on intelligent algorithms has developed rapidly in recent years. Unlike the past, algorithm research ideas have not only improved single algorithms, but gradually changed to use different algorithms to improve prediction accuracy, by using the different advantages of the algorithm to learn from each other, to avoid problems such as local optimization, lengthy steps, and complicated calculations. In this paper, a set of least squares support vector machine and Cubature Kalman filter combination model based on optimized weight distribution is designed. It is different from the traditional linear weighted summation method. At the same time, the gray neural network model is adopted and two algorithms are input. As the training data, the actual data is used as the output data, and finally, the reasonable weight distribution of the two algorithms is realized, which can effectively improve the prediction accuracy and is more suitable for practical applications. The model flow chart is shown in Fig.5.
The combined model forecasting workflow includes the following steps: • The load data of a single transformer of a power user is extracted from the database. The data of the group is a one-dimensional sequence of time series sorting. If the sampling frequency of the sensor is 15 minutes, then 96 data can be obtained in one day, and then the obtained data is processed, including the completion of missing data, the repair of abnormal data, etc.
• Modify the data format and arrange the formats of n · 96 and 1 · m respectively. Here, n represents the number of days of sampled data, m represents the total amount of data, and data of different formats are divided into training data and test data. Applied to different models.
• The least squares support vector machine model is trained by the load data of n * 96 format, and the predicted result is obtained by the test data. The Cubature Kalman filter model first uses the clone immune algorithm to complete the load data parameter estimation of 1 * m format, and then establish a corresponding prediction model to obtain the prediction result.
• The results predicted by the above two models are taken as the input actual load data as an output, and the gray neural network is input to complete the training of the weights, and after the model training is completed, VOLUME 8, 2020  the test data is input to obtain the final prediction result. In this paper, the gray neural network is a two-input-one output structure, as shown in Fig.6 [41]. The initial weight of the network can be expressed as follows [43]: The main content of this chapter is to design a set of ultrashort-term load forecasting based on a combined model for the characteristics of power load data. The least squares support vector machine model and the Cubature Kalman model algorithm are selected to realize the combination of lateral prediction and longitudinal prediction of load data. In addition, the gray neural network is used to combine the prediction results of the two algorithms, which replaces the traditional linear weighting method to further improve the prediction accuracy. This paper also selects the clone immune algorithm to realize the parameter estimation of the load model. Compared with the traditional mathematical fitting method, the modeling accuracy has been greatly improved, and the accuracy of the ultra-short-term load forecasting is also improved.
Herein, an improved Two-way GABPNN neural network model and a combined forecasting model are proposed. The main work is as follows: two groups of GABP neural network models are established. The first group is GABP data completion in the forward direction. The model is trained with load data sorted in the forward direction. The main purpose is to perform forward completion for mining the change trend information before load missing points. The second group of GABP is reverse GABP data completion, which uses reverse sorted load data for model training, so that the data after missing points can be mined for reverse GABP data completion.
Compared with the traditional one-way completion model, the two-way model complements the data from both sides of the data missing point, taking into account the changes before and after the data missing point, avoiding the situation that the data change information is insufficient in the single direction prediction; The model improves the selection of partial mutation rate of genetic algorithm, and adopts low variability rate for populations with high fitness, and high variability rate for populations with low fitness, and produces more highly adaptive populations as much as possible. And the PSO-SVR algorithm is used to fuse the data of two-way completion, which replaces the commonly used weighted average method, which further improves the completion accuracy. In combined model, the least squares support vector machine model and the Cubature Kalman model algorithm are selected to realize the combination of lateral prediction and longitudinal prediction of load data. In addition, the gray neural network is used to combine the prediction results of the two algorithms, which replaces the traditional linear weighting method to further improve the prediction accuracy.

IV. CASE STUDY
In this section, the actual load data will be used to test the validity of the established data completion model and the ultra-short-term load forecasting model. In addition, the appropriate evaluation criteria will be selected as the basis for effectiveness. Finally, the conclusions are drawn by comparison with other methods and methods proposed in this paper. The data in this paper is provided by Hangzhou Zhongheng Cloud Power Internet Technology Co., Ltd., and collects real historical load data of different industries in a certain region of Hangzhou, China.

A. MODEL ACCURACY EVALUATION INDEX
To verify whether the forecasting model is effective, it is necessary to use appropriate indicators for evaluation. This paper selects Mean Absolute Deviation, which is the average of the absolute values of the deviations of all individual observations from the arithmetic mean. Compared with the average error, the average absolute error can avoid the error because the positive and negative values cause the values to cancel each other out, so it can accurately reflect the actual prediction error, and it is the most widely used in the evaluation of the prediction model effect. The equation can be expressed as [44]: In the equation, x i represents the predicted value, X i represents the actual value, and N represents the amount of data.

B. MISSING DATA COMPLETION RESULT
In this paper, we select two set of load data from different power user for a week, a total of 672 samples of load data, and manually delete five groups of load data through manual deletion to generate missing points, then use BP neural network model and two-way GABPNN neural network model respectively to complete the data. The obtained results are compared with the actual values and the errors are calculated. Finally, the error rate is calculated to prove that the two-way GABPNN neural network has a good effect. The complete results of the two methods are shown in example 1 and example 2. Example 1 and example 2 are power users in manual product manufacturing user and electromechanical manufacturing user respectively. The model is validated by using load data from different industries, which proves that the model is suitable for various secondary industries in the industrial field.
Example 1: The experimental object of this group is a manual product manufacturing user. The completion results of the two methods are shown in Table 1 and Table 2.  The experimental object of this group is an electromechanical manufacturing user, while completion results of the two methods are shown in Table 3 and Table 4.  The average absolute error of the BP neural network model completion result is 9.84%, and the average absolute error of the two-way GABPNN neural network complementation result is 5.96%. The BP neural network error rate is concentrated in the interval of [−12.2, 11.9]. The maximum error is −12.2%. The error rate of the two-way GABPNN neural network model is in the range of [−8.7, 9.4], and the maximum error is 9.4% From these comparisons, it can be seen that the accuracy of load-complement completion based on a two-way GABPNN neural network is higher than simple BP neural network, and it has a better ability of univariate data missing completion in the load data processing. The solution to this problem is very helpful for the ultra-short-term load forecasting below. If the data is missing, the data completion work can be effectively completed and the prediction accuracy can be improved.

C. ULTRA-SHORT-TERM LOAD FORECAST RESULT
In order to verify the effect of the proposed combination forecasting method, the time series method commonly used for load forecasting is used to predict the load data of electromechanical manufacturing power users and manual product manufacturing power users, and then the combined forecasting method proposed in this paper is used to predict the same data. The final two sets of prediction results are compared with the actual data, and the prediction errors of the two methods are calculated, and the load prediction curve and the error curve are plotted. The error calculation method is shown in the formula in section 4.1. Therefore, it is judged whether the combined prediction method proposed in this paper is superior.
Example 1: The experimental object of this group is an electromechanical manufacturing user, while forecast results of the two methods and the error of them are shown in Fig.7 and Fig.8.  It can be seen from the results of the figure that the error value of the time series method is distributed in the interval of (−150, 150), and the distribution is relatively uniform, but the error is large. The average absolute error obtained by calculation is 15.81%. Also, it can be seen from the results on the graph that the error value of the combined prediction method is distributed in the (−40, 80) interval. Although the predicted value of a certain point shows a large deviation, the errors of the other points are in the (−40, 20) interval. Within the overall view, the prediction error is better than the time series method, and the calculated average absolute error is 6.91%.
Example 2: The experimental object of this group is a manual product manufacturing user. The forecast results of the two methods and the error of them are shown in Fig.9 and Fig.10.  It can be seen from the results of the figure that the error value of the time series method is distributed in the interval of (−150, 150), and the distribution is relatively uniform, but the error is large. The average absolute error obtained by calculation is 17.10%. Also, It can be seen from the results on the graph that the error value of the combined prediction model is distributed in the (−60, 80) interval. Within the overall view, the prediction error is better than the previous two models, and the calculated average absolute error is 9.18%.
The above example verification shows that the combined prediction model results are greatly improved compared with the traditional method. The combined model combines the features of the most generalized support vector machine with strong generalization and the real-time correction of the Cubature Kalman filter makes the error interval further reduced, which makes the error value distribution is more concentrated with 0 points. Also, the model can be applied to different industries. Therefore, the model has a certain universality. The user does not need to consider adjusting the model when using it. It only needs to model the load curve and can be put into use, which is more convenient and efficient.

V. CONCLUSION
Ultra-short-term load forecasting is a highly accurate and complicated task in power load forecasting, and it faces problems such as missing data and difficulty in modeling. In this paper, the ultra-short-term load forecasting is improved to solve these problems. Firstly, for the data loss situation that may exist in the load forecasting, the improved two-way GABPNN algorithm is used to complete the completion of the power load missing data. This method is completed from the two ends of the missing data to the missing point direction, avoiding the delay caused by the one-way completion and increasing the error, and using the missing load data to prove the experiment, the model can effectively improve the load data missing compensation. Secondly, Aiming at the problem that a single load forecasting model is easy to fall into a local optimal problem, in this paper, a combined model of least squares support vector machine and volume Kalman filter is usedand the weight distribution of the two algorithms is completed through gray neural network. Combining the characteristics of small squares support vector machine training data with strong generalization ability and real-time strong Cubature Kalman filter. Finally, through the example calculation, the prediction results are greatly improved. Although the method proposed in this paper has a certain improvement in the accuracy of ultra-short-term load forecasting, there is still room for further improvement. The next step will be to consider whether the load characteristic index calculation result can be used as the variable data to influence the load change and to find the appropriate load forecasting method by mining the daily load characteristic index change.