An Adaptive Interval Construction Based GRU Model for Short-Term Wind Speed Interval Prediction Using Two Phase Search Strategy

The application of wind power is greatly restricted due to the volatility and intermittency of wind. It is a challenging task to quantify the uncertainty of wind speed prediction. To tackle such a challenge, an adaptive interval construction-based gated recurrent unit (GRU) model is proposed for directly generating short-term wind speed prediction intervals in this article, using the two phase search strategy to search the model parameters. Different from the traditional interval prediction techniques, in the proposed model an adaptive interval construction method is designed, where the target values of wind speed are characterized by two interval width adjustment variables which are used to determine the lower and upper bounds of the interval of wind speed. A two phase search strategy is designed to optimize the parameters. In Phase I, the dynamic inertia weight particle swarm optimization algorithm is used to search the two interval width adjustment variables. In Phase II, the GRU networks are trained using the root mean square prop (RMSProp) algorithm (an effective gradient-based optimizer) to fit the upper and lower bounds of the constructed intervals, respectively. The two phases are executed alternately, so as to obtain optimal prediction intervals. The performance of the proposed method is compared with eight other machine learning and deep learning methods, and the experimental results show that the proposed method outperforms the compared methods. It indicates that the proposed method can generate satisfactory and better prediction intervals compared with other methods.

Fitness function of p j p j Best position of jth particle pω * j Parameters of GRU network corresponding to p j R 1 , R 2 Random numbers belonging to [0, 1] η Amplification factor U i ,L i Upper and lower bound of ith constructed auxiliary interval u i , l i jth upper bound, jth lower bound μ Nominal confidence level V j Velocity of jth particle ω * Parameters of GRU network w s , w e Initial inertia factor, the inertia factor when k = K y i jth wind speed value γ Two-value function depending on ICP and μ

I. INTRODUCTION
Wind energy, as an emerging clean energy source, has been playing ever-growing considerably important role in electric power industry. However, the application of wind power is greatly restricted due to the volatility and intermittency of wind [1], [2]. Wind speed forecasting for wind farm is an indispensable technique for the development of wind power, regulation of power system, and security and reliability of power grid. The prediction of wind speed can be divided into ultra-short-term (several seconds to several minutes in advance), short-term (several minutes to several hours in advance), medium-term (several days to several months in advance), and long-term forecasting (several months to several years in advance). Wind speed in a relatively short period of time affects wind power generation. Therefore, accurate shortterm wind speed prediction has important application value to solve the power grid fluctuation caused by wind instability [3], [4].
In recent years, point prediction methods for wind speed have been widely studied [5], [6], [7], [8]. In [9], Tian et al. proposed a effective wind speed combination forecasting model, which adopted the variational mode decomposition (VMD) algorithm to decompose the wind speed series, and used the echo state networks as the prediction models. The echo state networks were optimized by the improved whale optimization algorithm. In [10], an integrated multi-model fusion prediction model was proposed for multi-step wind speed forecasting by mixing empirical mode decomposition and local mean decomposition to process the wind speed, and the stochastic configuration network and the particle swarm optimization (PSO)-based support vector machine (SVM) were used to obtain the prediction results. Such methods produce deterministic prediction values of future wind speed. However, due to the random and fluctuating characteristics of wind, point prediction may not be accurate and cannot meet the needs of power grid dispatching [11]. Compared with point prediction methods, interval prediction methods can quantify the uncertainty of forecasting ability and provide more valuable information by generating the upper and lower bounds of future forecasting with a certain confidence level, and has been extensively investigated for wind speed forecasting [12], [13].
The mainstream interval prediction methods mainly include two classes: 1) indirect interval prediction methods and 2) direct interval prediction methods. The indirect interval prediction methods cannot produce the prediction intervals (PIs) directly, and usually realize the prediction tasks based on certain distribution assumptions, such as bootstrap methods [14], mean-variance estimation methods [15], Bayesian methods [16], kernel density estimation methods [17], and quantile regression methods [18]. The first three methods are based on prior assumptions to estimate the parameters of data distribution (mean and variance) so as to construct PIs. However, the bootstrap methods adopt the resampling technique and consume extensive memories. The mean-variance estimation methods generate the intervals with low coverage probability. The Bayesian methods usually require relatively higher computational cost and obtain lower-quality intervals [19]. Other two methods do not need the prior assumptions, but they depend on the performance of the point prediction models and have difficulty to match a appropriate data distribution [20].
Direct interval prediction methods can directly generate the upper and lower bounds of PIs by prediction models, and have no assumptions about the data distribution of wind speed. These methods usually adopt artificial neural networks (ANNs) with two outputs (lower and upper bound outputs) as the prediction model, and the similar framework is called lower upper bound estimation (LUBE) [21]. In this framework, in order to generate the prediction intervals, a few indexes, such as interval coverage probability (ICP) and interval width (IW), can be calculated to evaluate the quality of PIs.
A high-quality interval should have higher ICP and smaller IW. In addition, other indexes that integrate ICP and IW, such as coverage width criterion (CWC) and its variants are also developed to comprehensively evaluate the quality of intervals [22], [23]. These objective functions are nondifferentiable and highly complex, thus meta-heuristic algorithms, such as PSO [24], genetic algorithm [25], and differential evolution [26], can be employed to search the optimal parameters, which treat the prediction tasks as single-objective optimization (SOO) or multi-objective optimization (MOO) by considering the number of objective functions [27], [28], [29].
Based on LUBE framework, many artificial intelligence algorithms are applied to produce PIs, including least squares SVM [30], extreme learning machine (ELM) [31], and wavelet neural network [32]. In [30], Li and Jin constructed a MOO framework to produce the wind speed intervals by a least squares SVM-based mixture model. Mahmoud et al. [31] realized an advanced ELM algorithm to construct PIs with multiple confidence levels, and a more comprehensive objective function considering the sharpness and coverage errors was developed. In [32], a wavelet neural network was designed to build a MOO model to generate the wind speed intervals, and the preference-inspired coevolutionary algorithm using goal vectors algorithm was adopted to search the Pareto optimal solutions. Deep learning is developing rapidly and applied to various industries [33], where some techniques, such as temporal convolutional network (TCN) [34] and recurrent neural network (RNN) [35] are also employed for generating forecasting intervals of wind speed. In [34], Gan et al. developed a TCN method by constructing effective intervals, and the model was optimized via a designed gradient-based objective function. In [35], Shi et al. designed a new CWC index to realize the SOO task based on RNN and dragonfly algorithm. In addition, as variants of RNN, the long short-term memory (LSTM) network [36] and gated recurrent unit (GRU) network [37] have also been studied to forecast the intervals of wind speed and obtained more satisfied results. For example, Saeed et al. [36] built a bidirectional LSTM networks as the feature extractor and predictor to produce the PIs. Li et al. [37] adopted a presupposed intervals-based framework to forecast the intervals by applying GRU networks and error correction theory. The GRU network has fewer parameters than LSTM, thus its optimization speed is faster. Compared with the shallow-layer machine learning algorithms, deep learning techniques can usually obtain better prediction effect, but too many parameters are always disadvantageous to metaheuristic algorithms. Therefore, some specific cost functions were designed and gradient descend-based algorithms were advocated in [36] and [37] for wind speed interval prediction.
In order to enhance the quality of PIs, many hybrid models have also been extensively investigated in recent years. In [38], Zhang et al. presented a VMD-based hybrid framework to perform the point and interval prediction. The framework combined multiple basic models to train the subseries, and adopted the quantile regression averaging method to produce the uncertain intervals. In [39], Wang et al. utilized VMD and GRU networks to forecast the intervals of wind power, where an Adam optimization algorithm was adopted to fit the constructed intervals for each subseries. In [40], a multiple kernel robust ridge regression was applied to implement the interval prediction of wind speed and wind power by training the decomposed modes by VMD, and the Chaotic water cycle algorithm was designed to perform the MOO task. Hybrid models can generate well-quality PIs by making use of the virtues of multiple models compared with simple models, but the computational cost is also large, and their programming is more complex.
Although direct interval prediction methods have been widely applied in interval prediction, these methods have two shortcomings: 1) The objective functions are nondifferentiable and the label information is inadequate, meaning that the traditional gradient-based methods such as root mean square prop (RMSProp) may not work whereas meta-heuristic algorithms can be used to realize the optimization procedure.
2) The prediction models, such as ANN, LSTM, and GRU, often involve a great number of weight and bias parameters, and it is an arduous task to find satisfactory solutions using meta-heuristic algorithms. To achieve effective interval forecast for short-term wind speed, an adaptive interval construction-based GRU model (AIC-GRU) is proposed in this article, using the two phase search strategy to search the model parameters. In this framework, the upper and lower bounds of the interval are adaptively constructed by increasing the upper bound or decreasing the lower bound through the adjustment of two key variables (details are given in Section III-B). Once the upper and lower bounds of target intervals are presupposed, the label information can be directly utilized by a gradient descend-based optimization algorithm. Note that in traditional LUBE methods, evolutionary algorithms are usually adopted to solve the associated optimization problems (i.e., to optimize PIs based on nondifferentiable evaluation criterion). However, in the present study, due to the introduction of the two key variables, both evolutionary algorithms and gradient-based methods can be adopted in the LUBE methods. To optimize the proposed AIC-GRU model to obtain satisfactory forecasting intervals, a two phase search strategy is designed. In Phase I, the dynamic inertia weight PSO (DIWPSO) algorithm is adopted to search the appropriate interval width adjustment variables to adjust the auxiliary intervals. In Phase II, the GRU networks are trained using RMSProp algorithm to fit the presupposed upper and lower bounds of target intervals; this is a supervised learning problem. The RMSProp algorithm is an effective gradient descend algorithm, which can solve a problem that may not be well solved with the adaptive gradient algorithm due to the learning rate converging too fast. Additionally, the RMSProp algorithm can also well optimize the parameters of GRU networks. After the model optimization, the coverage probability and width of PIs are calculated to evaluate the quality of PIs. The two phases are executed alternately until a good optimization accuracy is reached.
Based on the above discussion, the novelty and main contributions of this work are summarized as follows:.
1) An adaptive interval construction-based GRU model (i.e., AIC-GRU) is proposed to directly generate forecasting intervals for short-term wind speed. 2) To form effective label information for upper and lower bounds of wind speed intervals, an adaptive auxiliary interval construction method is designed by introducing two interval width adjustment variables 3) A two phase search strategy is designed to optimize the interval forecasting model based on DIWPSO algorithm. 4) For generating the high-quality PIs, the multi-output GRU networks are established to conduct supervised learning using the constructed interval. The remaining of this article is arranged as follows. The definition of PIs for wind speed and the traditional LUBE framework are introduced in Section II. The proposed AIC-GRU model is presented in Section III. Section IV presents several numerical case studies to verify the effectiveness of the proposed method. The conclusion is written in Section V.

A. DEFINITION OF PIS FOR WIND SPEED
Given a sample set of wind speed time series, {y i }, (i = 1,2, …, n), the task of interval prediction is to find an interval that contains the upper bounds {u i } n i=1 and lower bounds {l i } n i=1 to cover the target outputs with certain confidence level (1-α), which is defined as: As mentioned in the previous section, the LUBE framework has been proposed and studied in recent years. Correspondingly, some cost functions that are suitable for interval analysis have also been developed, such as ICP, interval normalized average width (INAW), and CWC [21], [22], which are introduced below.

B. EVALUATION INDEXES OF PIS FOR WIND SPEED 1) INTERVAL COVERAGE PROBABILITY (ICP)
ICP is designed to calculate the probability that the PIs cover the target values; it is a key evaluation index which is used to measure the confidence level. Mathematically, ICP is defined as: where e i is a two-value function defined as below: For a specific individual value y i , if the interval covers the target value, e i is 1; otherwise, e i is 0. Under ideal conditions, the overall ICP can reach 100% if all the individual intervals cover all the corresponding target values.

2) INTERVAL NORMALIZED AVERAGE WIDTH (INAW)
Note that a higher ICP usually lead to wider PIs. However, too wide PIs may become less informative and less useful. Therefore, an index that can better measure the width of PIs, is proposed as follows: where R donates the difference of the maximum and minimum of target values. INAW measures the narrowness of the level of PIs. The narrower the PIs, the smaller the ICP may be, until the ICP approaches the specified confidence level.

3) COVERAGE WIDTH CRITERION (CWC)
CWC is designed as a comprehensive index for simultaneously evaluating the coverage probability and width of PIs by integrating ICP and INAW. The smaller the CWC value, the higher the quality of PIs. The equation is written as: where γ is fixed to 1 when training. When testing, γ is a twovalue function defined as below: when ICP is larger than μ, γ is 0; otherwise, γ is 1. The function of γ is as follows: when the coverage probability of PIs is larger than the confidence level μ, the quality of PIs mainly depends on the INAW; otherwise, CWC is determined by ICP and INAW concurrently. η and μ are used to constrain the optimization procedure, where η is usually set to 50, and μ represents the nominal confidence level (1-α). Using CWC as the objective function, the task is usually regard as a SOO process.

C. INTERVAL PREDICTION BASED ON LUBE FRAMEWORK
LUBE framework was designed by Khosravi et al. [21] to directly generate the PIs, and had no assumptions about the data distribution. In the LUBE framework, the simple ANN can be built as the interval predictor. The LUBE framework based on simple ANN is depicted in Fig. 1. It comprises a three-layer ANN model, including input layer, hidden layer, and output layer; the output layer has two nodes, representing the upper and lower bounds of PIs, respectively. From an optimization perspective, the interval prediction task based on the LUBE framework can be described as a constrained SOO process based on the ICP and INAW, where it requires that INAW should be as small as possible when ICP meets the nominal confidence level. The SOO task is defined as: where ω is the parameter vector of the prediction model, including weights and bias. These objective functions are nondifferentiable, thus gradient descend-based algorithms cannot be directly used to optimize the model. Instead, meta-heuristic algorithms are usually used in this LUBE framework. In this optimization process, ICP and INAW are calculated in each iteration until a satisfactory INAW is obtained.

III. PROPOSED AIC-GRU METHOD FOR SHORT-TERM WIND SPEED INTERVAL PREDICTION A. PROPOSED AIC-GRU METHOD FOR SHORT-TERM WIND SPEED INTERVAL PREDICTION
In the classic LUBE framework, most model parameters need to be adjusted by the meta-heuristic algorithms. In many real applications, the computational cost is enormous and the quality of the optimized parameters is unsatisfactory. For improving the quality of wind speed PIs, an adaptive interval construction-based GRU (i.e., AIC-GRU) interval prediction method is proposed in this article. This method consists of three parts: 1) the construction of the auxiliary intervals based on the interval width adjustment variables, 2) the design of the GRU-based interval prediction model, and 3) the two phase search strategy based on DIWPSO algorithm. Fig. 2 draws the basic frame of the proposed AIC-GRU method, and the details are given as follows.

B. CONSTRUCTION OF ADAPTIVE AUXILIARY INTERVALS
Different from the point prediction, where the prediction does not provide label information that is used to determine the upper and lower bounds, and therefore the gradient descendbased algorithms, such as RMSProp algorithm cannot be directly used to optimize the associated objective functions, in this article an adaptive auxiliary interval construction method is designed to adaptively obtain the presupposed upper and lower bounds. For the PIs, the ideal upper and lower bounds should be as close as possible to the target values y i , and cover all the target values. Therefore, the presupposed upper and lower bounds can be calculated by adding or subtracting a width adjustment variable for the target values. If the width adjustment variable is small enough, the specified intervals can be predicted by learning the presupposed auxiliary intervals. The interval construction process can be written as: whereÛ i andL i represent the upper and lower bounds of the constructed auxiliary intervals, respectively. B u and B l are the upper bound width adjustment variable and lower bound width adjustment variable, respectively.

C. GRU-BASED INTERVAL PREDICTION MODEL
The upper and lower bounds of the auxiliary intervals can be used as label information of predicted intervals, and then the GRU networks are built as the interval predictor to fit the upper and lower bounds of the auxiliary intervals by adopting RMSProp algorithm. GRU network is designed to capture the nonlinear relationship hidden in the time series. GRU network, with only two gates, i.e., update gate and reset gate, is easier and cheaper to implement and can obtain better results in certain cases than a LSTM network [41]. The basic structure of GRU network is shown in Fig. 3. For input x t at time t, the update gate z t extracts the information of hidden state h t−1 at time t-1 to update the hidden state h t . The reset gate controls the candidate stateh t by resetting h t−1 . The calculation process is described as follows: where W z , W r , and W˜h denote the weights of update gate, reset gate, and candidate state, respectively. In order to realize the interval prediction, the outputs of GRU networks are set to two units, representing the upper and lower bounds, respectively. The basic frame is shown in Fig. 4, which consists of three parts: 1) input layer, 2) GRU layer, and 3) fully connected layer. The ith input sample x i = (x i1 , x i2 , · · · , x it ) is transformed into an abstract features h t utilizing GRU layer, and then the fully connected layer   The optimization task is to minimize the difference of predicted intervals and auxiliary intervals by RMSProp algorithm. The objective function is defined as:

D. TWO PHASE SEARCH STRATEGY BASED ON DIWPSO ALGORITHM
To ensure that the prediction model generates the high-quality intervals, the constructed intervals should be as close as possible to the target values. That means the two interval width adjustment variables need to be optimized to narrow the auxiliary intervals. In doing so, a two phase search strategy based on DIWPSO algorithm is designed to optimize the two interval width adjustment variables and the parameters of GRU networks.

1) DIWPSO ALGORITHM
DIWPSO algorithm is developed on the basic of classic PSO algorithm, which is a meta-heuristic that mimics swarm behaviour of birds and fish flocks to conduct search for global optimal solutions. The position of each particle is updated iteratively to track the local optimal positions of its own and the global optimal solution of a whole swarm of all particles [42], [43]. For the jth particle, its velocity in kth iteration is , and the corresponding position is β k j = (β k j1 , β k j2 , · · · , β k jD ), where D represents the number of parameters to be optimized. The update rules are defined as follows: where a 1 and a 2 denote the acceleration coefficients, and R 1 and R 2 denote the random numbers belonging to [0, 1]. p j and g are the best position of jth particle and all the particles, respectively. w represents the inertia factor, and can be adjusted by linear decreasing method as follows: where w s denotes the initial inertia factor, K is the total iterations, and w e represents the inertia factor when k = K. The dynamic inertia factor can improve the search ability of DIWPSO algorithm.

2) OPTIMIZATION PROCESS OF THE PROPOSED TWO PHASE SEARCH STRATEGY BASED ON DIWPSO ALGORITHM
Let β = (B u , B l ), representing the two interval width adjustment variables searched by the DIWPSO algorithm. The position of jth particle can be set asβ j = (B ju , B jl ), and is updated using (12) and (13) in Phase I, and the corresponding PIs are generated using GRU networks trained by RMSProp algorithm in Phase II. Then the value of the fitness function is calculated to update p j and g. This two phases are performed alternately until the termination conditions are met. Based on the two phase search strategy, the optimization procedure can be regarded as a constrained SOO task: where ω * represents the weight and bias parameters of GRU networks. Note that the parameters of the GRU model are optimized by RMSProp algorithm, and the number of parameters to be optimized with DIWPSO algorithm in the proposed method is only two. It is much easier to find the optimal solution via the DIWPSO algorithm. Following the suggestion on dealing with constrained optimization problem in [44], a penalty is added to INAW if the constraint is violated. Therefore, (15) can be rewritten as: where PE is a penalty, and H (β, ω * ) represents the penalty coefficient defined as: The implementation procedure of the two phase search strategy is summarized below: Step 1: Data preprocessing and split. The wind speed data set is preprocessed and normalized into [0, 1]. Then the training and test sets are split.
Step 2: Initialize all the parameters of GRU networks and the DIWPSO algorithm, including ω * , β j , V j , p j , g, pFit j , gFit, pω * j and gω * , where pFit j and gFit represent the corresponding fitness functions of p j and g, respectively. pω * j and gω * are the parameters of GRU networks corresponding to p j and g, respectively.
Step 3: Preform the two phase search strategy.
Step 3.1: In Phase I, update β j and V j using (12) and (13), and obtain two new interval width adjustment variables.
Step 3.3: In Phase II, train the multi-output GRU networks to fit the upper and lower bounds of auxiliary intervals for jth particle based on the RMSProp algorithm, and obtain the PIs and the parameters of GRU networks ω * j .
Step 3.6: Steps 3.1-3.5 are preformed repeatedly for each particle until the termination conditions are satisfied. gω * are saved as the parameters of prediction model.
Step 4: Test the quality of PIs. The GRU networks are constructed using optimized gω * to predict the intervals in test set, and then the evaluation indexes are calculated for measuring the quality of PIs.
The pseudo code of the two phase search strategy are placed in Algorithm 1.

A. DATA PREPARATION
To verify the performance of the proposed AIC-GRU method, two wind speed data sets were collected from a real-world wind farm for the case studies, which are labeled as D1 and D2, respectively. The units of these wind speed data are m/s. The two data sets each have 2304 data points, and their resampling intervals are 10 minutes and 1 hour, respectively. For showing the differences between two data sets, the descriptive statistics, including mean value (MEAN), standard deviation (STDEV), minimum value (MIN), 25th percentiles (25%), 50th percentiles (50%), 75th percentiles (75%), and maximum value (MAX), is given in Table 1. In this case studies, for each of the two data sets, the first 2016 data points are used as the training set, and the last 288 data points are used as the test set. Based on the two data sets, the experiments of the single-step prediction and multi-step prediction for short-term wind speed interval prediction are conducted. All the experiments are performed in Anaconda3 (64-bit), locating a computer with Inter Core i5-9400F CPU running at 2.9GHz clock speed.

B. PARAMETER SETTINGS AND COMPARISON MODELS
For obtaining high-quality PIs, it is essential to select appropriate structure parameters of the GRU networks, including the number of input and hidden layer neurons (the number of the hidden layer is set to 1). Considering the computational cost, the number of input layer neurons is chosen from {1, 2, 3,

Algorithm 1: The Two Phase Search Strategy for the Proposed AIC-GRU Method
1: Initialize the parameters of GRU networks and DIWPSO algorithm. 2: for k in K: 3: w ← w s − (w s − w e ) · k K ; 4: for j in particles:

C. SINGLE-STEP PREDICTION VALIDATION
To test the performance of the proposed method, the singlestep prediction experiments based on the two data sets were performed to generate the prediction interval of the next moment.

1) EXPERIMENT SETTINGS
Following the introduction of the Section IV-B, the number of input and hidden layer neurons of the GRU networks needs to be selected. In this study, two groups of experiments based on the two data sets are respectively conducted to search the optimal values of the two parameters for the GRU networks. The comprehensive index, CWC, is calculated to evaluate the prediction performance. As an example, the results based on the data set D1 are shown in Fig. 5, from which it is known that CWC has smallest value when the number of input and hidden layer neurons are 6 and 12, respectively. Therefore, the two numbers are determined as the corresponding structure parameters, and the six historical wind speed observations are used as input features. Through repeated tests, the parameter values of the proposed model based on the two data sets (D1 and D2) are given in Table 2.
In addition, for evaluating the performance of the proposed method more fairly, the eight comparison methods, MBB, ANN-CWC, ANN-INAW, ELM, GRU, TCN, AIC-ELM, and  AIC-BP, are also used to generate PIs. The partial parameter values of these comparison methods are given in Table 3, and other parameters are the same as those of the proposed method. All the experiments were repeated 20 times; the means (MEAN) and standard deviations (STDEV) of the evaluation indexes, ICP, INAW, and CWC are given in Table 4. According to the three evaluation indexes, for a high-quality prediction interval, its ICP value should meet the nominal confidence level μ, and its INAW and CWC values should be as small as possible. It is noteworthy that following (4a) and (4b), when the ICP value is larger than the nominal confidence level μ, the values of INAW and CWC are equal. In Table 4, the training time and test time of each experiment are also provided. The values of three evaluation indexes based on the two data sets, D1 and D2, are drawn in Fig. 6(a) and (b), respectively.

2) RESULTS AND DISCUSSION
The following observations are made based on the experimental results: a) From However, the ICP value of the proposed method reaches 92.97%, which shows that the proposed method has better performance to meet the predetermined nominal confidence level μ. b) Compared with the MBB method, it is obvious that the proposed AIC-GRU method can obtain smaller INAW and CWC values. The MBB method calculates the mean and variance of wind speed series by repeated sampling. However, the method needs to assume that the data follow certain distribution, which is usually inaccurate for wind speed series. That is why the PIs of the MBB method are wider, although it has a high ICP value. Wider PIs tend to convey less information. c) Compared with the classic LUBE frame, the proposed AIC-GRU method can generate higher-quality PIs. From Table 4 and Fig. 6, the LUBE-based methods, ANN-CWC, ANN-INAW, ELM, GRU, and TCN, have larger INAW values, i.e., they generate wider PIs. Therefore, the CWC values are also larger. In addition, the CWC value of ANN-CWC method is larger than that of ANN-INAW method, indicating that the constrained SOO method is more suitable for solving interval prediction problem. Numerically, for the prediction results based on the data set D1, the CWC values of these adaptive interval constructionbased methods are 0.0582, 0.0608, and 0.0542, which are at least 53.48% smaller than those of the LUBE-based methods. The CWC value of the proposed method is 58.53%-72.25% smaller than those of the LUBE-based methods. Correspondingly, for the prediction results based on the data set D2, the CWC value of the proposed method is 84.68%-91.62% smaller than those of the LUBE-based methods. It is clear that the proposed method has better prediction ability. In addition, TCN and GRU methods, as the deep learning models, have been widely applied in wind speed interval prediction, but they have a large number of parameters that are difficult to be optimized by the meta-heuristic algorithm based on the LUBE framework, which makes them has more negative ICP and INAW compared with the proposed method. However, these adaptive interval construction-based methods directly optimize the model parameters via gradient descend-based optimization algorithms, which are more suitable than meta-heuristic algorithms for dealing with a large scale optimization problem where the objective function is differentiable.
d) Compared with the other adaptive interval constructionbased methods (i.e., AIC-ELM and AIC-BP), the CWC value of the proposed method based on the data set D1 is 6.87% and 10.86% smaller than that of AIC-ELM and AIC-BP, respectively. Correspondingly, the CWC value of the proposed method based on the data set D2 is 8.05% and 5.29% smaller than that of AIC-ELM and AIC-BP, respectively. For the two data sets, the proposed method has more outstanding performance than other two methods. This mainly because GRU networks has stronger learning ability of nonlinear time series, and can capture the nonlinear relationship of wind speed series. Compared with ELM and BP, GRU networks can more accurately fit the upper and lower bounds of presupposed auxiliary intervals in each iteration, so as to generate the higher-quality PIs. e) From Table 4, under the condition of meeting the nominal confidence level, the standard deviation of CWC for the proposed method based on the two data sets is smallest in all the methods, which denotes that the proposed method has better stability. f) It can be also seen from Table 4 and Fig. 6, the PIs generated by these methods based on the data set D2 is worse than those generated by these methods based on the data set D1 (their ICP values often do not meet the nominal confidence level). This indicates that the wind speed data with a 1-hour sampling interval are more difficult to predict than those with a 10-minute sampling interval. g) The time complexity of program execution is a undeniably momentous index. The training time and test time of each prediction models based on the two data sets are given in Table 4. For training time, by comparison, the LUBE-based methods consume longer time to optimize the parameters. These methods require more iterations to realize the convergence of optimization objective. MBB method is simple to implement by adjusting the resampling times. But its prediction accuracy is difficult to improve with the training times due to the inappropriate distribution assumptions. Compared with AIC-ELM and AIC-BP methods, the proposed method needs to train the GRU networks in each iteration, which consumes more time than training ELM and BP. For test time, all methods only take a short time (less than 0.1s) to obtain predicted results using the trained models, and the test time of the proposed method is only 0.0060 s and 0.0050 s based on the two data sets, respectively, which can quickly generate the prediction intervals of wind speed. For better showing the prediction ability of the proposed method, as an example, the interval prediction results of ANN-INAW method and the proposed method based on the data set D1 are drawn in Fig. 7. It can be found that the ANN-INAW method generates the worse PIs that are more wider (INAW = 0.1123) and have lower coverage probability (ICP = 91.67%), while the PIs generated by the proposed method are narrower that are 52.36% smaller than that of ANN-INAW, and the corresponding coverage probability is 93.40%. That means the proposed method has better interval prediction ability.

3) CONVERGENCE ANALYSIS
To examine the convergence property of the proposed method, considering to the computing time, the population size and the number of iteration of the DIWPSO algorithm are chosen as 50. Taking the data set D1 as an example, the optimization process of the two interval width adjustment variables (B u and B l ) and the values of the objective function (gFit) are drawn in Fig. 8, where it can be observed that the change of gFit has a similar trend as B u and B l . At the beginning of the iteration process, B u and B l gain lager values, and the value of gFit is greater than 0.1600. With the increase of the iteration number, the two phase search strategy produces smaller B u and B l to reduce the width of PIs. When the iteration number is 23, the values of B u and B l become steady, and correspondingly the change of the objective function gFit becomes very small. After a round 50 iterations, the value of gFit reduces to 0.0251, indicating that the proposed method has a good convergence  property, and can obtain optimal parameters for the construction of higher-quality PIs.

D. MULTI-STEP PREDICTION VALIDATION
To verify the prediction ability of the proposed method in different prediction horizons, some multi-step interval prediction experiments (from two steps to six steps) were performed based on the two data sets. In these experiments, the different prediction models were established for different prediction horizons. Seven methods, MBB, ANN-INAW, ELM, GRU, TCN, AIC-ELM, and AIC-BP, were used to conduct the same tasks for comparison purposes. The prediction results are reported in Table 5, and the visualization of CWC is shown in Fig. 9(a) and (b), where the single-step prediction results are also displayed. Fig. 9(a) shows the prediction results of all the methods based on the data set D1. From Table 5, it can be seen that the ICP values of all the methods meet the nominal confidence level. It is obvious that when the ICP values meet the nominal confidence level, the values of the comprehensive index, CWC, gradually increase with the increase of prediction horizons. This indicates that multi-step interval prediction is more difficult. However, in all prediction horizons, the proposed method can obtain the best CWC values compared with other methods, which indicates that this method achieves the narrower intervals with a qualified coverage probability. In contrast, the LUBE-based methods, ANN-INAW, ELM, GRU, and TCN do not produce comparable results. Their performance declines rapidly with the increase of prediction horizons. Numerically, the CWC values of the proposed method is 40.62%-51.82% smaller than those of the LUBEbased methods in two-step prediction tasks; the CWC values of the proposed method is 13.75%-16.96% smaller than those of the LUBE-based methods in six-step prediction tasks.
By calculating the mean values of the CWC values in all prediction horizons for each method, the corresponding value of the ELM method reaches 0.2713, which is 54.69% larger than that of the proposed method. Similarly, the GRU and TCN methods also obtain worse results that are 41.18% and 41.93% larger than that of the proposed method, respectively. In addition, for the adaptive interval construction-based methods, their performance is similar in all the prediction tasks, but the proposed method has more advantages in multi-step prediction tasks. For example, for four-step and five-step prediction tasks, the CWC values of the proposed method are 8.61% and 1.56% smaller than those of the AIC-BP method, respectively, and are 15.53% and 8.38% smaller than that of the AIC-ELM method, respectively.
The prediction results based on the data set D2 are drawn in Fig. 9(b). Compared with the prediction results using the data set D1 with a 10-minute sampling interval, predicting the wind speed interval is more difficult using the data set D2 with a 1-hour sampling interval, which leads to the inability of the LUBE-based methods to meet the nominal confidence level. Following (4a) and (4b), when the ICP value does not meet the nominal confidence level, the CWC value will become large. Therefore, the changes of CWC values in Fig. 9(b) are irregular with the increase of prediction horizons. However, it is significant that the AIC-ELM, AIC-BP, and the proposed methods meet the nominal confidence level and obtain lower CWC values in all prediction horizons. Among these three methods, the proposed method uses GRU networks to learn the upper and lower bounds of the auxiliary intervals, achieving better interval prediction ability.

V. CONCLUSION
Improving the quality of the PIs for short-term wind speed prediction is significant in the application of wind power. In this article, an AIC-GRU method was proposed. This method designed a two phase search strategy to optimize the parameters of the model parameters for interval prediction. Compared with the traditional LUBE framework, the AIC-GRU method optimized the parameters of GRU networks using a gradient descend-based algorithm (RMSProp), and introduced two interval width adjustment variables to adjust the auxiliary intervals, so as to construct the high-quality label information for GRU networks. The two interval width adjustment variables were optimized using DIWPSO algorithm. In addition, a comparison was performed with eight other methods for both single-step and multi-step prediction tasks based on two data sets with different sampling intervals, where the experimental results indicated that the proposed method can generate higher-quality PIs with higher coverage probability and narrower width.
In the proposed method, only the wind speed are used as the input variable to predict the intervals, but other factors, such as environmental temperature and atmospheric pressure, can also affect the change of wind speed. In addition, the GRU networks are adopted as the prediction model, but due to the large number of parameters in GRU networks that need to be optimized, the iteration process is slow. Therefore, in our future work, on the one hand, the more influencing factors will be considered and introduced into our method to obtain better performance; on the other hand, more suitable deep learning networks with fewer parameters and better performance will be integrated in our method.