LSTM-Attention-Embedding Model-Based Day-Ahead Prediction of Photovoltaic Power Output Using Bayesian Optimization

Photovoltaic (PV) output is susceptible to meteorological factors, resulting in intermittency and randomness of power generation. Accurate prediction of PV power output can not only reduce the impact of PV power generation on the grid but also provide a reference for grid dispatching. Therefore, this paper proposes an LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output. The statistical features at multiple time scales, combined features, time features and wind speed categorical features are explored for PV related meteorological factors. A deep learning model is constructed based on an LSTM block and an embedding block with the connection of a merge layer. The LSTM block is used to memorize and attend the historical information, and the embedding block is used to encode the categorical features. Then, an output block is used to output the prediction results, and a residual connection is also included in the model to mitigate the gradient transfer. Bayesian optimization is used to select the optimal combined features. The effectiveness of the proposed model is verified on two actual PV power plants in one area of China. The comparative experimental results show that the performance of the proposed model has been significantly improved compared to LSTM neural networks, BPNN, SVR model and persistence model.


I. INTRODUCTION
With the global concern about environmental issues, it has become the consensus of the world to develop renewable energy resources, such as wind [1], hydro [2], fuel cell [3], photovoltaic (PV) [4], [5].The International Energy Agency (IEA) estimates that the proportion of new energy resources will reach 60% in 2040, and among them, PV and wind energy will account more than 50%.PV, also known as solar PV, has developed from a niche market of small-scale applications to a mainstream electricity source since 1992.In 2017, the cumulative global PV power generation reached nearly 460 TWh, accounting for about 2% of the total global energy, and 60% of them is for utility-scale applications, and The associate editor coordinating the review of this manuscript and approving it for publication was Mehdi Savaghebi .the remaining 40% is for distributed applications [6].By the end of 2018, the total cumulative global PV capacity reached up to 512GW, and it is estimated to be enough to supply 2.55% of the global electricity demand [7].
However, the PV power generation is of an intermittent nature, and considerable fluctuations will be induced due to meteorological factors [8], such as ambient temperature, relative humidity, wind speed, and clearness index.The neural networks have been used in [9] to verify that the PV power output is strongly related to the temperature, wind speed and relative humidity.It is proved that the average photon energy (APE) and the temperature have a large influence on the performance of PV power output [10].Since the PV power output is greatly affected by meteorological factors, and strong fluctuation and intermittency will cause great impact on the system operation and grid-connected system.
It will cause the grid oscillation when a large proportion of PV system is connected to the grid [11].Therefore, accurate prediction of PV power output can significantly improve the operation of the power system and increase the penetration of the PV system.
Several studies have been conducted to predict PV power output and have achieved significant results.The methods for PV power prediction can be summarized as physical methods, direct prediction methods, and indirect prediction methods.The physical methods are mainly dependent on a physical model obtained from the theoretical analysis of energy conversion from solar energy into electric energy.The physical model is usually based on numerical weather prediction (NWP) [12], satellite imagery TSI, and cloud movement model [13].These models can predict the PV power output with high accuracy [14].However, such prediction methods require additional information on satellite cloud maps, resulting in higher operating cost and computational cost.
To address this issue, the direct prediction methods, also known as time-series methods, are widely used in the PV prediction.These methods establish a mathematical relationship between the historical power series and the future PV power according to the periodicity, tendency, and other properties of PV power profiles.In [15], the rule of PV historical output curves under different weather patterns is analyzed, and the prediction of PV power output is realized by superimposing fluctuations of PV power output at different scales.In [16], the chaotic characteristics of PV power output is presented, and RBF-based neural networks are used to fit the local variation of the phase space trajectory of PV power output.These methods are based on the historical PV output data, and the PV prediction is realized by finding out the PV output law in a certain time period.However, the meteorological information, such as irradiation, temperature, humidity, wind speed and wind direction are missing in such methods, and incomplete weather information is difficult to guarantee the accuracy of the prediction.
To improve the prediction accuracy, indirect prediction methods, also known as regression methods, are proposed.With consideration of meteorological factors, the prediction performance of PV power output at certain time period has been significantly improved.The indirect prediction methods are divided as artificial neural networks [17], [18], support vector machine [21], and Markov chain [22].Due to the obvious similarity of same weather patterns, the research on PV power prediction based on weather type is conducted, and the improvement has been done in [19].The Kohonen weather clustering model is improved to identify the weather type of the predicted date, and the meteorological data of the predicted date and clustering historical data are used as the input of neural networks to predict the PV power output.According to different weather types, Artificial neural networks are established in [20], the prediction of PV power output at daytime shows good performance due to the individual weather type, but the prediction performance for time-varying weather is poor.Compared with the previous two methods, artificial neural networks have an advantage of higher accuracy due to that multiple weather information is considered.However, NWP data does not respond effectively to meteorological factors, although further weather division strategy has been applied, the meteorological features still need to be further explored.
Deep learning has developed rapidly in computer vision and natural language processing, and it is also used in the regression and classification of one-dimensional data [23].Deep learning is categorized as long-short term memory (LSTM) [24], convolution neural networks [25], attention [26], and embedding [27].Several researchers have worked on PV power prediction based on deep learning.LSTM [28] and LSTM-based self-encoder [29] perform prediction directly without considering any other additional characteristics; although with deep belief networks used in [30], the factors that affect PV power output have been analyzed, and the relevant meteorological characteristics are also used, but the characteristics are not further explored.In [31], restricted Boltzmann machine is proposed, and only original characteristics related to PV power output are explored with only temperature characteristics included, and no more weather information are used; wavelet-based decomposing CNN-LSTM [32], [33] only uses direct prediction method, and no more additional information has been explored; and Bayesian deep learning model [34] is a method of uncertainty prediction.Also, recurrent neural network model is used to learn the nonlinear characteristics of PV sequence in [35], and the PV time-series data is divided into inter-day and intra-day data.This model is superior to the classical persistence method (Persistence), back propagation neural network (BPNN), radial basis function (RBF) neural network and support vector machine (SVM), and long short-term memory (LSTM) network.
Among them, different deep learning structures have been used, and both have improved the prediction accuracy of PV power output and PV irradiance.In [36], the additional meteorological information is used as the input, and nonlinear autoregressive recurrent neural networks (NARX), a feedforward neural network (FFNN) based model, is proposed to predict the solar irradiance; In [37], dry-bulb temperature, dew-point temperature, and relative humidity are used as the characteristics, and LSTM is used as the model to predict solar irradiance.Since the additional meteorological information can effectively improve the prediction accuracy, the better prediction performance is obtained compared with FFNN network model.
To the best knowledge of the authors, the deep learning methods can fit the PV power output very well, but the characteristics related to PV power output needs to be further explored, and the fitting ability of models should also be further improved.Therefore, to make full advantages of different deep learning models, the appropriate deep learning models are required and combined to improve the performance of the PV prediction.An improved deep learning model based on LSTM-attention-embedding algorithm is proposed in this paper.The tendency of the meteorological factors related to PV power output under each time window and the combined features of various meteorological factors are explored.An LSTM-attention-embedding model combined with LSTM-attention and embedding mechanism is built up to optimize the meteorological characteristics.The LSTMattention is used to learn the time-dependent meteorological numerical characteristics, and the embedding is used to describe the categorical features such as wind direction and time.The Bayesian optimization is used to select the optimal combined characteristics for LSTM-attentionembedding model.
This paper presents two original contributions that distinguish our work from existing schemes.First, the meteorological information related to PV power output is further explored, and the redundant characteristics are removed by using Bayesian optimization.Second, the different deep learning models are combined to better represent the correlation between characteristics.In the model, LSTM extracts the sequence information between characteristics, attention focuses on the important information, and embedding is to encode the categorical features.
The remainder of this paper is organized as follows.In section II, the characteristics related to PV power output are constructed, and the improved deep learning model based on Bayesian optimization is proposed.In section III, the prediction model of PV power output is proposed.The experimental results are analyzed in section IV, and finally, the conclusions are drawn in section V.

II. FEATURES EXPLORING MODEL-BASED PREDICTION OF PV POWER OUTPUT A. FEATURES CONSTRUCTION
The prediction accuracy of the model is dependent on the features of meteorological factors, and more effective meteorological features are required to improve the prediction performance.For the meteorological characteristics M ) is the k th features among samples, and M is the total numbers of meteorological factors.The following three kinds of features are explored.
1) The original features, that is 2) The statistical features The statistical features are used to reflect the data fluctuations within the time window t, and they can be expressed as where, xikt , S ikt , x maxkt are the mean value, standard deviation, and maximum value within the time window of [i-t, i + t].
3) The combined features The combined features, including the linear-combined features and nonlinear-combined features, are expressed as II-B.The linear-combined features are used to explore the linear relationship between each feature, while the nonlinear-combined features are used to explore the nonlinear relationship.
where, z ia , z is , z im , and z id are the additive, subtractive, multiplicative, and divisive features at sample time i.The original features X and statistical features in (1) are included in x ij .

4) Time features
The time features, such as the month of the year, the day of the week, the day of the month, the hour of the day, and the minute of the hour are extracted.

B. IMPROVED DEEP LEARNING MODEL 1) LSTM NEURAL NETWORKS
LSTM is currently a widely-used deep learning approach in machine learning area, and it is proposed by Hochreiter and Schmidhuber in 1997 [24].The LSTM can solve the longterm dependent sequences, and the main structure is shown in FIGURE 1.
The LSTM can be represented as a chain structure by time expansion, LSTM-neural network architecture shows the time expansion form of LSTM from sample time 0 to sample time t.In the figure, A represents the LSTM unit, for each time instance, there is one corresponding input x t and output h t , and the output h t at sample time t will selectively attend the input information from sample time 0 to sample time t-1.
There are four neural network layers in the repeating module.Neuron unit structure of LSTM shows the specific time connection of each LSTM unit, and it transfers the state (C t−1 ) and output (h t−1 ) of the cell at sample time t-1 to LSTM unit at sample time t, then the state (C t ) and output (h t ) of the cell at sample time t are transferred to LSTM unit at sample time t+1 in the same way until to the last sample time.
The detailed structure of each LSTM unit is shown at the bottom of FIGURE 1.The unit is mainly divided into forget gate, input gate, tanh layer and output gate.The input of each gate and layer are h t−1 and x t , and the output f t of the forget gate is obtained by the activation function σ .The output i t of the input gate is also obtained by function σ , while the output Ct of the tanh layer is obtained by tanh function.The output O t of the output gate is calculated from function σ , and the output h t of the LSTM is calculated by c t and O t .
For one LSTM cell, it has four gates.The first layer of neurons is a sigmoid control layer of the forget gate, and it can be expressed as where f t is the output of the forget gate, σ is the sigmoid activation function, W f and b f are denoted as the weight coefficient and offset of the forget gate, respectively.h t−1 , and x t are the output at sample time t-1 and the input at sample time t, respectively.The sigmoid layer of the input gate determines which information to update, the tanh layer is used to create a new candidate value that might be added to the states of the cell.
The second and third layer are the input gate and tanh layer, respectively.They can be expressed as where i t is the output of the input gate, W i and b i are the weight coefficient and offset of the input gate, respectively.
where Ct is candidate status, W C and b C are the weight coefficient and offset of the updated variables, respectively.Then, the old state of the cell C t−1 is updated as Finally, an output gate (O t at sample time t) including the sigmoid layer, shown as II-B.2, and the previous state of the cell, shown as (6), are multiplied together through a tanh layer to get the final output of the LSTM.The tanh layer is used to guarantee the output of the previous state of the cell to be bonded between -1 and 1.
2) ATTENTION MECHANISM When a large amount of input information is input into neural networks, the influence of the different inputs on the output value is different.To improve the computational efficiency of neural networks, more computing power is allocated to the important input information for the same output, and this is called the attention mechanism.The attention mechanism is calculated in two steps, the first step is to calculate the attention distribution value α i on all input values; and another step is to calculate the weighted average of the input information for a single output value.The attention mechanism is shown in FIGURE 2. where x i is the input value, q is the query vector of the neural network, s is the score function of the attention, α i is the attention distribution value of all input values for the query vector q.
The score function can be expressed as where, W is neural network parameters that can be learned autonomously.Taking the softmax value of (9), one can get the attention distribution α i , shown as a is the weighted average of the attention distribution value α and the input value x i , and it can be expressed as 3) EMBEDDING To ensure that the network can learn the categorical features such as time and wind direction, the embedding, a widely used structure in natural language processing, is applied here.The embedding is used to map the discrete features, such as the category number of the wind direction D, and a matrix D × n emb is obtained after embedding.In the matrix, n emb is the potential features, and the same processing is performed for other time features.

4) THE IMPROVED DEEP LEARNING NETWORK
To better learn the numerical features and categorical features for a long time, and an improved LSTM network for PV output prediction is shown in FIGURE 3.
The improved LSTM network is made up by several blocks, and they include continuous features (NWP features) and categorical features (time and wind direction), LSTMattention block, embedding block, merge layer and the output block.NWP features and categorical features are the input of the improved LSTM network.LSTM-attention block is used to memorize and attend the historical information, and embedding block is used to encode the categorical features.The merge layer is used to connect the LSTM-attention block and embedding block together, and the output block is to output the prediction results of PV power output.A detailed connection of each block of the improved LSTM network is shown in FIGURE 4.
1) The LSTM block consists of two layers of LSTM followed by two layers of fully connected (FC) layers.A residual connection is added for LSTM layer and FC layer to improve the gradient disappearance problem in the multilayer neural networks [35].The attention layer is connected between the LSTM layer and the FC layer, and it is used to focus on the important information.
2) The embedding block is used to encode each category feature, such as wind direction, the month of the year, and the minute of the hour.Then, these features will be merged, and finally they are connected to two layers of FC.A residual connection is also added to the FC layer.
3) The output block is used to output the prediction results with two layers of FC.

C. BAYESIAN OPTIMIZATION
Bayesian optimization is usually used to select the optimal parameters in the machine learning process.Compared with the grid search method, the Bayesian optimization has the following advantages: 1) the prior distribution is continuously updated based on the previous parameters; 2) due to the small number of iterations, it is more efficient when the parameter dimension is large; 3) it is robust to the non-convex problem, while the grid search is easy to fall into the local optimal solution.
In the process of Bayesian optimization, the loss function of the parameter fitting model is a Gaussian process, and parameters are updated to get the posterior probability of this function.To reduce the computational cost of the model, the optimal combined features are obtained in Bayesian optimization where the features presented in section 1.1 are selected as the parameters.
The following parameters, time window t, the number of statistical features num1, and the of the combined features num2, are optimized by Bayesian optimization.To constrain the features dimension and optimize the model parameters, the parameters of the sample is constructed as combined features [p 1 , p 2 , . .., p q ].The process of Bayesian optimization is described as 1) According to the combined parameters, the loss function is estimated as a Gaussian process; Part of the combined parameters [p 1 , p 2 , . . ., p r ] are selected to calculate their corresponding loss functions [L(p 1 ), L(p2),. . ., L(p r )], then the parameters and the loss functions are combined to get parameters-loss functions set.
The estimated loss function obeys Gaussian distribution, and Gaussian distribution of L(p 1:r ) can be expressed as where, k is the core function, and it is used to represent the covariance.
2) Sample selection using expected improvement (EI) function; The new sampling point is determined by comparing the loss function of the new sampling point with the current sampling point, and this mathematical expectation is called EI function.The Three-structured Parzen Estimator (TPE) is used to establish the EI function [39], and the present p + is selected as the optimal combined parameters, with y * , greater than L(p + ), as the threshold value, and the EI function is expressed as where, (p) is the distribution function which is smaller than y * , g(p) is the distribution function which is greater than y * .The point that makes the EI function the largest is selected as the sampling point.
3) Update Gaussian distribution; After adding the new sample, the new Gaussian distribution can be expressed as The Gaussian distribution of L(p r+1 ) is calculated as [40] 4) Loop 2) and 3) until the maximum number of the iteration is reached and the optimal output point at the present state is calculated.

III. PREDICTION OF PV POWER OUTPUT A. THE FLOWCHART OF PV PREDICITION
The of the PV prediction is shown in FIGURE 5.According to the meteorological factors, the statistical features and the combined features are constructed, and the time features are also extracted.Then, the LSTM-attentionembedding model is built up.To reduce the redundancy of the features, the Bayesian optimization is used to optimize the parameters.With the optimization method, the optimal time window, the number of statistical features and the number of the combined features are obtained.Finally, the prediction model of the PV power output is trained based on the available data.

B. EVALUATION INDEX
To quantify how well a prediction model is, mean squared error (MSE), mean absolute error (MAE), coefficient of determination (R 2 ) and root mean squared error (RMSE) are used to analyze the experimental results.All evaluation indexes are based on the difference between each predicted value and the actual value.

IV. EXPERIMENT ANALYSIS A. DATA ACQUISITION AND SIMULATION CONFIGURATION
The dataset used for the experiment was collected from a PV power output dataset within 25 months (from 1 st April 2016 to 30 th April 2018) from two PV stations located in one area of China, and the sampling frequency of the data is 1.1×10 −3 Hz (every 15min for one point).From the PV stations, we can get day-ahead historical meteorological data, which is same as NWP data.The information of each PV station is shown in  Due to advantages of being easy and simple, Keras toolkit is used to build up the deep learning network.In the network, python is used to program, NumPy and pandas are used for data processing, and matplotlib and seaborn are used for drawing.Our computer resources consist of an i7-9000 CPU, 32GB RAM, and GPU 2080Ti * 2. The learning and experimental phase of the proposed model are conducted in the GPU environment.

B. PARAMETERS OPTIMIZATION
The detailed training parameters for the improved deep learning model is shown in TABLE 3. The method for the parameter initialization can be found in [32].To verify the effectiveness of the proposed model, LSTM, BPNN, SVR, and Persistence [41] are compared.The detailed parameters are shown in IV-C.LSTM and BPNN model are neural networks that are used to layer stack structures, the parameters show the connection between each layer, the word before () represents the neural network layer, and the number in () represents the number of neural units of one layer.

C. DATA ANALYSIS
According to features construction method of meteorological factors, such as irradiance, temperature, humidity, pressure and wind speed presented in section 1.1, three statistical features are extracted for each factor.Then the number of statistical features is 15 (5×3).Among the combined features, 2×C 2  (15+5) features are added due to the exchangeability of the addition and multiplication, and 2×2×C 2 (15+5) features are added due to the unchangeability of the subtraction and the division, an additional 5 time features and one wind direction features are added, and the total number of features is 5+15+2×C 2 (15+5) +2×2×C 2 (15+5) +6 = 1166.Since excessive features will significantly increase the training time of the model, the computational cost will also be increased.To reduce the number of features, the effective features are selected by Bayesian optimization, and search space is shown in TABLE 5.
In our experimental process, LSTM-attention-embedding model needs 433s to find optimal features, while BLSTMattention-embedding model requires 73610s.Obviously seeing that the time required for BLSTM-attentionembedding model is multiplied compared to LSTMattention-embedding, and this is the disadvantage of the proposed method.However, the training and prediction cost after finding the optimal features are equivalent to other models.Also, once the optimal features are selected by Bayesian optimization, and a more accurate prediction can be obtained without demand to optimize features every time.This is the advantage of BLSTM-attention-embedding.
The experimental dataset of PV station 1 is divided into a training dataset and a test dataset in chronological order of 8:2, and the test error is selected as the target of Bayesian optimization.The iteration steps are 200, and the optimal parameters appear at the step of 199, and the optimal parameters are {t =18, num 1 = 3, num 2 = 12} with MSE of 0.66MW.Compared with the model {t =48, num 1 = 5, num 2 = 15} without any optimization, the MSE is decreased by 0.142MW, and the features dimension is also effectively reduced.Due to the redundant features and the invalid features are eliminated, the computational cost of the model is significantly reduced.The joint distribution map of the optimization process is shown in FIGURE 6.The selected optimal features are listed in TABLE 6, and num1 and num2 are statistical features and combined features, respectively.Among them, the most important feature is the irradiance, and this is in line with the real situation.That is because the fundamental of photovoltaic power generation is to convert solar energy into electrical energy.

D. DAY-AHEAD PREDICTION RESULTS
To demonstrate the effectiveness of the proposed method, the following models are compared.
1) Model 1: The BLSTM-attention-embedding model is the prediction model, and the features obtained from section 1.1 are the model input; 2) Model 2: The LSTM-attention-embedding model is the prediction model, and the features obtained from section 1.1 are the model input;    The errors of the prediction results of PV station 1 and PV station 2 with different models are shown in TABLE 7. It shows BLSTM-attention-embedding model has the best MSE, MAE, R 2 and RMSE, and followed by LSTM model, BPNN model, SVR, and Persistence model.The prediction accuracy of PV power output can be effectively improved by using LSTM-attention-embedding model, and after Bayesian optimization, the prediction accuracy can be further improved since more diverse effective features can provide more useful information for training the model.The proposed LSTMattention-embedding model can learn the long-term tendency and short-term mutation tendency of PV power generation, and Bayesian optimization can remove the redundant features and select the optimal effective features to improve the prediction accuracy of PV power output.
Regarding the prediction performance between PV station 1 and PV station 2, index R 2 is selected as evaluation index, since R 2 is the only index to show the fit degree between the predicted data and the true data without any limitations to training samples.However, index MSE, MAE and RMSE are positively correlated with the true data, and they are used to evaluate the prediction performance based on the same dataset.We can see that PV station 1 has higher value on index R 2 , and its overall prediction  performance is better than the PV station 2 due to that station 1 has more dataset.The prediction performance of LSTM-attention-embedding model with Bayesian optimization can be significantly improved compared with other models.The effectiveness of features construction and Bayesian optimization has been verified, and both can effectively improve the prediction accuracy of PV power output.
To further verify the superiority of the proposed model, the prediction results of PV power output under three typical weather conditions are analyzed on two PV stations.
Since there is no PV output at night, 64-time samples at daytime are selected as the experimental results of PV output.
1) The prediction results at sunny weather The real-time prediction curves of the PV power output of two stations at sunny weather are shown in FIGURE 10.The PV output prediction of each model is relatively accurate at sunny weather, and this is mainly because the PV power curve at sunny days is smooth and less fluctuating.Compared with other models, the proposed BLSTM-attention-embedding model is closer to experimental results.The prediction errors of different models at sunny weather are summarized in TABLE 8. Compared to LSTM, BPNN, SVR, and Persistence model, the prediction accuracy of LSTM-attention-embedding model is slightly improved.However, the performance of BLSTM-attention-embedding model with Bayesian optimization has been significantly improved, showing high fitting degree due to the optimal characteristics used.We can conclude that the proposed prediction model can improve the prediction accuracy of PV power output and can reflect the tendency of the PV power generation at sunny weather.
2) The prediction results at cloudy weather The real-time prediction curves of the PV power output of two stations at cloudy weather are shown in FIGURE 10.Since more effective features are obtained from BLSTMattention-embedding model and LSTM-attention-embedding model, and these models can capture the fluctuation of PV power output very well.BLSTM-attention-embedding model and LSTM-attention-embedding model are closer to experimental results compared with other models.However, the prediction accuracy of each model at cloudy weather is lower than that at sunny weather.The prediction errors of different models at cloudy weather are summarized in TABLE 9. Compared to LSTM, BPNN, SVR, and Persistence model, the prediction accuracy of LSTM-attention-embedding model is slightly improved.BLSTM-attention-embedding model with Bayesian optimization has a slight improvement in accuracy based on LSTM-attention-embedding model since attention mechanism in LSTM unit can capture the tendency of PV power generation.Therefore, the proposed model can improve the prediction accuracy of PV output at cloudy weather, but the model has a lower accuracy than that at sunny weather.
3) The prediction results at rainy weather The real-time prediction curves of the PV power output of two stations at rainy weather are shown in FIGURE 12.It is difficult to capture PV power generation due to more fluctuate and violent at rainy weather.Therefore, the prediction performance of PV power output of each model is not as good as sunny and cloudy weather, and even the prediction accuracy of BLSTM-attention-embedding model and LSTMattention-embedding model are not close to experimental results.However, we can still see that LSTM-attentionembedding model can represent the general tendency of PV power generation, and BLSTM-attention-embedding model is more easily to capture the sudden variation of the PV power, since the features after Bayesian optimization can effectively predict the PV power output, and it allows the model to capture the PV power more accurately.The prediction errors of different models at rainy weather are summarized in TABLE 10.The prediction accuracy of BLSTM-attention-embedding model and LSTM-attentionembedding model for the PV station 1 is higher than that for PV station 2 due to more training datasets of PV station 1. Due to randomness of PV power on rainy weather, the LSTM-attention-embedding model shows better accuracy on some individual rainy days, and this is the part that needs to be improved in the future work.The proposed method can improve the prediction accuracy of PV output on rainy days, but the accuracy is limited.This is the present challenge in predicting the PV power.In [33] and [42], the prediction accuracy on rainy days is also relatively poorer than the cloudy and sunny days.

V. CONCLUSION
PV power output is strongly related to the meteorological factors, and it shows the intermittency and volatility.Largescales of PV fed to the grid will challenge the power balance of the grid, leading to a series of disturbances.Accurate prediction of PV power output is an effective way to eliminate the problems caused by a high proportion of PV grid.Therefore, a BLSTM-attention-embedding model is proposed for the prediction of PV power output, with features construction of meteorological factors.
Several blocks are included in BLSTM-attentionembedding model, and they are: 1) LSTM, which is used to input meteorological factors and attend historical information; 2) Attention mechanism, which is used to focus on the import features; 3) Embedding, which is used to encode each feature; 4) Bayesian optimization, which is used to remove the redundant features and select more effective combined features.
The prediction results show the effectiveness of the proposed method.The LSTM-attention-embedding model can effectively improve the prediction accuracy of PV power generation, and after Bayesian optimization, the prediction accuracy can be further improved.Due to more effective diverse features are obtained, LSTM-attention-embedding model can learn the long-term and short-term tendency of PV power generation, and Bayesian optimization can optimize combined features to improve the prediction accuracy of PV power generation.
The proposed method in this paper shows that the forecast performance on rainy days is relatively poorer than that on cloudy and sunny days.Therefore, the prediction performance of PV power on rainy days should be further improved in the future research work.

FIGURE 5 .
FIGURE 5.The flowchart of PV power prediction.

FIGURE 7 . 3 )
FIGURE 7. The joint distribution map of errors with the number of iterations.

FIGURE 8 .
FIGURE 8. Scatter plots of the true and predicted PV data for all models on station 1.

FIGURE 9 .
FIGURE 9. Scatter plots of the true and predicted PV data for all models on station 2.

FIGURE 10 .
FIGURE 10.The prediction results at sunny weather.

FIGURE 11 .
FIGURE 11.The prediction results at cloudy weather.

FIGURE 12 .
FIGURE 12.The prediction results at rainy weather.

TABLE 1
, and the sampling data of each station includes time, irradiance, wind direction, temperature, pressure and humidity, with the resolution of 15 mins.Detailed meteorological factors are predicted by in-house model, which are listed as IV-B.

TABLE 1 .
Information of each PV station.

TABLE 2 .
The detailed meteorological parameters.

TABLE 3 .
Training parameters used in the model.

TABLE 4 .
Parameters for different models.

TABLE 6 .
Features selection result.

TABLE 7 .
Errors Comparison with Different Models

TABLE 8 .
Prediction errors at sunny weather.

TABLE 9 .
Prediction errors at cloudy weather.