Optimized Support Vector Regression-Based Model for Solar Power Generation Forecasting on the Basis of Online Weather Reports

Increasing the forecasting accuracy of photovoltaic (PV)-generated power is currently an important topic, particularly in the maintenance of the stability and reliability of modern electric grid systems. In this study, a model based on a particle swarm optimization (PSO)-optimized support vector regression (SVR) is proposed for the accurate forecasting of PV output power. In the process, an SVR-based model is established based on the most influential historical experimental data collected from an actual PV power station. A PSO-based algorithm is adapted for the selection of dominant SVR-based model parameters and improvement of performance. Moreover, a novel data preparation algorithm is developed for the preparation of a solar irradiance pattern on the basis of weather conditions and the percentages of cloud cover collected from online weather forecast reports. Finally, the proposed model is experimentally verified by deploying it to three different PV systems (1875Wp, 2000Wp and 2700Wp). Analytical and experimental results indicate that the proposed forecasting model ensures improved accuracy. The nRMSE of the proposed forecasting model is 2.841%. The proposed model will be effective in forecasting PV output power in existing PV systems. A guideline for the accurately forecasting of PV output power in practical applications is presented.


I. INTRODUCTION
The global demand for electric energy is increasing daily. Conventional electricity is mainly generated using fossil fuels, which are highly responsible for environmental pollution and global warming [1]. Therefore, the modern world intends to minimize the use of fossil fuel, and different initiatives for increasing the use of renewable energy has been undertaken because of the growing demand for electricity [2]. Consequently, global electricity generated from renewable energy sources has reached almost 29.0% of the total generation in 2020 (REN21.2021).
Solar photovoltaic (PV) energy is a desirable green energy that is extensively used in the renewable energy market. The worldwide total installed capacity of PV systems has reached 760.0 GW from 39.0 GW in the last 10 years (2010-2020; REN21.2021). Owing to the strategies of different governments and relevant international organizations, this installation rate will be retained or increase in the future [3]. However, PV output power highly fluctuates, thereby adversely affecting the stability and reliability of photovoltaic (PV)-integrated electric grid systems [4]. Increasing the forecasting accuracy of PV output power has become a critical element in energy management systems. Thus, numerous studies have been conducted to forecast PV output power.
The persistence model is generally used in verifying the forecasting capability of models and selecting a benchmark [5]. A number of deterministic approaches have been used in forecasting PV output power through physical methods and numerical techniques. Moreover, the numerical weather prediction (NWP) model is performed based on the physical and structural assumption at the PV system [6][7]. Several physical methods have used in forecasting PV output power on the basis of the various sources of the meteorological information of a PV site, such as solar irradiance, temperature, humidity and wind speed, as well as considering the component status, and the output characteristic curve, which are comparatively accurate and practical [8]. However, these types of model are complex and computationally complicated because of the chaotic nature of environmental factors [9]. Furthermore, huge investments in equipment, huge distance between the earth and satellites, and expensive maintenance are included on the shortcomings of NWP methods [10]. In statistical method, the internal law of PV power source is used to predict the historical power. This method is easy to implement but improving the prediction accuracy is a great limitations of these methods [9]. By contrast, the probabilistic prediction methods have achieved high interest in recent times [11]. Ensemble forecasting approaches are widely employed probabilistic techniques for forecasting PV output power [12][13][14]. These methods provide a set of forecasts and utilize observations and deterministic NWP dataset. Gaussian process regression (GPR) based on the Bayesian framework is a newly developed probabilistic forecasting technique [15]. However, the regression accuracy of this technique highly depends on the precision of measurement and on unavoidable outliers in meteorological data [16].
Different time series models have been widely used in predicting PV output power [17]. However, these models have low forecasting accuracy because of the nonstationary behavior of SI [18]. Among ANN-based methods, backpropagation neural network (BPNN) has been extensively used in forecasting because of its outstanding nonlinear mapping function. In [19], a novel BPNN-based prediction model that considers an additional input parameter (aerosol index) is proposed for the prediction of PV output power for the next 24 hours. The experimental results indicate that the proposed approach performs better than traditional methods. Moreover, ANN-based methods have been widely used in PV/solar power forecasting problems [20][21][22]. However, these types of models require an appropriate size of historical dataset and the suitable selection of network parameters even when very similar tasks are performed [23]. A DCNN is employed to forecast next 24-hour PV generated power on the basis of historical meteorological data [24]. The proposed model can predicts effectively the complex time series with high degree of irregularity. However, the forecasting accuracy of the model can still be improved by considering highly correlated weather forecast data [25].
SVR-based method has become a popular approach and has been effectively used in several forecasting methods, including stock market prediction, electrical load forecasting, tidal current forecasting, wind power and speed forecasting, SI and PV output power forecasting, and tourist arrival forecasting [26][27]. A forecasting model based on SVM and satellite images for solar power prediction is proposed in [28]. This model showed better result than conventional time series and ANN models. In [29], an SVR-based model for forecasting PV output power at different environmental conditions is proposed. The results showed that the SVRbased model performed better than others model, including ANN-based model. This model can perform more better if the dominant parameters of the SVR-based model are initialized properly. SVR is effective in solving various forecasting problems and shows rapid training speed [30][31][32]. However, some essential parameters that can significantly affect the performance of the model should be initialized appropriately and simultaneously for the construction of an SVR-based model. In most of the cases, the dominant parameters of the SVR-based model are fixed through trial-and-error method [29]. Few optimized SVR-based models have been proposed in several forecasting [33][34][35], which is not sufficient. The selection of the appropriate parameters of the PV output power prediction model is crucial owing to the highly nonlinear dataset of SI and PV output power. Therefore, a suitable structural method for efficiently confirming the selection of SVR-based model parameters (i.e. optimized SVR-based model) in PV power forecasting needs to be developed.
On the other hand, previous studies suggested inputting forecasted meteorological data into the model for forecasting PV output power [7]. These data are usually collected from weather stations or meteorological centers located near PV sites [36]. However, most regions have no meteorological centers. In numerous forecasting approaches, forecast the meteorological factors, including SI on the basis of NWP or cloud images. The forecasting accuracy of an NWP-based model is still unsatisfactory. Satellite-and cloud image-based forecasting processes are not only complicated but also highly expensive [37]. A prediction system using these processes may not be economically viable for a small or medium PV stations. Therefore, the development of an optimized SVRbased forecasting model as well as a suitable guidelines for collecting meteorological information is required for the practical forecasting of PV output power.
In the current study, an SVR-based model based on the most significant historical dataset for forecasting PV output power is developed. There are three most influential parameters in SVR-based model, which need to be initialize accurately and simultaneously. A PSO-based algorithm is established and combined with the proposed model for the optimization of the dominant parameters of SVR. The proposed optimized model is trained and tested using a separate historical dataset, then the forecasting results are analyzed using different performance metrics. An SI pattern of a forecasted day is constructed using a novel algorithm based on online weather reports. The proposed model forecasted the next-day PV power generation on the basis of this SI pattern and forecasted online atmospheric temperature dataset. The design strategy of the proposed model is highly relevant and exhibits a significant level of originality. The proposed model simultaneously selects and optimizes the dominant parameters of SVR and minimizes the complexity of the model and increases the accuracy to a significant level. The main contribution of this study is the presentation of a PSOoptimized SVR-based model for improving the forecasting accuracy of PV power generation. Moreover, a novel rulebased algorithm for obtaining SI from online weather reports and effective forecasting PV output power in real life applications is proposed. This paper is organized as follows: Section II discusses the preparation of training and testing dataset, and a novel algorithm for estimating the hourly SI from online weather reports. Section III describes the different methods, including the proposed methodology. Section IV discusses the performance evaluation metrics of the forecasting model. The results and discussion points of this study are presented in Section V. Finally, Section VI summarizes and concludes this study.

II. DATA COLLECTION AND ANALYSIS
In this study, the proposed model for forecasting PV output power is established based on historical PV output power and respective meteorological data of actual PV station. Thus, the PV output power and respective meteorological data, such as SI, atmospheric and module temperature, and wind speed are collected from three different PV systems installed in an institutional building at the same geographical location (latitude = 03°09̍ N; longitude = 101°41̍ E). Table II presents the details of these PV systems. The three PV plants are of monocrystalline, polycrystalline, and thin-film types with installed capacities of 1,875Wp; 2,000Wp and 2,700Wp, respectively. These data have been collected using an automatic data logger device at 5 minute intervals from July 1, 2016 to December 31, 2017.

A. SELECTION OF METEOROLOGICAL INPUTS FOR THE PROPOSED MODEL
The meteorological factors have variable effects on PV electricity production. Thus, identifying the most significant factor among the meteorological factors is vital to the PV power prediction model-building process. Correlation analysis (Equation-1) is employed, and the results show that SI and atmospheric temperature profoundly influence PV power generation. Therefore, only the two vectors are considered meteorological inputs in the construction of the proposed forecasting model for reducing the computational cost.

B. DATA PREPARATION FOR TRAINING THE PROPOSED MODEL
In the considered area, PV output power is usually available from 7:00 am to 7:00 pm, and this time range is known as daylight time. The historical sample data of measured PV output power and respective meteorological dataset are considered only within the daylight time. Data size is reduced to nearly half for the elimination of nighttime data. This procedure not only accelerates the simulation process but also improves prediction precision. The wrong sensor data and NaN values are replaced by the neighboring data or the same time of similar day's data. The range of the entire hourly averaged data is reduced through normalization before the data are inputted into the model. This process limits the data to a range of 0-1 and reduces regression error. The normalization formula [5] is as follows: where and are the normalized and original values of the considered input data vector, respectively, and and are the minimum value and maximum value of the actual data of a particular input vector, respectively. Notably, the model output data must be anti-normalized.

C. PREPARATION OF INPUT DATASET FROM ONLINE WEATHER REPORT
In previous studies, weather condition in the forecasted day is classified into different categories, such as sunny, rainy, cloudy, and foggy. Subsequently, PV power generation is forecasted based on the weather report of the forecasted day. However, weather conditions continuously vary with time on a particular day. Therefore, accurately determining the weather pattern of a forecasted day with a single word is impossible. Accordingly, determining the exact weather pattern of a forecasted day is extremely important. Using online weather forecasting reports can be the best option for accurately determining hourly weather conditions of the forecasted days. By contrast, online weather stations (for example, www.accuweather.com) do not directly predict SI. These online stations forecast weather conditions, such as sunny/clear sky, mostly sunny, partly sunny, partly cloudy, mostly cloudy, cloudy, and rainy, hourly.
Moreover, hourly atmospheric temperature, wind speed, cloud cover, and humidity are directly forecasted. However, apart from atmospheric temperature, the hourly SI of a forecasted day is essential for the proposed PV power forecasting model. Therefore, a novel algorithm for extracting the hourly SI of forecasted days on the basis of weather conditions and percentages of cloud cover obtained from online weather reports is proposed. Furthermore, the maximum and minimum indexes for each type of available weather condition are fixed according to previous online weather reports and historical recorded data analysis. The range of cloud cover for each type of weather condition is determined on the basis of the analysis of online weather reports in the previous last one year. The details of these weather types, including the range of cloud cover with maximum and minimum indexing, are presented in Table III.

A. Support Vector Regression (SVR)
SVM is a supervised machine learning method which developed based on the structural risk minimization (SRM) principle [38]. This method has a considerable generalization capability and is widely used in classification purposes. An extended version of SVM, SVR is developed for nonlinear regression estimation and can significantly minimize generalization error through the SRM principle.
In the SVR approach, the original input data "x" is mapped into a highly dimensional feature space through nonlinear mapping before linear regression is executed. A set of training data {( 1 , 1 ), ( 2 , 2 ), … … … … , ( , )} is considered, where ∈ is the input vector (different environment data) and ∈ 1 is the respective output data value (measured PV output power). The estimation function ( ) is calculated as follows: where ( ) is known as the feature mapped nonlinearly into a highly dimensional feature space from the input space. ∈ and ∈ are the weight vector and bias term, respectively. The regression problem can be reformed by minimizing the regularized risk function as in (4): where Hence, ( , ) is known as the loss function, and ℇ is the radius of the ℇ-tube. Loss is zero when the forecasted value is within the tube. The other part, 1 2 ⁄ ‖ ‖ 2 , measures the flatness of the function. The parameter C specifies the trade-off between empirical risk and model flatness. Meanwhile, ℇ determines the data that are ignored in the regression process. The C and ℇ values must be adjusted carefully. Two slack variables, ξ and ξ * , are considered, which denote the gaps between the actual values and the corresponding boundary values at top and bottom of the ℇ-tube, respectively. Then, the regression function can be reformed into the following constraint: This constrain can be optimized by incorporating the Lagrange multiplier method. The kernel function ( , ) with mercer conditions is introduced to replace the original function, then the following model can be obtained: Subject to: ∑( − * ) = 0 =1 ; 0 ≤ , * ≤ ; = 1,2, … … , .
By solving the optimization problem, the best fitting regression function is as follows: Regression accuracy greatly depends on the appropriate selection of a kernel function and its parameters. Gaussian radial bias function (RBF) is the most employed kernel in SVR because of its single parameter and broad application scope.
This function can distribute training data effectively. Furthermore, the kernel is highly capable of mapping training data nonlinearly into an infinite dimensional space. Therefore, the RBF has been used as a kernel function for the development of the forecasting model proposed in this study.

B. Particle Swarm Optimization (PSO)
In PSO, a swarm is formed by many particles that alter their positions around in the multidimensional search space for finding an optimum result [39]. Each particle changes its status according to its characteristics and those of other particles. In a single iteration, each particle achieves a new velocity that is calculated based on its present velocity and the position gap between its position and its previous best position and global best position. Then, this new velocity is utilized in estimating the next position of the particle. This procedure is repeated until a pre-specified smallest error is obtained. The algorithm updates the velocity and positions of the particles with the following equations: where and represent the velocity and position of the ℎ particle in dimension, respectively, w is called weight, 1 and 2 are regularly distributed random values within a range of 0-1. The positive constant 1 and 2 are called cognitive and social parameters, respectively, and are the best positions of the ℎ particle and swarm, respectively, and is the number of iterations.

C. Proposed PSO-optimized SVR-based Model
Three hyper-parameters, namely, penalty factor ( ), insensitive loss function (ℇ), and the kernel function parameter ( 2 for RBF), are involved in the proposed SVRbased model development process. If the value of is extremely high (supposed to infinity), then the regularized risk function considers the empirical risk, that is, ( , ) only, and ignores the model flatness. The parameter ℇ determines the width of ℇ-tube, which is important to the proper fitting of training data. By increasing the value of ℇ, generalization capability can be enhanced, and the amount of support vectors and algorithmic computation complexity decrease. By contrast, if the value of ℇ is extremely high, then the function is flat because of the absence of sufficient support vectors. Parameter determines the width of the Gaussian function, which reflects the distribution range of the training data "x".
Inappropriate hyper-parameters in SVR can lead to over-or under-fitting problems. Therefore, all these parameters affect the construction of the forecasting model in various ways. Setting these parameters appropriately is a great challenge to increasing forecasting accuracy. Little improvement in PV power forecasting accuracy highly affects smart electric networks and reduces spinning reserve for economy electricity supply. Therefore, a PSO-based algorithm is incorporated for the appropriate selection of dominant parameters. Compared with GA, PSO is free from complex evolution operators, such as crossover and mutation. The implementation of PSO is easy because few parameters need to be adjusted. Therefore, PSO is considered a good alternative for the appropriate selection of SVR-parameters. The flow diagram of the proposed PSOoptimized SVR-based model for forecasting PV output power is shown in Fig. 2.
In this study, MATLAB-based LIBSVM [40] is used in developing the SVR-based model for forecasting PV output power. A suitable training dataset, which should not overlap and cover nearly all-weather conditions, is prepared. This dataset is obtained through the analysis of historical data collected from an actual PV power station in previous years. Regression error is reduced by normalizing the entire dataset. First, the essential parameters of PSO, including maximum iteration number, are initialized. The number of parameters, particles, cognitive ( 1 ) and social ( 2 ) parameter, weight, and maximum iterations are set at 3, 10, 0.1, 0.11, and 20, respectively. The range of dominated hyperparameters of SVR is set according to previous knowledge and experience. The ranges of parameters are set as follows: C ∈ [1, 100], ℇ ∈ [0.005, 0.050], and σ ∈ [1,10]. For the hourly resolution dayahead approach, the hourly average normalized data samples are prepared for the training and testing the proposed model. Subsequently, the dataset is separated for training (nearly 80% of data sample) and testing. In PSO, the values of each particle of every SVR parameter are initialized by using a set of random numbers between 0 and 1. The SVR-based model is constructed using the particles of each parameter, and PV power output is initially forecasted on the basis of tested meteorological datasets. Forecasting accuracy is evaluated based on the defined fitness function, and the record of the best performance is maintained as the global best. The training and testing procedures of the SVR-based model, including the updating of the velocity and position of every particle, are continued until the maximum iteration number is achieved.
The optimum maximum iteration number should be determined in terms of the minimum cost. In each of the iteration, the local best is modified, and the global best is updated by comparing the local best and previous global best.
The optimum values of the SVR hyperparameters are obtained from the global best when this process is completed. These optimum values are used in the SVR-based model, and PV output power is finally forecasted by inputting the meteorological dataset of the forecasted day prepared based on online reports. A genetic algorithm-optimized SVR-based model, a backpropagation ANN-based model, and a GPR-based model are considered alternative models for the comparative performance evaluation of the proposed forecasting model. The persistence model is used to establish a benchmark of the proposed model for forecasting PV power generation.

IV. PERFORMANCE EVALUATION METRICS
Several metrics can be used in calculating forecast model error. The mostly used performance evaluation metrics normalized root mean square error (nRMSE), mean absolute percentage error (MAPE), maximum error, and average error are considered in this study. The mathematical equations of the most used metrics are as follows: where and are the hourly values of the measured actual and forecasted PV output power, respectively. In addition, ( ) is the highest value among the measured actual PV output power, and the number of considered test samples is represented by N. VOLUME XX, 2017 9

V. EXPERIMENTAL RESULTS AND DISCUSSION
The proposed model for forecasting PV output power is developed through the abovementioned procedures. This model provides the optimized values of SVR parameters as follows: C ∈ 35.9119, ℇ ∈ 0.0162, and σ ∈ 2.9025. By contrast, other competitive models, such as the GA-SVR, ANN, and GPR models, are developed on the basis of appropriate guidelines. All the models are trained and tested using the same dataset. Three grid-connected PV systems are employed for the evaluation of the proposed forecasting model.  Table IV. The average results of the entire considered models are presented in Table V.
The preceding figures and tables presented suggest that the proposed model can accurately forecast PV output power. The average results in all the metrics show that the efficiency of the proposed forecasting model is better than that of the other competitive models. A slight variation is observed in various samples because of the variations in weather patterns. Nevertheless, the error range of the proposed model is extremely low and is within an acceptable range. In addition, the nRMSE of the proposed forecasting model is 2.841%. This result satisfies the industrial requirements, that is, the shortterm prediction error of PV output power in RMSE should be less than 20% [5].
Clearly, the proposed model can accurately forecast PV output power on the basis of forecasted most influential meteorological data, namely, SI and atmospheric temperature. However, the hourly average atmospheric temperature of the forecasted day is directly obtained from online weather reports, whereas the SI data are unavailable in online weather reports. Accurately forecasting the SI of a forecasted day is difficult because of its unsteady characteristic. By contrast, existing SI forecasting models have several limitations, such as low efficiency, complexity, and high cost because of the use of expensive equipment. Moreover, compared with other meteorological factors, SI is not forecasted by online weather forecasting agencies. Therefore, the accurate preparation of SI pattern is a considerable obstacle to effective PV power generation forecasting using the proposed model.  In this study, a novel algorithm that appropriately determines the hourly SI pattern of the forecasted day from online weather reports is proposed. Five forecasted days of different weather patterns are considered randomly from July and August, 2017. A meteorological dataset is prepared for each forecasted day a day before the proposed algorithm and online weather reports are used. Then, the prepared day-wise meteorological dataset is inputted to the proposed PV power generation forecasting model.
PV output power is forecasted on the basis of online weather reports. The experimental results show that the forecasted PV output power of the proposed model nearly matched the measured actual PV output power in all the cases. The forecasting errors of different models in various metrics are calculated for each day, and their results for PV system-1 are shown in Table VI. This process is continued for PV system-2 and PV system-3 for the validation of the proposed model. The same meteorological dataset prepared from online reports is used in PV system-2 and PV system-3. The errors of all the models for each day in different metrics are calculated for PV system-2 and PV system-3. The average errors of all the PV systems of the different models in various metrics are shown in Table VII. The preceding tables presented imply that the proposed forecasting model can accurately forecast PV output power compared with the other competitive models. The performance of the proposed optimized model is better than that of the simple SVR-based model because of the appropriate selection of hyperparameters. The appropriate selection of SVR parameters significantly improves forecasting accuracy and reduces the computational cost and complexity of the proposed model. Notably, the forecasting error is higher than the model testing error when online data are used. This error may be reduced by accurately forecasting weather conditions. The proposed data preparation algorithm, which reduces forecasting cost and complexity, is efficient in preparing the SI pattern of the forecasted day. This simple process is easy to implement when the PV power generation forecasting model is used. The proposed model forecasts PV output power at any weather condition and nearly with the same accuracy. The overall error in nRMSE is 10.023%, thereby satisfying the industrial requirement of short-term PV power forecasting performance. The correlation between actual PV power and the PV power forecasted by the proposed model of PV system-1 is shown in Fig. 4. Hence, the correlation factor r 2 of 0.8614 indicates the excellent performance of the proposed model.  The estimated computational time of the forecasting models in considered direct data set and pre-processed data set is shown in Table VIII. In direct data set, the collected entire data vectors are considered directly. However, in pre-process data set, only highly correlative meteorological data vectors are considered. The elimination of night-time data and normalization process are also employed in pre-process data set. Therefore, it is clearly observed from the Table VIII, the forecasting models in this study reduces the computation cost significantly due to utilization of data pre-processing strategies. Besides, the computational time of the proposed forecasting model is comparative low from the other optimization forecasting technique (GA-SVR).

VI. CONCLUSION
Currently, accurately forecasting PV output power remains to be a significant challenge for PV energy to the smart electric network. This study presents a one-day-ahead PSO optimized SVR-based model for forecasting PV output power. The proper optimization of SVR-parameters confirms improved forecasting results. The preprocessing of input data, along with correlation analysis, reduces the computational cost and substantially improves regression precision. Moreover, a novel data preparation algorithm is used in obtaining the meteorological dataset of the forecasted day from online weather reports. The algorithm contributes to the practicality of the proposed model. The proposed model is validated experimentally using three different PV systems. The analytical and experimental results indicate that the proposed model can accurately forecast PV output power, and the results of the different metrics show that the proposed model outperforms the GA-SVR, SVR, ANN, and GPR models, including the persistence model. The nRMSE of the proposed forecasting model is 2.841% that satisfies the industrial requirements. In addition, the correlation factor between actual measured and forecasted PV output power of the proposed model of 0.8614 indicates the excellent performance of the proposed model. The proposed technique considerably reduces the forecasting cost and complexity.
Finally, the proposed model is promising and practical for existing grid-connected PV systems.