Prediction of Solar PV Power Using Deep Learning With Correlation-Based Signal Synthesis

Enhancement of the dispatching capacity and grid management efficiency requires knowledge of photovoltaic power generation beforehand. Intrinsically, photovoltaic power generation is highly volatile and irregular, which impedes its prediction accuracy. This paper proposes deep learning-based approaches and a pre-processing algorithm to handle these constraints. The proposed scheme employs Pearson’s Correlation Coefficient to find the similarity between atmospheric variables and PV power generation. Based on high PCC values, top atmospheric variables and PV power generated time series data are passed through the Empirical Mode Decomposition (EMD) to simplify the complex data streams into Intrinsic Mode Functions (IMFs). Further, to streamline the prediction process, the proposed correlation-based signal synthesis (CBSS) algorithm finds combinations of these IMFs, which have a high correlation value between atmospheric variables and PV power data. Deep learning models of algorithms Long Short Term Memory (LSTM) and Nonlinear Autoregressive Network with Exogenous Inputs (NARX) network with the configurations of three networks, a single network, and the direct approach employed for the prediction of IMFs combinations. The LSTM network was analyzed under the Adaptive moment estimation (ADAM), Stochastic Gradient Descent with Momentum (SGDM), and Root Mean Square Propagation (RMSP) optimization. Extensive experimentation was evaluated using atmospheric data from the Climate, Energy, and Water Research Institute (CEWRI), NARC, Islamabad, Pakistan. RMSE, MAE, MAPE, and $R^{2}$ performance measures show promising prediction results for the LSTM under the configuration of three networks and ADAM optimization.

Training parameter i.

I. INTRODUCTION
Due to limitations in fossil fuels reserves and environmental constraints, renewable energy sources (RESs) are gaining popularity and expansion at a fast rate [1], [2].Amongst these RESs, solar energy has emerged as a significant resource due to its continuity and pollution-free status [3].Therefore, any developing country needs to increase its solar energy shares in a combined energy generation mix [4].The frequent disruption in grid power and unavailability of large-scale grid infrastructure are limitations in providing electricity to a dense population [5].Due to plenty of sunlight, solar PV can generate sufficient electric power and offer on-grid and off-grid solutions.However, unlike conventional power generation sources, solar PV depends on limited controllable variables.Since solar energy is inconsistent and varying, heavily dependent on weather data [6].This data includes solar irradiance, relative humidity, temperature, and wind speed [7].These weather-related factors cause instability and fluctuations in output power in grid-connected and standalone solar PV systems [8].Therefore, precise and efficient prediction of solar PV is important in the smooth functioning and regulation of power systems [9].Accuracy in solar PV prediction is a key challenge in interconnecting large-scale solar power to a conventional grid system [10].
A Unified Power Quality Conditioner (UPQC) is proposed to overcome quality issues of grid-connected PV systems [11], [12].However, the connection of solar power to the main grid can be done smoothly by estimating the power obtained from the solar plants.Then, grid operators can utilize the predicted solar power for planning and decision processes efficiently [13].The interdependence between climate input data and output power is considered.It is shown that solar irradiance has a strong correlation between output power and predicting PV power [14].However, the more the number of input data variables, the accuracy of predicting the model increases but with an increase in complexity and computational cost [15].Therefore, selecting important parameters relating to climate data using correlation is needed to achieve high accuracy with minimum computational cost.
In literature, solar PV prediction models can be classified as statistical, physical, and AI (Artificial Intelligence) based models [16].Physical models forecast solar irradiance and output power using geographical and weather data [17].These parameters include wind velocity, ambient tempera-ture, humidity, and air pressure.This approach is directly related to the accuracy of climate data without considering previous solar data [18].The statistical models co-relate time series and real-time data to predict future scenarios.This approach needs historical data but computational problems and accuracy in data remain a challenging issue [19].AI methods include ANN, support vector machine (SVM), adaptive neuro-fuzzy interface system (ANFIS), LSTM, deep belief network (DBN), RNN, and CNN-based deep learning models are suggested for the prediction of solar power output [20].LSTM deep learning model is implemented to forecast using time series data.This approach becomes difficult due to cyclic and seasonal variations in the data [21].To predict solar irradiance and power a hybrid model is suggested using ANN [22].An indirect deep learning method is implemented using the LSTM approach to estimate solar PV power from solar irradiance with the help of weather data [23].The solar irradiance is predicted using the LSTM model with sunshine hours, humidity, and temperature as inputs.The proposed model results are compared with SVR (support vector regression).The LSTM model improves accuracy by evaluating root mean error criteria [24].A multihorizon prediction of solar irradiance with an LSTM model using inputs; irradiance, pressure, temperature, and wind speed is also implemented [25].Different deep learning models, such as LSTM, SVR, RNN, and GRU (Gated Recurrent Unit) are implemented to predict solar irradiance with good accuracy [26].In a comparative study between deep learning networks and machine learning-based models; Gradient Boosted Regression Tree (GBRT) and Feed Forward Neural Network (FFNN) are evaluated [27].The physical models are directly related to climate parameters in predicting solar power, whereas AI-based methods can overcome these issues for short and long-term prediction [28].The AI-based methods have limitations like higher computational cost and lower performance while handling large-size data [29].Machine learning models can extract less complex features from multi-dimension data [30].Hybrid models that combine several deep learning methods are also gaining importance to achieve better results in predicting solar power [31].A hybrid model is developed using CNN and LSTM to forecast solar irradiance based on solar angle, wind speed, perceptible water, wind direction, and temperature parameters [32].
The research work in literature based on solar power forecasting related to machine learning models, ANN, RNN, and CNN is summarized in Table 1.

A. MOTIVATION AND RESEARCH CHALLENGES
The following motivation and key scientific challenges are associated with predicting PV power.These challenges require further investigation.attention recently.Current energy crises, such as imbalance between supply and demand and overcoming power shortages, can be resolved using these renewable energy technologies [42].Solar energy has achieved around 849 GW capacities, featuring almost 28% 2) Renewable Energy Resources Solar PV is considered an encouraging replacement for fossil fuels.It is turbulent and irregular as it is directly related to climate data.Due to its fluctuating property, there is an imbalance between the supply and demand of energy.Therefore, solar energy prediction is highly recommended to maintain accurate supply and demand of solar power output and maintain a stable operation [43].Largescale connections of solar PV into the grid pose challenges and stability problems to the working of conventional power grids.Accurate prediction of solar PV will be effective in solving these issues.[36].However, these methods are still early and need further exploration [44].These deep learning approaches have several limitations, including slow convergence and efficiency degradation.The relationship between solar radiation and climate parameters has inspired different authors to predict solar power accurately but mostly relies on the linear relationship between different variables and selecting those inputs, achieving a minimum forecasting error [45].Nowadays, Machine learning (ML) and deep learning (DL) methods, such as RNNs and LSTM, to forecast solar PV are extensively used [46].The results of these studies depend on several factors.It includes climate conditions and forecasting horizon [47].However, the focus should be on data selection and its effect on different models that can improve the accuracy of prediction results.Comparing several models can provide better forecast results compared to an individual network.

B. NOVEL CONTRIBUTIONS
Different weather conditions are considered to predict the highly fluctuated PV power.However, a correlation-based signal synthesis algorithm (CBSS) has been proposed to find the interdependence between Intrinsic Mode Function combinations of atmospheric variables and output PV power generation data.The following contributions are made to address the challenges mentioned earlier: • A correlation-based signal synthesis algorithm (CBSS) has been proposed (Figure 1 step 1) to find the highly correlated IMF combinations of atmospheric variables with IMF combinations of PV power generation data.
• Deep learning approaches like the LSTM and NARX networks have been employed (Figure 1 step 2) in three different configurations of three networks, a single network, and a direct approach i.e., training on atmospheric and PV power data without any time series signal decomposition.Further, extensive experimentation is performed using three different optimization schemes of Stochastic Gradient Decent Momentum (SGDM), Adaptive moment estimation (ADAM), and Root Mean Square Propagation (RMSP) for the LSTM network.
• The consequent analysis demonstrates a high prediction rate performance (Figure 1 step 3) advantage as compared to the direct approach on collected atmospheric and PV power generated data.Different weather conditions were pre-processed before a prediction by deep learning schemes of LSTM and NARX to forecast the highly fluctuated and volatile generation of PV power.To improve the forecast accuracy, the proposed algorithm, CBSS is employed to find the appropriate and related combinations of EMD components of vital weather variables and PV power data.The proposed technique is applied to locally collected data for different atmospheric sensors.For the CBSS-LSTM integrated proposed model, a three network composition achieved RMSE of 8.17, R 2 value of 0.99, MAE value of 3.3, and MAPE value of 2.72 for ADAM optimization.For the LSTM model and optimization, the direct approach without any CBSS preprocessing achieved an RMSE of 27.11, R 2 value of 0.90, MAE value of 15.61, and MAPE value of 9.72.For the NARX network, a three-network configuration achieved RMSE of 8.24, R 2 value of 0.96, MAE value of 1.46, and MAPE value of 2.63.For the NARX, the direct approach without any CBSS pre-processing achieved an RMSE of 25.43, R 2 value of 0.88, MAE value of 16.55, and MAPE value of 9.79.
The proposed work is divided into different sections in this paper.Each section provides the details of the proposed techniques.The introduction, related motivation, and research challenges are discussed in section I, which also represents the related literature review regarding the PV prediction.The different experimental streams and the proposed model are detailed in section II.Section III provides the experimental results and discussion details that include performance measures, different optimization schemes, and network configuration for deep learning approaches.The last section concludes the research work.

II. PREDICTION FRAMEWORK A. OVERVIEW
The proposed methodology predicts photo-voltaic power generation using deep learning approaches.These approaches involve daily weather variables data like maximum temperature, pan evaporation, and humidity.However, multiple streams are employed to verify the effectiveness of the proposed scheme.Each stream encloses either the NARX or LSTM network as a basic deep-learning technique for prediction; however, additional preprocessing is an option  that can further simplify the prediction process.Figure 2 provides the step-wise details of each stream separately.To present a comprehensive comparison, Figure 2 provides the complete block diagram of each experimental stream.The initial phase (Figure 2 step 1) measures the similarity between the weather variables and PV data using Pearson's correlation coefficient (r).It is much more convenient for the predictor function to have a similarity between the weather variables and PV data.This stage specifies the variables with the highest average correlation value with PV data and employs these variables to further process them for prediction.In the case of direct prediction (Figure 2 step 4), the NARX network utilizes the prior selected weather variables along the PV data for prediction.A singlelayer LSTM Network (Figure 2 step 5) is employed for prediction using previous weather variables.This network contains a fully connected layer and a regression layer for forecasting.
Empirical Mode Decomposition (EMD) employs these selected features in step 1 to give intrinsic mode functions (IMFs) and residual of each weather variable and PV generation (Figure 2

B. CORRELATION-BASED SIGNAL SYNTHESIS (CBSS)
A strong correlation between input and output prediction data suggests any change in input data is related to predictable changes in output data that can improve predictable accuracy [48].Application of Person's correlation formula results in selecting climate variables displayed in Figure 3.All the possible IMFs of these selected variables are shown in Figure 4. To find the strong impact of the possible IMF combinations of input data on the possible IMF combinations of photovoltaic power generation data, the dot product of both has been considered to analyze the dependency of IMF combinations of atmospheric input data and target PV power generation data.A scalar product or dot product identifies the closeness of two data vectors or variables.This measure supports understanding the impact of one variable on another variable.Element-wise multiplication and addition of two vectors are performed to compute their scaler product.Further, a R ratio is calculated by dividing this product by the absolute value of two vectors.
X is the IMF combination of weather variables and P is the IMF combination of PV power generation data.The measure results in a range from +1 to −1.It indicates a strong correlation if the value lies near +1, while −1 suggests a negative correlation.However, near zero or zero specifies no correlation between the two variables.This analysis aims to find the possible IMF combinations that maximize the FIGURE 5. IMF combinations of PV power generation along with its corresponding maximum temperature, pan evaporation, and minimum temperature weather variable's IMF combinations.

VOLUME 12, 2024
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
correlation between input weather variables and PV power generation data.
The proposed algorithm 1 in Appendix aims to search for possible combinations of IMFs of selected input weather conditions that maximize the correlation value with possible IMF combinations of PV output data.Therefore, the first step is to acquire the pairs of each IMF of each weather variable, including the PV generation data.Algorithm X obtains these combinations from lines 1 to 9. Line 5 computes every sum of IMF's time series.To find the correlation between each IMF combination of each weather condition and each combination of PV generation data, algorithm 1 utilizes lines 10 to 17. Table 2 exhibits the best correlation-based combinations of IMFs of each weather condition and PV generation data.Algorithm 1 yields these combinations.The first column identifies the pairs of output combinations (OCs), where OC1, OC2, and OC3 represent IMF 1 and residual, IMF 2 and IMF 3, and IMF 4 and IMF 5, respectively.Similarly, input combinations (IC) are selected, where IC-IJ represents input IMF combination J th combination number of weather condition I.The investigation considers three IMF combinations for 1 st and 2 nd weather variables and two IMF combinations for 3 rd weather variables.The table also mentions the correlation value R along with IMF combinations.These correlation values are between input combinations ICs and output combinations OCs. Figure 5 shows the IMF combinations of PV power generation (in red) and corresponding IMF combinations of selected atmospheric data like maximum temperature (in orange), minimum temperature (in blue), and pan evaporation (in green) data.

A. DATA ACQUISITION
To predict the solar PV power generation, a 100 KW gridconnected system is considered.The experiment evaluation utilizes 24-hour' data recorded at 0900 hours daily.The employed meteorological data is documented at the Climate, Energy and Water Research Institute (CEWRI) field station, National Agricultural Research Centre (NARC), Islamabad, Pakistan.The recording location has latitude and longitude coordinates of 33.4 • north and 73.8 • east with an altitude of 1632 feet.The location has multiple temperature, wind, pan evaporation, and solar sensors, as shown in Figure 6.Initially, the methodology includes meteorological data like the maximum temperature, minimum temperature, wind speed, pan evaporation, rainfall, and relative humidity at 0900 hours daily.Additionally, the experimental analysis utilized the relative and average humidity at 1400 hours.All the daily data, including PV power generation data, is considered for prediction from June to December 2020.

B. PIVOTAL ATMOSPHERIC VARIABLES TO PV POWER
PV Power generation can depend on various atmospheric data streams.Maximum or minimum temperature, pan evaporation, humidity, and relative humidity are considered PV power data prediction factors.However, their effect on prediction is not the same for all weather factors.Some factors have more influence over predicting PV power data than others.To determine the most influential data factors, Perason's Correlation Coefficient (PCC) is employed.The lowest values of PCC are utilized to discard the unrelated weather data variables for the input data stream.This data pre-processing not only simplifies the prediction but also mitigates the computational complexity of the prediction process.
In equation 2, r x,y denotes the Perason's Correlation Coefficient between y i PV power generation data and a possible weather data series x i .Meanwhile, N denotes the length of time series data.x and ȳ are the averages of x and y, respectively.Figure 7 displays the PCC values for all the atmospheric variables for 30 days.Among these weather variables, maximum and minimum temperatures have the lowest variations.Both of these data series, including pan evaporation, have the highest PCC value compared to the rest of the weather variables.Therefore, wind and humidity-related factors are removed from the input.

C. EMD
Empirical Mode Decomposition (EMD) typically includes the adaptive decomposition of any complex signal into multiple intrinsic mode functions (IMFs).These IMFs represent the energies related to different intrinsic time scales and separate any event in time and frequency.The decomposition of the time series consists of the following steps.
1) The EMD algorithm's basic step separates each Intrinsic Mode Function (IMF) from the signal.The approach involves finding the minima and maxima to create the upper E u (t) and lower E l (t) envelopes of the signal.Next, we calculate the local mean using these two envelopes.
2) The original signal x is subtracted from this local mean µ to produce an oscillating component, y.Assume y as new signal x. and repeat the process from step 1 to acquire new y.
3) The process repeats until the stoppage criterion is satisfied.The condition is fulfilled when y does not have more than two extrema of the same sign, ensuring that y is in single oscillatory mode.If the above condition is not satisfied, the process repeats itself.4) After extraction of all IMFs, the residual signal is obtained after subtracting the sum of all IMFs from the original signal.
Each residual represents the overall tendency in the original signal.

D. PERFORMANCE MEASURES
Root Mean Square Error(RMSE) is the most commonly used performance measure to verify the prediction.It calculates the mean difference between the predicted values of an estimator and the original ground values.Mathematically, it represents the standard deviation of differences between data inputs and the regression line called residual.RMSE measures the tightness of these points around the regression line.The more the RMSE value near 1, the less the residual values.In other words, it sticks around the line more tightly.
In equation 4, y i denotes the actual values, x i is the predicted value and N denotes the total length of y i or x i .Mean absolute error (MAE) measures the absolute difference between the original value and the predicted value.Mathematically, it computes the non-negative error between the prediction and the original value.It provides the mathematical framework for averaging errors on all the data points.
In equation 5, y i denotes the actual values, x i is the predicted value and N denotes the total length of y i or x i .The mean absolute percentage error (MAPE) measures the accuracy of the prediction or forecasting method.It is commonly used as it provides the relative error.Even it is utilized as a loss function in regression training models.The mathematical framework of MAPE is defined as In equation 6, y i denotes the original value, x i is the predicted value and N denotes the total length of y i or x i .Performance measure of R 2 determines the fitness of regression or prediction model data.It calculates the proportion of variation in the dependent variable that can be attributed to the independent variable.The value of R 2 ranges between 0 and 1.The value near zero means the data does not fit well to the regression model while the value near one implies good data fitness to the regression model.However, it does not give any clue about the accuracy of the regression model.Therefore, R 2 is typically utilized with other performance measures.The mathematical framework of the stated method is as follows.
In equation 7, y i denotes the original value, x i is the predicted value and ŷ denotes the average over y.

E. EXPERIMENTAL SETUP
The experimental evaluation involves deep learning approaches like LSTM and NARX networks.Three configurations are being considered for the LSTM network to acquire the complete analysis.First, for three IMF combinations of PV power generation, three individual LSTM networks are trained separately for a combination.
In the second configuration, a single network is trained, having all three outputs, one for each combination.Lastly, a direct approach uses the maximum and minimum temperature and pan evaporation as inputs.In contrast, the target prediction is power generation PV time series data.Each network is trained using three different optimization functions.

FIGURE 8.
Graphs of prediction results of LSTM and NARX networks where 1 st row, graphs (a-c) represent three LSTM networks with output, while 2 nd row, graphs (d-f) are related to a single LSTM network an output, and 3 rd row graphs (g-i) are results of direct approach.The last row graphs (j-l) provides the results of the NARX network.
Stochastic Gradient Descent with Momentum (SGDM) [49] optimization is the ubiquitous scheme among these.To update parameters, SGD is an iterative process that involves the minimization of the loss function by taking small steps in the direction of the negative gradient of the loss function.However, simple SGD oscillates with the longest downward step towards achieving the minima of object function.Therefore, an additive term of momentum is 40746 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.included to mitigate the oscillations.The standard update of parameters is In equation 8, θ i denotes training parameter i while learning rate l should be greater than zero.m denotes the momentum term.While ∇L(θ i ) denotes the gradient of the loss function.
SGDM employs a single learning rate for all parameter updates.On the contrary, Root Mean Square Propagation (RMSP) utilizes different learning rates for different parame-ter updates.It computes the moving average of the square of each element of the parameter gradient, In equation 9, α 2 is the decay rate of the moving average.Whereas, in 10, ε is the small value for prevention in case of square root value of u i is zero.Adaptive moment estimation (ADAM) [50] has a comparable updation mechanism to RMSP.Additionally, it includes the moment term as well.Parameter gradients and corresponding squared are used to calculate the moving averages of each element.
In equations 11 and 12, α 1 and α 2 are the decay rates of gradients and their squares, respectively.Equation 13shows the RMSP-like parameter updation procedure.
The experimental evaluation for all proposed methods and direct approaches shows the enhanced performance of overall LSTM variants over the NARX networks.For a single network, these values are 8.24, 2.63, 1.46, and 0.96 respectively.The direct approach has values of 25.43, 16.55, 9.79, and 0.88 respectively.Overall single NARX networks have better performance scores as compared to the three and direct NARX networks.The LSTM three network with ADAM optimization has the best RMSE value of 8.17, while the LSTM single network with ADAM optimization has the best MAE value of 2.81.Similarly, the single NARX network has the best value of 1.46 and the LSTM three network with ADAM optimization has the best R 2 value of 0.99.
Table 3 demonstrates the Neural Network units utilized for each configuration of the NARX network.It also exhibits the LSTM units for each optimization type and network configuration.Single-layer LSTM shows better overall performance over NARX.However, within the LSTM network, the three network approach performs better than other network configurations but at the cost of more LSTM units.
Experimentation is extended to Gated Recurrent Units (GRUs) implementation in Table 4.The lowest RMSE achieved is 7.26, which belongs to GRU three network implementation with the ADAM optimization.The same optimizer configuration with a single network results in the minimum MAE value of 2.99.While for the same network and optimization configuration, the MAPE and R 2 values are 2.13 and 0.99, respectively.On average, for all experimentations, the time cost of three network configurations is the least as compared to other conventional and proposed methods.
Figure 9 represents the error analysis of the proposed optimizations and configurations.The error and its standard deviation are considered for the three single network configurations across all optimization schemes.The ADAM, RMSPROPS, and SGDM optimization schemes represent the error averages across all configuration network configurations of the single, three networks, and direct approach.The direct approach shows the maximum error in all error measures.Figure 9.a shows the average and standard deviation of MSE. Figure 9.b represents the average and its standard deviation of MAE.And Figure 9.c demonstrates the same for MAPE.

IV. CONCLUSION
Deep learning approaches and a proposed pre-processing reconstruction algorithm, are proposed to improve the prediction of highly irregular, random, and uncertain PV power generation in this paper.The proposed approach employs PCC to select the environment variables based on similarity with PV power data.PCC removed unrelated atmospheric variables.EMD decomposes the time series data of selected variables e.g., minimum, maximum temperature, pan evaporation, and PV power generation.Despite using PCC, these decomposed components have no relation to facilitate the prediction of PV power decomposed components.To improve the accuracy of the proposed model a correlation-based signal synthesis (CBSS) algorithm was proposed to establish the combination of these decomposed components related to combinations of PV power based on correlation.Further, LSTM and NARX deep learning approaches are utilized to find the prediction of PV power data for three different network configurations.These were using three different networks for each output, a single network for all output, and a direct approach without preprocessing.For the LSTM network, Adaptive moment estimation (ADAM), Stochastic Gradient Descent with Momentum (SGDM), and Root Mean Square Propagation (RMSP) optimization are utilized.
The results demonstrate under the LSTM approach with ADAM and RMSPROP optimization R 2 value is above 99%, which shows a good fit.Compared to other approaches, these two optimizations significantly differ between RMSE and MAE.These error measures have relatively low values for these model mixtures, which indicate superiority over other models.To meet the supply and demand of power, improve dispatching capacity, and gird planning of the power department, a PV power forecasting method is proposed, which is advantageous to the power department.In future work, further improving prediction accuracy is achievable using optimization for layers and the number of units in LSTM networks.The optimization is possible using different heuristic methods.

FIGURE 1 .
FIGURE 1. Optimization flow diagram of proposed deep learning-based PV power generation prediction.

FIGURE 2 .
FIGURE 2. Complete block diagram of the step-by-step implementation of the proposed prediction technique.

FIGURE 3 .
FIGURE 3. Graphs of correlation-based chosen weather variables maximum temperature pan evaporation, minimum temperature, and photo-voltaic power generation data.

FIGURE 4 .
FIGURE 4. Intrinsic Mode Functions (IMFs) and residuals of atmospheric recordings of maximum and minimum temperature, pan evaporation, and photo-voltaic power data by Empirical Mode Decomposition (EMD).
step 2).The IMFs are further fed into the proposed correlation-based signal synthesis (CBSS) algorithm (Figure 2 step 3) to find the newly added IMFs of PV and weather variables having the highest correlation.The synthesized data is given to a single-layer LSTM Network (Figure 2 step 6a) holding three outputs and three singlelayer LSTM Network (Figure 2 step 6b) holding one output.Similarly, newly constructed data is presented to a NARX network (Figure 2 step 7a) with three outputs and three NARX networks (Figure 2 step 7b) with three outputs.Finally, for each stream, evaluation (Figure 2 step 8) is used to determine the prediction accuracy.Conventional methods like Mean Square Error (MSE) and Mean Average Error (MAE) are employed for the performance evaluation.
Line 13 specifically computes the correlation values between these combinations.Then, line 15 selects the correlation values with the top three highest values.However, that selection is based on the condition that corresponding pairs should have unique IMF components.It is to ensure that each IMF should contribute equally to prediction.For each output IMF combination, lines 18 to 20 sum the selected correlation values at line 15 across all input IMF combinations of weather variables.It results in a sum value of Sy for each output.For lines 21 to 27, the algorithm selects the quotient (Ny divided by 2 ) number of combinations and its corresponding indics if the total number of combinations of PV data is odd.The else line 25 selects the Ny 2 number of combinations and their corresponding indices.Based on selected indices, Y combinations are chosen from C k,v,f for the next processing step of prediction.

FIGURE 9 .
FIGURE 9. Graphs of errors across various configurations.

Algorithm 1 3 :all i in N f do 4 : 10 : 28 :for all v in 3 do 30 :
Correlation-Based Signal Synthesis (CBSS) Algorithm Input: IMFs of weather variables and PV generation data Output: Selected output combinations C ′ y ′ u ,k ′ v ,f 1: k ← 1 2: for all f in N do for for all j← i + 1 in N f do for all y in N y do 11: for all f in N − 1 do 12: for all k in N c do 13: r y,k,f ← Compute correlation between C k,f and C y,f y 14: end for 15: r y,k ′ (1:3),f ← ∀k select top three values from r y,1:N C ,f such that their additive components ofcombination do not repeat 16: end for 17: end for 18: for all y in N y do 19: S y ← N −1 f =1 3 l=1 r y,l,f 20: end for 21: if N y is odd then 22: C y ′ (1:quotient(N y ,2)) ← ∀y select quotient(N y , 2) combinations and their indices with highest values of S y such that their additive components ofcombination do not repeat 23: Y ← [1 : quotient(N y , 2)] 24: else 25: C y ′ (1: Ny 2 ) ← ∀y select N y 2 combinations and their indices with highest values of S y such that their additive components ofcombination do not repeat 26: for all u in Y do 29: for all f in N − 1 do 31: NRMSE Normalized Root Mean Square Error.

TABLE 1 .
Comparative analysis of related literature review.

TABLE 2 .
Output IMF combinations (OCs) and its corresponding input combinations (ICs) by algorithm 1.

TABLE 3 .
Experimental results of the proposed and conventional technique and their corresponding units usage on LSTM and NARX.

TABLE 4 .
Experimental results of the proposed and conventional technique and their corresponding units usage on GRU.

Table 3 exhibits
RMSE, MAE, MAPE, and R 2 performance criterion.Figures8(a), (d), (f) show the SGDM optimization responses for three, single and direct LSTM networks.For three LSTM networks, 14.44, 10, 6.8, and 0.9 are the corresponding values of RMSE, MAE, MAPE, and R 2 .Similarly, for a single LSTM network, 16.03, 8.39, 5.78, and 0.96 are values for the same performance measures.In contrast, the direct approach has values of 17.5, 11.01,7.02,and 0.96 for the same performance criteria.For SDGM optimization, three and single network has better overall performance responses as compared to the direct approach.ADAM optimization reports 8.17, 3.30, 2.72, and 0.99 values of RMSE, MAE, MAPE, and R 2 , respectively.The single network has 8.40, 2.81, 2.41, and 0.99 values for criteria.In comparison, the direct approach holds the values of 27.11, 15.61, 9.72, and 0.90 for the above-mentioned performance values.) demonstrate the better overall prediction scores of the three networks as compared to the direct or single network scores.Figures 8(j), (k), (l) demonstrate the prediction curves for the three, single and direct NARX networks.The three output NARX network has performance values of 15.45,11.43,6.39,and 0.96 for RMSE, MAE, MAPE, and R 2 respectively.

Table 5
is representing all variables and metrics used in algorithm 1.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.