Output Power Prediction of a Photovoltaic Module Through Artificial Neural Network

With the increase in energy demand, renewable energy has become a need of almost every country. Solar Energy is an important constituent of it and contributes a large portion in it. Forecasting the output power of a Photovoltaic (PV) system has always been a challenging problem in the power sector from the last few decades. The output power of a PV system depends upon several environmental factors such as irradiance (G), temperature (T), humidity (H), wind speed (W), provided the tilt angle is kept constant, among which the vital role is played by irradiance. Researchers have utilized several techniques to accurately predict the output power of PV module but every method has various pros and cons. In this paper, an experimental measurement dataset of 28296 samples with all the environmental parameters mentioned above are taken as the inputs and power as its output, of a Poly-Silicon (Poly-Si) PV module, is trained through Artificial Neural Network (ANN), to predict the output power accurately. The proposed ANN contains a layer size of 15 and training algorithm used is Levenberg-Marquardt. A detailed analysis and preprocessing of the data is carried out through Pearson’s correlation method prior to training. The hyperparameters of Neural Network tuning are selected through heuristic method. The data division is done randomly with 70% dataset used for training, 15% dataset used for each validation and testing. The statistical results show that ANN accurately predicted the power output of PV module. The regression analysis values acquired are 98% and the MSE of all the three phases is 0.0604.


I. INTRODUCTION
Renewable Energy (RE) refers to all those powerful sources of energy that produce energy without consumption of fossil fuels and without posing any environmental and health hazards. Solar Energy is an important constituent of RE and Photovoltaic (PV) module allows to use sunlight radiations for power production. Solar energy possesses all the features mentioned above and make the most powerful contribution in RE [1], [2], [3], [4], [5]. However, there are few concerns The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso . associated with the PV module. The economic and technical factors greatly affect the environmental and technological benefits of a PV system. It can be only made feasible for the customers if there is public funding available or government initiatives are taken to increase the electricity reliance on RE. The PV module output is affected a lot by weather conditions e.g. irradiance, temperature, humidity, wind speed etc. Some parameters have a greater effect as compared to others. The worst case scenario occurs when a large area with a greater no. of PV modules is swept over by a band of clouds, in this case the supply from the PV modules to the grid goes down as the peak output power of module variates a lot. Furthermore, this could be even more intensified if the change in the irradiance is caused during the maximum load hours [6]. The scenario described turns in to a worst case while considering the optimal performance of a PV module. To avoid these worst case scenarios, reliable predictive performance of output power of PV module is so much important for the planning and operational purposes. This reliable predictive performance can be carried out with efficient tools like Machine Learning (ML) and Deep Learning (DL).
Several techniques have been utilized till now to maximize power production acquired from PV modules. To acquire the maximum efficiency of a PV module, Maximum Power Point Tracking (MPPT) technique and several empirical techniques have been reported in the literature. The disadvantages with it are requirement of detailed knowledge of the physical parameters and manufacturing specification of the PV modules, which are not always available. There are several other adaptive methods used in the literature to predict the performance of output power of a PV module. In [7], different methods are studied to forecast the PV power and their accuracies are evaluated. Multi input Support Regression Model is proposed to predict the power of a PV module [8]. In [9], ANN based on ambient vectors is applied for prediction purposes for a dataset obtained in Malaysia. General Regression Neural Network (GRNN) and Feedforward back propagation (FFBP) are applied to forecast the PV output power with four inputs of maximum temperature, mean temperature, minimum temperature and irradiance [10]. A method based on Power Law Model (PLM) is used to predict the I-V characteristics of a PV module with inputs of temperature and irradiance [11]. Nonlinear Autoregressive Neural Network is used to predict the PV model output power [12] with monthly and annual data base obtained. In [13], the author proposed that ANN learned by Particle Swarm Optimization (PSO) performs better than other techniques. Recurrent Neural Network (RNN) is proposed in [14] for hourly forecasting of power output in Pakistan. A systematic literature review is carried out in [15] on PV Power forecasting for the primary studies carried out between 2010 and 2020. In [16], Feedforward Neural Network technique is applied for hourly PV data obtained for Morocco with inputs of irradiance and temperature.
From the work already carried out in the literature for forecasting the PV Power Output, it is revealed that irradiance and temperature parameters have a great contribution in PV Power output prediction. But, to estimate the exact power output of a PV module, all the environmental parameters should be taken as input e.g. irradiance, temperature, humidity, wind speed etc. Secondly, every researcher took a separate database of a specific location for prediction of PV power output. So, a standardized dataset containing all the environmental parameters should be used to correctly estimate the PV Power output. Thirdly, there should be specific reasons mentioned why output power of a PV module depends on a certain parameter and why not on the other one. In this paper, all environmental parameters including temperature, irradiance, wind speed etc., are taken as input for accurate prediction of output power of a PV module. Later on, a climate parameter of humidity is filtered out due to negative correlation between this parameter and output power. Then, an efficient ANN is trained for this experimental database of 28896 data samples with a layer size of 15 and the training algorithm used is Levenberg-Marquardt due to its fast computation and MSE is quite less while regression value if 0.98 on average ion all the three phases of training, validation and testing.
The division of the paper is formulated as follows: section I gives a detailed introduction and literature review, section II discusses about the model of a solar cell and characteristic I-V equations and section III gives a brief discussion on methods applied. In Section IV, simulations and results are discussed, section V is about discussion and analysis while section VI concludes this study.

II. MODEL OF A SOLAR CELL AND POWER OUTPUT OF A PV MODULE
An accurate PV model should be able to forecast Current-Voltage (I-V) and Power-Voltage (P-V) curves, under the real operating conditions, for designing and accessing the performance of a PV module. The most commonly used equivalent circuit is 'Five-parameters model' as it explains the electrical behavior of a PV system better as compared to others. These five parameters include a Photocurrent source I L , a diode carrying a reverse saturation current I 0 which is in parallel with a shunt resistance R sh , a series resistance R s and a load resistance R L as shown in the one-diode equivalent circuit shown in Figure 1. Now, mathematically, equation (1) can be derived from this model allowing to acquire the I-V characteristic curve [17].
where, I L is dependent on the irradiance level, I 0 is dependent on silicon temperature, 'n' is the ideality factor and T c is the absolute temperature. The performance of a PV module is accessed by the 'peak power,' which is the maximum electric power that is supplied VOLUME 10, 2022 by it when it receives an irradiance level G of 1000 W/m 2 and a cell temperature of 25 • C. For given values of irradiance, temperature and load resistance, the operating point can be pointed out by drawing lines of load resistance on the I-V characteristic curve [17] as shown in the Figure 2 and Figure 3. The red circles point out the maximum power point. In Figure 2, there plot is between voltage and power. As voltage is increased, definitely power is increased due to direct relation, but there are different plots for different values of resistances as power has inverse relation with resistance value. The change in resistance is due to change in irradiance level. As the value of resistance increases, maximum power point will be decreased with temperature is kept constant  and from the figure 2, red circle will be at a lower position. In figure 3, the sample plot is between voltage and power but here irradiance level is kept constant while temperature is varied. That's why at 2.5 resistance value, there are three different red circles due to change in temperature.

III. METHOD APPLIED
The method applied in this paper for forecasting output power of a PV module is ANN, which is inspired by the biological working of a Neuron. Just as a neuron in a human brain processes the signals, in a similar way neural network processes the dataset (Inputs and outputs) of a dynamical system whose formulation of mathematical model is not an easy task. Its beauty is that no any detailed information about the dynamical system is required as it's an extensive and cumbersome task to collect data for that system thoroughly. The working of ANN is dependent on the working of neurons. A neuron works in a way depicted in Figure 4. Inputs are fed and each input is assigned a node and receives a specific weight, this weight is actually multiplied with the input value and all the products are summed up at the Transfer function block along with the bias. The result of summation is then passed through an activation function, which limits the output between 0 to 1. Most commonly, same activation function is used for all neurons. Different types of activation functions include step function, linear function and sigmoid function etc.

A. EXPERIMENTAL DATASET
The experimental database used for forecasting output power of a PV module is obtained from [18]. The reason is a detailed and most recent dataset is taken so that the results would be more reliable. The database if available publicly. In this database, total data samples were obtained from 8 stations positioned at different locations and of different capacities. The one used in this research is of one station with almost 28896 data samples and has the following features tabulated in Table 1  PV panels that are laid in for the whole station, array tilt is the angle of PV panels to receive the maximum number of radiations for maximum PV panel power output while PV technology is the type of material of the PV panel. Panel size actually refers to the area of the PV panel. The latitude and longitude information represents the exact location of the station.
The experimental data used for the training of ANN as inputs is as follows: a

B. PREPROCESSING OF THE DATASET
The Peak Power output P (W) is taken as output parameter. Out of total 28896 data samples, 70% data is used for training while 15% is used for each validation and testing. The division of this data of 70% and 15% is selected on a random basis so that an accurate analysis would be carried out in terms of MSE. Before starting the training phase, Before starting the training phase, Pearson's correlation method is used to filter out the most relevant features with high correlation scores. This method is used because it is very simple and computational friendly as utilizing a correlation measure, score is calculated for all predictors in the statistical data. Correlation matrix is computed between all the four parameters and output parameter to determine the effect of correlation among them. The correlation matrix is given as: In this correlation matrix, the first column is of correlation of humidity with itself and other parameters, second column is of irradiance, third column is of temperature, fourth column is of wind speed while the last column is of output power. From this correlation matrix, it can be easily depicted that correlation between pair of humidity and output peak power comes out to be negative, which means that there is negative correlation between them. From mathematics, we saw that humidity doesn't affect the output power of a PV module. Therefore, the parameter of humidity is removed from the database from avoiding the complexity of the model. All the other correlation coefficients along the diagonal of the matrix are equal to '1' as each of the parameter is perfectly correlated with itself. The correlation between output peak power and each other parameters is shown below with the help of a bar graph in Figure 5.

C. IMPLEMENTATION OF ANN
The exact architecture and scenario needs to be kept in mind for better. The exact architecture is shown in the Figure 6. The ANN is like black box whose inputs are the inputs of the PV module while output is the output power PV module. For implementation of ANN, the hyperparameters are set heuristically on each trial in such a way to getmaximum value of regression and minimum MSE value for all the three phases, which is our desired response. These parameters and options are listed below in Table 2.
To interpret the options listed in Table 2, the reason for choosing the specified training algorithm 'Levenberg-Marquardt' (LM) is that it is the fastest algorithm in computation. Secondly, performance measure of MSE is chosen due to its better and easy interpretation. Layer size of 15 is selected as its training, validation and testing accuracies are much better as compared to other layer sizes.
When training starts, there are some criterion set where the training is stopped either because of validation criterion, VOLUME 10, 2022    For better understanding of results, histogram plots provide convenience in interpreting the results. The plot of instances vs errors is shown in the Figure 9 for each training, validation and testing. As it can be seen from figure 9, the zero error line is shown in orange color vertical line while the training phase is shown in dark blue color, validation in green color and testing in red color. These values are at a minimum level.
The most importance parameter is of regression whose ideal value is 1. The regression plot actually shows how much predicted values match with the target values. The plot of regression analysis is shown below in the Figure 10. The cluster of data samples shows how large is the dataset on which the Neural Network is trained. There is also a fit line in all the four subplots which actually show that values closer to that line will be better and hence, the regression value achieved will be much closer to 1. In the training phase, value of regression is 0.9813, in validation phase, regression value is 0.98 while in the testing phase, its value is 0.9827. The overall value of all the three phases is 0.9813.

V. DISCUSSION AND ANALYSIS
From the work carried out in this research, it is verified that the climate parameters which greatly affect the performance of a PV module are irradiance (G), temperature (T) and wind speed (W). These are the parameters which can't be controlled. As previously mentioned, there are total three phases in an ANN namely training, validation and testing. For efficient training, a detailed database needs to be developed which in this case consisted of 28896 data samples. Before training, preprocessing of the data in hand is much essential as complexity of the model is reduced if through filtration is done (in our case, filtration of humidity parameter through correlation matrix). After filtration, the next phase is to split the data among the three phases. One method is that we would split the data section wise but for better accuracy, the data is split randomly between training (70%), validation (15%) and testing (15%) phases.
From the results acquired, one can easily interpret that the ANN captured the dynamics of the system to forecast the output power of PV module. The overall results of trained ANN for mapping the predictors to continuous responses are shown below for a concise and clear picture.
From the results listed above, it should be noted that validation and testing are not performed on the dataset which was earlier used for training. Even then, their accuracies are 98%.

VI. CONCLUSION
In the current study, an ANN is successfully trained to accurately forecast the output power of a PV module with variations in different input climate parameters. The input parameters are irradiance, temperature, wind speed and humidity. A comprehensive experimental database of a PV module is used that is 33 • tilted towards south and contained 28896 data samples. The accuracy of ANN in terms of regression came out to be 98% which is much higher than while error in terms of MSE came out to be 0.0604 on average in all the three phases. The proposed ANN with a layer size of 15 performed the training in a minimum computational time due to training algorithm of Levenberg-Marquardt. The technique dominated the existing work as it contained all the environmental parameters of climate as inputs and the uncorrelated parameter is filtered out to avoid complexity of the model.