Design and Development of a Short-Term Photovoltaic Power Output Forecasting Method Based on Random Forest, Deep Neural Network and LSTM Using Readily Available Weather Features

Renewable energy sources (RES) are an essential part of building a more sustainable future, with higher diversity of clean energy, reduced emissions and less dependence on finite fossil fuels such as coal, oil and natural gas. The advancements in the renewable energy sources domain bring higher hardware efficiency and lower costs, which improves the likelihood of wider RES adoption. However, integrating renewables such as photovoltaic (PV) systems in the current grid is still a major challenge. The main reason is the volatile, intermittent nature of RES, which increases the complexity of the grid management and maintenance. Having access to accurate PV power output forecasting could reduce the number of power supply disruptions, improve the planning of the available and reserve capacities and decrease the management and operational costs. In this context, this paper explores and evaluates three Artificial Intelligence (AI) methods - random forest (RF), deep neural network (DNN) and long short-term memory network (LSTM), which are applied for the task of short-term PV output power forecasting. Following a statistical forecasting approach, the selected models are trained on weather and PV output data collected in Berlin, Germany. The assembled data set contains predominantly broadly accessible weather features, which makes the proposed approach more cost efficient and easily applicable even for geographic locations without access to specialized hardware or hard-to-obtain input features. The performance achieved by two of the selected algorithms indicates that the RF and the DNN models are able to generate accurate solar power forecasts and are also able to handle sudden changes and shifts in the PV power output


I. INTRODUCTION
In recent years, climate change and global warming are becoming a more prevalent threat to the safe and sustainable future on the planet. The main cause for the global The associate editor coordinating the review of this manuscript and approving it for publication was Arash Asrari .
warming is the so-called ''greenhouse effect'', 1 which occurs when gases in the atmosphere prevent heat emitted from the surface of Earth from escaping into space. When it comes to the causes, the greenhouse effect is mostly a byproduct of human activity (e.g. deforestation and burning of fossil fuels). However, from an industry perspective, a large share of greenhouse gas emissions is produced by the electricity sector, 2 where the burning of fossil fuels is a standard practice for producing electricity. In an attempt to reduce the negative impact of the energy industry on the environment, many EU countries try to employ alternative, renewable energy sources that are more environmentally friendly. Some of the most common renewable energy sources are solar, wind, hydro and geothermal power. These sources are gaining more and more popularity and are used more broadly due to the reduced costs, improved infrastructure and governmental incentives (e.g. reduced fees, loan programs). According to a report by the Fraunhofer Institute for Solar Energy Systems (ISE), 3 in 2020 more than 50% of the net electricity generation in Germany was contributed by renewable sources. Additionally, the combined energy produced by wind and photovoltaics has surpassed the energy generated by fossil fuels. This is a clear indication that renewable energy sources are getting more popular and widely adopted.
With this in mind, the renewable energy transition (RET) and the integration of RES in the current grid system is not a straightforward process and hides a number of social, political and technological challenges [1], [2]. From a technological perspective, RES such as wind and solar are volatile and dependent on external environmental factors (e.g. wind speed, clouds, humidity, etc.). This variability and the volatile nature of RES is one of the most significant obstacles on the path towards wide RES adoption, since generating power output in a rapid manner is an essential part of the grid balancing process (i.e. matching the energy supply to the demand). Balancing the grid in the presence of uncertain power production levels introduces significant increase in operational costs. The reason is that the variable nature of RES implies that the energy production and consumption are bound to diverge at some point in time. For instance, during the night the wind energy generation might go up significantly, while the consumption goes down. The opposite event where the consumption exceeds the production could also take place -e.g. photovoltaics production goes down around sunset, but consumption does not. Such a mismatch between supply and demand requires not only a dedicated action from the grid operator for balancing the grid, but also might lead to financial loss caused by the need for a rapid up or down ramps in the energy generation of conventional sources [3]. Being able to plan such actions in advance in a position of uncertainty is essential for the secure operation and management of the grid [3]. Therefore, finding a forecasting approach that addresses the uncertainty introduced by RES could not only speed up their integration in the current grid system, but could also improve the utilization and management of the already available power plant infrastructure. As pointed out by Antonanzas et al. [4], the absence of an accurate energy production forecasting method is ''one of the key challenges'' staying on the path towards wide photovoltaic adoption. Therefore, implementing a reliable and accurate power output forecasting mechanism is of crucial importance for the renewable energy transition and would make the grid management easier and more cost efficient [4].
Within this context, in this paper, we propose Machine Learning (ML) based forecasting mechanisms for PV power output prediction. The models generate an hour-ahead PV power output prediction based on an hour-ahead weather forecast. The main contributions of the current work are threefold: • Contribution 1: We conduct a comprehensive analysis of several AI-based models for the prediction of hour-ahead solar power output forecast. The observations collected throughout this analysis are assessed with the help of a diverse set of metrics and based on them we identify the most effective models for predicting the PV outputs with high level of accuracy. As a result, we offer some practical insights for practitioners in the solar energy industry.
• Contribution 2: We provide a list of readily available weather features used for the training of the selected forecasting methods. The importance of this contribution stems from the observation that models trained on data sets with hard-to-obtain input features are less likely to be widely adopted. The reason is that acquiring uncommon weather features (e.g. solar irradiance) often times is financially infeasible and/or requires special infrastructure and hardware.
• Contribution 3: We examine and explore the impact of rapidly changing environmental and climate conditions on the overall model performance. Models trained on data collected from locations with fairly stable weather conditions might not reach high accuracy in locations with a more dynamic climate. Therefore, the models proposed in this work are trained on weather and PV data collected from Berlin, Germany. We take this location as a representative of the dynamics of the weather conditions throughout the course of a year in an effort to achieve more expressive and abstract models with respect to varying weather patterns. Finally, in order to evaluate their performance, the models discussed in this work are evaluated according to 3 of the most common metrics for regression tasks: • Mean Absolute Error (MAE) computes the the average of the absolute differences between the predicted valuesŷ i and the actual values y i . It is calculated with the following formula: MAE is more robust to outliers because it does not penalize large differences between predicted and actual VOLUME 11, 2023 values (i.e. errors) as severely as MSE and RMSE do. Therefore, it is often used when the goal is to reduce the impact of extreme outliers.
• Mean Squared Error (MSE) computes the average of the squared differences between predicted valuesŷ i and the actual values y i . It can be calculated with the formula: Since it utilizes the squared instead of the absolute difference, MSE is more sensitive to outliers and it is suitable for situations, where we want to train a model that minimizes predictions that are significantly far away from the actual values. However, MSE is represented in value ranges that are noticeably larger that the target variable, which makes interpreting the results more challenging.
• Root Mean Squared Error (RMSE) is computed as the square root of the MSE: The main advantage of this error metric is that it converts back to the same value ranges as and the target variable, which makes interpretation easier than MSE. At the same time, RMSE also penalizes outliers more severely than MAE, which makes it a suitable choice for many ML regression tasks.

II. RELATED WORK & TECHNOLOGICAL CONTEXT
Solar power forecasting methods can be classified based on multiple criteria. However, two of the more common options [5], [6] include classification based on the time horizon and the forecasting technique.

A. CLASSIFICATION BASED ON TIME HORIZON
In general, PV forecasting methods can be classified based on three time horizon categories -short-term, mediumterm and long-term forecasting [5], [6], [7], [8]. Each of these categories includes forecasting methods that are useful for specific stage of the grid management process (e.g. scheduling, maintenance, distribution, etc.). As pointed out by Das et al. [6] the exact definitions of each of these temporal horizons differ among the various researchers, but they can be approximately specified as follows: • Very short-term forecasts focus on the PV power output from one second up to an hour ahead. They are useful for real-time power dispatch, power smoothing and electricity market clearing [6], [8], [9], [10].
• Short-term forecasting methods generate predictions from one hour up to one week ahead. They are primarily applied for economic load dispatch, load balancing, plant management and securing the grid operation [6], [9], [10].
• Medium-term forecasts address the power production from one week up to one month ahead. This type of forecasts is utilized for planning and scheduling of the power dispatch in the near future [6], [8].
• Long-term forecasting covers time horizons longer than a month up to a year ahead. The main application of these forecasts is for planning of the capacity deployment, the transmission and distribution of energy [6], [8], [11]. The temporal horizon has a significant effect on the accuracy of the forecasting model [5], [6]. Longer term forecasts tend to be less accurate and produce larger prediction errors [5].

B. CLASSIFICATION BASED ON FORECASTING TECHNIQUE
There are roughly three categories in which a PV power output forecasting model can be placed based on the forecasting technique it utilizes. These categories are physical, statistical and hybrid models [4], [7], [9].

1) PHYSICAL MODELS
Physical models try to capture the relationship between input features and PV energy generation with the help of mathematical equations. More specifically, physical models use meteorological data and the design parameters and system characteristics of the PV system (e.g. location, orientation, etc.) to generate predictions [12]. As pointed out by Mayer et al. [12], physical models employ multiple steps (described as ''model chains'') before generating the final PV power output prediction [4], [12]. These steps include the computation/prediction of the solar irradiance and the consequent conversion of this value and additional inputs (e.g. PV system characteristics) into a PV output prediction [4], [12], [13]. When it comes to specific implementations, the three sub-categories of physical models are based on Numerical Weather Prediction (NWP), Sky Imagery and Satellite-Imaging models [7].

2) STATISTICAL MODELS
Statistical models use historical data to generate PV power output predictions. Typically, the historical data includes weather and power production measurements which are utilized by the model in an attempt to learn the patterns within the data set. Therefore, in order to collect such data, the power plant has to be active and running for a period of time. Additionally, in order to achieve high accuracy, statistical models require a large, high quality, historical data set that is a good representative of the task at hand. Acquiring such data is a challenging task. However, as pointed out by Antonanzas et al. [4], statistical models are the most widely adopted PV forecasting technique. They are flexible and their main advantage over physical models is that they do not require PV plant characteristics to generate predictions [4], [14].

3) HYBRID MODELS
As the name suggests, hybrid models combine multiple different forecasting techniques in attempt to achieve optimal 41580 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
results. The main intuition behind utilizing hybrid methods is that they leverage the strengths of the included techniques. Hybrid models can combine a physical with a statistical model, but they can also combine an ensemble of multiple different statistical approaches. It is often concluded in research literature that hybrid models deliver better predictive results in comparison to stand-alone methods [6], [10]. At the same time, hybrid models require a more complex setup and configuration, which introduces an additional developmental and time overhead. Therefore, before utilizing hybrid approaches, one should consider whether the amount of potential performance gain is worth the implementation effort and the time investment.

C. RESEARCH EFFORTS AND APPLICATIONS
Each class of the discussed PV output forecasting techniques has a unique set of advantages and disadvantages and can be applied successfully under specific circumstances. Which type of a forecasting technique will be selected depends on the goals of the project, the available data and the researcher's access to specialized infrastructure. Additionally, in recent years each class of the presented forecasting techniques is subject to extensive analysis and optimization, which in turn has led to the collection of new insights about the specific configurations and methods that deliver the highest PV forecasting performance.
In this context, Mayer and Gróf [12] propose a seven-step physical model chain for the computation of the PV power output. The authors assemble a wide range of different physical model chain combinations and compare these in order to get better understanding about the importance of selecting the proper model and modeling step within the physical model chain. The PV plant data used during the experiments is collected from 16 power plants distributed across Hungary [12]. In addition, the examined physical model chains utilize NWP data (i.e. global horizontal irradiance (GHI), temperature and wind speed) provided by the Hungarian Meteorological Service [12]. The evaluation and analysis of the presented models reveal that the two most important modeling steps within the model chain are the irradiance separation and the transposition modeling. Additionally, the authors of [12] observe in their experiments that more complex model chains achieve the lowest Mean Absolute Error, but simpler model chains achieve the lowest Root Mean Squared Error. Furthermore, according to the study results, the usage of the wind speed as an input feature provides only marginal improvements in the model accuracy [12].
Alternatively, statistical-based PV forecasting is presented by Theocharides et al. [15]. In their work, the authors analyse and compare the performance of three standard ML models -bayesian neural network (BNN), support vector regression (SVR) and regression tree (RT) applied for the task of PV power production forecasting. The models were trained on PV and meteorological data (e.g. solar irradiance, wind speed, temperature, etc.) collected over a two year period for a PV system located on the premises of University of Cyprus in Nicosia, Cyprus [15]. The experimental results and analysis show that the BNN outperforms the other two approaches in terms of their normalized root mean squared error (nRMSE) and mean absolute percentage error (MAPE). Additionally, the BNN not only adapted better to frequent fluctuations, but it also showed better computational efficiency, ease of implementation and optimization [15].
Given the strong correlation between the solar irradiance and the PV power output [16], many researchers attempt to develop new optimized irradiance forecasting methods. For instance, Michael et al. [17] introduce a hybrid method for GHI and plane of array (POA) irradiance prediction trained on data collected from Sweihan Photovoltaic Independent Power Project in Abu Dhabi. The proposed solution combines a convolutional neural network (CNN) and a long short-term memory neural network. More specifically, the input data is processed first by CNN layers, which aim at detecting and extracting features from the observed data points [17]. The generated CNN output is forwarded to the LSTM network, which produces the final irradiance prediction. In order to assess the performance of the proposed model, it is evaluated in terms of RMSE, MAE, R 2 and mean absolute percentage error. The resulting performance metrics indicate that the hybrid approach outperforms not only the standalone CNN and LSTM networks but also other traditional ML models such as SVR, artificial neural network (ANN) and linear regression (LR) [17].
Another approach for predicting the solar irradiance based on XGBoost is proposed Li et al. [18]. In their work, the authors generate a day-ahead solar irradiance forecast by combining XGBoost and kernel density estimation (KDE). The XGBoost model utilized in this work is trained on historical data and generates multiple predictions iteratively. These predictions are transformed into probability prediction intervals by the authors with the help of KDE. The main advantages of the presented solar irradience prediction model are the higher time efficiency during training and hyperparameter tuning. At the same time, the accuracy achieved by this hybrid model is comparable and in some cases even slightly higher than other common ML and DL techniques (e.g. SVR, RF).
An alternative method that also uses a CNN architecture but focuses on direct 4 PV power output forecasting is proposed by Huang and Kuo [19]. The method is called PVPNet and utilizes a deep convolutional neural network model which is trained on meteorological and PV system's characteristics data such as the module temperature, solar irradiance and the historic PV output. As other CNN-based methods, the CNN layers are used for feature extraction purposes. The conducted experiments compare the proposed method with other common ML methods (e.g. LSTM, RF, SVR, ANN, etc.). and the experimental results indicate that on average PVPNet achieves lower MAE and RMSE than the other traditional ML methods.
A different perspective on PV output forecasting is also presented by Massaoudi et al [8]. In their review article, the authors classify and summarize DL-based methods into three categories: discriminative learning, generative learning, and deep reinforcement learning. For each of the categories the article describes some of the most promising neural network architectures and relates these to the PV output forecasting domain. Additionally, the authors also discuss the role of federated learning, deep transfer learning, incremental learning and Big Data and their relevance for the generation of accurate PV predictions [8]. Finally, the paper concludes with a summary of the possible directions for future research and some of the present challenges in the domain such as the lack of long-term forecasts, the need for explainability and interpretability of ML-based PV systems, the limited amount of real-world validation and the need for data privacy guarantees [8].
Given the immense improvements in the field of Deep Learning in recent years, most researchers try to improve the accuracy of their PV output forecasts by applying newer, more complex, advanced and capable ML models. However, an alternative way to approach the task of improving accuracy is to focus on improving the input data quality. As demonstrated by Liu et al., this can be achieved by collecting additional input features such as the so-called ''aerosol index'' [20]. The aerosol index encapsulates the effect of aerosols in the atmosphere (e.g. dust) on the PV power production. Some aerosols prevent and/or significantly reduce the solar radiation that reaches the power plant, which in turn reduces that produced energy [20]. According to the results presented by the authors, the aerosol index improves the PV forecasting model accuracy for cloudy days [20]. Since the predictions generated during cloudy days are less accurate than the ones achieved during sunny conditions, the collection of input features such as the aerosol index could lead to significantly better predictive performance of the PV output forecasting models.

III. METHODOLOGY
The aim of this paper is to present an analysis and a performance comparison between three statistical short-term PV power output forecasting models. As depicted in Figure 1, the models are capable of predicting hour-ahead PV power output based on hour-ahead weather forecasts and are implemented as a part of a ML pipeline that includes multiple essential stagesdata collection and preprocessing, model training and model evaluation. The implementation and execution details involved in each of these stages are described in the subsections that follow.

A. DATA COLLECTION AND PREPROCESSING
In order to train a highly accurate ML model which generalizes well to previously unseen instances, we need high quality input data. For the purposes of this work, we utilize accessible meteorological and PV output historic data. This is contrary to many of the current state-of-the-art solar forecasting approaches which use solar irradiance as an essential input feature in the solar forecasting procedure. Often times, acquiring such features requires specialized hardware and infrastructure that might not be readily available. In order to address this challenge and provide a forecasting solution that can be applied in a wider range of use cases, the solar forecasting models proposed in this work are trained on meteorological and PV data collected from two external APIs.
The first API is the PVOutput service [21] which was used for fetching the photovoltaic data generated by a PV system located in Berlin, Germany. The PV system consists of 30 solar panels each capable of producing 185W power output. This results in a maximum of 5500 Watt overall power generation, which is also the largest value of our target feature. The second data source is the OpenWeatherMap API [22], which was used to collect historic weather data for the geographic coordinates of the selected PV system. The decision to choose these two APIs as data sources was governed by the observation that they provide a good balance between expected quality (e.g. input features, data coverage and representativeness), cost, ease of access and availability. The historic data collected from the two APIs contains samples from the years 2017, 2018, 2019 and the first two months of 2020.
With this in mind, after collecting the two data sets, the samples inside are joined together according to their matching timestamps. The resulting joined data set contains 13919 data samples, each of which contains input weather features and the corresponding PV power output for the given time of day (see Figure 1). The time resolution represented by each sample is one hour. The further analysis and preprocessing steps performed on this data is described in the subsections that follow.

1) EXPLORATORY DATA ANALYSIS
The first operation performed on the joined dataset is the Exploratory Data Analysis (EDA), which provides insights about the data and guides the data processing steps performed 41582 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  later on in the pipeline. The exact procedures performed during the EDA include the following: Data analysis: In order to collect more insights about the available training data it is essential to examine the relationship between the features and the target variable. In this context, Figure 2 depicts a ''pearson'' correlation plot which visualizes the positive and negative linear relationship between the input features (i.e. weather data) and the label value (i.e. PV power output) in the joined data set. The first observation from the figure is that there is a strong positive correlation between the Instantaneous Power and the Energy Generation features. The reason is that the Energy Generation is computed as function of the Instantaneous Power: where E Wh represents the Energy Generation, P w represents the Instantaneous Power and t h is the time duration for which the PV system was producing energy. Given that the Instantaneous Power is our target variable, we cannot use the Energy Generation as an input feature. Therefore, during the data cleaning phase of the EDA this attribute is removed from the data set. What can be also observed in Figure 2 is the positive correlation between the ambient temperature features (i.e. temp, temp_min, temp_max and feels_like) and the target value. This behavior is expected, since higher ambient temperatures are associated with sunny and warm meteorological conditions and, therefore, this feature can impact the forecasting accuracy of the proposed solutions. Furthermore, utilizing the ambient temperature is supported by the results presented in related work, which demonstrate the key role that the ambient temperature plays in the prediction of the PV power output. With this in mind, important to note here is that all 3 temperature-related feature have a Pearson correlation coefficient of approximately 1 among each other. This is another expected dependency and a one, which can serve as an indicator of duplicate information. Given that the 3 temperature related features are so strongly correlated, we can keep only the temp feature and remove all of the remaining duplicate ones without a loss of information. In addition to the ambient temperature, there are multiple other weather features that correlate with the PV power output. In order to further analyse this correlation, we can examine the scatter plot in Figure 3, which visualizes the relationship between the Instantaneous Power (on the Y-axis) and the remaining correlating features (on the X-axis).
In this context, the first relationship that can be analyzed is the one between the time of day and the target variable. Figure 3 illustrates that the PV output follows a parabolic pattern with relation to the time of day. This is expected, since the sunlight intensity increases throughout the day until high noon (apex of the sun's motion) and then it goes steadily down. This explains the positive Pearson coefficient in the correlation plot from Figure 2.
Another important correlation is the one between the Instantaneous Power and the humidity weather feature.
Opposite to the previously discussed weather features, the Pearson coefficient here is equal to -0.44, which indicates a negative linear relationship between the two variables. This observation is also supported by the humidity scatter plot in Figure 3 which, despite including many outliers, still illustrates an inverse relationship between the humidity and the Instantaneous Power. This inverse relationship is the subject of discussion of multiple related research articles [23], [24], [25], which also confirm the observed behavior. In simple terms, the higher levels of humidity lead to the formation of water vapor on the surface of the solar cell which reflects and refracts the incoming sunlight [23], [25]. This reduces the efficiency in the power production of the PV system.
Similar to the humidity feature, the clouds_all feature demonstrates a negative correlation coefficient of -0.12 in Figure 2. At a first glance that behavior seems reasonable, since clouds prevent some of the sunlight from reaching the solar cell. However, there are two unexpected occurrences that are observed in both Figures 2 and 3 with regards to the relationship between the cloud coverage and the PV power output. The first one is that the correlation coefficient is relatively low and the second one is that there is no clear distinctive pattern to be recognized in the clouds_all scatter plot. The reason is that the clouds_all feature describes only the cloud coverage but contains no information about the altitude, density and type of the clouds. As Chrobak et al. point out [26], this information is essential for determining the magnitude with which the clouds impede the PV power production. Therefore, acquiring some additional cloud-related features might be useful for increasing the PV output forecasting accuracy.
The last analyzed weather feature is the wind speed. Firstly, the scatter plot in Figure 3 indicates a slight inverse relationship between the wind speed feature and the Instantaneous Power. At the same time the correlation coefficient between the wind speed and most other features is very close to zero. Therefore, based on this first analysis, it is challenging to make a conclusive statement about the impact of this feature on the overall model performance. Nevertheless, the behavior exhibited in Figures 2 and 3 is consistent with the findings by Mayer and Gróf [12], which state that the wind speed brings only marginal improvements to the PV output forecasting accuracy.
Data cleaning: In addition to the features identified during the data analysis step, there are other attributes that have to be removed from the data set. For instance, a standard practice is for columns with missing values to be either completely removed or the missing values to be replaced with the help of various imputation techniques. In the scope of this work, the columns with large number of missing values were removed and no imputation techniques were employed. This decision was governed by the conducted data analysis, which showed that the columns with large number of missing values have little to none predictive power for PV power output. As a result, the final set of features selected for the training of the forecasting models includes the following features: • Ambient temperature: The temperature of the surrounding environment (measured in • C).
• Atmospheric pressure: The force exerted by the weight of the air above a given area/surface (measured in hPa).
• Humidity: The amount of water vapor present in the air (measured in %).
• Clouds percentage: The proportion of sky area covered by clouds at a given location and time (measured in %).
• Wind speed: The speed with which an air mass moves horizontally past a given point (measured in m/s).
• Month: One of the twelve months in a calendar year.
Data preparation: After analysing and cleaning the data, the final stage of the data pre-processing pipeline is the data preparation. This stage aims at converting the data into a format that allows the training and evaluation of the ML models. There are three main steps executed during the data preparation phase: 1) Data partitioning is performed in order to separate the data into the standard train, test and validation splits. For the purposes of this work, the data is divided into a 80% training, 10% test and 10% validation sets. Important to note here is that there are two data splits used for the training of the LSTM network. The first is a univariate split where only the historic Instantaneous Power values are used to predict the current one. Conversely, the second split uses both historic weather data and the Instantaneous Power to predict the current PV output. 2) Handling categorical data is a data transformation procedure used to convert categorical features into their numeric equivalent. The two categorical weather features included in the data set are related to the weather description (e.g. clouds, broken clouds, clear, rain, etc.). These features are transformed into numerical ones with the help of one-hot encoding. 3) Feature scaling is a data transformation used for scaling all input features within the same value range. The feature scaling methods used for the purposes of this work is the so-called min-max normalization which normalizes the input features in the range between 0 and 1.

B. MODELS
The PV power output forecasting solutions proposed in this work are developed with the help of three common Machine Learning algorithms -random forest, deep neural network and long short-term memory neural network. Random forest is a supervised ensemble learning algorithm proposed by Breiman [27] that can be used for both classification and regression tasks. A random forest forms its final prediction by combining and aggregating the predictions of multiple decision trees which are trained on different subsets of the training data. Figure 4 shows an example for a single decision tree: Decision trees work based on a chain of if. . . then..else conditionals that are traversed in a sequential order from the root to the leaf nodes, and where leaf nodes represent either a class (i.e. for classification tasks) or a continuous value (i.e. for regression tasks). Constructing decision trees is often achieved with greedy algorithms for performance reasons. In the context of the random forest algorithm, each decision tree in the ensemble is trained on a dedicated so-called bootstrap sample, which is a fix-sized subset of the training data. The data instances in the bootstrap samples are selected based on a random sampling strategy with replacement. Additionally, each tree is trained on a subset of the complete feature set, which prevents features with higher predictive power to be universally chosen by  a large number of trees and consequently to suppress less contributing features. Utilizing the bootstrapping (also known as bagging) procedure for the data samples and the feature set allows the individual trees to generate diverse predictions. Aggregating these predictions leads to less overfitting and more accurate results compared to the results produced by a single decision tree.
With this in mind, the random forest implemented in the scope of the current work does not rely on the standard training configuration. Instead, in order to maximize the model performance, the RF was configured with parameters selected with a hyperparameter optimization technique called randomized search. The hyperparameters that the randomized search picked as most accurate configuration are summarized in Table 1. These parameters were utilized for the training of the final RF model discussed in Section IV.
Deep neural network Deep neural networks are a versatile AI method that can be used for learning complex predictive tasks. Neural networks are structured as a collection of small components called neurons, which are connected and transfer signals between one another (see Figure 6).  visualizes the structure of a single neuron and the two main operations that it performs. The first operation is the computation of the weighted sum of inputs, which as the name suggests, sums all inputs multiplied by their corresponding weight: The result of this first computation is then forwarded to an activation function, which determines if the neuron is activated or not. Among the most commonly utilized activation functions are Sigmoid σ (x), Hyperbolic tangent function tanh(x) and the Rectified Linear Unit (ReLU) [29]. Their corresponding mathematical formulas are defined as: ReLU (x) = max(0, x) Activation functions play an essential role in the training process of deep learning models, since they introduce nonlinearity, which in turn enables the model to learn complex relationships between input features and the target variable. The activation function influences the learning rule of the algorithm through its derivative, which is essential for determining the model parameter updates by computing the gradients of the loss function.
After explaining its inner workings, it is important to note that a single neuron is not capable of learning very complex tasks. Therefore, researchers stack multiple layers of neurons and produce more capable models called Deep Neural Networks. Figure 6 illustrates a basic architecture of a neural network with one hidden layer.
Similar to the processing steps performed by a single neuron, the neural network executes a series of sequential weighted sum and activation function computations, 5 which produce a final prediction. In order to evaluate the model accuracy, we compute the so-called loss function, which measures the difference between the generated prediction and the original value. The main goal of the DNN training process is to minimize the prediction error and this is achieved with the help of the so-called backward propagation (or ''backpropagation''). During the backpropagation pass through the network we compute the gradient of the cost function with respect to the network weights by applying the chain rule 6 of calculus. Once the gradients are calculated, they are used to adjust the weights and biases of the network in a way that minimizes the loss function. The most common optimization algorithm to perform the forward pass and backward pass through the network is called gradient descent. As indicated previously, the intuition behind the gradient descent algorithm is to update the model parameters (i.e. the weights and biases) by moving in the direction of the steepest descent of the cost function. Mathematically this can be expressed with the following formula: In this equation, θ t refers to the model parameters during iteration t, η represents the so-called learning rate and ∇J (θ t ) encapsulates the gradient of the cost function with respect to θ at iteration t. The fundamental idea underlying this expression is that by subtracting the gradient ∇J (θ t ) multiplied by the learning rate η from the current iteration parameter values θ t , we move toward the direction of the steepest descent, which in turn minimizes the cost function. 6 In calculus the chain rule is used to compute the derivative of a composite function (e.g. f (g(x))). This is performed by decomposing the composite function into simpler ones and applying the derivative to each sub-function individually: With a clear understanding of the equation's components, we can now explore more in-depth the role of the learning rate in the optimization process. The learning rate η is a hyperparameter that controls the size of the parameter updates during gradient descent. More precisely, η influences how fast or slow the algorithm will converge to an optimal solution. Larger learning rates might lead to faster convergence, but they could also cause negative effects -e.g. surpassing (or ''overshooting'') the global minimum. In comparison, smaller learning rates provide more stable convergence, but require more gradient descent iterations and consequently lead to longer training time.
With this theoretical context in mind, the neural network utilized for the PV output prediction in the scope of this work uses an architecture and parameters listed in Table 2. The model has four hidden layers with 100, 50, 20, and 5 neurons and the gradient descent optimizer utilized for training the network is Adam [30]. In order to reduce overfitting, L2 regularization with a penalty of 0.001 and dropout rate of 0.25 (in layers 1 and 2) are applied. Dropout regularization [31] helps with reducing overfitting by randomly dropping out some neurons and their connections during training. In addition, each hidden layer performs a specific transformation on the input data using the ReLU activation function. The choice of loss function also directly influences the learning rule and the model's ability to converge to the optimal solution. For instance, MSE have certain properties such as being differentiable 7 and having a parabolic shape, which helps guide the gradient descent algorithm toward the global minimum. In contrast, MAE is not differentiable at all points, making it less suitable for gradient-based optimization algorithms. Therefore, for the purposes of the regression task at hand we choose MSE as a loss function. Finally, the output layer produces the predicted values for the photovoltaic output.
Equivalent to the procedure applied during the training of the random forest, the DNN's parameters listed above were chosen with the help of the randomized hyperparameter search algorithm.
Long short-term memory neural networks When it comes to solving time-series prediction tasks, traditional DNN architectures display a major limitation with regards to their ability to carry through information across multiple time steps. An alternative neural network architecture that tries to address this limitation is encompassed by the so-called 7 The same applies for RMSE. VOLUME 11, 2023  recurrent neural networks (RNN). RNNs can operate with inputs of arbitrary length such as text and speech data and are also able to model the sequential nature of inputs captured at multiple time steps. To achieve this the RNN computes the output for the current time step y t as a function of the current inputs x t and the output of the previous state y t−1 [32]: In other words, RNNs allow information from the past time steps to pass forward and influence the decisions in the later time steps. This is especially useful for prediction tasks where information carried over from the previous samples might help to better understand the behavior observed in the current sample [33]. One downside of the RNN architecture is that with each consecutive time step the network starts to ''forget'' the information from the earliest time steps and all long-term dependencies slowly vanish [32]. In 1997 Hochreiter and Schmidhuber [34] propose an improved RNN architecture called Long Short-Term Memory network which addresses the challenge of maintaining long-term dependencies across the time steps. A LSTM unit consist of a memory cell, an input gate, an output gate and a forget gate. As the name suggests, the purpose of the memory cell is to store information computed during the past and current steps, whereas the individual gates moderate what goes in and out of the cell. More specifically, the forget gate is responsible for deciding which information will be discarded or dropped from memory. On the other hand, the input gate decides what new information will be added to the cell state in order to update it. Finally, the output gate decides which information leaves the cell and will be forwarded to the next time step. The combination of these mechanisms allows the LSTM networks to deliver the same benefits as traditional RNNs, while simultaneously avoiding their major shortcoming of learning long-term dependencies.
In that context, the exact LSTM architecture used for the purposes of this work and the corresponding hyperparameters were derived with the help of a hyperparameter optimization procedure and are presented in Table 3. The LSTM network utilizes one hidden layer with 500 neurons and the ReLU activation function. The LSTM hidden layer processes the input sequence by utilizing LSTM cells, which have the ability to selectively retain or forget information from the previous time step. To generate the the final photovoltaic prediction, the output of the LSTM layer is passed through a dense layer. The regularization technique used to reduce overfitting was dropout with a rate of 0.2. Finally, the LSTM uses 20 time steps from the past to predict the next target value.

IV. RESULTS
The predictive performance of the PV output forecasting models trained in the scope of this work is evaluated according to three of the most common performance metrics -the mean absolute error, the mean squared error and the root mean squared error. To provide a more comprehensive overview of the achieved results, we also provide visual examples of the model behavior during days with both high and low accuracy.

A. RANDOM FOREST RESULTS
The first ML model we evaluate in the context of PV output prediction is a random forest. The results for the hyperparameter-tuned model are summarized in Table 4. One result that can be immediately noticed from the observed metrics is an overfitting on the training data. Even after the hyperparameter tuning the RF still displays noticeably better performance on the training data compared to the test and validation sets. Nevertheless, the model achieves a MAE and RMSE of 309.46 and 443.42 Watt respectively, which is 5% to 10% lower error rate on the previously unseen examples compared to the non-tuned model version. Additionally, in order to gain a better understanding of the model behavior, we can examine the results presented in Figure 7. What the diagram depicts are the days during which the model predicts the PV output with the highest and lowest accuracy sorted according to the normalized root mean squared error metric. By using nRMSE we ensure that days during which the generated power falls within vastly different value ranges 8 are accounted for and such days can be accurately compared with regards to the scale of the observed PV output.
Given the above considerations, the RF behavior during the high accuracy days illustrated in Figure 7 indicates that the model performs extremely well in conditions of steady PV production. More specifically, the majority of high performance days are characterized by high power production with very few fluctuations. At the same time, the lowest performance exhibited by the RF is observed during days with the opposite properties -low overall PV output and more production fluctuations. Additionally, the production curve during 8 An example for such differing value ranges is observed in Figure 7 where the PV production on 08/11/2019 reaches approximately 3000 Watts, whereas the maximum production on 03/01/2020 is below 300 Watts. high accuracy days follows more of a bell-shaped pattern, 9 whereas the curve during the low accuracy days has a flatter profile. According to these results, a clear pattern with regards to the strengths and weaknesses of the RF starts to emerge. To validate this pattern, we can explore not only the highest and lowest performance days, but also days during which the model performs with average accuracy. Two such days are depicted in Figure 8 where one of the days represents an instance of a high (i.e. maximum production of 3500 Watts) and the other of a low (i.e. maximum production of 300 Watts) PV production day. What is immediately noticeable is that the model shows a similar behavior and achieves approximately the same nRMSE value for both days. Therefore, based on the insights collected from all performance groups (i.e. highest, average, lowest accuracy days), we can assume that days with low PV production are not necessarily harder to predict than high production days, but rather the inaccurate predictions arise during days when the production deviates from the previously mentioned bell-shaped pattern. Another key insight from Figure 8 is that based on the environment and the present weather conditions, the RF model learned to recognize sudden changes in the PV power output. Multiple instances of such abrupt changes (marked with purple dots in Figure 8) were accurately predicted by the model, which indicates its ability to handle the volatile nature of solar power forecasting.

B. DNN RESULTS
The second PV output forecasting model trained in the scope of this work is a deep neural network. The results achieved by this model and its hyperparameter optimized version are  summarized in Table 5. In contrast to the RF which demonstrated a small overfitting on the training data, the tuned DNN has a lower variance. This is achieved by applying L2 and dropout regularization as well as early stopping techniques (see Table 2). Nevertheless, despite the lower variance, both the tuned RF and the DNN achieve similar results in regards to their RMSE and MAE error metrics. Therefore, comparing the two models purely based on the computed error metric would be insufficient. In order to conduct a more thorough  analysis of the DNN performance, Figure 9 displays the model behavior during its most and least accurate days sorted according to the nRMSE metric.
What is immediately evident is that the DNN is least accurate, when the PV power generation is extremely low in magnitude and exhibits a flat profile. In all three low performance cases, the DNN overestimates the Instantaneous Power value and the curve that the predictions form is closely resembling the previously mentioned bell-shaped pattern. This behavior is expected and aligned with the most common pattern present in the training data. More precisely, throughout the year there are more days where the PV output reaches peak values and follows the bell-shaped curve. Therefore, since the models encounter this pattern more frequently during training, they learn to bias their output in a similar direction and generate higher PV output predictions even during low production days. Nonetheless, even during the low accuracy days, the DNN still exhibits positive signs of performance. In particular, as indicated in Figure 9, the DNN accurately predicts that the overall power production for the day will be substantially lower (e.g. 300 to 500 Watts) compared to the peak production days (e.g. 500 to 3500 Watts). This suggests that the model is able to accurately distinguish between low and high production days entirely based on the input weather conditions.
Another indication that the DNN model generates highly accurate PV output predictions can be found by examining the high performance days depicted in Figure 9. In contrast to the RF model, the DNN achieves some of its highest accuracy during days with high number of fluctuations (e.g. 02/10/2019 in Figure 9). It is important to note that the DNN is able to accurately predict multiple abrupt changes in the PV output, which exemplifies the ability of this model to be applied in volatile conditions such as the ones common in the renewable energy forecasting field. What is even more impressive is that these results are achieved by a model trained exclusively on common basic weather features.

C. LSTM RESULTS
The final model trained for the task of PV output forecasting is an LSTM network. There are two versions of this model evaluated in the scope of this work -a univariate and a multivariate LSTM. The error metrics generated by the two models are summarized in Tables 6 and 7. 41590 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  The first conclusion that can be drawn from the generated predictions and the corresponding error metrics is that the multivariate LSTM outperforms the univariate LSTM. The reason behind this behavior is that the multivariate LSTM utilizes the weather features as an additional source of input about the environmental conditions which allows the model to achieve higher accuracy. Nonetheless, both LSTM networks show considerably worse performance in comparison to the previously discussed DNN and RF models. In particular the RMSE produced by the LSTMs are in the ranges between 550 and 800 Watts, whereas the RF and the DNN achieve RMSE between 400 and 500 Watts.
In addition to the error metrics analysis, in Figures 10  and 11 we can examine the multivariate 10 LSTM behavior during individual days during which it achieves high, average and low prediction accuracy. What becomes immediately clear is that the LSTM shows similar strengths and 10 Since the multivariate LSTM outperforms the univariate version considerably, in the analysis of the model performance during individual days we focus exclusively on the multivariate LSTM.  weaknesses as the DNN and the RF. More specifically, the LSTM is most accurate during days with very few fluctuations and a PV output that follows the common bell-shaped production curve. At the same time, the model is most inaccurate for days with a more flat production profile. However, the LSTM is substantially worse compared to the RF and the DNN and even during its highest performance days it predicts approximately 500 Watts lower PV output than the actual values (see 16/01/2020 plotted in Figure 10). Similar level of inaccuracies occur during all 3 of the plotted high performance days, which indicates that even at its best the LSTM still noticeably underperforms compared to the other two models.
Likewise, during low performance days the LSTM falls short again in terms of predictive accuracy. In fact, the model predicts negative PV outputs early in the day. This is caused by the usage of the last 20 time steps (i.e. which represent roughly one day) for predicting the current PV output value.
Since each day contains a different number of samples and these 20 time steps contain data points that follow the familiar bell-shaped curve and end with decreasing values, it is reasonable that the model predicts this downward trend to continue. Furthermore, another noticeable pattern exhibited by the LSTM during the low performance days is that the model always predicts a bell-shaped production curve even when the actual PV output follows a completely flat profile, which is even more far off of the actual value than what the DNN and the RF demonstrate.
Finally, a key requirement in the renewable energy forecasting domain is the ability to handle sudden changes in the environment. In this context, Figure 11 shows two days with slight production fluctuations and during which the LSTM has an average performance. In contrast to the RF and the DNN, which were able to accurately predict even multiple fluctuations throughout the course of a day (see Figures 8 and 9), the LSTM has difficulties predicting PV outputs that deviate from the standard bell-shaped pattern and include fluctuations.

D. DISCUSSION
Based on the collected error metrics and the accompanying individual day performance analysis, the LSTM is the worst performing model for the task of predicting PV outputs. Compared to the other two models, the LSTM showed a minimum of 10 to 20% larger RMSE not only on the previously unseen data (i.e. test set) but also on the training and validation sets. On the contrary, the DNN model demonstrated a lower error rate and the ability to recognize and accurately predict even PV outputs that contain multiple fluctuations. The DNN's performance came only second to the random forest, which achieved the highest accuracy in terms of the selected error metrics, as well as the lowest training and prediction time.
Based on the observed results, the RF and the DNN models are suitable tools for PV output forecasting that not only achieve decent accuracy, but are also able to address the volatile nature of renewable energy sources such as a PV system.
Another important point established during the conducted experiments is that achieving reasonable predictive performance for PV output forecasting is possible even in the absence of more obscure and harder to collect weather features such as the solar irradiance. The results demonstrated in this work were achieved by models trained on common weather features such as the ones described and analyzed in Subsection III-A and depicted in Figures 2 and 3. In fact, according to the feature importance score generated by the tuned RF model, there are two environmental conditions that are the primary contributors to an accurate model prediction and these are the humidity and the time of day. Some additional features such as the ambient temperature and the atmospheric pressure also positively contribute to the accurate model predictions.
In the same context, when it comes to PV output forecasting, the geographic location and the corresponding climate are of immense importance for the accuracy of the selected forecasting model. More specifically, locations with rapidly changing environmental conditions represent more difficult instances for generating renewable energy forecasts compared to places with more stable climate. The results achieved in this work illustrate this point and provide an example for PV output forecasts generated for a geographic location (i.e. Berlin, Germany) with relatively diverse weather conditions throughout the year. The models trained on the data sourced at this location demonstrated their worst performance during days with low overall PV output and a flat production curve. Similar conclusions were also drawn in related research [9], [16], [35], where the highest model performance is achieved during ''sunny'' days, whereas during ''rainy'' or ''cloudy'' days the models struggle to generate accurate predictions. Such results indicate that PV output forecasting can achieve high levels of accuracy at locations with more consistent and warm conditions. For instance, the LSTM model proposed by Nasser et al. [33] is trained on PV data from Egypt, where the weather is warmer, sunnier, with higher ambient temperatures and with fewer fluctuations. Therefore, most days throughout the yearly cycle have a similar PV output profile (i.e. the familiar bell-shaped curve), which the PV forecasting techniques are well-suited to predict. Ultimately, we can conclude that the level of accuracy of the solar forecasting techniques depends not only on the selected model and the available data, but also on the geographic location of the PV system. Locations with stable, sunny, warm weather are a more favorable candidates for the application of PV output forecasting solutions.

V. CONCLUSION AND FUTURE WORK
The intermittent, volatile nature of RES makes their integration in the current grid a very challenging task. Being VOLUME 11, 2023 able to accurately predict the PV system behavior ahead of time can have immense positive impact on the management and safe operation of the grid, as well as on planning the available capacities that will meet the expected demand. Thus, in this paper we presented three ML-based algorithms -a random forest, deep neural network and long short-term memory network. These algorithms and the corresponding models were utilized for the task of solar energy generation forecasting based on weather and PV system data collected in Berlin, Germany. Based on the conducted experiments, we concluded that while the LSTM model performed suboptimally, the RF and DNN demonstrated high accuracy and the ability to generate accurate predictions even in the presence of PV output fluctuations. In fact, both the RF and the DNN were also well suited for generating highly accurate predictions during days with high overall power generation and a bell-shaped power production curve.
Another important outcome from this work is the utilization of more widely available weather features instead of relying on harder to obtain data (e.g. solar irradiance) that requires expensive infrastructure and/or involves extra financial cost for certain geographic locations. The results achieved throughout our experiments demonstrate that accurate PV output forecasting is feasible even with more broadly accessible weather data (e.g. ambient temperature, humidity, atmospheric pressure, etc.) that does not require highly specialized hardware or infrastructure to be obtained. Additionally, the RF and the DNN model described in this paper operated reasonably well even in the presence of abrupt PV output fluctuations, which indicates that the utilized widely available weather features are sufficient for reaching the essential performance baseline.
Finally, there are multiple directions for future work that can extend and build upon the contributions presented in this work. One such direction would be to train the same model architectures on a more sophisticated data set which contains features with higher predictive power. A comparison between these results and the ones achieved with the current data set would allow us to perform a more thorough cost-benefit analysis. Additionally, the high predictive accuracy reached with different LSTM-based architectures in related research [9], [33], [36], [37] indicates that the LSTM has a lot of potential and exploring it further might lead to more promising results. Finally, as the Machine Learning field continues to grow and improve, another option for future work includes evaluating and comparing the performance of additional alternative models (e.g. XGBoost [18] and CNN [19]) on the dataset and weather feature discussed in the current study.
NIKOLAY TCHOLTCHEV received the Ph.D. degree in engineering and the Diploma degree in computer science from the Berlin University of Technology. He is currently with the Fraunhofer Institute for Open Communication Systems (FOKUS), where he leads and participates in projects related to the areas of smart cities (open urban platforms), network and systems management, cybersecurity, autonomic communications, virtual und softwarized networks and Testbeds, VoIP emergency communication, blockchain, smart energy, firewall/IDS/IPS, model-checking, IPv4/6, the IoT, artificial intelligence, model-based testing, and testing for security purposes.
PHILIPP LÄMMEL is currently a Senior Researcher and the Project Manager with the Fraunhofer FOKUS and is involved in several industrial and research projects. His research interests include the design, specification, implementation, and evaluation of open platforms in urban environments thereby covering aspects, such as stakeholder management, requirements engineering, software design, quality assurance, and DevOps. He is also active in the areas of DLT/blockchain, IT security, artificial intelligence, the IoT, e-health, and cloud computing, and contributes actively to standardization activities in the area of smart city.
MANFRED HAUSWIRTH (Senior Member, IEEE) is currently the Managing Director of the Fraunhofer Institute for Open Communication Systems FOKUS and the Chair for Open Distributed Systems with TU Berlin. His research interests include distributed information systems, the Internet of Things, streaming data processing and linked data, semantics, and artificial intelligence. In these fields, he has garnered numerous international prizes for his projects. He is an active member of many scientific and political committees for the development of digitalization. He is the Principal Investigator with the Weizenbaum Institute, Einstein Center Digital Future (ECDF), Berlin Big Data Center (BBDC), and Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS).