Comparative Analysis of Neural Networks Techniques to Forecast Global Horizontal Irradiance

Due to the continuous increasing importance of renewable energy sources as an alternative to fossil fuels, to contrast air pollution and global warming, the prediction of Global Horizontal Irradiation (GHI), one of the main parameters determining solar energy production of photovoltaic systems, represents an attractive topic nowadays. Solar irradiance is determined by deterministic factors (i.e. the position of the sun) and stochastic factors (i.e. the presence of clouds). Since the stochastic element is difficult to model, this problem can benefit from machine learning techniques, like artificial neural networks. This work proposes a methodology to forecast GHI in short- (i.e. from 15 min to 60 min) and mid-term (i.e. from 60 to 120 min) time horizons. For this purpose, we designed, optimised and compared four neural network architectures for time-series forecasting, respectively based on: i) Non-Linear Autoregressive, ii) Feed-Forward, iii) Long Short-Term Memory and iv) Echo State Network. The original data-set, consisting of GHI values sampled every 15min, has been pre-processed by applying different filtering techniques. Our results analysis compares the performance of the proposed neural networks identifying the best in terms of error rate and forecast horizon. This analysis highlights that the clear-sky index results the preferred filtering technique by giving greatly improvements in data-set pre-processing, and Echo State Network gives best accuracy results.


I. INTRODUCTION
Nowadays, renewable energy is a very hustling research area. Finding viable, clean energy sources to replace fossil fuels, or at least to significantly decrease their usage in short to medium term, has become an extremely critical goal to achieve. On the one hand, air pollution, of which fossil fuels are a major contributor, is causing a real health crisis [1]. According to the World Health Organisation (WHO), air pollution is responsible for 7 million deaths every year, and 91% of the world population lives in places where air The associate editor coordinating the review of this manuscript and approving it for publication was Baozhen Yao . quality exceeds the limits mandated by the WHO itself [2]. On the other hand, greenhouse gas emissions from fossil fuels are also one of the major drivers of anthropogenic climate changes. According to a 2018 special report by the Intergovernmental Panel on Climate Change (IPCC), immediate action must be taken to limit the increase in global temperature to 1.5 • C and avoid the worst consequences of global warming [3]. For these reasons, renewable energy sources (RES) will have a key role in the future of our society. Among them, an important part is played by solar energy, which can be used to produce electricity exploiting photovoltaic (PV) systems. Nowadays, PV systems are rapidly spreading in our cities with strong economic, social and VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ environmental impacts [4]. At the same time, the scientific community is researching new models and new optimisation methods to better manage these resources. In a PV system, the power output of a PV panel is directly proportional to the solar irradiance, which in turn depends on various factors (e.g. latitude, season, and sky conditions). There are different components of solar irradiance, but the most important one for PV power generation is Global Horizontal Irradiance (GHI), which is the total irradiance on a horizontal surface [5]. GHI is related to Direct Normal Irradiance (DNI), which is the irradiance on a surface perpendicular to the sun, and Diffuse Horizontal Irradiance (DHI), which is the radiation from light scattered by the atmosphere.
To optimise smart grid operations and match power production, distribution and consumption efficiently and reliably, it is needed to know in advance the amount of energy produced by power plants [6], as well as energy consumption. However, one of the issues posed by some of the most popular renewable energy sources, like wind and solar energy, is their non-dispatchable and intermittent nature. A dispatchable energy source can be turned on and off when needed in a short amount of time, according to needs. This is not true for PV power systems. The sun only shines for a limited amount of hours during the day, depending on latitude and season, and the irradiance is also affected by clouds.
When integrating non-dispatchable Renewable Energy Sources (RES) into existing power grids, this intrinsic variability must be taken into account, particularly when the share of energy from these types of sources increases [7]. An exciting possibility is to integrate RES using smart grid technologies. A traditional power grid is centralised and implements a one-way communication, where the power is sent from the power plant to customers. In a smart grid, on the other hand, the process becomes distributed, and the consumer can also be an active user, giving feedbacks on electrical use that allow the grid to tune itself to provide better performance and guarantee better reliability. Examples of the application of smart grid management are Demand-Response (DR) [8], Demand Side Management (DSM) [9] and mobility based on electric vehicles [10]. DR refers to the changes in electricity consumption patterns by the user in response to fluctuations in power production by renewable energy sources and grid requirements, as well as for economic reasons like changes in the price of electricity. Instead, DSM refers to a set of actions aimed at efficiently managing the consumption of a grid, in order to reduce the costs incurred for the supply of electricity and for general system charges.
As stated above, the most popular RES are non-dispatchable and intermittent by nature. These features introduce problems in grid stability and efficiency which lead to limitations on the amount of these resources which can be effectively added to the grid. Thus, generation output by RES is driven by environmental and meteorological conditions and cannot respond to changes in demand. Solar energy, in particular, is determined both by deterministic (e.g. latitude, day of the year, hour of the day) and stochastic factors (e.g. effects of the atmosphere and weather conditions like cloud coverage). Consequently, the scientific community is pushing for innovation and optimization purposes in this scenario. Indeed, many studies are currently focused on new and optimized methods to forecast the Global Horizontal Irradiance (GHI), which strongly influences Photovoltaic (PV) production [11]- [14]. In this context, the main challenge is, therefore, to find a methodology to predict the power generated by a photovoltaic system accurately. Since PV energy generation is highly correlated to solar irradiance, it makes sense concentrate on predicting the latter, in particular GHI, and then use these predictions to calculate the expected energy production. Indeed, accurate forecasting of GHI is crucial to unlock the development of novel control strategies for smart grid management, e.g. DR and DSM, that aim at mitigating undesirable fluctuations introduced by RES (e.g. PV systems). For example in this scenario, a photovoltaic simulator such as the one proposed in [15], [16] could be employed using GHI predictions as a system input to estimate energy production of PV systems for the next short-and mid-term time-horizons. Thanks to DR and DSM policies, the energy demand can dynamically respond to changes in the energy generation, overturning the current paradigm where the generation responds to changes in demand.
Since solar irradiance is a physical phenomenon, a possibility could be to develop a physical model. The main problem with this approach is its complexity, mainly when modelling the stochastic atmospheric phenomena that determine the measured GHI on the surface. A more straightforward approach is based on time-series forecasting. The idea is to use previous values of the time-series we are interested in predicting, and/or one or more related series, one or more future values. Several studies were proposed in the literature to find physical and mathematical models to estimate and forecast solar radiation. Classical linear time-series models have been widely used [17]. Simple statistical models, for example, can be used but they might give sub-optimal results because solar irradiance is a complex nonlinear time-series. However, these studies have highlighted that these methodologies are not sufficient in the analysis and prediction of solar radiation due to the non-stationary and non-linearity characteristics [18]. To overcome these limits, a more robust approach is based on machine learning. One of the most used and studied applications of machine learning is that of artificial neural networks [19], [20].
In this paper, we propose a methodology for short-(i.e. from 15 min to 60 min) and mid-term (i.e. from 60 to 120 min) GHI forecast, with 15 min time-steps, exploiting state-of-the-art neural networks models in time-series scenario. The main goal is to obtaining accurate predictions to use as input for PV simulators, as demonstrated in our previous works [21], [22]. More accurate inputs allow even more robust PV simulations [16] and this leads to better analysis and management of the energy produced in operational contexts such as DR and DSM. In detail, we specifically designed and deeply optimised four architectures respectively based on a Non-Linear Autoregressive (NAR), a Feed-Forward (FFNN), a Long Short-Term Memory (LSTM) and an Echo State Network (ESN) as they represent the most promising and suitable solutions in timeseries predictions. In addition, we apply different data-set preprocessing and filtering techniques to identify which of them has better performances by comparing the prediction results of the different architectures. We repeated training, optimisation and test phases of all our neural network architectures for each pre-processing technique applied to the original data-set consisting of a time-series of GHI values sampled every 15 min. The first pre-processing technique aims at removing some sampling error in the original GHI data-set (hereinafter referred to as raw GHI data-set). Then, the second preprocessing technique applies a Tikhonov regularisation to smooth the time-series making the training phase easier. Finally, the third pre-processing technique is based on the clear-sky index, which is the ratio of the measured irradiance and the clear-sky irradiance. In this case, GHI samples are converted into clear-sky index values to remove the seasonal trend of the time-series. Both Tikhonov regularisation and clear-sky index are applied on the raw GHI data-set.
The rest of the paper is organised as follows. Section II performs a review of the scientific literature on the topic of solar irradiance forecasting. Section III illustrates the methodology proposed in this work. Section IV presents the experimental results by comparing the performance of the three approaches described. Finally, Section V discusses the concluding remark.

II. RELATED WORKS
The literature encloses several forecasting models for solar irradiance and PV power. Generally these models are divided into four main categories [20], [23] i) statistical models; ii) cloud imagery-based models; iii) numerical weather predictions (NWP) models and iv) hybrid models. As introduced in Section I, statistical models use previous values of the solar irradiance or PV power time-series to forecast the next values. For this reason, they represent the category of our interest to which we are inspired and compared. They, in turn, can be divided into linear and nonlinear models. However, according to [24], limitation of linear models is that they cannot take into account the non-linearity of many real-life time-series, including solar irradiance. As a result, according to the purpose of our work, this section investigates these models by highlighting merits and weaknesses.
Generally, linear methods represent the simplest forecasting model, often used as reference to evaluate other more complex [12]. They are based on the simple assumption that the forecasted value of the time-series is the same as the current value. One of the simplest linear models is the autoregressive (AR) [25]. A slightly more complex model is the autoregressive moving average (ARMA), which combines autoregressive and moving-average components [25]. The ARMA model can be extended including exogenous inputs (ARMAX). AR and ARMA models can be used to forecast stationary time-series. In a stationary process, the mean and the variance remain constant over time [17]. But processes like solar irradiance are non-stationary, so they must be transformed into stationary time-series, or different models should be developed. The ARIMA model (autoregressive integrated moving average) can be used for non-stationary time-series forecasting. Reikard [26] shows that ARIMA can give good short-term solar irradiance forecasting results. His experiments evaluate forecasting horizons of 5, 15, 30 and 60 min, and the ARIMA model not only outperforms simple AR models in all cases, but it also performs better than feed-forward artificial neural networks except for the shortest time horizon. This might be caused by the difficulty to train ANNs causing them to reach only a local optimum.
However, a limitation of linear models is that they cannot take into account the non-linearity of many real-life timeseries, including solar irradiance. For this reason, nonlinear techniques for time-series forecasting have become very popular, and they have been extensively used for solar irradiance prediction [27]. Martin et al. [28] forecast half daily values of solar irradiance, i.e. ''accumulated hourly global solar irradiance from solar raise to solar noon and from noon until dawn for each day''. Since this time-series is nonstationary, two transformations are proposed, the clearness index, which is the ratio between the solar irradiance measured at ground level and the extraterrestrial irradiance, and the lost component, which is the difference of the same quantities. Different feed-forward neural networks configurations, in terms of the number of hidden layers, neurons, inputs, are tested, and the best one is selected for each weather station where the prediction model is evaluated. The results show that ANNs improve the forecasting accuracy of the reference persistence model and outperform a simpler linear AR model. Pedro and Coimbra [29] evaluate different PV power forecasting techniques. Among them, a feed forward neural network with one hidden layer of 20 neurons. The network has 13 inputs, which are fed with 13 previous values of the time-series. The forecasting is evaluated for 1 h and 2 h ahead. The neural network-based technique outperforms the other models evaluated in the study. Lauret et al. [30] instead compare several machine learning techniques. Simple persistence and AR models are used as reference. The GHI time-series are pre-processed by transforming them into clear-sky index series. For the shortest time horizons, the machine learning techniques perform better than the reference models for unstable conditions, while for clear-sky conditions, the AR model is also accurate. For longer time horizons, though, the machine learning models, including the feed-forward neural network, clearly outperform persistence and linear AR techniques. Rana et al. [31] use an ensemble of neural networks for short-term (from 5 to 60 min ahead) PV power forecasting for both the univariate and multivariate case. They test multiple ensembles E i , where each ANN in the ensemble has i neurons in the hidden layer. Each ensemble is made of 20 networks. The final forecasting result is selected by taking the median of the 20 predictions. This VOLUME 9, 2021 method obtains better results than the reference persistence model and another machine learning technique called support vector machine (SVM). McCandless et al. [32] develop a cloud regime-dependent forecasting technique based on feed-forward neural networks. Instead of using a single ''global'' neural network, different ANNs are trained and used for each cloud regime. To determine the cloud regime, a k-means algorithm is applied to the clearness index timeseries. For the shortest time horizons (15 min) ANNs does not improve upon the reference persistence model, except for the most unstable sky conditions. For longer horizons (from 60 to 180 min), however, the ANNs outperform persistence. It is also shown that the regime-dependent forecasting always gives better results than a single global ANN. Monjoly et al. [33] use different multi-scale decomposition techniques to pre-process the GHI time-series, after transforming it into the clear-sky index. The different time scale components are then forecasted separately using separate ANNs or using a hybrid AR-ANN model. The results show that multi-scale decomposition significantly improves forecasting results, both using ANNs and the hybrid model. Bouzgou and Gueymard [13] design a Wrapper Mutual Information Methodology (WMIM) optimization approach by exploiting Extreme Learning Machine (ELM) regression technique to i) investigate the effect of the mutual information measure between the historical variables and the targeted future GHI value, and ii) select the best possible combination of historical variables from the existing time series. Experimental results highlight that the ELM model, combined with WMIM, provides the same performances of the more conventional Multi Layer Perceptron (MLP) but lower computing time.
On the other hand, another widely used neural network is the recurrent neural network, where feedback connections are added. The outputs of these networks also depend on their current state (memory), not only on the current inputs. This behaviour makes them very suitable for time-series analysis and forecasting. Among these, there are Long Short-Term Memory networks. These are recurrent neural networks using particular units (LSTM units) as nodes. These units can remember values for an arbitrarily long amount of time, and this behaviour makes them very suitable for time-series forecasting. Alzahrani et al. [34] use a deep (i.e. with more than one hidden layer) recurrent neural network with LSTM units for short-term forecasting of solar irradiance. The input time-series is sampled at a very high frequency (100 Hz). The advantage of this high-resolution time-series is that it can capture fast fluctuations. The results show that the deep LSTM has better accuracy than the reference feed-forward neural network. In [35] the authors exploit LSTM networks for 1 h ahead PV power forecasting. Different LSTM models are evaluated, and the best one, LSTM for regression with time steps, is selected and compared with other forecasting techniques, including multiple linear regression and feed-forward neural networks. The LSTM is shown to give more accurate results than the other models. Srivastava and Lessmann [36] compare LSTM with other established forecasting techniques in day-ahead GHI forecasting. They use satellite-derived GHI values and other atmospheric variables as inputs. In contrast to the majority of the studies in the literature, many different locations in several countries with different climates are taken into account, which makes it possible to assess the validity of the proposed model in different conditions. The LSTM-based approach is compared to a simple persistence model, a feed-forward neural network model and another machine learning model called ''Gradient Boosting Regression'' (GBR). The results show the superior performance of the LSTM compared to the other methods. Finally, Zang et al. [14] addresses short-term solar irradiance forecasting by exploiting spatio-temporal correlation model based on deep learning. In detail, they first apply a convolutional neural network (CNN) to extract spatial features. Then, they apply a LSTM network to extract temporal features from historical data. In this way, they successfully obtain spatiotemporal correlations to predict global horizontal irradiance one hour in the future.
Another interesting recurrent architecture is the Echo State Network. An ESN has a sparsely connected hidden layer, called ''reservoir'', with fixed connections and weights. The only weights that are learnt are those of the output connections. This property makes these networks easier to train compared to other recurrent architectures. Kmet and Kmetova [37] used an ESN for 24 h ahead solar irradiance forecasting, using the actual mean hourly values of irradiance and other meteorological variables like humidity and air temperature. The inputs of the network consist of 24 hourly values of the selected variables for the present day, and the outputs are the irradiance forecasting for the next day. The paper shows that this approach gives good results. In a previous study, on the other hand, Ruffing and Venayagamoorthy [38] found that in a real-world application results of an ESN-based solar irradiance forecasting model were not very promising. In a related field, Deihimi and Showkati [39] used an ESN for 1 h and 24 h ahead electric load forecasting. In this case, the results showed that the ESN has a good generalisation capability and can give very accurate results. Table 1 summarizes and highlights the key features of the analysed literature solutions based on machine learning to predict GHI. All these works represent real milestones, however, as can easily be seen i) often a large number of hard-to-find data are used to predict the GHI trends (e.g. Surface Weather Observation, satellite-derived GHI, atmospheric variables, meteorological measurements); ii) very few works exploit the potential of filtering techniques to pre-process the original data-set with the aim of using it as input for the neural models (i.e. a data-set characterized by a trend that is easier to learn allows the implementation of leaner neural architectures), iii) few works are compatible with DR and DSM scenarios (i.e. 15 min time-steps) and iv) hybrid or assembled methodologies, rather than a single global architecture, are almost always used which implies a greater use of computational resources. With respect to presented literature solutions to forecast solar radiation, our work proposes an optimised methodology for the short-(i.e. up to 60 min) and mid-term (i.e. from 60 to 120 min) GHI forecast exploiting and comparing four neural networks specifically designed and optimised. We applied three different data-set pre-processing and filtering techniques to identify which of them has better performances by comparing the prediction results of the different architectures. As a first analysis, we gave in input to our neural networks the raw GHI data-set where we applied some basic filters to clean the data-set itself by removing possible errors, such as lack of data or storage error due to sensor sampling. Then, as a second analysis, we applied the Tikhonov regularisation technique to the very same raw dataset, which smooths the time-series trend-making easier the training of our neural networks. Finally, as a third analysis, we converted the raw data-set, consisting of GHI samples, into clear-sky index values, thus removing seasonal trends of the time-series. For a fair comparison, all the data-sets at our disposal are used as input on all the best neural architectures identified experimentally. The use of pre-processing techniques, together with the capillary optimisation of neural structures, allows us to increase the prediction time horizon with an acceptable error rate. Moreover, compared to stateof-art solutions, the optimisation of neural architectures from specifically transformed data-sets allows us to obtain leaner structures at the computational level without affecting the prediction accuracy.
To overcome the state-of-the-art limitations and to highlights our scientific contribution, we propose a novel and reliable methodology to forecast GHI in short-and mid-term exploiting neural networks models suitable for time-series analyses. Through this research work we aim at i) obtaining accurate GHI predictions to be compliant with smart grid management, as previously discussed; ii) obtaining an optimized methodology for the short-(i.e. from 15 up to 60 min.) and mid-term (i.e. from 60 to 120 min.) GHI forecast and, iii) assessing the impact of different pre-processing techniques in relation to different state-of-art neural models in a GHI forecasting scenario.
For this purpose, we optimized the methodology right from the pre-processing of the data-set until the choice of the best performing neural model operating with time-series, suitable in this operative context. Consequently, we optimized the methodology right from the pre-processing of the dataset until the choice of the best performing neural model operating with time-series, suitable in this operative context. In brief, the most important operations we performed are: • exploiting three different data-set pre-processing and filtering techniques (i.e. Raw data-set, Tikhonov, and Clear-sky index). To the best of our knowledge, we used the Tikhonov regularisation for the first time in the field of energy forecasting; • designing, deeply optimizing and comparing the performance of four neural network architectures based on i) Non-linear Autoregressive Neural Network, ii) Feed Forward Neural Network, iii) Long Short-Term Memory and iv) Echo State Neural Network. In the literature, they represent the best solution for time-series analysis. Our objective, as mentioned above, is to design and optimise highly-specialised models for the GHI forecast in the short-and mid-term (i.e. from next 15 to 120 min). This allows more accurate GHI forecasts ready to be used as input to PV simulators (e.g. [15], [16]) to unlock new energy scenarios and policies, such as DR, DSM and electric vehicles energy management; • mixing together the investigated pre-processing techniques and the neural models. The results analysis shows that the clear-sky index approach is the most successful, giving the most accurate results, particularly for mid-term predictions and the Echo State Network results VOLUME 9, 2021 the neural architecture that best performs in terms of prediction accuracy.

III. METHODOLOGY
Generally, when approaching applications based on neural networks, it is common practice to follow a strict and precise procedure. In [40], the author provides a comprehensive procedure to identify a dynamical system. This procedure consists of four steps as detailed in Figure 1. The first step, Experiment, corresponds to the problem analysis. Generally, this step approaches the problem by identifying the main characteristics and expected goals to start collecting a reliable data-set. Then, the available data must be divided into two different data-sets: Training-and Test-set, respectively. These data-sets are used in the training and test phases of the neural network. In this work, we propose a methodology for short-and mid-term GHI forecasts. Therefore, we exploit 5 years of GHI measurement sampled every 15 minutes for training and 1 year for test (see Section III-A). We also applied three different pre-processing techniques to the input data-set to obtain better performances in terms of prediction accuracy and computation level. The first necessary pre-processing technique aims to avoid missing and inconsistent measurements, i.e. removing errors due to sensor sampling, obtaining the so-called raw Data-set. Then, we applied two alternative filters to resulting raw dataset: i) Tikhonov regularization and ii) clear-sky index conversion. The Tikhonov regularization technique smooths the time-series trend avoiding possible spikes. While converting GHI time-series into clear-sky index values removes all possible seasonal trends. In both cases, we trained all our neural networks again, and then we optimized their architectures. Each of the three resulting data-sets has been split into Training-and Test-set (i.e. data never used during the training).
The second step, Model Structure Selection, allows generally to identify the correct neural network model to use. This step is crucial because the use of the wrong neural model can affect the expected results [41]. We design and compare four state-of-art neural networks in time-series scenario: i) Nonlinear Autoregressive Neural Network (NAR), ii) Feed-Forward Neural Network (FFNN), iii) Long Short-Term Memory Network (LSTM) and iv) Echo State Network (ESN) (see Section III-B). Moreover, we consider also both one-step and multiple steps predictions for each selected model. Making a one-step prediction means taking GHI samples at times t, t − 1, t − 2, . . . , t − n to predict GHI at time t + 1, i.e. in 15 min, since that is the distance of two consecutive samples in the data-set. Instead, for multi-step prediction, we evaluate two different techniques, iterative and multi-output, respectively. In the iterative approach, the artificial neural network has a single output, so it can only predict one-step ahead. For subsequent steps, the predicted value for time t +1 is used as one of the inputs for the prediction at time t +2, and so on. In the multi-output approach the network has n output nodes, giving the prediction for t +1, t +2, . . . , t +n in parallel.
Once the network model is identified, the network is first implemented and then trained. This step is called Model Estimation. In time series scenario, training a neural network is needed to provide i) the vector containing desired output data, ii) the number of regressors to define the prediction, iii) the vector containing the weights of both input-to-hidden and hidden-to-output layers and iv) the data structure containing the parameters associated with the selected training algorithm. Finally, the training phase produces a training error, which represents the network performance index [42]. To fair compare their performances in forecasting GHI, we trained all our neural networks with the very same Training-set, and we tested them with the very same Test-set.
The Model Validation step validates the trained network. Generally, validating a network allows evaluating its capabilities. In time-series predictions, the most common validation method consists of analyzing the residuals (i.e. prediction errors) by cross-validating the test set. This method allows performing a set of tests, including also the auto-correlation function of the residuals and the cross-correlation function between controls and residuals. This analysis provides the test error, that is an index considered as a generalization of the error estimation. This index should not be too high compared to training error. If this happens, the network could overfit the training set. Generally, if the network overfits the training set, the selected model structure contains too many weights. The structure is then subjected to the Network optimization and final validation. The process requires to return in the Estimate Model step to change and redefine some structural parameters by optimizing the whole architecture. For this purpose, the unnecessary weight must be pruned. Consequently, once the new weights are given, the network architecture must be re-validated. This is an iterative process as highlighted in Figure 1 (see dotted arrows). Generally, leading back to model estimation means that the problem has several local minima and finding the global minimum is not easy. Leading back to model structure selection means that the neural structure does not fit for purpose. Indeed, this is usually oversized. Thus, it is common to apply some pruning techniques. Generally, an initial model structure that is large enough to describe the system is determinate, and it is then reduced gradually until the optima structure is achieved. Finally, leading back to the experiment phase implies that certain regimes of the operating range are not reflected in the data-set; thus, additional tests are needed to acquire more information about the missing regimes. The whole process of model estimation, validation and optimisation is reported in Section III-B and Section IV where the best neural structure identified for each model and the prediction results are described respectively. Figure 2 summarizes the overall process. The rest of this section will describe in-depth the proposed methodology.

A. DATA-SET AND DATA TRANSFORMATION
We exploit a data-set of 6 years GHI measurements, sampled every 15 minutes from a meteorological station in Turin [21]. This data-set has been subjected to necessary some preprocessing. First, GHI can never be negative, so any negative values were set to 0. Then, comparing the raw GHI values in the data-set with the generated clear-sky values (I cs ), some of them were higher than the corresponding I cs . This is probably due to some sampling error by the sensor or, in some cases, to some short-term cloud enhancement effects [43]. Since most of these peaks usually occur when the solar zenith angle is big [44], which is also when the reliability of the sensor is lower, it was decided to filter these anomalous peaks, so for each GHI > I cs , GHI was set equal to I cs .
After these basic pre-processing, we divide the data-set into Training-and a Test-set as follows: 5 years for Training (2010-2014) and 1 year for Test (2015), with 175 296 and 35 038 samples, respectively. Then, we exploit about the 10% of the Training-set for validation purpose, in the training and model validation phase.
Furthermore, we introduced two data-set transformations to improve prediction performance: i) Tikhonov regularization and ii) clear-sky index.

1) TIKHONOV REGULARIZATION
Generally, raw GHI data is characterized by many sudden peaks and variations. For this reason, we decide to try smoothing the data to make the training of the neural networks easier. The disadvantage of this approach is that some information about the variability of the phenomenon will necessarily be lost. However, the potential benefit is that the networks could more easily follow the trends in the data, particularly on medium or long term predictions. This is a trade-off, meaning that the choice of this approach might depend on the required prediction horizon and the application for which the predictions are needed. Then to smooth the original data, we exploit the Tikhonov regularization [45]. This technique is used for time-series analysis and predictions in other domains, like glucose level prediction [46].
The filtered signal is given by: In Equation 1, ω is the N -dimensional first derivative of the input signal, while U d is the integral operator matrix (Equation 2).
To calculate ω, the function f (ω) (Equation 3) needs to be minimized.
In Equation 3 L d is the second derivative operator matrix, while λ d is the regularization parameter, set to 3000, in accordance to [46]. An example of the results of the filtering can be seen in Figure 3. This figure shows twelve consecutive days in the year on which the three main weather conditions occur: i) sunny day, ii) cloudy day, and iii) rainy day. Filtered data will be used for training the networks. Since the filter eliminates some peaks and spikes in the GHI data, giving a smoother signal, it should be easier for the networks to approximate it, potentially increasing the generalization capability of the model. Once the networks are trained, the original unfiltered GHI data will be used for testing, as shown in Figure 2. The solar irradiance time-series exhibits a seasonality component and is therefore non-stationary. Some authors assert that neural network can work well even with non-stationary time-series, given enough training data [27]. Others prefer to transform the solar irradiance into a stationary series [29], [33]. For a stationary series, statistical properties like mean and variance are constant in time. This should make it easier to predict than a non-stationary series. For this purpose, the clear-sky index (K c ) is used in literature. K c is the ratio between the expected irradiance under clear sky conditions and the measured one. For this work, after evaluating the performance of the network with the original GHI data, we chose to repeat the experiments using the clear-sky index, defined in Equation 4, where I m is the measured irradiance, and I cs is the calculated clear-sky irradiance.
The K c series was then used to train the networks. Since I cs ≥ I m , then 0 ≤ K c ≤ 1, so it is not necessary to scale the input data. To make predictions, K c values were used as inputs, then the predicted values (i.e. the outputs of the network were multiplied by the corresponding clear-sky values in order to obtain the GHI predictions) that could then be compared with the expected values.

B. PREDICTION MODEL BUILDING
Between the solutions we propose and compare, one of the most commonly used is the Feed-Forward Neural Network (FFNN). Contextually, we propose a Nonlinear Autoregressive Neural Network (NAR) architecture also based on the Multilayer Perceptron like the FFNN. In addition, we propose a Long Short-Term Memory Network (LSTM) architecture, often used for time-series forecasting and successfully applied to GHI prediction in recent studies [34], [35]. Finally, we propose an architecture based on the Echo State Network (ESN), which has shown promising results in time-series forecasting [37]. In the following subsections, we will present the neural architectures considered in this study. For each of them, we will describe the properties and strengths, giving particular emphasis to the hyperparameters taken into consideration and properly investigated. In Section IV, instead, we will present and detail all the network configuration w.r.t. the exploited data-set. This is because the optimization of the architecture strictly depends on the data-set under consideration.

1) NONLINEAR AUTOREGRESSIVE NEURAL NETWORK
The Nonlinear Autoregressive Neural Network is an ANN that extends a traditional linear autoregressive model. It is particularly suitable for non-linear time-series that report unexpected spikes and fleeting, transient periods [47]. The general structure is shown in Figure 4. Thus the model can be described as: where F is an unknown non-linear function and at time t the network is fed with the n regressors of the signal y. Progressively, we determined the hyperparameters of the network, in particular the number of both regressors and units in the hidden layer. Since there is no rule to determine the best number of regressors mathematically, the choice was made by trial-and-error, going from 2 up to 20 regressors. Regarding the units in the hidden layer, we overestimate the initial number selecting 30 hidden units. This because, in this methodology, we can adopt pruning functionality that allows to eliminate superfluous weights and determine the best network configuration [21]. Once the parameters were selected, the network training was performed using the Levenberg-Marquardt algorithm [48]. Consequently, we have pruned the obtained network with the Optimal Brain Surgeon algorithm [49], and we trained the network again before making the inference for the predictions.

2) FEED-FORWARD NEURAL NETWORK
The third neural network is based on Feed-forward Neural Network (FFNN). Also based on the Multilayer Perceptron, it is characterized by a dense fully connected layer, where information only moves from one side to the other.
In FFNN models, generally, the output of each node is calculated using an activation function applied to the weighted sum of the inputs. The activation function is usually a nonlinear one (e.g. a common choice being the hyperbolic tangent (tanh)). Considering a MLP with n inputs, one hidden layer with m units and one output, the output of the network can be modelled as follows: In equation 6, F and f are the activation functions for the output and hidden layer respectively, W i and w ij are the weights between hidden and output layers and between input and output layers, W 0 and w i0 are the biases, and u j are the inputs.
In a preliminary phase, we considered different architectures with different hidden layers. We found that the best compromise between prediction and computation accuracy is an architecture with two hidden layers. Once the model has been chosen, we determined the hyperparameters, i.e. number of regressors and activation function. For the hidden layers, we have opted for the hyperbolic tangent (tanh) activation function, since it is a common choice and gives good results [50]. For the output layer, instead, we have chosen a linear activation function. As for the number of inputs and units in the hidden layer, there is no established mathematical technique to choose the best parameters, also in this case, we opted for a trial-and-error approach. First, we arbitrarily decided to select the number of units as two times plus one the number of inputs for networks with a single hidden layer. Then, we investigated the number of regressors from 2 to 20, evaluating the performance for each case. The optimization algorithm used for training is the Adaptive Moment Estimation (Adam optimizer) [51]. This algorithm is closely related to two other optimization techniques, Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad), combining their features together. To avoid the phenomenon of overfitting, we have used the early-stopping technique [52]. In practice, during the training phase, training is stopped when there is no improvement in the validation set for a few steps. The additional benefit of early-stopping is the significant reduction of the training times.

3) LONG SHORT-TERM MEMORY NEURAL NETWORK
The Long Short-Term Memory Neural Network (LSTM) represents an evolution of a canonical recurrent neural network developed to solve the ''vanishing gradient'' problem [53]. This is a problem that arises during the training of such neural networks with backpropagation methods. These architectures are particularly suitable in the prediction of time-series because, thanks to their structure, they are able to preserve the error that can be backpropagated through time and layers. By maintaining a more constant error, they allow recurrent nets to continue to learn over many time steps. Since the LSTM was proposed, different variations of the architecture were developed [54]. The typical LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. The LSTM unit structure is shown in Figure 5. The cell state, represented by the top horizontal line in Figure 5, contains information that is passed on to the next cell and can be modified by the gates. First, the forget gate decides what information is kept or thrown away from the cell state. The sigmoid activation function outputs a value between 0 and 1, where 0 means to forget the previous state completely and 1 to keep it as is. Then, new information can be stored in the cell state. This is done by the input gate, which is composed of a sigmoid and a tanh layer. Finally, the output of the LSTM unit is determined by the output gate based on the cell state.
Mathematically, LSTM cell computes its short-term state h t , its long-term state c t and its output y t at each time step t basing the following set of equations expressed in vectorial: where W xi , W xf , W xo and W xg are the weight matrices of the connections to the input vector x t , W hi , W hf , W ho and W hg are the weight matrices of the connections to the previous short-term state vector h t−1 and b i , b f , b o and b g are the bias terms. For our implementation, we used an LSTM layer for the hidden part of the networks, while the output layers are simple dense layer, as for the FFNN. As already discussed for the FFNN in Section III-B2, the same trial-and-error approach was adopted to determine the number of regressors, hidden units and hidden layers. The Adam optimizer was again used for training, and the same early-stopping technique was also used to avoid overfitting.

4) ECHO STATE NETWORK
The Echo State Network (ESN) is a recurrent neural network composed by an input layer, a recurrent hidden layer called ''reservoir'' and an output layer (see Figure 6). ESNs are an implementation of so-called ''reservoir computing'' [55]. The main idea is to have a fixed, random, sparsely connected recurrent layer, and a readout layer, connecting the reservoir to the output. In the simplest architecture, these output connections are the only trainable ones. It is also possible to add direct trainable connections from input to output, bypassing the reservoir, and feedback connections from output to reservoir [56].
The reservoir must satisfy the ''echo state property'' (ESP), which guarantees that the effects of the initial conditions vanish as time passes. It has been empirically observed that the ESP usually holds when the spectral radius of the reservoir weight matrix is lower than one [56]. However, this condition is neither sufficient nor necessary. Stricter conditions have been determined by Yildiz et al. [57]. Jaeger [56] gives the following definitions for the ESN reservoir state (Equation 8) and output (Equation 9).
where x t is the reservoir state at time t, u t is the input vector, y t is the output of the network, z t = [x t ; u t ] is the system state, and W, W in , W fb , and W out are the reservoir weight matrix, the input weight matrix, the optional feedback matrix, and the output matrix, respectively. One interesting feature of ESNs is that they are straightforward to train, unlike other recurrent neural networks. In ESNs, the recurrent layer is fixed, and this greatly simplifies the training process. Some hyperparameters need to be determined. The number of inputs (regressors) was determined by a trial-and-error approach. For the size of the reservoir, Lukoševičius [58] suggests that a ''big'' reservoir is usually better, given the sparsity of the connections between its units. Clearly, it is not possible to implement an arbitrarily big reservoir, since memory consumption needs to be taken into account. Keeping this in mind, the first choice was to use 500 units. Starting from that upper limit, reservoirs with 50, 100, and 200 units were also tested, to verify the assumption that ''bigger is better''. The reservoir density was chosen again by trial-and-error, and 0.1 was selected for its value. Another important parameter is the spectral radius. As already discussed, a value lower than 1 should guarantee the echo state property. It is usually a good choice to choose a value close to 1, as suggested in [58], so 0.9 was selected.

C. MULTI-STEP PREDICTIONS
Generally, the inference of neural networks to obtain the prediction of time-series can be done with three different modes [59]: • iterative prediction; • multi-output neural network; • dedicated network for each forecast horizon. The iterative method is often used, particularly for short-term horizons (i.e. few next time-steps). It is based on a network model trained for single-step predictions (i.e. with only one output), where for each step after the first one the prediction for the previous step is used as input. This has the disadvantage that the prediction errors tend to accumulate. Another option is to use a single network with n outputs, where n is the number of steps to predict. Unlike the previous method, it is necessary to determine the number of steps in the future that need to be predicted when the model is built. However, it is expected that, for longer forecast horizons, this method will give better performance than the iterative method. Finally, it is possible to use a different network for each forecast horizon. However, the study in [59] shows that this methodology has better results for short-term horizon (e.g. next 2 or 3 timesteps), but is outperformed by the multi-output method for longer horizons.
In this work, we evaluate the first two methods proving that the multi-output approach outperforms the iterative one (see Section IV-B), and it was therefore chosen for all the following experiments. This does not apply to the NAR network, however, since the realized model already contains a function to perform multi-step predictions, which was therefore used.

IV. RESULTS
In this Section, we present our experimental results. First, we briefly describe the statistical indicators used to analyse and compare the predictions. Then, we prove that by using a multi-output artificial neural network for predictions with many steps ahead, we can obtain better results than the iterative method, justifying the choice of the former approach. Finally, we describe the prediction performances obtained with raw GHI data, Tikhonov regularisation and clear-sky index.
The proposed ANNs are implemented in Python by using the Keras library with Tensorflow backend. To train and validate them, we run our simulations in a server equipped with a CPU 2× Intel Xeon E5-2680 v3 2.50 GHz and 128 Gb of RAM.

A. STATISTICAL INDICATORS
State of the art has a large number of statistical indicators to evaluate the performances of neural networks in terms of predictions [60]. In the study of time-series, the three main indexes adopted are: • MAD -the Mean Absolute Difference between predicted and observed values; • RMSD -the Root Mean Square Deviation defined as the standard deviation of the difference between the predicted and the observed values; • R 2 -the Coefficient of Determination, defined as square of the correlation (R) between predicted and observed values. Their mathematical expressions are shown in Equations 10, 11 and 12 respectively, where I m is the measured value of irradiance, I p is the predicted value, and the subscript avg indicates the average value. MAD and RMSD are expressed in percentage and not in absolute units (i.e. W m −2 for irradiance). For MAD and RMSD, a lower value indicates a smaller error and therefore, better performance. R 2 , on the other hand, shows the correlation between real and predicted values, where a value of 1 means complete correlation, while lower values indicate a lower correlation factor. Therefore, to evaluate the performances of each single neural network model, first we need to compare the final GHI forecast with the corresponding real trend and express the error rate with statistical performance indicators described above. Then, to perform a correct fair comparison between the different neural network models, we compare their performance indicators. To do so, all the models must be trained and tested i) with the very same original data-set to which ii) we applied the very same pre-processing technique and iii) after their optimization phases succeeded.

B. ITERATIVE VS. MULTI-OUTPUT NETWORKS
As introduced in section III-C, we have applied two different methodologies for multi-step predictions, iterative and multioutput. The former uses a single-step model and iteratively generates multiple predictions; the latter gives the desired n predictions exploiting a single step of calculation (i.e. n steps in the future). Figure 7 shows the comparison for the FFNN exploiting raw GHI data with time horizon from 15 min to 2 h. The performance is similar for the first 30 minutes, but the multi-output network starts improving for mid-term time horizons. Increasing the time horizon will result in the accumulation of the error, further widening the gap between the two approaches. Figure 8 depicts the same experiment for the LSTM network still exploiting raw GHI. Again, the experimental results report similar behaviour with the LSTM network for the first 30 min. Instead, we do not apply our iterative approach to NAR and ESN because their models already embed a feature to perform multi-step predictions. These results justify our choice of using networks with multiple outputs to predict GHI many steps ahead, in accordance with [59]. Consequently, all the results illustrated in the following sections are based on the multi-output approach.

C. GHI PREDICTION EXPLOITING RAW GHI DATA
In this section, we present the performance of our neural networks optimised to use the raw GHI data-set for both training and test (see Section III-A).
Starting from NAR, we used as reference the architecture deeply described in [21]. Consequently, we designed the same network (with 7 regressors and 30 neurons, before pruning), exploiting the very same data-set (for both training and test) to guarantee a fair comparison. Then, as described in Section III-B1, different numbers of regressors were evaluated, trying to improve the performance and find the best network. Table 2 reports the comparison of raw GHI predictions between the reference model with 7 regressors [21] (hereinafter NAR-7) and our best model with 12 regressors (hereinafter NAR-12) in terms of MAD, R 2 and RMSD.
The results report that NAR-12 performs slightly better than NAR-7 for all three statistical indicators, particularly for longer time horizons. Up to 45 min, the performance of predictions is quite similar. The improvements can be noted from 60 min onward. At 120 min, NAR-12 performs better than NAR-7 with improvements of about 6%, 0.05 and 8% for MAD, R 2 and RMSD, respectively. Consequently, we chose this new model to be compared with the other networks.
For FFNN, LSTM and ESN, we have tested different configurations, as described in Section III, and we compared their RMSD values to decide which parameters give best results. In the following, we report the configuration of each neural network: • LSTM: 16 regressors with 2 hidden layers; the first layer with 10 neurons and the second with 5 neurons.
• ESN: 3 regressors with a reservoir of 500 neurons. Looking at these parameters, it can be noted that both FFNN and LSTM work better with a greater number of regressors, while the ESN has the best performance with just 3 regressors. As for the size of the networks, the ESN works well with a big reservoir, as recognised in literature [58]. The FFNN also has better results using a big amount of units. Whilst, the LSTM works better with a smaller architecture.  Our first expectation was that the multi-output architecture used for multi-step prediction improves the results of the NAR, particularly for longer time horizons. We also expected that the LSTM would perform better than the FFNN. In fact, its recurrent structure should be more suitable to model the temporal behaviour of GHI time-series. Moreover, LSTM has shown very good performance in many tasks, including timeseries forecasts [54].
Figures 9(a) to 9(c) show plots of MAD, R 2 and RMSD over the time for all the proposed neural network architectures. These plots show that the results are very similar for short-term predictions (i.e. up to 60 min). Then, the four trends start diverging each other for all three indicators. Performances of NAR-12 (see the green dotted line) rapidly decrease compared to the other networks. Contrary to our expectations, LSTM does not clearly outperform the FFNN (see red and yellow dotted-lines, respectively). In fact, both architectures have very similar trends with a few negligible differences. These plots strongly highlight the very good results of the ESN for mid-term forecasts, which clearly outperforms the other neural networks. Table 3 details the values of MAD, R 2 and RMSD for the four networks up to 120 min ahead predictions. In our view, this is a good time horizon in which the forecast error is still acceptable. Table 3, at 120 min ESN outperforms the other neural networks with MAD = 29.03%, R 2 = 0.88 and RMSD = 55.22%, clearly improving the performance of about 12%, 0.07 and 14% (for MAD, R 2 , and RMSD, respectively) w.r.t. NAR-12.

D. GHI PREDICTION EXPLOITING TIKHONOV REGULARISATION
As a second analysis, we focused on predictions based on the Tikhonov regularisation applied to the raw GHI dataset, as described in Section III-A1. As discussed in [46], Tikhonov regularisation is not applicable to real-time data. Thus, we exploited the ''hybrid'' approach presented in [46] in which the Tikhonov regularisation is applied only on the training-set used for training our ANNs. Instead the test-set, used to assess the performance of our ANNs, consists of raw GHI data.
Since the data-set is different w.r.t. our previous NAR-7, we applied the very same methodology described in [21] to design the best NAR suitable for this training-set pre-processed with Tikhonov regularisation. The resulting architecture (hereinafter NAR-10) is characterised by 10 regressors and 21 neurons, before pruning. To design and train the best network architecture for FFNN, LSTM and  ESN, we applied the same trial-and-error approach described in Section IV-C. In the following, we report the configuration of each neural network: • FFNN: 10 regressors and 2 hidden layers; the first layer with 50 neurons and the second with 25 neurons.
• LSTM: 16 regressors with 2 hidden layers; the first layer with 10 neurons and the second with 5 neurons.
• ESN: 3 regressors with a reservoir of 100 neurons. Table 4 reports the values of MAD, R 2 and RMSD for the four ANN up to 120 min ahead predictions. Comparing these results with those in Table 3 (i.e. prediction performance exploiting raw GHI data), we can notice that the performance of all the ANN gets worse when we exploit this ''hybrid'' approach. In general, the error on predictions is too high and it is not acceptable. This is also confirmed by the trends reported in Figures 10(a) to 10(c). Thus, this ''hybrid'' approach is not suitable for this application scenario.
If we apply the Tikhonov regularisation also on the testset, performances significantly improve in average of about 10%, 1 and 25% for MAD, R 2 and RMSD, respectively (see Table 5 and Figures 11(a), 11(b) and 11(c)). However, the Tikhonov regularisation should be rethought to work in real-time.

E. CLEAR-SKY INDEX PREDICTION
As a third analysis, we focus on predictions exploiting the clear-sky index (K c ). The same network configurations for the NAR, FFNN, LSTM and ESN used for the GHI prediction exploiting Tikhonov regularisation (see Section IV-D) were also applied to this new data-set.
The transformation of GHI into K c removes the seasonal trends due to the changing position of the sun during the year, and the clear-sky model already takes into account the atmosphere turbidity. So the networks only have to predict the stochastic component due to clouds. The expectation is that the results will be better than those described in the previous sections because these filtered data should be easier to model for the neural networks. The analysis of the obtained results confirms this assumption. As shown in Figures 12(a) to 12(c), the gap between ESN and the other three neural networks is reduced in terms of MAD, R 2 , and RMSD. The ESN has only a slight improvement but still is more performing than the others. It appears that the ESN was already able to extract the seasonal trends from the raw GHI data better than the other networks. When only the stochastic component needs to be predicted, and the rest is handled by the clear-sky model, the differences among the networks greatly decreases. As reported in Table 6, at 120 min ESN is still outperforming the other neural networks with MAD = 25.47%, R 2 = 0.88 and RMSD = 55.02%. Comparing the ESN with the NAR-10, which gives worst performances among the four neural networks, there is an improvement of about 0.6%, 0.03 and 5.4% (for MAD, R 2 , and RMSD, respectively).

F. FINAL REMARKS ON PREDICTION RESULTS
Considering the results discussed in the previous sections and reported in Figure 13, the main findings can be summarised as follows: • designing ANNs following the multi-output approach to forecasting GHI provides better performances than the iterative approach; • the Echo State Network is the ANN architecture that better performs among those tested; • using the clear-sky index the prediction accuracy significantly improves and allows to use of smaller networks with fewer regressors; • best performances are achieved when Tikhonov regularisation is applied. However, to be suitable for GHI forecasts, it should be rethought to work in real-time; • 120 min is the maximum time horizon reached by our ANNs in which the forecast error is still acceptable.

V. CONCLUSION
The objective of this manuscript is to propose a methodology for the short-(i.e. from 15 up to 60 min.) and mid-term (i.e. from 60 to 120 min.) GHI forecast based on four different artificial neural networks due to the highly non-linear nature of the physical phenomenon of radiation: • Nonlinear Autoregressive Neural Network; • Feed-Forward Neural Network; • Long Short-Term Memory Network; • Echo State Network. In detail, we specifically designed, optimized and compared these four ANNs architectures to understand which was the best to use in this research context.
Consequently, to fully optimize the proposed methodology, we investigated these neural architectures by applying three different approaches based mainly on the use of filtering techniques in the pre-processing phase of the dataset. Specifically, in the first scenario the raw GHI was directly used to train the networks and to make predictions. Secondly, the training set of the GHI series was filtered with Tikhonov regularisation before performing the training procedure and lastly, the GHI time-series was transformed into the clear-sky index series, which was then used to train the networks and make predictions. Lastly, we also compared which of the iterative and multi-output prediction architectural models were performing best.
The obtained experimental results suggest a few interesting considerations. First of all, using a multi-output approach significantly improves the accuracy of multi-step predictions when more than 60 min in the future is required. In our case, this means that for forecast horizons of 45 or 60 min and longer this approach is to be preferred over a single-output model used iteratively. Considering that a single-output model is simpler, it may be better to use the iterative approach when a long-term prediction is not needed for the application.
From a neural network point of view, the experimental results demonstrate that the ESN gives very good results compared to other models, especially when directly predicting GHI. Moreover, compared to the other models, it needs a smaller number of regressors to give very accurate results. The adoption of this model in the context of the GHI forecast has been insufficiently investigated. Given the lack of research in the literature, we are firmly convinced that these findings are genuinely new results worthy of further studies.
The proposed model that uses Tikhonov regularisation to filter the training data and uses the unfiltered GHI for the testing part, which is used successfully in other fields involving time-series forecasting, like blood glucose predictions, does not appear suitable for GHI forecasting. Therefore, filtered data was used in input for the testing part, too. This is not ideal since this method would require, when the system is used for actual predictions ''in the field'', to filter the data every time a new forecast is requested, which might limit the applicability of the method. Moreover, the Tikhonov filter was applied to the whole testing set, divided into long segments, but in a real application, using real-time data, this is not possible, because new data would have to be filtered when it becomes available (e.g. in our scenarios every 15 min). The algorithm would have to be modified accordingly, and the way it might affect the results needs to be studied more in detail. However, with this approach (i.e. exploiting filtered data in the entire process, from training to inference) the results were more interesting, showing that potentially, filtered data allows maintaining a better accuracy for short-and mid-term forecast.
Finally, from a filtering point of view, the results show that the clear-sky index, K c , greatly improves prediction accuracy when predicting many steps ahead, particularly for NAR, FFNN and LSTM networks. As already stated, the improvement for ESN is small. However, using K c allowed the ESN to give better results with a smaller reservoir, which is important in terms of memory usage.
ALESSANDRO ALIBERTI (Member, IEEE) received the B.S. and M.S. degrees in cinema and media engineering with the Politecnico di Torino, and the Ph.D. degree in computer engineering from the Politecnico di Torino, in 2020. He is currently a Postdoctoral Researcher with the Politecnico di Torino. During his academic experience, he proposes innovative and optimized data stream processing and machine learning methodologies, ranging from energy and environmental data and moving to data from CPS systems. Moving toward smart and sustainable energy use, his research interest includes the design and the optimization of innovative machine learning methodologies by exploiting primarily neural networks for the forecasting of time-series in smart city context. DANIELE FUCINI received the M.S. degree in computer engineering from the Politecnico di Torino, Turin, in 2019. During the academic career, his activities mainly focus on the design of innovative neural networks techniques to forecast global horizontal irradiance by investigating the implementation of such neural models on embedded systems.
LORENZO BOTTACCIOLI (Member, IEEE) received the Ph.D. degree (cum laude) in computer engineering from the Politecnico di Torino, Italy, in 2018. He is currently an Assistant Professor with the Energy Center Lab, Politecnico di Torino. His main research interests include smart energy, smart city, and smart communities, with focus on software solutions for planning, analysing, and optimizing smart energy systems, and for spatial representation of energy information. He has been with the Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino. He is currently a Full Professor of computer engineering with the Università di Bologna. His research interests include parallel computing for distributed embedded systems, such as multicore and sensor networks, software solutions for smart cities, and simulation and analysis of biological systems using parallel architectures. In the fields above, he has authored over 200 scientific publications,between 2000 and 2018.
EDOARDO PATTI (Member, IEEE) received the M.Sc. and Ph.D. degrees in computer engineering from the Politecnico di Torino, in 2010 and 2014, respectively. From 2014 to 2015, he held an academic visiting position with The University of Manchester. He is currently an Assistant Professor with the Politecnico di Torino. His research interests include ubiquitous computing, the Internet of Things, smart systems and cities, software architectures with particular emphasis on infrastructure for ambient intelligence, software solutions for simulating and optimizing energy systems, and software solutions for energy data visualization to increase user awareness. In the fields above, he has authored over 70 scientific publications,between 2011 and 2018.