A Selective Ensemble Approach for Accuracy Improvement and Computational Load Reduction in ANN-based PV power forecasting

Day-ahead power forecasting is an effective way to deal with the challenges of increased penetration of photovoltaic power into the electric grid, due to its non-programmable nature. This is significantly beneficial for smart grid and micro-grids application. Machine learning and hybrid approaches are well assessed techniques, able to provide effective forecasting with a data-driven approach based on previous measurements from existing power plants. Ensemble methods can be employed to increase solar power forecasting accuracy, by running several independent forecasting models in parallel. In this paper, a novel selective approach is proposed and assessed, where independently trained neural networks are evaluated in terms of accuracy, in order to properly select a suitable forecasting. Moreover, in order to reduce the associated computational burden, suitably developed new normalization approaches are proposed and evaluated. The considered experimental case study shows that the combination of the proposed procedures is able to increase accuracy and to mitigate the overall computational load, resulting in a simple and lightweight algorithm. Additionally, a comparison with other commonly used techniques has shown that the proposed approach is robust with respect to dataset limited size and discontinuities.


I. INTRODUCTION
I N the last years, energy system have been progressively shifting towards Renewable Energy Sources (RES) integration in the Smart Grid (SG). In particular, in 2015 the United Nations included among the 17 Sustainable Development Goals (SDGs) one objective regarding energy [1], underlining the need to significantly increase the share of renewables in the global energy mix, together with the overall energy efficiency of the system [2].
Among the RES, Wind turbines and solar photovoltaic (PV) are the most relevant technologies exploiting resources belonging to this category and will be essential to meet future energy needs while reducing the carbon emissions [3], [4]. Nevertheless, their intrinsic variability causes problems in balancing the energy supply and demand that still have to be fully addressed [5]: thus, more accurate energy forecasting approaches would be helpful for SG and stand-alone microgrids stability, balancing of demand and supply, integration of distributed generation, monitoring, energy market [6].
PV power production, in particular, strongly relies on meteorological condition. Of particular importance is the solar radiation received by the solar module, that can generally be decomposed into two main contributions: one deterministic, depending on the position of Earth and the Sun, and one stochastic, due to atmospheric conditions and clouds [7].
Distributed Energy Management Systems (EMS) are envisaged for an efficient utilization of PV power in smartand micro-grids, by suitable integration of forecasting and operation planning [8].
In order to perform the PV power forecast, different forecasting models can be implemented. Depending on their nature, they are generally divided into three classes: physical, statistical and hybrid [9].
Physical methods, also referred as parametric or "white box" methods, directly use available weather conditions as input features to a physical model of the PV systems; thus, they strongly rely on the accuracy of the weather forecast they are provided [10] and a proper selection of weather parameters. Furthermore, a complete model of both the system and the surrounding environment is hardly achievable with a high degree of precision.
Data-driven methods, such as Markov chains and exponential smoothing, are statistical approaches, where the underlying model is based on previous measurements rather than on a physical model of the system [11]; thus, they strongly rely on the accuracy of historical data collected. Among data-driven methods, "black box" approaches such as e.g. Machine Learning (ML) and Artificial Neural Networks (ANN) have been reported to guarantee high level of accuracy when adopted to the power forecast of PV system [12]. More recently, the use of recurrent neural network models has been also proposed for short-term residential load forecasting in SG applications [13].
Currently, among several data-driven approaches proposed in literature, a great attention was gained by Deep Learning, Convolutional Neural Networks [14] and Long Short-Term Memory (LSTM) networks [15].
Hybrid methods are usually able to suitably combine the previous approaches, actually improving solar forecasting performance based on the historical data by compensating deviations [16]. Several approaches have been succesfully proposed in literature: in [17] an improvement in wind power forecasting is obtained by means of independent forecasting models hybridization; in [18], a physical-hybrid-based forecasting model is used to effectively monitor PV modules and to evaluate faults or anomalous trends that may affect the PV plant; in [19], different components of irradiance and additional weather features are employed to improve the input of ML models.
Nowadays, in addition more attention is given to the "ensemble methods" development which improve forecast accuracy and eliminate limitations of single models [20]. Generally, an "ensemble" consists of a set of individually trained models whose predictions are combined when classifying novel instances. [21] Besides, these ensemble methods are primarily divided into cooperative ensemble where each of the ensemble members performs the same task and their predictions are aggregated to obtain the improved performance [22] or competitive ensemble where, on the contrary, the best prediction of a relevant ensemble member is selected for a particular input. [23]. Hence, ensemble methods, could be obtained both by using different training sets and by having ANN with different structures and this proved to be effective both ambient temperature [24] and in solar radiation forecast [25].
Recently, ML methods for PV have been based also on LSTM models and aggregation functions, in order to achieve accurate predictions thanks to their recurrent architecture and memory units [26]. Effective strategies for aggregating the predictions of deep learning models are also addressed in [27], exploiting long-term information to properly model solar irradiance fluctuations.
In this view, the authors recently proposed a preliminary attempt to employ several parallel ML models, suitably selected, as an effective way to improve power forecasting accuracy [28]. However, the use of the ensemble approach generally implies reaching a tradeoff between the improved accuracy and the increased computational load.
In the present work, authors will focus on the day ahead PV power prediction by means of ML techniques with an ensemble approach. Here novel methodologies aiming at increasing the forecast accuracy and reducing the computational load are proposed. In particular, accuracy is addressed analyzing the new selective ensemble approach, where several independent models are assessed to properly choose the most effective ensemble. Moreover, in order to reduce the computational burden associated with the proposed ensemble procedure, new normalization approaches are presented and evaluated.
Finally, a suitable experimental case study is considered in order to show the forecasting accuracy improvement and the reduced computational load, thus resulting in a simple, robust and lightweight algorithm, capable of being implemented on an industrial micro-controller.
The present paper is structured as follows: in Section II the here employed physical hybrid approach is presented, with the adopted metrics for performance measurement and a presentation of the case study. In section III, the proposed parallel computing approach is analyzed, specifically addressing the accuracy of the selective ensemble technique. The computational burden issues are addressed in section IV, where suitably defined techniques for pre-and postprocessing data are presented to improve the performance. Numerical results will be reported in Section V to validate the proposed approach. Section VI will conclude the paper with a brief discussion of perspective and future research.

II. THE FORECASTING MODEL
Among the previously mentioned approaches, hybrid predicting methods combining deterministic models with ANN have been demonstrated to improve forecasting accuracies [16]. In this work, a simple Multi-Layer Perceptron (MLP) with 12 neurons in the first hidden layer and 5 in the second one is hybridized with the physical Clear-sky Solar Radiation model (CSRM) [29]. This method is called Physical Hybrid Artificial Neural Network (PHANN) and it is described in detail in [30], while a simple scheme of its functioning is reported in Figure 1.
This ML architecture has been chosen for its simplicity and lightweight in view of its implementation on an industrial micro-controller [18], [31]. This approach was already demonstrated to provide a significant accuracy improvement with respect to both deterministic and standard ML approaches [16], thus providing a reliable day-ahead prediction based on forecasted weather parameters, which are the input of the network trained with the historical dataset.
However, a well-known drawback with neural networks non-linear approximation ability is the possibility to occur in bad generalization. In fact, an ANN properly sized, in terms of number of neurons in the hidden layers, can be correctly trained with an arbitrary training data set; nevertheless, it could learn both investigated dependencies and noise (worsening the predictive ability of the network) giving rise to the well-known issue of overtraining, that is when the network learns most likely the gross existing links among parameters first, and then the fine structure that is generated also by noise [32]. This issue can be mitigated by control techniques on the convergence of the ANN model, as described in the following section.

A. CASE STUDY
In order to check the performance and the accuracy of the novel approaches proposed in this work, the study was conducted on experimental data collected from the SolarTech LAB at Politecnico di Milano, Italy, located on the rooftop of Department of Energy [33]. In this Laboratory, different technologies are simultaneously tested and in order to test the modules under different meteorological conditions, each of them is singularly controlled by a dedicated micro-inverter. All PV modules are oriented with an azimuth γ equal to −6 • 30 , assuming 0 • is the South positive West, and a tilt θ of 30 • . The tilt of the modules can be modified together with the distance among arrays.
In the order to properly assess the capability of the proposed model, the production from a single monocrystalline module of nominal power of 245 W p is considered, by adopting its DC power recordings of the year 2017. The PV module and micro-inverter datasheets are reported in Table 1. A publicly available dataset with all these measurements has been previouly provided by the authors in [34] as described by [35].
Meteorological forecasts for the following days were collected from a commercial weather service provider every day at 11 p.m. The weather parameters considered by the proposed model are summarized in table 2, where they are listed together with the theoretical irradiance in clear-sky conditions, computed according to the deterministic CSRM, the day of the year (DOY), the hour of the day (h) and the desired target P , i.e. the measured power production in the Finally, the here considered dataset covers 269 days for the year 2017; however, the described procedure has general validity, and it can be easily extended to different time range, PV modules, power plants and also load forecasting, according to the suitable data set availability.
In order to properly set the forecasting model avoiding overtraining issues, the data were divided into two sub sets, called training and validation sets. They comprises the parameters reported in Table 2, together with the measured power of the PV module, in order to compute errors and evaluate accuracy, as proposed in [36].
In particular, the dataset is divided randomly assigning 90% of data to the training set and 10% to the validation set. These specific choices and shares have been previously defined through sensitivity analysis conducted in [37].

B. ACCURACY MEASUREMENT
The effectiveness and the accuracy of the forecasting techniques must be proven by means of evaluation indicators. A wide definition's variety can be found in literature [38], [  and the most common will be here proposed. Since the considered model is for day-ahead forecasting with hourly resolution, the root for all the other metrics is the hourly error e h , defined as: where P m,h is the measured power corresponding to the hour h and P p,h is the prediction provided by one of the forecasting methods. For a fair accuracy assessment and easy identifiable indicators, normalized error have been preferred. Hence, starting from this basic definition, the normalized mean absolute error (NMAE), which is the Mean Absolute Error (MAE) normalized by the nominal power P n of the plant and the normalized root mean square error (nRMSE) which is defined as the Root Mean Square Error normalized by the maximum power recorded in the PV system have been adopted. Besides, the weighted mean absolute error (WMAE) is defined as follows: (2) and it is based on the total energy production in the considered period: when this last data is significantly lower than the predicted one, large values can occur (often significantly above 100%). Additionally, two novel error metrics have been introduced by the authors in [40]; these are specifically tailored on PV power production in order to provide a more significant evaluation of the overall error behavior.
The enveloped mean absolute error (EMAE) is similar to the WMAE, but an upper bound limited to 100% is set by its definition, to avoid the previously described drawback of the standard WMAE indicator: Finally, the objective mean absolute error (OMAE) whose definition has been specifically designed to introduce an adaptive normalization factor, more representative of the maximum level of irradiance theoretically available in a specific time: where G CS P OA,h is the theoretical irradiance on the plane of the array provided by the CSRM for a specific time and G ST C is the reference irradiance in standard test condition. This correction factor makes this indicator more related to the theoretical available power in any time of the day and the year, with respect to just the nominal capacity P n used in NMAE. Night measurements are hence included in the definition of these metrics to properly compare errors in different periods of the year.

III. ENSEMBLE FORECAST
The forecasting technique adopted in this work is based on a ML method, such as ANN. During the network initialization, weights among neurons are firstly randomly assigned and later optimized during training. As highlighted in [41], this process presents many local minima, hence the obtained values for weights can greatly differ from one run to the other, resulting in a differentiation of the errors committed by the networks on different subsets of the input space. For this reason, an "ensemble method" is usually adopted, as reported in [42]. This method consists in performing the forecast of power profile with various models and finally averaging their results to obtain the desired single day profile.
In our work, the considered models for the ensemble are parallel runs of the MLP defined in Section II: in particular, for each run, commonly referred to as "trial", the network's weights and biases are re-initialized and then trained; thus, every considered single MLP is trained on different training and validation datasets, assuring a proper diversity among all the parallel trials. For obvious reasons, the number of computed trials and their respective accuracy are parameters that have a great influence both on the overall prediction reliability and the computational burden, thus, in the following subsection 3.A we will describe how the minimum number of parallel MLPs can be set, while in subsection 3.B we will define the acceptance criteria in order to select a single trial for the considered ensemble.

A. NUMBER OF TRIALS
To evaluate how many MLPs should be trained in parallel in order to have a reliable prediction, several ensembles are considered for every day of the year, each composed of an increasing number of trials, with a maximum of 250. For every single MLP added to the ensemble, the corresponding predicted power profile is then obtained averaging the current number of trials; the error committed throughout the whole year range is then evaluated for any trial added to the ensemble, and the corresponding metrics are computed as described above. An additional parameter can be determined with respect to each indicator and the number of considered trials, i.e. the marginal benefit M % of increasing the number of trials N t , e.g. for the EMAE indicator it can be defined as: It indicates the performance increase (error reduction) obtained for any trial added to an ensemble. For the sake of generalization, this process has been repeated 8 times, considering 8 independent ensembles of maximum 250 trials each, for a total of 2000 independent trials.
In Figure 2, two graphs are shown: the first one is representative of the error committed in terms of EMAE indicator, with respect to the number of adopted trials. The second graph, on the other hand, shows the M % as defined in equation (5). In both graphs, the grey lines represents the eight aforementioned independent ensembles, while the orange line their average.
As it is possible to see for each model, accuracy increases (reduction of EMAE) with the number of trials, approaching an asymptotic value: Nevertheless, as seen in the figure below, the marginal benefit, despite being generally positive, quickly tends toward zero in less than 40 trials. Since the computational load linearly increases with the number of parallel trials (at each repetition, a new network is created, trained and the forecast is computed), a trade off between the time required to predict the power profile and the increase of performance is required. After the preliminary study described above, in the following a number of trials equal to 40 is adopted. Finally, the purpose of the marginal benefit is a criterion to properly define the number of trials to be adopted in the ensemble.

B. TRIALS SELECTION
One of the main advantages of forecasting PV power is that almost half of the desired outputs are known a priori. For instance, power production is known to be zero during night hours. Additionally, power production in daytime is expected not to exceed nominal power, modulated by the hourly profile of maximum irradiation, which can be daily determined by the CSRM. Thus, a mask can be defined daily by setting the maximum and minimum admitted profiles P h,top and P h,bot , respectively: where P h,top has been defined following the same methodology adopted in (4) for OMAE. This valuable information can be used for admitting or rejecting values produced by the forecasting model: this helps validating the accuracy of single outputs obtained by the ANN, which allows an improvement with respect to the previously introduced ensemble logic.
In fact, analyzing the result of each single trial, it is possible to detect the worst performing ones, which can be eliminated in advance without affecting the overall average prediction. For instance, in Figure 3 (top) the forecast of a sample day is presented: the gray lines represent the output of each of the 40 considered trials, the orange line is their average (i.e., the output of the ensemble), the blue line is the actual measurements and finally the yellow shaded area represents the mask defined by the CSRM. As it is possible to see in the first plot, a few trials present a power profile exceeding the admission mask: these are not consistent, since they strongly differ from the expected value. These unfortunate cases can be explained by the nature of ANN training, based on the Error Back-Propagation (EBP) algorithm, which can end up in sub-optimal solutions. Consequently, those single trials are not providing a good generalization, hence being not suited for the scope.
Starting from these considerations, an accuracy measure can be defined for each trial, defining an exclusion criterion of a single trial based on the error committed with respect to the mask defined by (7) and (8), respectively: For each forecasted day, this cumulative error has to be lower than or equal to a control threshold P thr : P err,top + P err,bot ≤ P thr (11) It is here worth to underline that this methodology aims at identifying the properly trained network excluding from the ensemble the results when (11) is not verified.    3 (bottom) presents the power forecast of the same day, after the introduction of the control threshold. As it can be noticed, both the power profiles and the observed EMAE error indicator are significantly improved for that particular day. Thus, with specific reference to the sample day reported in Fig. 3, it is worth highlighting that precision of the forecast also at nighttime is particularly important in order to train the forecasting model to the best precision at sunrise and sunset: indeed, these two conditions are those when the highest relative errors usually occur during the day, due to the small value of produced power, which is usually at the denominator of error indicators, as shown in (2), for instance. Moreover, since PV power forecasting should take care of accuracy not only on average but also in peak hours, a report of the percentage error behavior in different hours of the day (average over the whole year) is reported in Figure 4, with respect to the plant capacity. As shown in this figure, while absolute errors at peak hours are the most relevant in percentage, the selective ensemble approach is effective in reducing them, as reported in the overall distribution during the year, shown in the boxplots: this improvement is clearly highlighted by the bottom histogram, where the average values from the previous boxplots are compared, showing the higher accuracy due to ensemble selection, in particular during peak hours. The overall procedure is synthesized in the flowchart of Figure 5: in order to compose the ensemble of trained MLPs, the output of each trial is compared to P thr to evaluate its accuracy: when the trial satisfies the control threshold, its model is included in the ensemble, otherwise it is discarded and a new MLP is trained and evaluated, until the completion of the ensemble. The ensemble forecast is hence computed a posteriori, once the well trained networks have been selected.
While a preliminary implementation of this method was introduced and its overall validity demonstrated in [28] focusing mainly on night hours, in the current paper the control threshold is extended to full day analysis and an additional study is proposed, in the following subsection, to further analyze the performance improvement and mitigate the additional load associated with the computation of several trials.

C. COMPUTATIONAL LOAD
The main disadvantage coming from the implementation of the described method is given by the increased computational burden required to provide the desired number of valid trials: to achieve it, respecting the control threshold, some networks must be trained and later discarded.
In order to quantify the added computational burden, power profile predictions are simulated day-by-day for the whole year 2017, through 250 independent trials; these are later reassembled respecting the logic described above, to obtain a valid ensemble of 40 trials; this study is conducted by varying the value of the threshold P thr in equation (11).
In Figure 6, the blue bars represent the number of trials that was possible to select, on average, with respect to the control threshold P thr , on a yearly basis; the red stacks, on the other hand, represent the number of runs that had to be discarded. Hence, the overall number of networks that, on average, must be trained, is given by the sum of the two. As it is possible to see, reducing P thr , the vast majority of the trials are not compliant and are discarded.
In the lower part of Figure 6, the blue line shows the corresponding computational load, which, as expected, tends to infinity as P thr approaches zero. In fact, during training ANNs  learn to generalize trends but not the specific behaviour of the training dataset, as it occurs in case of overtraining.
From the aforementioned considerations it is evident that a trade-off between the accuracy and increased computational burden is fundamental. For this reason, based also on preliminary studies conducted in [28], a threshold P thr = 30 Wh is considered a good compromise and will be used in the followings.

IV. DATA PRE-PROCESSING AND POST-PROCESSING
In the previous section a technique aiming at reducing the overall prediction error was presented by means of selective ensemble approach. Since the required computational load is greatly increased, in the following a few approaches are proposed and analyzed to enhance convergence speed of ANN training, thus reducing the total computational effort needed is now presented.
When dealing with ANN, and adopting a gradient descent method to perform the training process (for example EBP [43]), it is particularly important to pre-process and postprocess the inputs and the outputs. VOLUME 4, 2016 According to [44], convergence is faster when every input variable of the training set has a mean value that is close to zero. In [45], the importance of normalizing data is further explained, in fact it is shown how removing covariance shift from internal activation of the network may aid in the training process.
In the current section, a novel normalization procedure is proposed: as a starting point is always beneficial to shift the mean value according to [44]. Moreover convergence is faster and works better if inputs are scaled so that they have approximately the same covariance C i . Scaling speeds the learning process because it helps to balance out the rate at which the weights connected to the input nodes learn, as shown in [46].
Data normalization is a fundamental pre-processing step for mining and learning from data; nevertheless, finding the proper approach to deal with time series normalization is not obvious: several normalization methods proposed in literature are valid only on specific time series [47].
In this section, we propose different approaches for normalizing non-stationary time series to be used with ANNs in forecast problems, by means of properly defined data preand post-processing, as shown in Fig. 1.
Generally speaking, the covariance value should be chosen with respect to the adopted neuron's activation function. Here, the hyperbolic tangent is used and a covariance equal to 1 is demonstrated to be a good choice. An exception to this guideline can be made when the importance of every input to the output (i.e. for example, the correlation) is known a priori [46].
In the following, a step-by-step analysis is conducted on how different input and output processing influence the overall prediction performances, both in terms of committed error and time required to compute the forecast, in order to define suitable data processing to improve ANN efficiency.

A. CORRELATION
The importance of every input to the output is studied by means of the Bravais-Pearson correlation factor [48]: where X and Y are two generic variables,x andȳ their mean value and x i and y i a single observation belonging to X and Y respectively.

B. NORMALIZATION
One of the most common way of pre-and post-processing the output is normalizing the variable in a predefined range, usually [-1;+1]. Generally speaking, the normalization process is performed according to equation (13), where Z is a generic vector of data, whose minimum observation is Z min and maximum observation is Z max . Z * max and Z * min represent the upper and lower bound of the desired range.

C. COVARIANCE
As said, the covariance plays a central role in the ANNs training process. It is defined for generic vectors X and Y as follows: where E[·] is the expected value operator. When the normalization expressed in equation (13) is performed, the covariance coefficient changes consequently following equation (15). (15) and, given the linearity of the expectation operator E[·]: As it is possible to notice, the new covariance coefficient is both dependent on the observed and desired upper and lower bounds.

D. ADJUSTED RANGE
When the information about the correlation of the inputs with respect to the output is known, instead of normalizing all the variables in a range of [−1, +1] as it is commonly done, a smarter normalization can be applied: each parameter's range could be adjusted, in fact, in order to obtain the covariance equal to the correlation factor. As far as the power P is concerned, being the correlation with itself equal to 1, it is possible to rearrange equation (16) and, being the desired range symmetric with respect to the origin, it is possible to write: : 1 ρ P,P Cov P,P · (P max − P min ) which allows to find the optimal power range.
As for each of the other i-th selected inputs of the ANN model:

E. CONSIDERED NORMALIZATION APPROACHES
Finally, four different approaches can be proposed for data normalization: a detailed description is presented in the following, while their comparison and analysis of numerical results is reported in the next Section.

Ap1: Baseline approach
The first approach (Ap1) can be considered as the baseline, since it does not include any pre-and post-processing of the dataset. When the available data are not processed, they are directly provided to the ANN in order to compute the forecast. In table 3, the above described input and output parameters are given.

Ap2: Traditional normalization approach
The second approach (Ap2) normalizes all the data in a range of [-1,+1]. This normalization is the traditionally adopted one, as it is built-in in common ML libraries, and it can be used as a benchmark for the others; Normalizing the available variables between [−1, +1], the consequent covariance coefficients can be computed as in the following equation The obtained values are listed in table 4.

Ap3: Adaptive normalization approach
The third approach here considered (Ap3) adjusts the normalization range for every available variable to reflect the importance of each input determining the output. In table 5, the obtained covariances and ranges deriving from equations (17) and (18) are reported.

Ap4: Enhanced normalization approach
The fourth approach here considered (Ap4) is similar to the previous one, but it divides all the obtained ranges by two, to properly match the activation window of the chosen transfer function (hyperbolic tangent). The active range of a transfer function is defined as the range of the input which produces a value for the first derivative significantly different from zero. For the tansigmoid function, the active input range is usually set as [−2, +2]. For this reason the input range of every available parameter is halved with respect to Ap3, resulting in a covariance equal to a fourth of the previous. It is worth highlighting that the information about the relative importance of each input to the output is retained. Results are shown in table 6.

V. COMPARISON AND NUMERICAL RESULTS
Following the aforementioned procedures, it was possible to compute the optimal ranges for data processing. In particular, as described in Section II, a simple Multi-Layer Perceptron (MLP) is here adopted, with 6 input neurons (parameters described in Table 2), 1 output neuron (predicted power), 12 and VOLUME 4, 2016 5 neurons in the first and second hidden layers, respectively. In order to analyse the performance of the four approaches presented above, they will be compared from two different point of view. Firstly, the committed error will be evaluated, secondly, the time required to perform the simulations of the whole considered year is reported, used as a proxy of the computational burden.

A. ACCURACY
The overall results of the combination of the presented normalization approaches with the ensemble forecast, are shown in table 7, where the computed outcome are filtered with a control threshold P thr equal to 30 Wh/day. In particular, in table 7 the combination of the four considered normalization methodologies to the ensemble selection is compared with the most widely adopted normalization approach when the ensemble selection is disabled (Ap2 * ); for the sake of clarity, the lowest attained values for each error indicator are highlighted in bold. From the performed comparison, it is evident to what extent a correct pre-and post-process of inputs and outputs affects the prediction accuracy: indeed, Ap2, Ap3 and Ap4 show a significant improvement with respect to Ap1 (when pre-and post-processing are avoided). Moreover, the proposed data processing approaches Ap2, Ap3 and Ap4 combined with the selective ensemble present a similar improvement on the forecasting error with respect to the benchmark case Ap2* (when the selective ensemble is not applied).
In order to validate the proposed model results, a comparison with other architectures commonly used to address the PV power forecasting has been conducted, considering in particular CNN [14] and LSTM [15].
For what concerns the adopted CNN approach, the combined use of a convolutional layer and a pooling layer is the commonly considered structure, here implemented with a 1D convolutional layer (32 filters), max pooling layer, flattening layer and dense layer (fully connected, 20 neurons); simulations were performed under different input sequences, considering T amb (a), GHI (b), GPOA (c) and wind speed (d), as reported in Table 8.
For what concerns the adopted LSTM approach, a sliding window forecast was assumed, considering different units, hidden layers and dropout: in Table 8, the considered models are indicated as (units × hidden layers × dropout).
After this comparison, it is possible to conclude in general terms that the CNN and LSTM are not well suited for the data distribution available in the considered case study, due, for instance, to the overall size, probably too small for CNN approach, or to some missing days in the dataset, which break the continuity of data flow needed by LSTM: these are common issues when using real data from PV plant, which usually are not easily available, and often subject to interruptions. Indeed, as reported in Table 8, the forecasting computed by these two approaches have a similar behavior in terms of accuracy, but slightly above the error level reached by the presented PHANN with the selective ensemble: apparently, the here proposed approach is simple and lightweight, better suited to address the day-ahead power forecast based on numerical weather predictions, and more robust with respect to the dataset limitations presented above.

B. COMPUTATIONAL BURDEN
Additionally, it is possible to analyze the overall computational time taken by each approach for a whole year forecasting of daily power profiles (with 40 effective trials): Table 9 reports the results of simulations performed on an Intel(R) i7-7700 CPU @3.60 GHz -64 GB ram. These results show how pre-processing the available dataset is relevant to reduce the computational burden. In fact, the two newly proposed adaptive and enhanced normalization approaches (Ap3 and Ap4, respectively) allow to significantly reduce the time required to train the network with respect to the most commonly used normalization procedure Ap2.
In particular, the newly proposed enhanced normalization approach (Ap4) guarantees a reduction in the computational effort required because, during the first iteration, when weights are randomly picked, it allows the transfer function to work in a region of the domain were its derivative is greater and the optimization algorithm can proceed faster.
Moreover, in table 10, a comparison of the computational load of the different normalization procedures is reported, considering the combination of the here introduced selective ensemble approach with the control threshold P thr set to 30 Wh/day.

VI. CONCLUSION
In this paper, the combination of two novel approaches aimed at improving forecasting accuracy and computational efficiency has been analyzed, applied to the day-ahead PV power prediction.
First, a selective ensemble methodology has been proposed, in order to select the most promising trained MLPs for the ensemble forecast. This approach led to a significant reduction of the overall error committed, with respect to several considered error evaluation metrics (i.e. about 1 percentage point, on average).
Additionally, through the correlation and covariance analysis of the available dataset, it was possible to obtain the optimal normalization ranges of ANN. By applying the newly proposed adaptive and enhanced normalization approaches, it was possible to get a relevant mitigation (i.e. -17%) of additional computational load associated to ensemble selection, with respect to the traditionally adopted normalization procedure.
Future perspective of this study envisage the possibility to conduct additional adaptive data analysis to further increase the computational efficiency of the proposed combined approach and to apply it to the forecasting of power production and load consumption in challenging micro-grid management applications.