Incorporating Spatial and Temporal Correlations to Improve Aggregation of Decentralized Day-Ahead Wind Power Forecasts

In some electricity markets, individual wind farms are obliged to provide point forecasts to the power purchaser or system operator. These decentralized forecasts are usually based on on-site meteorological forecasts and measurements, and thus optimized for local conditions. Simply adding decentralized forecasts may not capture some of the spatial and temporal correlations of wind power, thereby lowering the potential accuracy of the aggregated forecast. This paper proposes the explanatory variables that are used to train the kernel density estimator and conditional kernel density estimator models to derive day-ahead aggregated point and probabilistic wind power forecasts from decentralized point forecasts of geographically distributed wind farms. The proposed explanatory variables include (a) decentralized point forecasts clustered using the clustering large applications algorithm to reduce the high-dimensional matrices, (b) hour of day and month of year to account for diurnal and seasonal cycles, respectively, and (c) atmospheric states derived from self-organizing maps to represent large-scale synoptic circulation climatology for a study area. The proposed methodology is tested using the day-ahead point forecast data obtained from 29 wind farms in South Africa. The results from the proposed methodology show a significant improvement as compared to simply adding the decentralized point forecasts. The derived predictive densities are shown to be non-Gaussian and time-varying, as expected given the time-varying nature of wind uncertainty. The proposed methodology provides system operators with a method of not only producing more accurate aggregated forecasts from decentralized forecasts, but also improving operational decisions such as dynamic operating reserve allocation and stochastic unit commitment.


I. INTRODUCTION
The electricity generation from wind energy is increasing worldwide, as different regions continue with the transition towards a decarbonized future. Wind power forecasting remains one of the most effective methods of reducing the impacts of wind power intermittency on power system operations. Traditionally, wind power forecasts consist of one value The associate editor coordinating the review of this manuscript and approving it for publication was Ton Duc Do .
(also called point or deterministic forecasts) for each wind farm [1], [2], [3]. These are usually based on on-site meteorological forecasts and measurements, and thus optimized for local conditions [4]. However, as wind capacity continues to increase in a region, producing accurate aggregated wind power forecasts becomes a concern for system operators. One of the key considerations in aggregated wind power forecasting is spatial-temporal correlations between wind farms that are geographically distributed. The power from geographically distributed wind farms exhibits spatial-temporal correlations, the degree of which can vary based on numerous factors such as separation distance and direction, timescale, diurnal weather variations and movement of synoptic weather systems [1], [2], [5]. Recent literature has shown the potential benefit of incorporating spatial-temporal correlations in wind forecasting, especially in terms of improving forecasting accuracy [1], [2], [3], [6], [7], [8]. Another factor to consider is that uncertainty in wind power forecasting is unavoidable due to time-varying meteorological conditions, weather-to-power conversion process, and dynamic behavior of wind turbines [7]. Based on this realization, utilities are moving away from only focusing on increasing the accuracy of wind power forecasts towards also quantifying the uncertainty inherent in the forecast and incorporating this into their decisions [1], [2], [7], [9], [10], i.e. attaching a risk metric to the forecast. This is also referred to as probabilistic forecasting.
Probabilistic wind power forecasting approaches that incorporate spatial-temporal correlations can be divided into two main categories: physical and statistical [1], [7], [11], [12], [13]. Physical approaches typically use numerical weather prediction (NWP) models and current weather conditions to predict wind speed [7], [8], [14]. NWP models formulate the problem of wind speed prediction as a set of mathematical equations describing the atmosphere and oceans. In [7], a recursively backtracking framework based on the particle filter was used to estimate the atmospheric state (with near-surface measurements) and forecast samples of aggregated wind power. The samples were used to derive predictive densities using a kernel density estimator (KDE). In [3], weighted Euclidean distance was proposed to search for similar wind characteristics in historical NWP data, and used corresponding aggregated wind power measurements to construct probabilistic forecasts based on distance-weighted KDE. Weather ensemble predictions based on atmospheric models and time series were used together with Gaussian KDE [15], normalized prediction risk index [16], and generalized autoregressive conditional heteroscedasticity [17] to derive wind power forecasts and associated uncertainty. In [18], Gaussian processes combined with NWP were used to derive day-ahead wind power forecasts. In [19], the poor man's ensemble was used to estimate forecast errors for one wind farm while in [20] used input from 16 different European meteorological services for Previento to derive probabilistic forecasts for Germany. More studies based on physical models can be found in literature reviews conducted in [11], [12], [21], [22], and [23]. Statistical approaches, on the other hand, take historical wind power data and/or NWP as inputs and use machine learning algorithms and/or other statistical models to generate aggregated wind forecasts [7], [8], [14]. These approaches assume that historical data can be used to infer spatial-temporal correlations among wind farms. In [8], machine learning algorithms were used to generate point forecasts of wind farms, the copula method to build spatial-temporal correlated aggregated wind power forecasts, and Bayesian theory to derive predictive densities.
In [2], Bayesian hierarchical models were used for obtaining spatial-temporal correlated probabilistic wind power forecasts. In [1], the alternating direction method of multipliers was proposed to capture spatial-temporal correlations of geographically distributed wind farms and used multiple quantile regression to derive predictive densities. In [24] and [25], the resampling approach is used to estimate the confidence intervals for wind power forecasts. In [26], local quantile regression is compared with the local Gaussian model and the Nadaraya-Watson Estimator. In [27], historical forecast error distributions were used to obtain scenarios for stochastic wind power generation. In [10], time adaptive conditional KDE was proposed for probabilistic wind power forecasting. In [28], [29], and [30], the beta distribution was used to estimate forecast errors at different wind power forecast bins while in [31] the gamma-like distributions were used to achieve the same. In [32], the logit transformation approach was used to estimate the confidence intervals of forecast errors. More studies based on statistical models can be found in literature reviews conducted in [11], [12], [21], [22], and [23].
The first observation made from the literature is that the majority of proposed forecasting methodologies assume that the problem of aggregated wind power forecasting is solved in a centralized manner, i.e., forecasts of all wind farms in a region are derived centrally by one forecasting company (facilitated by the system operator). A centralized forecaster will often use a consistent forecasting approach for all wind farms, leading to more consistent results [33]. In addition, a forecaster will have access to measurements from all wind farms, making it easier to incorporate spatial-temporal correlations between wind farms into their forecasting methodologies [1]. However, in some markets (e.g. South Africa), individual wind farms are obliged to provide point forecasts to the power purchaser or system operator [34]. These decentralized point forecasts are optimized for local conditions and therefore simply adding these decentralized point forecasts may not capture some of the common spatial and temporal correlations of wind power, thereby lowering the potential accuracy of the aggregated wind power forecast. It is well known that the correlation of wind power from geographically distributed wind farms depends on the proximity of wind farms [2]. Wind farms in close proximity are highly correlated, whereas wind farms that are further apart are not. In addition, wind power profiles exhibit a high degree of statistical regularity along diurnal and seasonal timescales in the literature [35], [36], [37], [38], [39], [40], [41]. These spatial and temporal correlations form the basis of understanding wind power variability and ultimately improving wind power forecasting. Therefore, there is a need for a model that aggregates decentralized point forecasts while considering these correlations.
The second observation made from the literature is that most forecasting methodologies are based on the microscale and/or mesoscale NWP models. However, it was illustrated in [42], [43], and [44] that the probabilistic properties of wind power generators' output, along with the level of correlation between wind generators' output, are dependent on the dominant large-scale atmospheric circulation archetypes. Thus, the information contained in large-scale atmospheric circulations can be useful in improving wind power forecasting. Large-scale atmospheric circulations have been incorporated in medium-to long-term wind forecasting [15], [45], [46], but have only been alluded to in short-term wind forecasting [7], [15], [47]. There is a need for more applied research on the potential benefits of incorporating large-scale atmospheric circulations in short-term wind power forecasting.
In light of the two aforementioned needs, the primary hypothesis of this paper is that: the accuracy of aggregated forecasts can be improved by training machine learning models with features that account for some of the common spatial and temporal correlations of wind power, including those correlations caused by large-scale atmospheric circulations.
To test this hypothesis, a methodology is proposed that trains machine learning models using the explanatory variables listed below to derive aggregated point and probabilistic wind power forecasts: (a) Decentralized point forecasts -To eliminate duplicate features and reduce the high dimension matrices required to model a high number of wind farms in a region (without losing important spatial information), the correlated point forecasts are first clustered into k clusters using the clustering large applications algorithm (CLARA). (b) The hour of day and month of year -These are included to model the well-known statistical regularity of wind profiles along diurnal and seasonal timescales.
(c) Atmospheric states -These are derived from self-organizing maps (SOMs) to represent large-scale synoptic circulation climatology for a study area. The machine learning models proposed to test the hypothesis are: (a) the k-nearest neighbor (k-NN) to derive the aggregated wind power point forecast and (b) conditional KDE to derive aggregated wind power predictive densities. It should be noted here that there are numerous machine learning models that have been applied for wind power forecasting as shown in the literature review conducted above. However, the k-NN-and KDE-based approaches are proposed to test this hypothesis because they are common and easy to implement. In addition, the conditional KDE is a nonparametric approach for predicting wind power densities and thus can account for the time-varying and non-Gaussian nature of wind power uncertainty. The proposed methodology is demonstrated using the day-ahead point forecast data obtained from 29 wind farms in South Africa.
An overview of this paper is shown in Fig. 1, with the paper organized as follows: Section II provides a theoretical framework of the methodology for aggregated wind power forecasting. Section III defines the forecasting evaluation framework used in this paper. Section IV introduces the case study used for illustrating the proposed methodology and presents the results and discussions. Section V gives conclusions and identifies further research that may arise from this work.
The most significant contribution made by this paper is in proposing a simple approach to improve the accuracy of aggregated point and probabilistic wind power forecasts that can be derived from decentralized point forecasts. This is particularly important in regions where individual wind farms generate their own forecasts and do not necessarily have access to the measurements from other wind farms. The proposed approach provides system operators with a way of aggregating these forecasts while taking into account spatial and temporal correlations of wind power. In addition, the derived predictive densities can be used to improve operational decisions such as dynamic operating reserve allocation and stochastic unit commitment.
An additional contribution of the paper is contained in the proposed approach towards atmospheric states, derived from SOMs, which demonstrates another way in which large-scale atmospheric circulation patterns can be incorporated into short-term wind power forecasting.

II. FORECASTING METHODOLOGY A. FEATURE SELECTION 1) CLUSTERS OF WIND POWER POINT FORECASTS
There can be hundreds of wind farms within a region and thus the dataset containing individual wind power point forecasts can be high dimensional. In most machine learning algorithms, as the number of features grows, the amount of data required to generalize accurately grows exponentially (also known as the 'curse of dimensionality'). In addition, wind farms that are in close proximity are highly correlated as (mentioned in Section I) and therefore, the machine learning model might not learn anything insightful by considering these wind farms independently. To mitigate against this phenomenon, this paper clusters wind farms with correlated point forecasts together. Some of the most common clustering algorithms that can be used to achieve this include partitioning, include hierarchical, k-means, partitioning around medoids (PAM) and the clustering large applications algorithm (CLARA). Recent studies have shown that the CLARA algorithm produces better clustering results for large datasets [48], [49], and hence it is used in this paper. The CLARA algorithm can be summarised in the following steps: • Creating random subsets with fixed size from original dataset.
• Choosing the number of clusters k and corresponding k medoids for each subset.
• Calculating the dissimilarity matrix and assigning each observation of the dataset to the closest medoid.
• Calculating the mean of the dissimilarities of the observations to their closest medoid.
• Repeating the process while retaining the sub-dataset for which the mean is minimal. The silhouette coefficient is used in this paper to find the optimal number of clusters k, while the distance metric used is the Euclidean distance (as also recommended for wind resource clustering in [48]).

2) ATMOSPHERIC STATES
A useful way to classify atmospheric circulation is by using SOMs, which are a class of self-learning artificial neural networks [50]. The classification output from the SOM procedure is a two-dimensional array of nodes, spaced on a lattice topology, which may be interpreted as maps showing typical patterns within a dataset. SOMs have often been used in meteorology and climatology [51], and may indeed be preferential to other commonly employed classification procedures (e.g. k-means clustering or principal component analysis (PCA)), notably principal components analysis, as it does not discretise data and does not force orthogonality [52].
The SOM training process is based on a competitive learning algorithm which successively measures the Euclidean distance between a predefined set of SOM-nodes (or reference vectors) and the input feature vectors. For each iteration of the training process, the best matching unit's (BMU) weight, along with the weights of nodes located in the BMUs proximity on the SOM lattice, is updated towards that of the feature vector. Reference nodes on the SOM lattice thereby develop towards a generalized configuration of the training dataset. Once the training process has been completed, each feature vector in the classification time-series may be clustered based on the weighted Euclidean to each node on the SOM map. In other words, each time-step in the input-data is retroactively clustered by being assigned the node number (or atmospheric state in this instance), to which it is most similar.

3) HOUR OF DAY AND MONTH OF YEAR
The cyclical nature of the processes responsible for diurnal and seasonal variability -i.e., the rotation of the earth around its axis and around the sun -however does imbue these processes with a measure of statistical regularity. This statistical regularity increases the wind power predictability associated with such cyclical diurnal and seasonal processes. The value of modelling these variations in wind power forecasting has been shown in the recent literature [53]. This paper captures these variations by including the attributes 'hour of day' (for diurnal variations) and 'month of year' (for seasonal variations), which takes the values 0, 1, . . . 23 and 0, 1, . . . 11, respectively.

B. K-NN ALGORITHM
The k-NN algorithm is a non-parametric method that averages the k closest training examples in feature space to predict the new data point (also known as the query point). This method resembles the similar-day approach that is still used by many system operators for short term load demand forecasting [53]. The similar-day approach predicts the load demand using historical days with similar weather conditions and day types to the day of forecast. In the same way, the k-NN algorithm is used in this paper to predict the aggregated wind power using historical examples with similar month of year, hour of day, point forecasts (of wind farm clusters) and atmospheric states. The k-NN algorithm can be summarised in three steps: • Calculating the distance between the query point and each training point. VOLUME 10, 2022 • Selecting the k nearest neighbors from the training set with k smallest distances.
• Predicting the aggregated wind power output based on a weighted average of k nearest neighbors. If X 1 , X 2 , . . . , X K are the k nearest neighbors and P 1 , P 2 , . . . , P K are their corresponding aggregated wind power observations, then the aggregated wind power predictionp(x) can be derived using the estimator given by: where K (x, X k ) is a kernel function that assigns weights on aggregated wind power observations P k based on the distance from the neighbor X k to the query point x. The distance metric used in this paper is the Manhattan distance. The aggregated wind power observations of the neighbor with the smallest distance from the query point has more influence on the final point predictionp (x) , as compared to the other observations. For a Gaussian kernel function (which is used in this paper) with a smoothing bandwidth h, the function K (x, X k ) can be written as: (2)

C. CONDITIONAL KERNEL DENSITY ESTIMATOR
Once the aggregated wind power point forecasts have been determined using (5), the KDE-based approach is used to derive the conditional aggregated wind power predictive density. This involves estimating the conditional probability density function of aggregated wind power P, given that the explanatory variable X is equal to x. The main advantage of KDE is that it is a non-parametric and data driven approach that can estimate the density of a random variable without distribution hypothesis. As a result, KDE-based approaches are becoming popular in recent developments of probabilistic wind power forecasting [3], [10], [53], [54], [55], [56]. The main drawback of KDE-based approaches is the difficulty in selecting good bandwidths, especially in the presence of large datasets and high dimensionality. For this reason, this paper considers different combinations of explanatory variables (explained in 2.1) to avoid using all variables at once (and thus reducing the dimensionality of the dataset). The standard conditional KDE (also known as the Nadaraya-Watson Conditional Estimator) can be written as: Having where h p and h x are bandwidths controlling the smoothness of each conditional density in p and x directions, respectively, X i is a point in a training set and P i is the corresponding aggregated wind power observation, and K is a kernel function. The choice of a kernel function K and bandwidths h p and h x has a significant influence on estimated conditional densities. For most applications, Gaussian kernel function is the popular choice. However, it is well known that wind power output follows a non-Gaussian distribution [10], [53]. In general, the mean squared error of the Epanechnikov kernel function is optimal [55], [57], and hence it is used in this paper. The Epanechnikov kernel function can be written as: where ∂ = p−P i h p and ∂ = x−X i h x in p and x directions, respectively.
To select the bandwidths h p and h x , this paper uses the least-squares cross-validation (LSCV) method. The method is based on selecting h p and h x that minimises the integrated squared error (ISE) given by: Therefore, minimising ISE is equivalent to minimising the first two terms, (f (p | x)) 2 dp−2 f (p | x) f (p | x) dp, since the last term does not involve h p and h x . Thus, a crossvalidated estimate of ISE is given by: where,f −i isf evaluated with (X i , P i ) left out.

D. SUMMARY OF THE PROPOSED METHODOLOGY
In summary, a methodology is proposed that trains k-NN and conditional KDE models to derive aggregated point and probabilistic wind power forecasts, respectively. The explanatory variables considered in training these models include: decentralized point forecasts (clustered using CLARA), atmospheric states (derived using SOMs), month of year and hour of day. Fig. 2 shows the flowchart summarising the proposed methodology for aggregated wind power forecasting.

A. POINT FORECASTS
There are many metrics that can be used to evaluate the accuracy of wind power point forecasts. The most frequently used metrics are mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R 2 ) [8], [13], [14], [55]. The same metrics are used in this paper to evaluate the accuracy of aggregated wind power point forecasts, and are calculated as: where P i andp i are the i-th actual and forecasted values of aggregated wind power, respectively, N is the total number of forecasting samples and C t is the total wind capacity in a region. Note that the total wind capacity C t is used as a denominator for (8) and (9) instead of actual wind power P i to avoid the effects of aggregated wind power that is close to zero. In general, smaller MAE and RMSE, and R 2 value that is close to 1, indicate better performance of the forecasting model.

B. PROBABILISTIC FORECASTS
To evaluate the aggregated wind power predictive densities, this paper adopts the framework defined in [58].
The framework is based on three metrics: reliability or calibration, sharpness and skill score. Evaluating probabilistic forecasts based on these metrics requires the evaluation set consisting of quantile forecasts (of various nominal proportions) and observations.

1) RELIABILITY OR CALIBRATION
This metric measures the statistical consistency between quantile forecasts and observations. For example, a quantile forecast with a nominal level of 0.5 should contain 50% of the observed values lower or equal to its value. For a given quantile forecastq (α) t at time t with nominal level α, and the corresponding observation p t , the indicator variable ξ (α) t is given by: The empirical level a (α) k is obtained by calculating the mean of indicator variable ξ (α) t over a set of T quantile forecasts.
In a 'perfect' calibration, the empirical levels match the nominal proportions.

2) SHARPNESS
This metric measures how tight or concentrated the predictive densities are. Let δ In general, narrow prediction intervals (subject to calibration) are preferred because they suggest that predictions have more information content.

3) SKILL SCORE
This metric assesses probabilistic forecasts by a single score, just like MAE and RMSE for the point forecasts. This score incorporates a variety of aspects of probabilistic forecast evaluation and hence allows for easier comparison between probabilistic forecasting approaches. In this paper, the skill score S c for a set of M quantiles is calculated as: This score cannot be decomposed to provide contributions of calibration, sharpness and other aspects of probabilistic forecast evaluation. In general, a higher score means higher skill of the probabilistic forecast approach and the score is zero for a 'perfect' probabilistic forecast.

IV. CASE STUDY A. DESCRIPTION OF CASE STUDY
The proposed methodology for aggregated wind power forecasting is tested using the data obtained from 29 wind farms in South Africa from 01 January 2018 to 31 March 2021. The average installed capacity of these wind farms is 92 MW (ranging between 13 MW and 140 MW) and their geographical locations are shown in Fig. 3. Since the individual wind power point forecast data is sensitive and confidential, Eskom (the utility company in South Africa) was only able to send the day-ahead wind power point forecasts in groups of three nearest wind farms (summed together). The challenge here is that some of the wind farms may not fall in the same cluster as its two nearest neighbors if CLARA was applied to the original dataset, instead of applying it to already clustered dataset (as received from the utility). This implies a drop in resolution of the data, which may have impacted the performance of the proposed methodology in this particular case study. The SOM Toolbox developed for Matlab by Helsinki University of Technology was used [59] for the classification of atmospheric states. As classification parameter, the hourly ERA5 850hPa geopotential height reanalysis dataset [60] was selected, as geopotential heights provide a good representation of large scale atmospheric circulation, and is a wind speed predictor e.g. [47] and [61]. It was further deemed that the 850 hPa pressure level provides sufficient elevation to capture circulation above the South African escarpment. The classification area was bounded between 22-40 • S and 5-40 • E, which includes the South African subcontinent along with significant areas of the Indian and Atlantic oceans so as to adequately capture the large-scale circulation impacting the study area. To illustrate the proposed aggregation methodology, a 4 × 5 SOM-node topology was selected, which was deemed to provide sufficient detail. It should however be noted that the number of SOM nodes chosen represents what is often a subjective trade-off between level of generalization or detail required and quantization error.
The SOM was trained in two phases with a batch training algorithm, using a rectangular lattice as recommended for batch training [62]. The first (rough) training phase consisted of 1000 iterations with the neighborhood decreasing from 5 to 1, which was followed by a fine-tuning phase of 5000 iterations where the neighborhood function decreased from 2 to 1, thereafter remaining fixed after which only the BMU was updated. A number of studies have showed the starting point of the neighborhood function to have little impact on patterns produced [63], however in continuing the training once the radius parameter equals 1, the SOM approach becomes the equivalent of the k-means clustering procedure, which has been shown to be advantages in terms of average classification error [64]. The Epanechikov neighborhood function was used, as recommended by [62] when training small SOM maps.
The full dataset (consisting of hour of day, month of year, actual wind power generation, clusters of wind power point forecasts, and atmospheric states) was split into a training set (by randomly picking 90% rows from the full dataset) and testing set (remaining 10% from the full dataset). The seven-fold cross validation was used to determine the optimal value of k in k-NN algorithm. The aggregated wind power point forecast results obtained by training the k-NN algorithm with the proposed explanatory variables were compared to the results obtained by simply summing up the point forecasts. In addition, the k-NN algorithm was trained using different combinations of explanatory variables to evaluate the impact of each variable on the accuracy of the resulting aggregated forecasts. Furthermore, in order to assess the proposed model under various conditions, its performance evaluated during different hours of the day, months of the year, and atmospheric states.
This paper trained the conditional KDE using various combinations of proposed explanatory variables, and the results were compared to those obtained by training the conditional KDE with just the sum of point forecasts. This was done because it was difficult to train the conditional KDE using all proposed explanatory variables at once (due to bandwidth selection as explained in Section II-C).

B. EVALUATION OF POINT FORECASTS
In Table 1, the overall performance of training the k-NN model with proposed explanatory variables is compared to simply adding the point forecasts by individual wind farms. The proposed methodology performs better than the sum of point forecasts within the context of the proposed evaluation metrics. The MAE and RMSE of the proposed methodology are 55% and 47% less than those achieved by adding the point forecasts, respectively. Likewise, the R 2 values show that 94% of the observed data fit the proposed model, while 77% of the data fit the sum of point forecasts model. Therefore, taking into account the spatial and temporal    correlations through the proposed explanatory variables improves the accuracy of aggregating decentralized point forecasts. Table 2 displays the results of the various scenarios for explanatory variables to show how the proposed explanatory variables impact forecasting accuracy. As shown in Table 2, taking into account the spatial information found within various clusters result in better forecasting accuracy compared to the sum of point forecasts. The accuracy improves further when taking into account diurnal and seasonal cycles as well as large-scale atmospheric circulations. The seasonal cycles show superior impact followed by the large-scale atmospheric circulations.
Tables 3, 4 and 5 shows the performance of the proposed model during different hours of the day, months of the year and atmospheric states, respectively. The proposed model seems to perform better in the hours between 4h and 14h and in the months between March and July (which is the autumn and winter seasons). Fig. 4 shows the hourly average wind power generation for January (summer), April (autumn), July (winter) and October (spring). The average wind generation profile shows daily ramp-up in wind power, especially during summer, autumn, and spring seasons, from approximately 9h to 18h, which is in turn followed by an equivalent rampdown during the evening. This ramp-up is likely associated with daily surface heating due to the diurnal cycle. A possible reason that prediction errors are comparatively small during a period associated with significant variability in the wind power profile (4h-14h), is due to the cyclical nature of said variability, which is a function diurnal surface heating and cooling. Notwithstanding diurnal variability, Fig. 4 shows that the autumn and winter wind power profiles are comparatively flat throughout the day, which is in turn a possible reason for the better model performance during these seasons.
The model also performs better during the atmospheric states (2,4), (2,5), (3,1), (3,3), (3,4), (3,5), (4,2), (4.3) and (4.4). Fig. 5 shows the 20 (4 × 5) node SOM representing the 20 classified atmospheric states together with the frequency of SOM node occurrence as a percentage above each node. It is evident that the nodes where the proposed model performs better tends to represent atmospheric circulation VOLUME 10, 2022  dominated by high pressure conditions. Such conditions are associated with clear skies and more stable weather conditions, and thus less variability over different timescales which increases predictive skill. Fig. 6 shows the day-ahead prediction intervals (10-90%) together with the actual observations of aggregated wind power generation on 30 March 2021 (which is reflective of a 'typical' autumn day in South Africa). Note that for Fig. 6 only clusters of wind power point forecasts were used as the explanatory variables for illustration purposes. It can be seen that only two observations (at 7h and 21h) falls outside the 90% prediction intervals. The width of the prediction intervals is time-varying with the actual observations as expected. To get a different view on these results, Fig. 7 shows the aggregated wind power generation predictive densities at 1h, 8h, 14h and 20h on 30 March 2021. It can be seen from Fig. 7 that the predictive densities are not only  time-varying, but also multimodal and asymmetric (and thus non-Gaussian). Fig. 8 shows the reliability diagrams of the probabilistic forecasts using the proposed model for different explanatory variables scenarios. In Fig. 8 (a), the empirical levels are plotted against the nominal proportions while also showing the desired line (where the empirical levels are equal to the nominal proportions). It can be seen from Fig. 8 (a) that results achieved from all considered scenarios aligns well with the desired line (indicating reliable probabilistic forecasts). To conduct a comparative analysis between considered scenarios, Fig. 8 (b) plots the errors between empirical levels and desired outcomes, against the nominal proportions. A general conclusion from Fig. 8 (b) is that the 'sum' scenario produces better reliability results than all other scenarios. In addition, the reliability of forecasts seems to drop with the increase in explanatory variables considered in a scenario. One possible explanation for this is the bandwidths selection that becomes more complex for high number of explanatory variables. As seen through negative  reliability results, the proposed model tends to underestimate the aggregated wind power generation for the 'sum' scenario. For the rest of scenarios, the model tends to underestimate the aggregated wind power generation when nominal proportion is below 60% and overestimate for larger nominal proportions. Fig. 9 shows the sharpness results of prediction intervals for different explanatory variables scenarios. It can be seen that the 'clusters + month of year' scenario achieves the best sharpness results (ranging between 2% and 20%) while the 'sum' scenario achieves the worst sharpness results (ranging between 2% and 28%). The results show a significant improvement from the 'sum' scenario to the 'clusters' scenario, followed by noticeable improvements from the 'clusters' scenario to the rest of scenarios. This highlights the importance of considering spatial correlations, larger atmospheric circulations, inter-hour and seasonal variations in aggregated wind power forecasting. However, good sharpness results may lead to underestimation and overestimation of aggregated wind power uncertainty as deduced from the reliability diagram in Fig. 8. Therefore, there is a need for a trade-off between reliability and sharpness depending on the specific requirements of the grid and its customers. Table 6 shows the skill score results for different combinations of explanatory variables. It can be seen that the 'clusters + month of year' scenario achieves the best skill score (−0.148) while, the lowest skill score (−0.283) is achieved on 'sum' scenario. The results show a significant improvement from the 'sum' scenario to the 'clusters' scenario, followed by smaller improvements from the 'clusters' scenario to the rest of scenarios.

C. EVALUATION OF PROBABILISTIC FORECASTS
The results above show that taking into account the spatial and temporal correlations through the proposed explanatory variables also improves the accuracy of aggregated probabilistic forecasts that can be derived from decentralized forecasts.

V. CONCLUSION
This paper proposed explanatory variables that were used to train the k-NN and conditional KDE models to derive dayahead aggregated point and probabilistic wind power forecasts from decentralized point forecasts of geographically distributed wind farms. The proposed explanatory variables include clusters of point forecasts (to account for spatial correlations between wind farms), hour of day (to account for diurnal cycles), month of year (to account for seasonal cycles) and, atmospheric states (to account for correlations due to large-scale atmospheric circulations). The main findings of this paper can be summarized as follows: • The results from the proposed approach showed a significant improvement (47% less RMSE) as compared to simply adding the decentralized point forecasts. Therefore, the proposed well-known spatial and temporal correlations as well as large-scale atmospheric circulations can be effective in capturing some spatial and temporal correlations that are not accounted for by simply adding the decentralized forecasts.
• In the case study, the proposed approach performed better in hours between 4h and 14h. One possible reason for this can be that the variability of the wind power profile between these hours is a function of diurnal surface heating and cooling, and this was incorporated by including the hour of day feature in the proposed approach. In addition, the autumn and winter wind power profiles are comparatively flat throughout the day, which can be a possible reason for the better performance of the approach during these seasons. Furthermore, the proposed approach performed better in SOM nodes that tends to represent atmospheric circulation dominated by high pressure conditions.
• The predictive densities obtained from the proposed approach were shown to be non-Gaussian and timevarying as expected given the time-varying nature of wind uncertainty. The probabilistic forecasts from the considered scenarios were relatively reliable (less than 8% in error between empirical levels and desired outcomes). However, the reliability of forecasts appears to drop with the increase in explanatory variables considered in a scenario. This can be due to bandwidths selection that becomes more complex in the proposed KDE approach for a high number of explanatory variables.
• The sharpness and skill score results showed a significant improvement when the proposed explanatory variables were considered as compared to simply adding decentralized forecasts. In addition, the month of year feature produce slightly better sharpness and skill scores compared to the hour of day and atmospheric state features, which highlights the importance of incorporating effects of seasonality in aggregated forecasting. The most significant contribution made by this paper is in proposing an approach for incorporating some spatial and temporal correlations to improve accuracy of aggregated forecasts derived from decentralized point forecasts. In addition, knowing the hours, months, and atmospheric states under which the proposed approach performs better together with the derived predictive densities can be used as inputs to operational processes such as stochastic unit commitment and dynamic operating reserve allocation. This paper also demonstrated a way in which large-scale atmospheric circulation patterns can be incorporated into short-term wind power forecasting -which has been lacking in the literature. In future, one can test the proposed methodology using ungrouped wind data to see if there can be further improvements on the results. In addition, more work needs to be done in terms of bandwidths selection of high dimensional dataset in KDEbased approaches.