Leveraging Turbine-Level Data for Improved Probabilistic Wind Power Forecasting

This paper describes two methods for creating improved probabilistic wind power forecasts through the use of turbine-level data. The first is a feature engineering approach whereby deterministic power forecasts from the turbine level are used as explanatory variables in a wind farm level forecasting model. The second is a novel bottom-up hierarchical approach where the wind farm forecast is inferred from the joint predictive distribution of the power output from individual turbines. Notably, the latter produces probabilistic forecasts that are coherent across both turbine and farm levels, which the former does not. The methods are tested at two utility scale wind farms and are shown to provide consistent improvements of up to 5%, in terms of continuous ranked probability score compared to the best performing state-of-the-art benchmark model. The bottom-up hierarchical approach provides greater improvement at the site characterized by a complex layout and terrain, while both approaches perform similarly at the second location. We show that there is a clear benefit in leveraging readily available turbine-level information for wind power forecasting.

relationship between meteorological forecasts and corresponding wind farm power generation via a statistical learning technique. This allows for a fully data-driven statistical model, using inputs from the physics-based NWP, which implicitly accounts for complex physical processes influencing the wind to power conversion such as wake losses and any systematic bias in the weather forecasts [3].
Wind power prediction was initially approached as a deterministic problem with research and early commercial products focusing on providing single-valued best estimates of future generation [4]. However, there has been extensive research in the area of probabilistic forecasting, which is reviewed comprehensively in [5], driven by the economic value of quantifying uncertainty when making decisions [3]. Uncertainty at a single point in the future is commonly quantified by producing a predictive probability distribution of future power production, called a density forecast. Density forecasts are central to probabilistic forecasting, and as such have received much attention from the research community. Non-parametric methods, where no particular distribution shape is assumed, have emerged as superior to estimating parametric probability distributions conditional on NWP and other inputs. Popular statistical methods for generating these forecasts emerged as additive quantile regression with splines [6], adapted resampling [7], and conditional Kernel Density Estimation [8].
More recently, a number of competitions have been run in order to compare forecasting methods on the same dataset and under controlled conditions [9]. The two winning teams from GEFCom (2012 and 2014) utilised Gradient Boosting regression Trees (GBT), the latter for quantile regression to produce density forecasts, with input features engineered from NWPs [10], [11]. Other entrants also employed GBTs but did not produce as skillful forecasts highlighting the importance of feature engineering in such methods. This approach was extended with spatio-temporal features engineered from a grid of NWP points to improve probabilistic forecast performance of both solar and wind power in [12]. Analog ensemble methods have also been successful in producing non-parametric density forecasts [13] where here the definition of the distance measure used to define the ensemble is critical and similar to feature engineering in regression.
Hierarchical forecasting has received increased attention in recent years because of the desire from forecast users for coherency (or consistency), i.e. the forecast of each level in a hierarchy should sum together appropriately. Additionally, including coherency constraints in predictive models can improve performance at all levels of the hierarchy. Hierarchies This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ can be both spatial and temporal in nature [14], [15]. There are different approaches to hierarchical forecasting, the simplest being the bottom-up approach, which forecasts the top level in the hierarchy by summing the constituent lower level forecasts [16], [17]. As discussed in [18], the bottom-up approach can in practice tend to deliver poor performance because of the low signal to noise ratio of the bottom hierarchy in applications such as load forecasting using smart meter data. However, this is not the case for wind farms where each wind turbine provides a consistent weather dependent signal. In the wind power forecasting domain, [19] evaluates a method of deterministic forecast reconciliation via a generalised least squares method to generate coherent forecasts.
The concept of coherent probabilistic forecasts is explored in [18], [20] where the importance of this property is emphasised in settings where forecasts from multiple levels of the hierarchy are used in decision-making. In these works, the marginal distributions are determined for nodes in the system and the dependence is modelled using an empirical copula. However, in the wind farm setting the structure of the hierarchy is relatively simple, and the size lends itself to families of parametric copulas rather than the empirical copula, which requires large volumes of data to satisfactorily estimate.
A wide variety of copula families exist, several of which have been applied to model spatial dependency in the wind power forecasting context but not in a hierarchical setting to the best of the authors knowledge [21]. The most frequently used family is the Gaussian copula [22], [23], though temporal dependency has received more attention than spatial or spatial-temporal dependency. Copula vines, which are a series of linked bivariate copula families, offer a more flexible framework for modelling multivariate dependency, and have subsequently been the subject of recent studies in wind power forecasting [23], [24].
In this study, two methods are investigated to leverage turbinelevel data and are compared to state-of-the-art benchmarks. The first is a feature engineering approach proposed in [25], where deterministic power forecasts for individual turbines are used as predictor variables when producing non-parametric wind farm forecasts. This is a hierarchical method in the sense that information from the turbine-level is used to supplement the available information set. However, forecast coherency is not guaranteed. This work also expands on [25] by extending the case study to a second wind farm with different site characteristics and testing a second novel approach based on hierarchical coherency. In this second bottom-up approach, density forecasts are produced for all turbines and the spatial dependence between them is modelled in a copula framework to allow aggregation to the wind farm level.
The turbine-level feature engineering method aims to improve the wind farm forecast by generating new covariates from individual turbine data. Whereas the bottom-up probabilistic forecasts reflect the physical reality of the problem -that the total wind farm power output is the sum of individual turbine generation -and therefore have the added benefit of coherency. The main contributions of this paper are the proposed bottom-up hierarchical method and its evaluation, plus significantly expanding the evaluation of the feature engineering approach first introduced in [25]. We hypothesise that leveraging information from the turbine-level will enable us to improve forecast performance, particularly since modern utility scale wind farms are often distributed over large areas of complex terrain and as a result, individual turbines can experience different conditions from one another at any given time. The advantages of the proposed hierarchical method are improved accuracy and coherency between turbine-level and wind farm total, however the nature of the wind farm (terrain, layout, size...) has a bearing on the extent of this improvement. This paper is organised as follows: Section II details the forecasting methods and benchmark models, Section III describes the case study based on two utility scale wind farms in the UK, Section IV presents and evaluates the results, and conclusions are drawn in Section V. Supplementary information provides additional detail and results [26].

II. FORECASTING METHODOLOGY
This section covers the two tested methods for leveraging turbine level data, the benchmark models, and the statistical learning techniques employed. The entire forecasting methodology is summarised in Figure 1, which details the training process, input data, and output forecast of each model. The turbine level feature engineering model is generated using quantile regression, where NWP predictions are supplemented with additional features; these include deterministic forecasts of individual turbine generation and wind farm-level generation [25]. The bottom-up probabilistic method involves estimating the full multivariate predictive distribution of generation from all turbines. To this end, the marginal distribution of each turbine is determined via quantile regression and the spatial dependency structure is modelled via a copula. The wind farm-level density forecast is then generated by sampling from the multivariate distribution and taking the empirical distribution of the aggregated turbine-level samples. The Gaussian copula with both empirical and parametric covariance matrices is examined, due to its simplicity and successful use in similar studies [21], [22], [27]; vine copulas with a range of copula families are also considered [23], [24].
Explanatory variables x t common to both proposed methods and benchmarks are derived from NWP wind speed and direction outputs at 10m and 100m. Features that capture wind shear, veer, and phase errors in NWP are engineered inspired by [11], [12]. Cubic spline basis functions are also included to capture diurnal bias in the NWP at the specific sites along the lines of [28]. Full details of all features are listed in the supplementary material [26].

A. Gradient Boosting Trees
This section introduces the statistical learning technique used to map the relationship between the input features derived from the NWP and the target measured time series, i.e. individual turbine or wind farm power measurements. The Gradient Boosting regression Tree algorithm (GBT) is an ensemble learning technique whereby powerful predictive models can be constructed by combining a number of individual regression trees, known as weak learners [29]. This technique can capture non-linear relationships such as the wind power curve, can be used with a variety of differentiable loss functions, and can intrinsically learn interactions between input features. GBTs are also naturally regularised by virtue of the way trees are constructed [30]. The use of GBTs for quantile regression here is motivated by their success in similar applications [10], [11] though this element could be substituted for other supervised learning algorithms. The gradient boosted tree F n (x t ) is defined as the sum of n regression trees where each f i (x t ) is a regression tree. The ensemble of regression trees is constructed sequentially by estimating the new re- for some loss function L(·). Where L(·) is differentiable, this optimisation can be solved by steepest descent [30]. Turbinelevel deterministic forecasts used as features in this study are produced by GBTs fit with a squared loss function, and density forecasts are produced using GBTs for multiple quantile regression (quantile loss function [1]) and then using spline interpolation, with knots at each predicted quantile and the boundaries 0 and nominal power, to estimate the predictive Cumulative Distribution Function (CDF). GBTs include several hyper-parameters which control both the tree fitting and boosting processes to optimise performance while preventing over-fitting [31]. The two key parameters tuned to minimise out-of-sample error via k-fold cross validation are the interaction depth and shrinkage. The interaction depth is the number of splits allowed to partition the input variable space per tree and the shrinkage or learning rate controls the weight of each tree in the ensemble.

B. Benchmark Models
Two highly competitive benchmark models are implemented based on wind farm level power measurements and input features x t derived solely from NWPs. These features include temporal averaging, shear and others; a full list is provided in the supplementary information [26]. The first benchmark is a wind farm-level GBT quantile regression model, WF(x t ), and the second is an Analog Ensemble method, AnEn, described below. These benchmarks represent the state-of-the-art in wind power forecasting and were informed by [10], [11], [13] in particular.
The Analog Ensemble is a non-parametric algorithm that ranks similarity between the current forecast and a training dataset of historical forecasts with concurrent measurements. The k most similar concurrent measurements are used to construct an ensemble, assumed to be equally likely, from which empirical quantiles can be extracted. In this case, a mean GBT benchmark forecast is used as the explanatory variable and the model searches for the most similar out-of-sample mean power forecasts in the training dataset. The AnEn is also conditioned by lead time and the ranking metric used is euclidean distance. This algorithm is similar to the k-Nearest-Neighbours regression solution used in the second placed entry to the GEFCom2014 wind track [13]. For more information, the reader is referred to [32].

C. Turbine-Level Feature Engineering
Here, we present the method to engineer features based on individual wind turbines to feed into the wind farm-level forecast from related work by the authors [25]. This approach comprises of two layers: in the first layer, deterministic forecasts for individual wind turbines and the wind farm as a whole are produced; then in the second layer, density forecasts for the wind farm are produced by quantile regression using features from both NWP and the first layer. The deterministic forecasts for individual wind turbines y i,t are produced using the same explanatory variables x t as for direct wind farm-level forecasting benchmark. These forecasts are combined via a weighted sum over all D turbines to produce the deterministic wind farm forecast which completes the constitution of the supplementary feature set The weights ω are estimated via elastic net regression motivated by the necessity to regularise turbine forecasts because they are highly correlated. The weights are calculated via where α and λ are hyper parameters requiring tuning, Z and Y are matrices of vertically stacked instances of z t and y t [33]. The hyper parameter 0 ≤ α ≤ 1 controls the weighting of the two penalty terms, in effect trading off between ridge (α = 0) and lasso (α = 1) regression. Total regularisation is controlled by λ ≥ 0. The optimal values of α and λ are determined through grid search and k-fold cross validation. The final wind farm level density forecast, WFT(x SUP t ), is produced using quantile regression in the same way as the benchmark model but with the expanded feature set x SUP t . In order to refine the forecast skill, a reduced feature set selected from x SUP t is used. This selection process involves fitting a regularised GBT model with all the available inputs from x SUP t , then selecting and retaining only the features that have the greatest influence. This additional selection stage removes superfluous predictors which provide no additional information and only deteriorate forecast performance. The final variables retained in each model, and their relative importance, can be found in the supplementary information [26]. Low shrinkage and interaction depth hyperparameter choices for the GBT algorithm provide a degree of regularisation and feature selection from which the dimensions of the problem can be reduced substantially [30].

D. Bottom-Up Probabilistic Method
Here, we propose a novel approach to forecast the power from the wind farm by estimating the joint predictive distribution of production from all wind turbines in the farm in a copula framework. The marginals of the copula comprise of density forecasts which are produced for each turbine using quantile regression and spline interpolation from the collection of quantiles. A range of copula functions are explored.
Let the random variable Y i denote the wind power generation at the i th turbine, and y i the corresponding realisation (time indices are dropped to avoid notational clutter). The predictive CDF of the i th turbine is for i = 1, 2, ..D turbines. Sklar's theorem [34] states that for any D-dimensional cumulative distribution F (·) with continuous marginals F i (·) there exists a unique copula function C(·) such that F (y 1 , y 2 , ..., y D ) = C (F 1 (y 1 ), F 2 (y 2 ), ..., F D (y D )) , (6) which separates the marginal distributions and dependency structures between the marginals. This is useful because it decouples the problem into two constituent parts: 1) estimating the marginal distributions for each turbine, and 2) estimating the dependence structure via a copula function. Note that the copula function links uniformly distributed marginals u i = F i (y i ) and therefore the calibration of the density forecasts that form the marginal distributions is critical. Equation 6 can be alternatively written as is the inverse of the marginal distribution F i (·). Therefore, via sampling from the multivariate copula, pseudoobservations can be back transformed into the original domain to produce spatial scenario forecasts of power generation [21]. Next we introduce a range of options for the copula function.
1) Gaussian Copula: The Gaussian copula is given by where Φ −1 (·) indicates the inverse standard normal distribution function and Φ Σ (·) the D-dimensional normal distribution function with covariance matrix Σ and zero mean. In this context, the covariance matrix encodes the spatial dependence structure for the D-turbines which illustrates one of the reasons why the Gaussian copula is so popular: the dependency structure is characterised by a single covariance matrix. It should be noted that constitutes the transformation of the uniformly distributed marginals into the Gaussian domain where v i ∼ N (0, 1). Therefore, we can estimate the copula by calculating the sample covariance matrix for the transformed normally distributed variables. Using this framework, it is simple to sample from the multivariate distribution and generate D-spatial scenarios of the future generationv i . Each of the samples are back-transformed and then transformed into the original power domain using the inverse CDF for the i th turbinê which are summed over the D-turbines to give a snapshot of the wind farm forecast generationẑ j for a j th out of K ordered samples j = 1, 2, ..., K. Using the empirical distribution function the wind farm forecast with the correct underlying spatial dependence structures is finally given bŷ We refer to this approach (based on the empirical covariance matrix) as EGCop in the proceeding text. From observing the often noisy empirical covariance estimates in this and other studies based on temporal scenarios forecasting [21], [27], we also consider a parametric exponential covariance structure (PGCop). This approach has shown to be effective in increasing forecast skill by smoothing the empirical covariance matrix. The parametric spatial covariance between two turbines is where Δs is the distance between turbines i and j, and the parameter η is fit using weighted least squares regression using empirical covariance and distance information.
2) Copula Vine: The vine copula (VCop) is a series of bivariate copulas in which a different distribution family may be used for each pair. This allows for more complex dependency structures with asymmetry and tail dependencies to be captured, at the expense of added computational cost compared to the Gaussian method. This flexibility has encouraged recent studies considering vine copulas in the wind power forecasting context [23], [24]. The vine method works by factorising the D-dimensional density into the d(d − 1)/2 product of bivariate copulas where each pair copula is estimated via maximum likelihood from a set of distribution families (Gumbel, Gaussian, Student-t etc.). The optimal family for each pair-copula is chosen by minimisation of the Akaike Information Criteria (AIC). The implementation here follows [23] and for more detail please refer to [35].

III. CASE STUDY
The proposed methodologies and benchmarks are tested on two large UK wind farms, Wind Farm A (128 MW capacity, 56 turbines) and Wind Farm B (70 MW capacity, 35 turbines), which cover an area of approximately 20 km 2 and 15 km 2 respectively. Training and testing data are partitioned at Wind Farm A into 12 and 4 month blocks respectively and at Wind Farm B 15 and 6 month blocks, due to differences in data availability. The test dataset covers the months of December to March for Wind Farm A and April to September for Wind Farm B. Both test periods contain periods of high, low, and variable wind speed, and results based on the shortest test dataset (Wind Farm A) covers the most challenging period for forecasters. An example density forecast at Wind Farm A using the parametric copula method is shown in Figure 2.
Generation data from individual turbine SCADA systems and the wind farm power export meter are used at 30-minute resolution with instances of curtailment flagged and excluded from the forecasting exercise. Data is also adjusted for availability so the impact of outages on evaluation results is minimised. NWP data from the European Centre for Medium-Range Weather Forecasts is extracted at the closest grid point to each wind farm from 0 to 48 hours ahead in hourly intervals, with 2 issue times per day. Linear interpolation is used to match the resolution of the hourly forecasts and half hourly power data. The methodologies described are implemented in R using the packages glmnet, VineCopula, kknn, and gbm [36]- [40].

IV. RESULTS
The skill of probabilistic forecasts is evaluated using proper scoring rules and according the principle that it is desirable for density forecasts to be as sharp as possible subject to calibration [41]. Sharpness is a measure of the spread of the distribution and calibration (or reliability) is the property that the forecast spread matches that of the observations. Calibration of individual quantiles q is calculated aŝ where 1(·) is the indicator function. If the forecast is calibrated, the empirical coverage should satisfyâ (q) ≈ q for all q [1]. Calibration is visualised using reliability diagrams and quantile bias b (q) = q −â (q) [42]. The sharpness and calibration can be both quantified via the Continuous Ranked Probability Score (CRPS) [41] which compares the predictive forecast distribution F t to observation z t and rewards both sharpness and reliability. The hyper-parameters of the GBT and AnEn models considered here are tuned in order to minimise CRPS, subject to reliability. However, it is beneficial to tune hyper-parameters for different quantiles separately. Here, we produce 19 GBT models for quantiles from 0.05 to 0.95 in steps of 0.05. To minimise the burden of hyper-parameter selection, only hyper-parameters for the 0.05, 0.3, 0.5, 0.7, and 0.95 quantiles are optimised and then used for neighbouring quantiles. The shrinkage and tree depth hyper-parameters are selected using k-fold cross validation and a grid search of the parameter space on the training data. The number of trees is kept constant at 500, as is the minimum number of observations in each terminal node at 30, and the bag fraction at 75%. For the AnEn benchmark, the number of members in the ensemble is selected by minimising the CRPS on the training data via k-fold cross validation.
For VCop, C-vine and R-vine structures were both tested. The C-vine, which uses a star shaped configuration for each tree in the vine to connect the bivariate copulas, consistently provided lower error metrics than the R-vine structure, so only results from that structure are detailed here for brevity. Each bivariate copula is selected using the AIC on the training data and then used to produce forecasts on the test data. Full details of copula family selections are given in the supplementary information [26].
At Wind Farm A, all of the proposed methods show improvements over the two benchmarks across the whole forecast horizon. The CRPS and improvement over benchmark metrics at Wind Farm A are detailed in Table I. The feature engineering method reduces CRPS by 3.95% and 5.46% compared to direct wind farm-level forecasting using WF(x t ) and AnEn respectively. The only difference between the WF(x t ) benchmark and this method is the incorporation of features derived from turbine-level information. The copula-based methods also consistently outperform the benchmarks, and the Gaussian copula with parametric covariance matrix give the best performance of all models across all lead-times with reductions of 5.01% and 6.50% over WF(x t ) and AnEn respectively. The calibration plots in Figure 3 reveal that the turbine-level feature engineering and copula methods also marginally improve the reliability of the forecast compared to the WF(x t ) benchmark, and that these methods are all well calibrated, indicating that reductions in CRPS are mainly due to increased sharpness.
At Wind Farm B, as detailed in Table II, all proposed methods outperform the benchmarks, though to a lesser extent than Wind Farm A. Unlike Wind Farm A, the feature engineering approach provides the greatest improvement reducing CRPS by 1.24% and 2.39% compared to the WF(x t ) and AnEn benchmarks respectively. This improvement is also consistent across lead-times. The quantile bias plots, shown in Figure 4a, illustrate that the model calibration is slightly diminished when compared to the WF(x t ) benchmark from the 15th-60th percentile, but otherwise provides improvement outside this range. The reliability diagram in Figure 4b reveals that the proposed models are well calibrated and that variations between the models are small.
Bootstrapping [43] is used here to estimate the uncertainty of evaluation results. The CRPS values from the test datasets are re-sampled with replacement (number of samples equal to the size of the test dataset) and averaged 1000 times in order to estimate the sampling variation of the average scores in Tables I  and II. The results of this process are presented via boxplots in Figure 5 and show that improvement in CRPS compared to benchmarks is pronounced at both sites.
Comparing the copula methods at both wind farms, the Gaussian copula with parametric covariance matrix produces forecasts with lower CRPS and superior calibration, supporting parametrisation of the covariance matrix to produce a smooth spatial dependency structure. The more detailed and flexible dependency structure of the copula vine does not lead to further improvements in the forecast skill, and neither does the Gaussian copula with empirical covariance suggesting that both of these models are over parametrised given the volume of training data. The calibration of the vine copula in particular is poor compared to the WF(x t ) benchmark.
The regular layout of turbines at Wind Farm B is evident in the covariance matrix for that wind farm, shown in Figure 6 and the layout of the farm can be found in the supplementary material [26]. The block pattern is consistent with the evenly spaced rows of turbines. The covariance is relatively high across the wind farm with only 6% of values below 0.7, which implies that there is little information to be gained by considering individual turbines as forecast errors are very similar across the site. At Wind Farm A, as shown in Figure 7, the covariance structure is more complex because of the wind farm's irregular layout and terrain. Covariance is high within small areas of the wind farm but weak between regions.
Deterministic forecast performance is summarised in Table III. The median (p50 in Figure 2) of each predictive distribution is taken as the deterministic forecast and evaluated in terms of Mean Absolute Error (MAE) [44]. As expected, the behaviour of the results is very similar to the probabilistic case. Performance evaluations separated by forecast horizon and in terms of Root Mean Square Error are available in the supplementary information [26].
One feature of the bottom-up probabilistic method is the extended computational time required to train all the models. In this study, with a desktop computer (8 virtual cores, 3.6 GHz CPU, 16GB RAM) it takes approximately 10.5 minutes to fit the required 19 quantile regression models using parallelization. This is the length of the model training phase for the WF(x t ) benchmark. The feature engineering method will take 10.5 minutes plus an additional 3.5 minutes multiplied by the number of turbines. The bottom-up hierarchical method training duration is 10.5 minutes multiplied by the number of turbines. However, significant additional time is required to determine the vine copula structure. Operationally the time required to issue a forecast is negligible for all but the VCop method and re-training models would be required infrequently.
The case study results indicate that turbine-level data can be leveraged to improve forecast skill, although the characteristics  of the wind farm also have a bearing on the performance of the different methods. At a site with simple layout where the response of all turbines to the weather is similar, and therefore forecast errors are similar, only a modest improvement in forecast skill is realised by considering turbine-level information. In this situation there is no advantage in modelling the full spatial dependency structure between forecast errors at individual wind turbines; it is sufficient to supplement a conventional forecasting method with turbine-level features. However, at a complex site modelling the spatial covariance structure provides greater improvement -5% greater in this case study -than feature engineering alone.
Importantly, these improvements come at very low cost. Turbine-level SCADA data is routinely collected and stored by operators, and only modest computational power is required to realise the benefits of the methods proposed here. Furthermore, turbine-level data is only required for training, not in real-time operation, so there is no need for new communications or data feeds, and third party forecast providers could enhance their forecasts for individual wind farms with a static dataset of historic turbine-level data. Importantly, the proposed framework is not constrained to GBTs as these can be readily substituted with any other method of producing density forecasts.

V. CONCLUSION
Turbine-level data provides valuable information about how a wind farm responds to different weather conditions, and the nature of forecast errors, which is not accessible when only considering a wind farm's total power production. Two methods for improving wind power forecasting by leveraging data from individual wind turbines are evaluated. The first is a feature engineering approach whereby deterministic forecasts for individual turbines are aggregated and used as supplementary input variables to a conventional wind farm-level model [25]. The second is a novel bottom-up probabilistic approach which forecasts the joint predictive distribution of generation from all turbines in a copula framework, which is then used to produce a wind farm-level forecast.
Both methods are shown to increase forecast skill compared to two highly competitive benchmarks, particularly at the site with complex terrain. At Wind Farm A, the Gaussian copula method with parametric covariance matrix reduces CRPS by 5% compared to the best performing benchmark while the feature engineering approach provides a 4% improvement. At Wind Farm B, both methods improve forecast skill by approximately 1%.
These improvements come at almost no cost as turbine-level data is routinely recorded by SCADA systems and this data is only required for training forecast models; no additional communications or data flows are required operationally. Therefore, both utilities producing in-house power forecasts and third party forecast providers could enhance their forecast performance using a static dataset of turbine-level data. Future work should explore the benefits of turbine-level data in spatio-temporal forecasting and the dynamic evolution of covariance structures. For example, [22] propose an adaptive update scheme to track slow changes in temporal covariance, but fast changes require dependency structures to be conditional on suitable explanatory variables or regimes.