Predictability of Power Grid Frequency

The power grid frequency is the central observable in power system control, as it measures the balance of electrical supply and demand. A reliable frequency forecast can facilitate rapid control actions and may thus greatly improve power system stability. Here, we develop a weighted-nearest-neighbor (WNN) predictor to investigate how predictable the frequency trajectories are. Our forecasts for up to one hour are more precise than averaged daily profiles and could increase the efficiency of frequency control actions. Furthermore, we gain an increased understanding of the specific properties of different synchronous areas by interpreting the optimal prediction parameters (number of nearest neighbors, the prediction horizon, etc.) in terms of the physical system. Finally, prediction errors indicate the occurrence of exceptional external perturbations. Overall, we provide a diagnostics tool and an accurate predictor of the power grid frequency time series, allowing better understanding of the underlying dynamics.


INTRODUCTION
T HE electrical power system relies on a constant balance of supply and demand. Abundant energy will speed up generators and lead to an increase of the power grid's (mains) frequency. Similarly, a shortage of generation slows down the same generators and reduces the systems frequency as kinetic energy stored in the generator is transformed into electrical energy. Control systems, ordered from primary to tertiary control, help to ensure the balance of supply and demand by closely monitoring the frequency trajectory and maintaining it close to the desired reference value of f = 50 or 60 Hz [1]. Large deviations of the frequency away from the reference require decisive control actions and cause high costs [2].
To optimize the usage of costly control actions, we require a precise understanding of the power grid frequency. This frequency is neither constant nor varying slowly but is instead highly stochastic and subject to multiple external influences [3], [4]. For example, the organization of the energy market leads to deterministic imprints of dispatch activities in the frequency in forms of sudden jumps or drops [5]. Simultaneously, an increasing share of renewable generators decreases the inertia available in the grid [6] and introduces additional fluctuations [7], [8]. Given this hybrid stochastic and deterministic nature, the question arises to which extend the frequency trajectory is predictable. A precise estimate of the future frequency trajectory would be very beneficial as it would allow an estimate of necessary control power early in time, saving costs [2] and stabilizing the grid [1].
Beyond precise forecasts of the near future trajectories, a fundamental understanding of the power grid frequency dynamics is critical as this one-dimensional time series encodes vast information on the stability and the current state of the power system [9]. Only a solid understanding of how the energy mix, demand patterns and energy market rules impact the power system and its stability will allow us to implement and control highly renewable power systems in the future. As the starting point to develop such an understanding we study the power grid frequency since frequency data is much more readily available [10] than precise demand or generation values in a given synchronous area.
With the increasing popularity of machine learning techniques [11], there are many tools available to forecast time series, such as the power grid frequency. Recent studies used artificial neural networks (ANN) [12] to predict hourly frequency time series in India based on features such as wind power generation and power demand. Other authors [9] used a linear state space model and uncertain basis function to predict US frequency time series for up to one second, while a Bayesian network was used to predict the frequency time series for up to 3 minutes [13]. Finally, ARMA processes have been used in the British grid to achieve prediction horizons of tens to hundreds of seconds [14] and in the US to achieve forecasts of 5 to 30 minutes [15].
We will particularly focus on k-weighted-nearestneighbor (WNN) methods, which have gained popularity in a variety of fields from biology [16] to financial systems [17], but have also been applied in the energy sector, e.g. to forecast electricity prices [18] or power demand [19]. In contrast to earlier applications of WNN predictors on the power grid frequency [15], we improve and optimize the methodology, extend the prediction accuracy and horizon, compare different synchronous areas, and establish a system-relevant null model. Furthermore, we establish the accuracy and specificity of the forecasting algorithm as important characteristics to analyze the dynamics of the power system in general and the interplay of internal and external influences in particular. WNN predictors are particularly well suited to be interpreted as they are among the best explainable machine learning algorithms.
In this article, we use frequency recordings from several European synchronous areas to motivate the mean frequency (daily profile) as a suitable null model (Section 2) and develop a WNN predictor to forecast the time series (Section 3). We demonstrate how our predictions outperform the null models in particularly on short prediction horizons and provide in-depth analysis and interpretation of when and how the power grid frequency can be predicted (Section 4).

Data Sources and pre-processing
We train and test our frequency predictor on large highresolution datasets from three different European synchronous areas. In particular, we use publicly available frequency recordings of the years 2015-2018 from the Continental Europe (CE) [20], the Great Britain (GB) [21] and the Nordic synchronous areas [22], following the naming convention used in [23]. The data from CE and from GB comes with a one-second resolution, while the Nordic data exhibits a resolution of 0.1 s. Moreover, some of the datasets have varying formats and multiple frequency recordings are corrupted or missing. We therefore resample the data to a common one-second resolution and conduct a thorough preprocessing (Supplemental Material). The pre-processed time series are available online [24], thus providing a ready-touse database to develop new methods for frequency analysis and prediction.
We want to point out that our pre-processing involves the identification and exclusion of corrupted measurements. However, the k-nearest-neighbor method can cope with the resulting holes in the time series. Missing segments are simply ignored during the nearest neighbor search. This is a great advantage of the WNN predictor, as we can harness the full length of the dataset without manipulating it too much.

Characteristics of the frequency time series
The frequency trajectory exhibits deterministic as well as stochastic characteristics, which can be attributed to different dynamics within the power system. Firstly, a frequency deviation generally reflects a mismatch of power generation and demand ( Fig. 1(a)). Such a mismatch occurs when the power generation does not match the expected demand curve. The demand itself evolves continuously and shows typical daily, weekly and seasonal patterns [2]. In contrast, the power generation exhibits discontinuous behaviour due to the trading on electricity markets and the resulting changes of the power plant dispatch [5]. In Europe, this trading is operated on various different spot-markets such as the European Energy Exchange Power Spot (EPEX SPOT), which covers countries in Western and Central Europe. The resulting dispatch changes are commonly scheduled for discrete time intervals of one hour, 30 and 15 minutes [25], [26]. The mismatch between the step-like behaviour of the generation and the continuous behaviour of the load leads to regular frequency jumps at the beginning of these trading intervals [3], [5], [10]. Fig. 1(b) shows a frequency sample that displays these typical quasi-deterministic jumps after every 15 minute interval.
Secondly, the frequency characteristics are determined by the frequency control schemes. To assure a secure power system operation, these control measures drive back the frequency after a deviation from its reference value of 50 Hz [1]. They thus lead to a characteristic behaviour after a frequency jump or sag, which can for example be observed in Fig. 1(b). On time scales of seconds after a disturbance, the inertia of the rotating generators and the energy supplied by primary control limits the frequency deviation caused by the disturbance. Afterwards, on time scales of several minutes, secondary and tertiary control set in and restore the system to a state of stable operation at the reference frequency [1].
Finally, the frequency characteristics are influenced by other external factors, that are of rather stochastic nature. Fluctuations of the demand directly affect the power balance, where demand forecasting errors [27] and large social events [28] can lead to significant unexpected frequency deviations. The variability of renewable energy sources causes additional frequency fluctuations due to its inherent intermittency [29] or due to generation forecasting errors [30]. In summary, the frequency characteristics are thus determined by a complex mix of stochastic and quasi-deterministic processes.

Analysis of frequency patterns
Despite its complex characteristics, the power grid frequency still exhibits regular patterns with a specific recurrence period. We identify this period by searching for regular peaks in the auto-correlation function (ACF) with time lags of up to one month ( Fig. 2(a)). In all grid areas, the ACF displays regular peaks with a period of one day. Significant (but less pronounced) peaks with a period of 12h only show up in the CE data. In CE and GB, the ACF also exhibits regular peaks with shorter periods of 15 min, 30 min and 1h, but these peaks are much smaller than the daily peak [4]. Frequency patterns are thus most strongly correlated with patterns that occur one or multiple 24hperiods later. We conclude, that the one-day period is the main recurrence time for frequency patterns within all three synchronous areas.
The average pattern that belongs to this main recurrence period is the mean daily frequency evolution, which we call the daily profile. A formal definition of the daily profile is given later in (10). The daily profiles of our three datasets exhibit some common feature but also important differences ( Fig. 2(b)). All profiles show pronounced frequency jumps at the beginning of the full hour, which reflect the impact of the hourly trading interval. In particular, the CE profile displays sharp peaks of different heights, while the peaks in the GB profile are the least pronounced. The direction and height of the peaks in the CE profile are time-dependent and related to whether the demand curve is rising or falling [5]. These results are consistent with the ACFs in Fig. 2(a). There, we also observe the strongest correlation for the CE data and the lowest correlation for the GB data. The CE frequency is thus strongly determined by regular daily patterns, while the GB frequency only exhibits weak patterns within this period.
The relevance of regular patterns for the frequency time series is further characterized by the standard deviation (StD) in Fig. 2(c). We calculate the StD for each time within the hour, i.e. the StD at 0 min is computed as the StD of Over-production causes a frequency increase and under-production a frequency decline. (b): Example frequency time series from the CE synchronous area [20]. It displays the typical frequency jumps at 15 minute intervals that are caused by the trading on electricity markets and subsequent changes of the power plant dispatch. all frequency recordings with time stamps XX : 00 : 00 averaging over all hours XX and days. In general, CE exhibits the lowest and GB the highest variability. The StD peaks after the full hour trading event in the Nordic and especially in CE areas, where the StD almost doubles after the full hour trading peak. The exact value of the full hour frequency peak thus exhibits a particularly high uncertainty.
We conclude, that CE is a comparatively low-noise system with defining deterministic events that drive the standard deviation. Deterministic patterns are least pronounced in GB, such that random fluctuations are of highest importance compared to the other areas. The Nordic data is mostly in between. The differences between the grid areas can be attributed to different system properties as well as varying regulations for frequency control and market operation. For example, the low variance in the CE area is likely related to its large size [10], which provides much inertia and enables spatial balancing of nodal power mismatches. Moreover, the deadband, i.e. the frequency range without active control, is the largest in GB thus resulting in a high frequency variability [23]. Despite these differences, there is one important common result: In all three cases the main recurrence period of frequency patterns is one day. The same result was found for frequency time series from US grids [15]. This highlights the importance of the daily time scale and the corresponding daily profile for the prediction of future frequency patterns.

Weighted Nearest neighbors
The WNN method predicts future values of a time series by looking for similar patterns in the past. To predict the frequency f (t) for t ≥ t 0 , we thus cut the historical time series into non-overlapping patterns F n with γ data points and a time delay τ : The vectors F n form an embedding of the time series in a space of dimension γ, which is also referred to as delay embedding in the context of time series analysis [31,Chap. 2]. To include the information of all data points, we choose a delay equal to the original time resolution of τ = 1 s. The WNN predictor searches for patterns F n that are similar to the initial pattern F 0 , which ends at the prediction start t 0 . However, we already know that frequency patterns mainly recur with a period of one day (Section 2). Therefore, we only look for similar patterns at the same time of the day, i.e. only within the set From this set, we choose those patterns that are closest to the initial pattern in terms of the distance Here, · denotes the Euclidean distance. Given this metric, we sort the patterns as ( .
|F| being the total number of patterns. We then select k patterns with the smallest distance to the initial pattern and obtain the ordered set of nearest neighbors In practice, we use the scikit-learn package to search and sort the nearest neighbors [32]. Finally, we assume that trajectories, which were similar in the past, will likely be similar in the future (Fig. 3).
Technically, the prediction f p (t 0 + ∆t) is therefore given by a weighted average of the trajectories succeeding the knearest-neighbors: The weights α n are chosen to decrease with the distance ( . F nj ) which introduces an additional smoothing [31,Chap. 3]. Following [18], we use a linear weighting that has the following form: In practice, we apply the WNN method to predict the time steps ∆t ∈ {1s, 2s, ..., T } with a maximum prediction length of T = 3600s. Longer predictions are not relevant, since the superiority of the WNN method over the null models is mostly revealed within the first 30 minutes of the prediction (see Section 4).

Performance estimation
During the optimization and evaluation of the WNN predictor, we use the Mean Square Error (MSE) as the central performance measure. In particular, we evaluate the timedependent MSE of a general predictorf (t 0 + ∆t) for each prediction step ∆t by averaging over different starting times t i 0 : To select different starting times, we randomly choose N = 5000 different start hours h i 0 . The starting time is then given by t i 0 = h i 0 + ∆t 0 where h i 0 counts the hours after the start of 2015 and ∆t 0 represents a fixed starting time within the hour. In this way, we account for the frequency dynamics, that crucially depend on the time within the hour as discussed in Section 2.
To estimate the out-of-sample performance of our predictor, we split our data into different subsets (equally for all synchronous areas). In general, the years 2015 and 2016 serve as training set, which is searched for nearest neighbors during the WNN prediction. To optimize the hyperparameters of the WNN predictor, we evaluate its MSE on a validation set that comprises the year 2017 (Section 3.3). Finally, we define the year 2018 as our test set. On the test set, we compare the performance of our WNN predictor to system-specific null models (Section 3.4). To predict the future of the present (initial) pattern F 0 , the WNN method looks for similar patterns F nj in the past. The patterns that are most similar to the initial pattern form the set of nearest neighbors. Here, we have chosen a set S 2 of two nearest neighbors. The average of their subsequent trajectories generates the WNN prediction.

Hyperparameter optimization
Our WNN method exhibits two hyperparameters which are the number of nearest neighbors k and the window size (or pattern length) γτ . We use a window size of γτ = 60 min unless stated otherwise, which provides a good prediction at low computational effort. The window size is thus not explicitly optimized, but we investigate its impact on the prediction accuracy in Section 4.5.
In contrast, we strictly optimize the number of nearest neighbors k by using two different approaches. In the fixedk approach, we estimate an optimal number of nearest neighbors by minimizing the time-averaged prediction error MSE(f p ) of the WNN predictor f p : In practice, we perform a grid search on the set G = {1, 3, 5, ..., 451} to determine a fixed optimal value k opt ∈ G for all prediction times ∆t ∈ [1s, T ]. This is how the WNN method is commonly used [18], [19]. We denote this as fixedk WNN prediction.
In the adaptive-k approach, we minimize the timedependent error M SE ∆t (f p ) (6) for each prediction step ∆t individually. In this way, we account for the very different prediction horizons we investigate in our paper. These range from several seconds to one hour thus making it highly probable to obtain different optimal k-values for different prediction horizons. In practice, we therefore calculate a time-dependent estimator k opt (∆t) for each prediction step ∆t by performing a grid search on the set G. To make the estimator more robust against noise, we smooth k opt (∆t) using a sliding window with a length of one minute. Finally, the adaptive-k WNN prediction is calculated by simply inserting a time-dependent k into (4).

Null models
On our test set, we compare different predictors based on their Root Mean Square Errors (RMSE), which reflects the actual frequency error in Hz: We use two easily interpretable null models to benchmark the performance of the WNN predictor. Our first trivial null model is the reference value of 50Hz, which is also the frequency mean: f m (t 0 + ∆t) = 50 Hz.
Our second null model is the daily profile. In Section 2, we have shown that the daily profile is the most important system-specific pattern that recurs with a period of one day. It should therefore be a benchmark model for every newly proposed frequency predictor. In practice, we calculate the daily profile predictor by averaging over all the patterns in the set F (from (2)): To make its prediction comparable to the WNN predictor, we have restricted the set F to patterns from the training set. Note, that the WNN predictor (4) converges to the daily profile predictor in the limit k → ∞ when applying uniform weights.

Forecast examples
The best and worst prediction examples give us a first impression about the performance of the WNN predictor (Fig. 4). We complement these examples with an estimate of the prediction uncertainty σ ∆t that is based on the StD of the nearest neighbors: Here, · denotes the average over all n ∈ S k . For the adaptive-k WNN, we use k = max ∆t k opt (∆t) , which turns (11) into an upper bound for the uncertainty. We observe, that the best predictions are essentially a smoothed curve of the real trajectory. The prediction is often very similar to the daily profile, but performs better especially in the first 15 minutes. Even more, the prediction uncertainty provides a good estimate for the short-term variability of the real data.
The worst predictions in GB and CE make mistakes at the boundaries but still capture the remaining trajectory (e.g. 30-45 min in GB). In both examples, the daily profile and the WNN forecast predict the same direction for the hourly frequency jump but the real trajectory deviates in the opposite direction. The deviation indicates unforeseen events affecting the grid frequency trajectory, which are also not captured at all by the daily profile. This relation points to a potential application of time series prediction in the posteriori analysis of power system operation. A large forecasting error can serve as a tool to identify external (unforeseen) events.
Meanwhile, the worst prediction in the Nordic area stays nearly constant and the corresponding real trajectory randomly oscillates around a shifted value. This exemplifies the weak performance of the WNN predictor for unspecific patterns with strong noise.

Performance of forecasting methods
We evaluate the performance of our forecasting methods by calculating their RMSE on our test set (Fig. 5). The results show that our WNN predictor outperforms both null models in all grid areas. Its RMSE is smallest for CE and largest for GB. This relates to Section 2 where we identified GB as the most stochastic and CE as the most deterministic Figure 5: The WNN predictors outperform alternatives. The WNN predictor outperforms the null models in all three synchronous areas by returning the smallest RMSE, especially in the first 15-30 minutes. The scale of the y-axis differs between the subplots, since the GB area exhibits much larger errors than the CE area. The RMSE of the WNN predictor is further strongly time-dependent and converges to the daily profile towards the end of the prediction. and thus most predictable grid. The improvement of the WNN predictor relative to the daily profile is largest in Nordic (up to 30%) and smallest in CE (up to 20%). This is due to the fact, that the daily profile itself is already a good predictor in CE. Meanwhile, the daily profile performs much worse in the Nordic area, where its RMSE nearly follows the 50Hz prediction error.
Comparing performance over time, we observe that the WNN outperforms the null models especially during the first 15min. As the prediction length increases, the WNN prediction converges to the daily profile. On the other hand, the performance is also clearly affected by the trading events (especially in CE). This time-dependence will be investigated in more detail in Section 4.4 and 4.5.
Finally, we note that there is no significant difference between the adaptive-k and the fixed-k WNN predictor for long predictions of up to 60 minutes (Fig. 5). However, we observe a significant difference for very short prediction horizons, which we will discuss in the next section.

Optimal number of nearest neighbors
Determining the optimal number of nearest neighbors k opt (∆t) can help to better understand the functioning of the WNN predictor. Moreover, it yields valuable information about the grid frequency dynamics in general. We present the optimization results in Fig. 6, which shows the normalized RMSE landscape as a function of k and ∆t as well as the optimal values k opt (∆t). The adaptive number of nearest neighbors tends to increase the more the prediction is in the future. However, the minimum is very flat at most time steps and both the adaptive-k and the fixed-k predictor lead to very similar errors (in agreement with results from Section 4.2). We only observe a significant difference within the first minute, where the adaptive-k WNN yields up to 5% better results than the fixed-k approach. We conclude, that the adaptive approach is slightly better, especially in the first 1min. We will therefore only apply the adaptive-k WNN method throughout the rest of the paper.
As an application, we can interpret k opt (∆t) in terms of the predictability of frequency patterns. A low number of nearest neighbors corresponds to well-defined trajectories that match to some past trajectories accurately. Contrary, a higher number of nearest neighbors k opt (∆t) indicates that trajectories are rather unspecific with respect to the history. A large number of trajectories has to be averaged such that the prediction is similar to the daily profile. In particular in the first 15 minutes, the adaptive-k yields very low k values. The frequency trajectory is thus very specific in this time regime. As the prediction time increases, the optimal number k opt (∆t) rises. The trajectory thus becomes more unspecific with respect to past patterns and thus less predictable for the WNN predictor. Consistently, the WNN predictor approaches the daily profile at the end of the hour, which we obtain for k → ∞.

Impact of the prediction start
Up to now we have focused on predictions starting at full hours, such that the prediction interval coincides exactly with the main time scale of energy trading and power plant dispatch. We now widen our scope and assess the time-dependence of the WNN performance by initializing the prediction at different starting times ∆t 0 (Fig. 7). To still relate the WNN performance to our null models, we additionally assess its relative error RMSE(f p )/RMSE(f d ) The fixed k opt minimizes the aggregated MSE leading to very similar prediction errors in all but the the first minutes.
("relative RMSE"), which is normalized by the daily profile error RMSE(f d ).
Irrespective of the trading events, we observe two different time regimes depending on the prediction length. During the first 15 minutes, the relative RMSE and the optimal number k opt (∆t) are increasing while still being much lower than future values. Here, the frequency dynamics exhibit specific patterns, that resemble particular patterns in the past (as described in Section 4.3). This specific memory is lost over time, as the relative RMSE increases continuously during the first 15-30 minutes. In particularly in the CE and Nordic areas, one can identify two clearly distinct time scales of memory loss: Firstly, there is an initial rapid increase of the RMSE and the relative RMSE within approximately one minute. It is followed by a slower, not necessarily monotonous increase of the relative RMSE on timescales up to tens of minutes. This clear separation of time scales is especially visible when energy trading is important, i.e. at full hours being strongest in the CE area. It could be attributed to the grid inertia or to control measures that provide additional memory for a short period of time.
Finally after 15-30 minutes, the relative RMSE reaches a relatively constant level in CE and GB with values close to one. Here, the WNN prediction does not differ much from the daily profile anymore. Meanwhile, the relative RMSE and the optimal number k opt (∆t) continue to rise for up to 60 minutes in the Nordic area. Here, the memory of specific historic patterns thus reduces much slower compared to the other areas. We will come back to this effect in Section 4.5.
In addition to the prediction length, the trading events play a crucial role for the prediction. In all grid areas, the RMSE increases strongly around the one-hour trading event. For CE and Nordic, we observe this also at 15 and 45 minutes. Around these events, the dispatch is changed abruptly, causing large frequency deviations, which are hard to forecast accurately (Fig. 2(c)). The optimal number of nearest neighbors k opt (∆t) and the relative RMSE also peak at the trading event. This indicates a lack of specific information about the trading peak and a high uncertainty connected to it. CE is a special case, as its one-hour trading jump is particularly hard to forecast. Interestingly, k opt (∆t) decreases again after the peak. The trajectory thus becomes more specific and predictable again, probably due to the control measures reacting to the disturbance in a pre-defined way.
The trading peaks have another important impact on the prediction error. After a trading event, the RMSE loses its dependence on the starting time ∆t 0 and joins the error curve of earlier prediction starts. This happens in all grid areas, at latest during the one-hour trading event. In practice, it means that our prediction starting at 55 min performs approximately as well at 60 min as the one that started at 0 min. The information contained in the initial pattern thus looses its significance with the occurrence of a trading event. In other words, the trading events cause a memory loss in the frequency trajectory.
We conclude, that the best WNN prediction is always obtained right after the prediction starts. On a time horizon of up to 30 min, the prediction is significantly better than the daily profile. However, this time horizon is considerably shortened if there are trading events, such as the full hour dispatches.

Impact of the window size
We finalize the discussion of the WNN predictor by shortly investigating the impact of different window sizes. In addition to the window size γτ = 60 min (which we have used throughout this paper), we show the prediction errors for γτ = 15 min and 30 min in Fig. 8. Figure 7: Trading events shorten the prediction horizon. Here, we show the optimal number of nearest neighbors k opt (∆t) (a), the RMSE (b) and the relative RMSE (c), which is normalized by the daily profile error. Irrespective of the starting time within an hour ∆t 0 , the predictions perform best within a time horizon of 15min. However, trading events introduce additional uncertainty thus increasing the prediction error and shortening the prediction horizon.
On time scales of several minutes to one hour, there is no significant difference between the predictors in CE and GB. The large window is slightly better than the shorter ones. In contrast, the smallest window performs best in the Nordic area especially in the first 15 minutes. Shorter windows contain more specific information about the near past than longer windows. In the Nordic grid, the significance of very specific historic patterns thus prevail much longer than in the other grids. This is consistent with Section 4.4, where we have seen that the memory of specific historic patterns reduces relatively slow in the Nordic area.
On time scales below one minute the smallest window performs best for all grid areas (inset). Shortly after the prediction starts, the memory of the last few seconds determines the trajectory. Irrespective of the area, the shorter window thus performs best on this time scale, as it contains more specific information about about the near past of the trajectory.
We conclude, that small window sizes are best for prediction horizons below one minute. For several minutes to one hour, large window sizes are slightly better in CE and GB. If computational resources are scarce, smaller window sizes can also be used here, as they are less computationally expensive but only slightly less accurate. In the Nordic area, small window sizes are the best even for several minutes. However, the performance differences are small in all grid areas, which also justifies that we did not systematically determine the optimal value for γ, thus saving computational time during training.

DISCUSSION
Summarizing, we have demonstrated how a k-nearest neighbor (WNN) approach provides an accurate forecast Figure 8: Shorter windows predict more accurately at the beginning. The optimal window size (or pattern length) γτ is different depending on the prediction length. During the first minute, the shortest window performs best in all grid areas, as it contains more specific information about the near past. For several minutes to one hour, the results differ between the areas.
of the power grid frequency. The predictor performs particularly well when using an adaptive number of nearest neighbors. Compared to previously existing forecasts of the power grid frequency [9], [12], [14], [15], we make three key contributions: First, we introduce the daily profile as a relevant and system-specific null model. Secondly, we offer an extensive statistical demonstration and discussion of the performance of the WNN method for different synchronous areas on time scales of several seconds to one hour. Thirdly, we interpret the time-dependent predictability and optimization results based on the economic and physical dynamics in the different synchronous areas. In that way, we establish machine learning techniques as valuable tools for an a posteriori assessment of power system operation and stability.
Our results can be used to improve power system stability. Since our estimates are more precise than the daily profile, they could be used to estimate necessary control power capacities. This is particularly interesting since we have a solid prediction horizon of about 60 minutes, making slower, typically cheaper forms of control available, instead of purely relying on expensive primary control [1], [2]. Especially during the first 15-30 minutes, our predictor is significantly more accurate than the daily profile and could replace it for planning purposes. Crucially, our analysis is not limited to any specific grid but can be applied to any power system, given sufficient data to train the algorithm.
We even gained valuable lessons when the predictor performed worst: The largest prediction errors are associated with unforeseen events that are also missed by the daily profile. Therefore, the introduced WNN predictor could also be used as a diagnostics tool to identify external perturbations, where e.g. renewable generation [33] or singular demand patterns caused by large sports events [28] impact the frequency dynamics. Furthermore, even our worst predictions correctly returned the expected average and standard deviation of the frequency time series for the next hour. Hence, the predictor could be used as a worstcase estimator to determine how much control power will be maximally necessary during the next hour to guarantee stable operation.
Finally, we went beyond pure forecasting of the next sixty minutes of the power grid frequency dynamics but instead achieved a better understanding of the different synchronous area: Monitoring the number of nearest neighbors allowed us to distinguish deterministic and stochastic behavior of different synchronous areas but also of different time intervals. Our analysis reveals that before the market acts every 15 minutes, the time series becomes less predictable but becomes more predictable after the power has been dispatched. This insight could be used to modify dispatch strategies in order to minimize the unpredictable impact on the frequency, reducing the required control power and thereby saving money.
Our results on the forecast of the power grid frequency can be extended in multiple directions in the future. Firstly, we were restricted by data availability. A similar forecast and interpretation could be developed and applied to power grid frequency time series from other regions in the world, e.g. data from the Eastern Interconnection in the US or from the Irish grid, with its high wind penetration. Secondly, many alternative forecasting methods are available, from artificial neural networks (ANN) [11] and recurrent neural networks (RNN) [34] to classical methods of time series prediction [31]. However, a fully comprehensive review of all available methods was beyond the scope of this study and will be left for the future. Finally, we are convinced that our approach to forecasting and machine learning as a tool to understand a system's dynamics should also be applied to other time series, such as renewable generation [35], air pollution [36], [37] or the stock market [38]. Supplementary Material: Data preparation ! The frequency data for this paper has been preprocessed. A thorough pre-processing was necessary, as all datasets contain missing or corrupted data points and exhibit different formats and time resolutions. During the pre-processing we uniformly mark missing and corrupted data points, which offers the opportunity to individually fill, interpolate or exclude these invalid measurements. The whole process consists of two steps (Section 1) and exhibits some free parameters that are fixed based on the data (Section 2). Our code and the ready-to-use pre-processed data are available online [1].
Our time series include frequency measurements from the Continental Europe (CE), the Great Britain (GB) and the Nordic synchronous areas. The raw data is described in Table 1. By the 18th February 2020, the data has been publicly available on the websites of different Transmission System Operators (TSO). We follow the naming convention established in [2] and use "area" synonymous with "synchronous area".

Convert data to common format
The data contains time stamps with varying formats and Daylight Saving Time (DST) changes. We thus process the time stamps and convert them to a uniform format. We point out that we use local time including the DST changes in the processed data. This is relevant for our weighted-nearestneighbor prediction, since we compare frequency patterns based on their time stamp. By using the local time, we account for other important socio-economic patterns (such as the load) that rather follow local time than UTC time. In addition, we convert the time series to a common resolution of one second. We therefore resample the Nordic data by averaging it in non-overlapping windows of 10 data points. Finally, we mark missing time stamps by inserting a NaNvalue (Not a Number) into the time series.

Mark and clean corrupted data
We search for different types of corrupted data. In particular, we identify isolated peaks, too high or low frequency values and too long windows with constant frequency. The identification is based on the frequency f (t) or their increments ∆f (t) = f (t) − f (t − 1) and follows these definitions: • A time stamp t contains an isolated peak, if ∆f t and ∆f t+1 have the opposite sign and are too large, i.e. |∆f t |, |∆f t+1 | > ∆f c .
• Following [6], frequencies below 49Hz and above 51Hz are consider as unrealistic and thus as too low (and too high).

•
We define a too long constant window as a time intervals of length T > T c that exhibits increments |∆f | < 10 −9 Hz.
These three types of data points are marked and converted to NaN-values. This offers the possibility to apply custom cleaning methods to the data (for example interpolation or data exclusion). Here, we clean the data by filling (at maximum) N f NaN-values with the last valid frequency value. During the prediction, we exclude frequency patterns with remaining NaN-values from the simulations.

CHOICE OF PRE-PROCESSING PARAMETERS
The above procedure exhibits three free parameters. We select the parameters ∆f c , T c and N f in the following way.
• Isolated peaks mainly occur in CE data ( Table 2). The increment histogram for CE ( Fig. 1(a)) shows minima at ∆f = ±0.05 and separated maxima beyond this values. That indicates that the increments beyond this threshold are caused by another process than the regular stochastic frequency movement. The frequency sample in 1(b) also shows that ∆f c = 0.05 would include most of the isolated peak. The GB and Nordic area do not show as many isolated peaks in their data (Table 2). A manual inspection of the isolated peaks with ∆f c = 0.05 reveals, that no regular data points are accidentally marked as corrupted in the GB and Nordic data. We therefore keep the choice ∆f c = 0.05.

•
We allow intervals of constant values below a length of T c = 1 min. We mainly predict the frequency on time intervals of several minutes, so that constant windows with T < 1 min are negligible. • We choose N f = 6 since most of the corrupted CE and GB data can be cleaned in that way (Fig. 2). At the same time, we do not manipulate the time series too much on time scales relevant for our prediction. Note, that a large part of the marked data in the Nordic time series will not be filled by N f = 6s. This is mainly due to the large amount of missing data within the Nordic frequency recordings (Table 2).
Using the procedure outlined above, we obtain clean data to be used for training and validation, see also [1], where the clean data can be downloaded.  : Missing and corrupted data points are marked as NaN-values. As a result, there are many intervals of NaN-values in the processed data. This figure displays the histogram for the lengths of these segments. CE and GB exhibit a particularly large amount of intervals with lengths below 6 time steps. Filling N f = 6 values thus eliminates many corrupted or missing data segments while not changing the data too much. Table 1: Raw frequency recordings are available from different TSOs. For each synchronous area, we obtained publicly available frequency data from one of the Transmission System Operators (TSO). TransnetBW is a German TSO, while Nationalgrid and Fingrid are British and Finish, respectively. Although we have only used the years 2015-2018 in this paper, we have pre-processed a longer period of time.