Practical Framework for Frequency Stability Studies in Power Systems With Renewable Energy Sources

The transition from power systems dominated by synchronous machines to systems based on converter-based generation technologies (CGTs), is weakening currently robust power systems by reducing system inertia with the replacement of synchronous generators with low-inertia CGTs. From a frequency stability viewpoint, this is resulting in faster frequency dynamics and more frequent and larger frequency excursions after system contingencies, thus significantly affecting the stability of power systems dominated by CGTs, requiring detailed stability assessments to ensure the secure integration of CGTs. In this paper, a practical framework is presented for frequency stability studies based on time domain simulations of power systems with CGTs. A fundamental part of the proposed approach is the use of a filter to first identify worst-case scenarios among various possible system operating conditions. Once these worst-case scenarios are identified, a clustering technique is used to select representative worst-case operating conditions to evaluate the frequency stability of the system using time-domain simulations. The effectiveness of the proposed framework is demonstrated on the Chilean Northern Interconnected System (NIS), where it is shown that the proposed filter is able to quickly identify worst-case scenarios for further study. Moreover, we show that the selected representative operating conditions cover a wide-range of worst-case frequency responses, demonstrating the effectiveness of the proposed tool for frequency stability analyses.

to avoid the social and economic consequences that major blackouts may have on society [8].
To prevent power system instability, system operators perform different stability studies in order to detect potential problems [9]. Based on these studies, preventive and/or corrective measures aiming to maintain system integrity during contingencies can be defined [10]. Stability assessments are usually performed by means of offline time-domain simulations (TDS) [11], following a worst-case approach, performing simulations for only a limited set of relevant operating points (OP) and contingencies. These critical conditions and contingencies are usually selected based on the historical performance of the system and the experience of the operator and planner [12]. For instance, frequency problems are more likely to arise during periods of low net load, where only a limited number of SGs are available to support frequency response. The critical contingency considered for the simulations is the sudden outage of the largest online generation unit [13]. In this context, stability assessments of large power systems with thousands of buses and hundreds of generators and contingencies, and for a large number of typically encountered operating conditions are not realistically feasible, especially for real-time, online stability assessments, due to computational limitations [14].
Although worst-case scenarios used for assessing system stability are usually well defined, in power systems dominated by CGTs, traditional approaches for defining these scenarios may no longer be valid. With high levels of CGTs, power system dynamics change in new ways, thus making the process of defining worst-case scenarios even more challenging. Additionally, the high uncertainty of CGTs may not only result in a shift of the critical operating conditions, but also in an increase of the number of risky conditions in which system stability may be threaten [15]. Consequently, currently widely accepted criteria for defining critical scenarios for stability assessments may fail to cover all critical operating points and contingencies that might result in power system instabilities [13].
Only few works have addressed the issue of selecting representative operating points and contingencies for frequency stability studies. In [13], a method is proposed for selecting operating conditions based on a Complete-Linkage Clustering that uses as input data key factors affecting frequency stability such as system inertia H sys , power imbalance P, ROCOF, and the frequency nadir f min . To estimate the frequency nadir for each operating point and contingency, historical data of real contingencies is used. The results show that the methodology allows identifying more OPs with poor dynamic performance than the ones that would be selected by only following the traditional worst-case approach. However, to estimate the maximum frequency deviation during contingencies, the approach requires having large historical data of real contingencies, which in many cases may not be available. In [16], the authors use a clustering technique to identify representative OPs for the design of UFLSS. The approach consists on performing TDSs in all possible system OPs considering single and multiple generator outages. From these simulations, the authors extract the system-frequency response for each OP, which is then used as input data for the clustering process. The results show that this methodology allows covering a wider range of system responses than the common practice of scenario selection. However, the methodology is designed to identify all possible system frequency responses, including non-critical conditions in terms of frequency stability. Thus, the methodology is not practical for frequency stability assessments of large power systems, since only risky scenarios are of interest in this case.
The use of Artificial Intelligence in power system stability analysis is an active research area that has received significant attention in the past few years. Beside the aforementioned works, AI have been successfully used for dynamic stability assessment such as frequency stability [17], [18], rotor angle stability [10], [12], [19], voltage stability [20]- [22] and fault location, detection and diagnosis [23], [24]. While in online applications the computational speed of the AI-based approach is a critical issue, for offline applications such as [10], [12], [22] the speed of simulation is not critical but desired as it increases productivity and minimizes costs [25].
From the aforementioned review, there are no proposals in the technical literature for identifying and selecting representative risky situations in which frequency stability of systems with CGTs may be threaten. To fill this gap, this article proposes a practical framework for identifying worst-case OPs to be considered in offline frequency stability studies in large power systems with CGTs. The proposed framework uses a novel filter specifically designed to provide a first identification of worst-case operating points and contingencies (OPC) in terms of frequency stability performance. The input data of the filter are relevant features that affect system frequency performance, which can be computed with relatively low computational efforts. Then, the worst-case OPs identified by the filter are broken down into groups by means of a clustering technique, and one OP of each group is selected as representative of the group. Different from traditional approaches, where the OP lying closer to the center of gravity of the group is selected as the representative one, in this paper the representative OP of each group is selected as the one with the highest ROCOF. Both the filter and the approach for selecting representative OPs are key features of this proposal that allow focusing on risky situations in terms of frequency stability, thus helping system operators to choose scenarios for frequency stability studies in an efficient manner. Therefore, the main contributions of this work are: • A practical framework for selecting worst-case representative OPs to be considered in offline frequency stability studies in real-world large-scale power systems with CGTs.
• A novel filter that initially identifies worst-case OPCs in terms of frequency stability performance.
• New quasi-steady state indices that can be used as a proxy to assess system frequency stability. VOLUME 8, 2020 • Identification of effective features that can be used to characterize system OPs under different contingencies from a frequency stability viewpoint. The remainder of this article is organized as follows. Section II presents an overview of the proposed approach. Section III reviews key features that influence the system frequency performance during contingencies, along with a methodology for their fast computation. Section IV presents the filter design. The results obtained using a detailed model of the Chilean Northern Interconnected System (NIS) in the year 2016 are discussed in Section V. Finally, Section VI summarizes the main conclusions and contributions of the paper.

II. PROBLEM STATEMENT AND METHODOLOGY
A practical methodology for selecting worst case OPs to be considered in offline frequency stability studies is presented here. An overview of the proposed methodology is presented in Fig. 1. Thus, let X = {x d , d = 1, . . . , |OP|} denote the set containing the system operating states for all possible OPs that the system may experience. This set can be obtained from traditional market simulation tools such as a Unit Commitment (UC) [26] or an Optimal Power Flow (OPF) [27]. Since performing TDSs for all possible OPCs is not possible in practice, the task is to select an appropriate subset X WC r ⊂ X that contains only worst-case OPs from a frequency stability viewpoint, i.e. according to the frequency nadir. This subset X WC r should be small enough in order to keep the problem tractable, but should cover all possible critical system conditions.
As previously discussed, the use of a traditional clusterbased approach to select the subset X WC r would be inefficient, since it would result in representative OPs for all possible system frequency responses, including non-critical conditions. Furthermore, in real-world power systems, the number of non-critical OPs is significantly larger than the number of critical ones. Consequently, using a cluster approach using all possible OPs would require a large number of clusters to cover all critical system conditions. To overcome this issue, before the clustering process a filter specifically designed to provide an initial identification of worst-case OPs in terms of frequency performance is proposed here. The filter uses a targeted percentage (P T ) of worst-case OPCs to be identified. For example, the filter can be set to identify the worst 20% worst-case OPCs (P T = 20%). After applying the filter, the OPs identified as critical are extracted, and then a traditional clustering technique is used to select representative OPs among the ones identified first by the filter as worst cases. To identify the ''a-priori'' worst-case conditions, the filter uses as input data key features that affect the frequency stability, thus allowing a better characterization of OPCs in terms of their dynamic performance. The values of these features can be obtained with low computational effort, as shown in Section III.
Mathematically, the input data of the filter is defined as the set X * = x * d , d = 1, . . . , |OPC| of relevant features for each OPC. The result of the filter is the set X WC = FILTER (X * , N t ) = x WC d , d = 1, . . . , N T that contains OPCs that have ''a-priori'' the worst frequency performance during contingencies, where N T = P T |OPC| is the number of worst-case OPCs and · is the ceiling function.

III. SELECTION OF RELEVANT FEATURES
In this section the relevant features chosen for characterizing OPCs in terms of their frequency stability performance are described, along with a strategy for their fast computation.

A. INERTIAL RESPONSE
The inertial response of a power system is the inherent physical response of SGs to sudden power imbalances between generation and load, at which the mechanical power of the prime movers does not immediately change due to the time delay of the speed governors. Due to their electromechanical coupling, the rotating masses inject or absorb kinetic energy into or from the grid during a few seconds to counteract the frequency deviation according to their inertia. This natural counter response from SGs is observed whenever there is a mismatch between generation and consumption. This slows down the system frequency dynamic response and thus it is easier to regulate.
To identify the features that influence the system inertial response, a single-machine equivalent representation can be used to model a system with N g SGs, where the frequency f (t) is considered a global system parameter, and thus all SGs can be aggregated into a unit represented by a single mass model. With this approach, the dynamics of the system frequency during the first seconds after a power imbalance can be described with the following equation (in per unit) [3]: where f n is the nominal system frequency,P G is the total generated power,P L is the total system demand and H sys represents the equivalent total inertia of the system (in seconds).
Assuming that the power system only has SGs and neglecting the inertia of the loads, H sys can be calculated as follows: where H i and S i are the inertia constant and the nominal capacity (in MVA) of generator i, respectively. In case of the sudden trip of a SG, the power mismatch is initially compensated by extracting kinetic energy from the remaining rotating machines. This action arrests the decline in system frequency, and hence prevents the immediate activation of automatic UFLSS. Based on (1) it can be concluded that, during the first few seconds, the frequency dynamic is mainly determined by the power imbalance P t + 0 =P G t + 0 −P L t + 0 , and the total system inertia H sys,t + 0 after the fault. Accordingly, these quantities are some of the most basic indicators that have been traditionally used for characterizing system inertial response [3]. The ROCOF of the system right after the contingency is also an indicator widely used: Other more sophisticated indicators such as estimated time-domain frequency nadir, restoration time [28], and combinations thereof have also been used [13], [29].

B. PRIMARY FREQUENCY RESPONSE
After the inertial response, the turbine-governor controls on SGs begin to act upon its valves or gates, in order to increase the output power of the turbines in proportion to the drop in frequency. SGs will thus increase their generation output until the generation-demand balance is restored and the system frequency is stabilized.
The effectiveness of the combined control actions of all SGs during this second stage is mainly determined by the delay in the response of the governors, as well as the ramp capacity, inertia, and headroom (power reserves) of each machine. Fig. 2 illustrates the typical dynamic response of a generator i, in terms of its injected power P gi (t), after the sudden outage of a generating unit. Observe in this figure that P gi increases at a maximum ramp rate that can be approximated by a constant value r i [30]. The slope r i and the time delay t i d depend on the generation technology (e.g. gas, coal, hydro). Fig. 2 reveals that two relevant factors that influence the system frequency response in the second stage are the total FIGURE 2. Dynamic response of a primary controller for synchronous generator [31], and adopted approximation.
available power reserves and the combined ramp rate of all SGs. While the power reserves can be easily obtained from generators dispatch and their technical data, the combined (transient) ramp rate capacity is more difficult to assess, since its depends on the load level of each SG, their control schemes, and the system operating conditions, among others. To characterize the system performance during the primary frequency response, four indices are proposed here: PFR, t NAD , c sys , and a Frequency Response Index FRI. The first three indices respectively represent the system primary frequency response as a whole, the time when the frequency nadir is reached, and the effective ramp rate of the system primary frequency response. The new index FRI captures most relevant variables affecting system frequency performance.
A graphical representation of the first three indices is shown in Fig. 3, where the y-axis on the left corresponds to the frequency and on the right the turbine output power in p.u. Further details of these indices as well as a practical algorithm for computing them are presented in the next section. Since the amount of non-synchronous instantaneous penetration level can also influence the system frequency response, as shown in [30], this characteristic is also included among the relevant features.  Table 1 presents the set of relevant features that are used in this work for characterizing the system frequency dynamic performance during contingencies.

C. PRACTICAL ALGORITHM TO COMPUTE RELEVANT FEATURES
The exact computation of the features indicated with an * in Table 1 requires to determine, first, the dynamic response of the power system from its differential-algebraic equation model, which is computationally costly in real, complex grids. To overcome this, a practical algorithm for computing these features is proposed next.
The system frequency dynamics after the outage of unit j can be described in a simplified way by: where H h,j sys is the system inertia after the outage of unit j, and P j is the power output of unit j before the contingency that results in a generation-load mismatch. To solve (4), the governor response P j i of generator i due to the loss of generator j can be approximated as follows: where t i d is the time delay in the governor's response, r i is the ramp rate, and R i is the reserve level of unit i. From (5), it can be observed that each SG changes its power output after a time delay t i d , with the machine increasing its power output at a constant slope r i until its reserve R i runs out.
To estimate the system primary frequency response, (4) and (5) can be solved using Euler's method until the frequency nadir is reached. This approach assumes that short-term dynamics are stable and therefore the differential equations are replaced by their equilibrium (algebraic) equations. Consequently, this simulation is referred to a quasi-steady-state, differing from dynamic simulations where the system dynamic response is obtained by solving the full set of differential-algebraic equations. As a result, for each operating hour h and contingency j, an estimation can be obtained of the frequency nadir f h,j NAD , and the time when the nadir is reached t h,j NAD . It follows then that the primary frequency response of the system PFR h,j , and the effective system ramp rate in terms of power reserves deployment c h,j sys can be determined as follows: Finally, the proposed composite index FRI represents the overall system frequency performance and is defined as follows: This index combines the effective governor ramp rate of the system, the system inertia after the contingency j, and the power imbalance. Thus, a large FRI indicates a robust system in terms of frequency stability. Note that, due to the simplifications adopted, the aforementioned quasi steadystate indices represent a proxy of the frequency performance for each operating condition and contingency. However, as shown in the next subsections, these indices, along with the steady-state ones presented in Table 1, provide useful information that can be used by the filter to identify ''a-priori'' worst-case conditions and thus to effectively select representative worst-case conditions.

A. METHODOLOGY
To design the filter, the IEEE 14-bus test system defined in [32] is used, with an added a 50 MW wind turbine at Bus 6 and a 150 MW PV plant at Bus 1. For this purpose, a detailed dynamic model of the system was implemented in PowerFactory [33], to simulate the dynamic behaviour of the system for multiple contingencies. The existing 60 Hz model was adapted to 50 Hz to match the operating frequency in Chile, for comparison and evaluation purposes, and proper dynamic power plant models were added for each generator, including Automatic Voltage Regulators (AVRs) and governors.
To simulate the yearly operation with hourly resolution, the methodology presented in [27] was used. Thus, following the procedure illustrated in Fig. 4   nadirs. Fig. 5 shows the cumulative distribution function of the frequency nadir for all 18677 OPCs.
In the next step, the values of each of the relevant features in Table 1 are computed for each OPC, which yields the set X * with the values of the relevant features for all 18677 OPCs. Then, a dimensionality reduction technique is used to identify the most effective features to characterize the OPCs in terms of frequency stability performance. This allows improving the effectiveness of the filter by eliminating redundant and highly correlated data, thus avoiding overfitting. After selecting the effective features, several models to characterize OPCs in terms of their frequency stability performance are considered, which are used to choose the a-priori worst-case OPCs for a given targeted percentage of worst-case OPCs to be identified. These steps are described in detail next.

B. IDENTIFICATION OF EFFECTIVE FEATURES
The objective of this step is to identify an effective subset of relevant features, among all candidates presented in Table 1, for characterizing OPCs in terms of their frequency performance during contingencies. The input data for this step are the sets X * and F previously described.
To identify effective features, two feature selection techniques were implemented: a Linear Logistic Regression (LLR) technique combined with a Recursive Feature Elimination (RFE) [34], and a Random Forest (RF) [34]. An LLR is a generalized linear classification model that estimates the probability that a certain sample (in this case an OPC) belongs to a certain binary class, given its features. To focus on worst-case OPCs, the binary class was defined based on a desired threshold for the frequency nadir f th NAD . Several tests were performed with different values of f th NAD , which were selected to identify different targets of worst-case OPCs, ranging from 5% up to 50%. Thereby, in each test, OPCs were labeled as follows: where f i NAD is the frequency nadir for the i-th OPC. In the LLR model, the probability that a given OPC i belongs to class 1 is the following: where β is a vector with the weights of each feature in the vector x i . If for a given i-th OPC, the probability is above 50%, i.e. p (y i = 1|x i ) > 0.5, then the corresponding OPC is labeled as 1, otherwise is given a 0 value. The RFE strategy to identify effective features using the proposed LLR model for each value of f th NAD works as follows: first an LLR model is built using all features. Then, the feature with the lowest weight is removed from the set, and a new LLR model is built with the remaining features. This procedure is performed until the accuracy reaches a minimum value, which was set to 95%.
The second implemented strategy was based on RF, which is an ensemble of Decision Trees DTs. A single DT classifier uses the input features to learn a series of questions to infer the class label of the samples (in this case, the true value of y i for each OPC). The DT algorithm starts at the tree root and splits the data on the feature that results in the largest information gain. This procedure is repeated iteratively at each child node until the leaves are pure (e.g. the final estimation is reached). Consequently, in a DT model, important features are likely to appear closer to the root of the tree, while unimportant features will often appear closer to the leaves. For an RF model, the importance of the features can be estimated by computing the average depth at which a feature appears across all trees in the forest [34]. The proposed RF model consists of 30 DTs, as is in the case of the LLT model, several models were trained for different values of f th NAD . For illustrative purposes, Table 2 presents the results obtained for the 14-bus test system from both models for identifying 50% of worst-case OPCs, i.e. 50% of all OPCs are labeled as 1. Observe from Fig. 5 that the threshold of VOLUME 8, 2020 the frequency nadir for this case is f th NAD = 49 Hz, which corresponds to the activation threshold of the UFLSSs in Chile. Note also that both models are consistent regarding effective features for characterizing OPCs. Indeed, all features identified as relevant by the LLR+RFE model have a relevance above 6.94% according to the RF model. In addition, both models achieved a high level of accuracy in labeling the OPCs. Thus, while the LLR+RFE model achieved an accuracy of 97.4%, the RF accuracy was 99.5%; similar results were obtained for other values of f th NAD . Based on the results in this table, the selected effective features were the ones considered as relevant by the LLR+RFE model, which are used from here on.

C. FILTER DESIGN
In this section, the details of the filter designed to identify worst-case OPCs in terms of their frequency performance are discussed. Thus, for designing the filter, the set X † that contains the values of the effective features for each OPC and the set F with the frequency nadir calculated based on dynamic simulations for each OPC are used. For generalization purposes, each feature is first normalized, so that each one of them adopts a value between 0 and 1.
The filter is an ensemble of two types of artificial intelligence-based classification models: an RF and an artificial neural network (ANN) [34]. Basically, for a specific percentage P T of worst-case OPCs to be identified, an RF model and an ANN model are trained to identify whether or not an OPC belongs to the P T percentage of worst-case OPCs from a frequency performance viewpoint. The result of each model is the probability of an OPC belonging to the P T percentage of worst-cases among all OPCs. Then, each OPC is ranked in descending order according to the averaged probability. Finally, worst case OPCs are identified as the ones with the highest probability of being among the P T percentage of worst-case OPCs, with N T = P T |OPC| . The filter architecture is illustrated in Fig. 6, and was developed in Python [34], to identify worst-case OPCs ranging from 5% up to 50%, in 5% steps. It is important to mention that, once the input data were obtained, it took around 3 minutes to train the AI-based models. For illustrative purposes, Fig. 7 shows the performance of the filter for identifying 20% of the worst-case OPCs for the 14-bus test system. The frequency nadir threshold in this case is f th NAD = 48.47 Hz, as it can be seen in Fig. 5, where the threshold f th NAD is depicted with a vertical line. Observe that the filter is able to identify most worst-case OPCs, with the OPCs wrongly classified having a frequency nadir is very close to the targeted frequency threshold.  Note that the filter designed to identify 20% of worst-case OPCs achieved the best performance in terms of accuracy. In the next section, the impact of using different filter targets in the final results, i.e. for selecting worst-case OPCs for frequency stability studies, is discussed for a real power grid.

V. RESULTS AND DISCUSSION
This section presents the results of applying the proposed methodology in Fig. 1 to a detailed model of the Chilean Northern Interconnected System (NIS). The simulations were performed in a computer with 1 Intel Core i7 4890HQ (4 cores) and 16 GB of RAM.

A. SYSTEM DESCRIPTION
The NIS is a medium-sized system located in the northern part of Chile. It has a thermal generation mix based on coal, natural gas and oil, with a total installed capacity is 5908 MW by the end of 2016. The RES penetration level of the system in that year was 21%. The NIS was modeled in PowerFactory, and has about 80 SGs, more than 1800 buses, 300 transmission lines and 260 loads [35]. The system's operating condition with hourly resolution (8760 hours) were obtained from [36], and correspond to the real system operation in the year 2016. As described in Section II, for offline frequency stability studies, any market simulation tool such as an UC or an OPF can be used to obtain realistic system operating conditions (for example, see the approach described in [8]). In this case, uncertainties introduced by CGTs, whose variability is not significant for the frequency stability studies presented here due to the relatively short timeframes of these phenomena (e.g. [37], [38]), can be modelled within these tools as well. For analyzing the system frequency stability, the outage of all generating units available at each hour were considered, computing the values of the selected effective features presented in Table 1 for a total of 47244 OPCs. To compute the features indicated with an * in Table 1, the algorithm presented in Section III-C was implemented in Python, and it took around 10 minutes to compute all values. Furthermore, to validate the proposed methodology and analyze its performance, we also computed TDSs for each OPC and contingency, which required about a month to be completed.

B. IDENTIFICATION OF WORST-CASE OPCS
Based on the system operating conditions for each hour of the year and the values of the selected effective features for each of the 47144 OPCs, a-priori worst-case OPCs were identified using the filter for different targets. Recall that worst-case OPCs are defined based on the frequency nadir, i.e. OPCs that lead to lower nadir values are considered more critical than OPCs that lead to higher values. It is worth mentioning that the filter requires around 5 seconds to identify a-priori worst-case OPCs. For illustrative purposes, Fig. 8 shows the results of using the filter to identify 20% of the worst-case OPCs. Notice that the filter only uses steady-state and quasisteady-state indices, while to evaluate its performance, the actual results of the frequency nadir for each OPC obtained from TDSs are used, which in practice is not available due to the large computational costs (1 month in this case). The threshold of the frequency nadir obtained from the TDSs for this case was 48.89 Hz, which is just below the threshold for the activation of UFLSSs in Chile. From Fig. 8 it can be seen that the filter was able to identify most worst-case OPCs and, most importantly, discard a large number of OPCs that can be classified as non-critical according to the targeted frequency threshold. Even though some cases of false identification occurred, the frequency nadir of these OPCs was very close to the targeted frequency threshold. Indeed, the OPC with the lowest frequency nadir that was wrongly classified by the filter had a frequency nadir of 48.69 Hz, which is 0.14 Hz higher than the frequency nadir of the most critical OPC (48.55 Hz) and only 0.2 Hz below the threshold. A summary of the results of applying the filter with other targets is presented in Table 4, where it can be seen that the best performance in terms of accuracy was obtained for a 20% target; a significant drop in accuracy can be observed when targeting 10% and 5% of worst-case OPCs. It is worth mentioning that the classification error obtained by the filter does not affect the overall performance of the proposed methodology, as will be discussed in detail in the next section. The main reason is because the frequency nadirs of misclassified OPCs are closed to the targeted frequency threshold, and therefore the filter is still able to identify most actual worst-case OPCs for each target. Note that even though the filter was designed using the small-scale IEEE 14-bus test system, the filter was still able to identify the a-priori worst OPCs with acceptable accuracy for a real power grid.
The good performance of the filter for a real-world power system demonstrates its suitability and scalability for other power systems.

C. SELECTION OF REPRESENTATIVE WORST-CASE OPS AND PERFORMANCE EVALUATION
The last step consists of selecting representative worst-case OPs, among the ones identified as critical. Notice that an OP is considered critical if, for any contingency, the frequency nadir of the corresponding OPC lies below the desired threshold. For selecting representative worst-case OPs, a Complete Linkage Clustering was implemented [39]. The input data is the system operating condition in all worst-case OPs, using the Euclidean distance as a proximity measure between two OPs [39]. For each group (cluster), the representative OP is the one with the highest ROCOF according to (3).
For illustrative purposes, Fig. 9 depicts the results obtained of selecting 30 representative worst-case OPs after implementing a filter with a 20% target. The figure shows the frequency nadir of all OPs, among all contingencies, ordered from the highest value to the lowest one. Encircled in red are the representative OPs selected using the proposed methodology and in black is the OP that is usually selected using a traditional worst-case approach. This figure demonstrates the effectiveness of the proposed techniques, since the selected OPs not only cover a wide range of critical conditions, but also the frequency nadir is below the desired threshold in all of them. These results demonstrate that the proposed methodology allows selecting in practice the worst-case representative OPs in an effective manner, and without the need of computing complex and time-consuming TDSs. In-depth frequency stability analysis can be then performed in the selected representative worstcase OPs, which allows to use available resources effectively by focusing on worst-case conditions from a frequency stability viewpoint.
To evaluate the influence of the number of representative OPs in the representation accuracy of all other worst-case OPs, the clustering process was run using different number of clusters and, for each case, the accuracy of the worst-case OPs being represented by their corresponding representative OP was quantified. To this end, for each number of clusters, the distance of all worst-case OPs to their corresponding representative OP was determined, and the maximum value was then used as a representation of the error of the clusters. For illustrative purposes, Fig. 10 shows the results of the root mean square error (RMSE) obtained for filter targets of 5%, 20% and 50% and different number of clusters, with continuous lines representing the RMSE value obtained considering only worst-case OPs identified by the filter (both correct and incorrect), and dashed lines depicting the RMSE value obtained for actual worst-case OPs. In the latter, for each worst-case OP left out by the filter, its distance to each cluster centroid (i.e. representative OP of the cluster) is computed, assigning it to the cluster with the minimum distance. From these presented results, it can be concluded that the classification error of the filter (see Table 4) has relatively low impact in the representation accuracy of actual worst-case OPs (difference between continuous and dashed lines). Moreover, the representation error of the actual 5% worst-cases is even lower than the representation error obtained for the 5% worst-case OPs identified by the filter (both correct and incorrect). The main reason for this is the strategy used for selecting the representative OP for each group, since instead of using the centroid (i.e. the OP lying closer to the center of gravity of the cluster), which is a traditional strategy (e.g. [40]), the OP with the highest ROCOF was selected according to (3). This means that, in each group, the OP with the a-priori worst performance in terms of frequency stability will be selected as representative OP of the group, which in turn results in a better representation of true worst-case OPs. Fig. 10 also provides an indication of the number of clusters needed to achieve a given level of accuracy in representing worst-case OPs. For example, around 15 clusters are needed to represent the 5% worst-case OPs, with an error in terms of RMSE of less than 0.05. Similarly, the results of the proposed methodology indicates that around 28 clusters are required to represent the 20% worst-case OPs with an RMSE less than 0.05. However, in this case, to achieve the same level of accuracy to represent the actual 20% worst-case OPs (dashed blue line), around 32 clusters are required. These results show how an appropriate number of representative worst-case OPs can be chosen to achieve a desired level of accuracy.
In order to analyze the impact of the classification error obtained by the filter, the clustering process was run assuming an ideal filter, i.e. a theoretical one that has no classification error. To this end, the input data used for the clustering were actual worst-case OPs for each target. The results obtained for this case are presented in Fig. 11 for filter targets of 5%, 20% and 50%, with continuous lines representing the RMSE values of actual worst-case OPs, obtained when the filter is used (i.e. with classification error), and dashed lines corresponding to the RMSE values obtained when an ideal filter with no classification error is used. From this figure it can be seen that the impact of the classification errors of the filter presented in Table 4 is negligible.
Thus, note that, even though the use of a perfect filter with no classification error leads to a better representation of actual worst-case OPs for the same number of clusters, as expected, it would require, in all cases, at most 2 additional representative OPs to counteract this negative impact and achieve the same level of representation accuracy that would have been obtained if a perfect filter were available. These results demonstrate the robustness of the proposed methodology for selecting representative worst-case OPs, in spite of the classification errors incurred by the filter.
Finally, a case study was conducted to evaluate the accuracy in representing the actual 1% worst-case OPs with different filter targets, ranging from 5% up to 50%, and grouping the identified worst-case OPs in 20 clusters. The results of this case study are presented in Fig. 12, which depicts the minimum, mean and maximum RMSE values obtained for each filter target, as well as the 25% and 75% quantiles for each case (blue boxes). These results demonstrate the high performance of the proposed methodology for representing the actual 1% worst-case OPs, independent from the filter target being used. The only difference among the filter targets that can be observed is regarding the mean RMSE value; thus, while the lower mean RMSE values are obtained with filter targets of 5%, 10% and 30%, the highest mean values are obtained with filter targets of 20%, 40% and 50%. These results further confirm that the misidentification rates of the filter have low impact in the representation accuracy of actual worst-case OPs. In fact, one can conclude that to achieve better accuracy representing the actual 1% worst-case OPs, the best strategy is to use the filter with lower targets (e.g. 5% and 10%), despite their higher misidentification rate compared to the ones obtained with using the filter with higher targets.

VI. CONCLUSION
Future power systems dominated by CGTs will be characterized by faster frequency dynamics and more frequent and larger frequency excursion due to power mismatches between supply and demand, like the ones that are being observed today. In this context, as the deployment of CGTs in power systems continuous to increase, their high levels of uncertainty and variability will not only results in a shift of critical OPs, but also in an increase of the number of risky conditions in which system stability may be threaten. As a consequence, current practices for selecting worst-case scenarios for frequency stability studies may no longer be valid. In this context, in this article, a practical framework for identifying worst case scenarios to be considered in detailed frequency stability studies in practical power systems with CGTs has been proposed. A key feature of the proposed approach is the use of a novel filter specifically designed to provide a first identification of worst case OPs in terms of frequency stability performance. This in turn would allow system operators to choose scenarios for frequency stability studies in an efficient manner and with low computational efforts.
The results obtained for a case study based on a detailed dynamic model of the NIS in Chile demonstrated the effectiveness of the proposed methodology. Thus, the proposed methodology allowed the identification of a wide range of worst-case OPs in terms of frequency stability performance, and demonstrated that critical OPs obtained using the traditional worst-case approach (minimum system inertia) do not necessarily lead to the most critical system conditions. The capability of successfully identifying all risky conditions that may threaten the frequency stability of power systems with high levels of CGTs, together with the low computational efforts required, make the proposed methodology a practical support tool for system operators to select critical scenarios for frequency stability studies. The proposed framework covers thereby an existing gap in the area of offline stability studies of real power systems.
In future work, the proposed methodology will be applied to the Chilean power grid as RES penetration levels increase, in order to evaluate its performance for frequency stability studies in power systems with high penetration of RES. A second future work is related to the implementation of the proposed methodology for its use in the real-time operation. In this case, several challenges appear such as computational speed, dealing with noisy and bad data and dealing with limited instrumentation available to monitor the current system operation, which is usually performed throughout Phasor Measurement Units (PMUs).