Coherency-Based Supervised Learning Approach for Determining Optimal Post-Disturbance System Separation Strategies

Controlled islanding is the last remedial action to prevent cascading outages or blackouts in power systems. Conventional methods presented for controlled islanding strategy determination, particularly those calculating load shedding values using optimization methods, are not fast enough in online applications for modern power systems. In this paper, a novel learning-based approach is introduced for online coherency-based controlled islanding in transmission systems. The proposed approach presents a prediction and optimization model, which is faster than conventional optimization-based models in two ways. Firstly, the proposed approach uses a classification model to predict the splitting scheme in a short time following the occurrence of a disturbance, and secondly in the proposed approach, a simpler optimization problem with fewer variables is solved to find the load shedding amount required in each area. In the proposed load shedding approach, some candidate system partitioning schemes are calculated beforehand and therefore, the load shedding optimization problem is simplified significantly compared to similar optimization-based approaches. Note that appropriate features, which are used in this paper as the input of the classifier, are acquired by processing post-disturbance phase angle variations, which are measured across the network. The proposed approach is simulated on the 16-machine, 68-bus system, and its accuracy and efficacy have been demonstrated.


I. INTRODUCTION
Controlled islanding is considered the last remedial action to prevent power systems moving toward an unstable operating condition following the occurrence of a disturbance in the system [1]. However, applying an appropriate online islanding scheme in transmission systems, which consider both dynamics and statics of the system, has always been a challenge. Fortunately, with the advent of new technologies like synchronized measurement technology (SMT), this task can be performed online using innovative techniques that using real time informative data obtained through SMT.
In general, two categories of methods have been proposed in the literature to solve the controlled islanding problem.
The associate editor coordinating the review of this manuscript and approving it for publication was Emilio Barocio.
The first category includes methods that use the Graph theory to split a large power system into islands [2], [3], [4], [5], while the methods in the second category use mixed integer programming to solve an optimization problem whose solution is an optimal islanding strategy [6], [7], [8], [9], [10]. There are also recent works like [11] that employs both methods to achieve a low computational approach for controlled islanding. As a preliminary requirement, in both categories, the use of slow-coherency or online coherency concepts for maintaining the dynamics of each island has also been addressed [12], [13], [14]. Traditionally, online controlled islanding methods, which are placed in the second category, encompass two stages. In the first stage, coherent generators are determined using online coherency evaluation techniques, and then an optimization problem is solved to find the best splitting strategy, so that the least amount of load shedding is required in the system. However, this kind of online splitting has two disadvantages; (i) it needs to observe a rotor angle or speed of generators for a time duration even greater than 15 seconds after the occurrence of a disturbance to find coherent generators, (ii) the optimization problem includes extra variables related to the connectivity of coherent generators and the disconnectivity of lines that makes the problem sophisticated and the process of finding a solution timeconsuming, particularly for large power systems.
To overcome the above shortcomings, this paper proposes the use of classifiers to predict splitting schemes fast using the first oscillation of voltage phase angles. Nowadays, deep learning and pattern recognition techniques are suggested to be used or are already implemented in various scientific fields. In power system studies, the techniques have increasingly attracted the attractions of researchers to be used in various applications [15], [16]. In [17], it has been stated that deep learning can be used to effectively solve numerous current power system problems, particularly those in which a huge volume of informative data is available. The history of using deep learning techniques, such as decision trees, neural networks, and Bayesian networks, in power system studies dates back to the 1990s [18]. During decades, the techniques have been proposed to tackle problems, such as instability prediction [19], and load and renewable energy forecasting [20]. A survey on the literature indicates that although various studies have been reported on the use of deep learning and pattern recognition for the controlled islanding strategy detection in distribution systems, the use of these concepts has been less examined in transmission systems than in distribution systems. Some examples are mentioned below. In [21], decision trees are used to predict controlled islanding. In their work, the authors assumed that islands and their borders were fixed and therefore, a decision tree (DT) was built for each island whose input was the Thevenin impedance observed from generators. In another work presented in [22], the authors proposed the use of three DTs to predict the probability of the need for controlled islanding and the islanding strategy, which appeared to be difficult to be implemented. In [23], the authors used the slow-coherency concept to find possible generators grouping schemes. Then, a DT was assigned to each of the splitting schemes whose output was whether this grouping would occur or not. This method is ineffective for large systems or for online coherency evaluation, since a large number of DTs will be required in this case. In a recent work, several DTs are proposed to be trained to enhance the evaluation of the power system dynamic security, which is needed for the proper selection of controlled islanding strategies [24]. Note that DTs are not the only classifiers used in this filed. For example, the authors in [25], [26], and [27] proposed machine-learning-based controlled islanding approaches, which used artificial neural networks and label propagation.
In this paper, a classifier is built using supervised learning techniques to be then employed for online controlled islanding. One advantage of this approach is that all data required in this approach are obtained only through processing voltage phase angle variations and thus extra measurements or processing for achieving data like rotor angle variations are not required in this approach. Another advantage is that the proposed approach makes the problem of online controlled islanding faster in two ways. Firstly, it uses the classifier, which can predict the splitting scheme in less than half a second, being extremely fast compared to the traditional approaches. Secondly, it uses a simpler optimization approach to find the least amount of load shedding, which can be solved very quickly due to the lower number of variables included in the problem. Figure 1 depicts a comparison between the traditional approaches and the proposed approach in terms of the time needed to reach a solution. The figure shows the post disturbance variations of generators' speeds. Please note that this is only an illustration used for comparison and therefore the time duration of each step is not exact. As the figure illustrates, conventional methods usually reach a solution using two steps c1 and c2 and within time duration of more than 10 seconds after the occurrence of a disturbance. However, the proposed approach is considerably fast and reaches the solution within a time duration less than two seconds. Section IV-E presents an example of the time duration of each of steps p1 to p2 and c1 to c2. It should be noted that the splitting strategy that is identified in step p2 as the output of the classifier determines solely the borders of areas and provides no information regarding the load shedding schemes in them. Hence, as Figure 1 shows, a third step, step p3, is needed. In step p3, a simpler load shedding optimization problem is solved to determine the amount of load shedding that will be required to be applied to each load bus.

II. ONLINE COHERENCY EVALUATION FOR SIMPLER CONTROLLED ISLANDING
Coherency evaluation is now an essential task for establishing a controlled islanding strategy. Traditionally, coherency evaluation was suggested to be conducted based on the slowcoherency concept. According to this concept, a splitting scheme will be obtained, which is fixed for all disturbances occurring in the system. However, with the advent of the synchronized measurement technology, coherency evaluation has moved toward a data-driven approach in which the data gathered all over the system are used to detect online coherent generators. In data driven approaches, it is assumed that the degree of coherency between generators is not fixed and changes for different disturbances. A relatively up-to-date review on coherency evaluation techniques can be found in [28].

A. COHERENCY EVALUATION IN THE COMPLEX DOMAIN
In this paper, coherency evaluation is suggested to be conducted in the complex domain. In this regard, areas will be determined based on the similarity between the modes exited in the system. For this purpose, first, it should be noted that to establish a faster and simpler way of reaching the VOLUME 11, 2023 controlled islanding solution, traditional controlled islanding approaches have been revised here. Accordingly, instead of adding constraints related to the islands' border identification to the main optimization problem (which adds numerious additional variables to the problem), several candidate splitting schemes will be determined in advance and then a simpler load shedding problem is solved to find the best splitting scheme. In the proposed approach, first, discrete Fourier transform (DFT) is applied to the voltage phase angle variations at all buses using (1).
where δ i is the phase angle of the voltage phasor at the ith bus and N is the number of samples in the time frame. Note that coherency is conventionally evaluated between generators. However, as suggested by the authors in [29] and [30], the assessment of coherency between the phase angles of voltage phasors is also a meaningful way to determine the borders of areas. According to (1), F i (f ) obtained for the i th bus is a vector of N complex elements. To assess the similarity of two vectors related to buses i and j, the correlation coefficient defined in (2) is calculated.
The value cc i,j of obtained from (2) is a complex value whose location in the complex plane will be somewhere in a circle with a radius equal to 1 and centered at the center of the complex plane. For two highly coherent buses, the value of cc i,j will be close to the point 1 + j0. To understand it better, consider the eight data points illustrated in the complex plane shown in Figure 2. The data point located at the point 1 + j0, i.e. C i,I , is the self-coefficient between bus i and itself. The rest of data points represent the locations of CCs between each of buses in the 8-bus system and the i th bus. Figure 2 depicts the locations of CCs obtained with respect to the i th bus. According to the figure, bus i is coherent with bus j and forms cluster C1, while two other groups of coherent buses (clusters C2 and C3) can be seen. Furthermore, one can observe that clusters C2 and C3 are more close to each other, so that they may form a larger group of buses; however, further analysis is needed to be assured of that.
Considering the above concept illustrated in Figure 2, it is suggested in this paper to find some candidate splitting schemes in advance according to the dynamic response of the system to the disturbance. Then, a simpler load shedding scheme will be solved to find the best scheme. To find the candidate schemes, firstly, area centers should be determined based on post disturbance variations of phase angles. This can be achieved by calculating a density value using (3).
In (3), r a is a positive constant used to represent the desired neighboring radius. In addition, N B is the number of buses and d i,j = 1 − cc i,j is the dissimilarity index. Looking at (3) one can conclude that the value of D i depends on the similarity of the i th bus with the rest of buses. In other words, for a bus that has post disturbance similarity with a high number of buses, 5896 VOLUME 11, 2023 the value of D i will be high. Hence, the first area cluster will be detected as the bus with the highest density value. In the next step, a subtractive procedure is carried out to find the rest of centers. For this reason, the formula defined in (4) is used to revise the density values in the h th iteration with regard to the density value of the area center determined in the (h-1) th iteration.
In (4), is the density value of the area center that have been determined in the (h-1) th iteration and is the dissimilarity between the i th bus and the previous area center. Moreover, r b is another positive constant used to define the neighborhood that has measurable reductions in density values. Note that the ratio of r b to r a must be greater than 1. The stopping criteria for the above subtractive procedure could be as in (5). According to (5), the procedure will be stopped in the h th iteration if the ratio of the density value of the area center that was found in the h th iteration and the density value of the center that was identified in the first iteration is lower than λ.
One note to be considered here is that in the coherencybased system splitting for controlled islanding, it is required to have at least one generation in each area. Otherwise, the islanding strategy would mathematically suggest to trip out all buses in the area lacking generation. To avoid these circumstances, choosing a proper value for λ is essential. [29] recommends that the value of λ should be set on 0.2.
After determining the N C number of area centers, it is now possible to illustrate N C illustrations similar to Figure 2. Indeed, the aim is to form an area and to examine the coherency among other buses in the rest of system observed from the area center. Therefore, it is also required to cluster the data points in each of N C illustrations. In the next step, a clustering technique is used to cluster the data points in each complex plane. Since a complex plane is a 2-D space and there is no prior assumption on the number of areas, density-based spatial clustering of applications with the noise (DBSCAN) algorithm is a good choice in this case [31]. DBSCAN has several advantages over other clustering techniques, such as fuzzy c-means (FCM) and k-means (KM) algorithms; (i) it does not need a prior assumption on the number of clusters. This is a key feature, since from a measurement viewpoint, groups of coherent buses are not fixed for different disturbances, (ii) it needs fewer parameters to be set, (iii) it does not use any random selection operation, and therefore its solution is deterministic. Note that when using DBSCAN, two parameters, i.e. a radius for determining the neighborhood (α shown in Figure 2), and the minimum number of data points required to exist in the α-neighborhood of a data point to form a cluster (minPt), are needed to be tuned.
After applying DBSCAN to N C complex plains, N C schemes for clustering the buses of the system will be obtained (Note that some of these schemes may be similar). Now, a load shedding problem is required to be solved to balance the load-generation in areas in each splitting scheme. Therefore, the best solution would be the one for which the least amount of load shedding has been obtained.

B. LOAD SHEDDING PROBLEM
The load shedding problem suggested for this step is extremely simple, since the borders of islands are determined in advance. In the proposed approach, the aim is to achieve a controlled islanding solution quickly with a desirable level of accuracy. Therefore, in this stage, the linearized model of AC non-linear load flow constraints will be considered to form a linear optimization problem. It worth noting that in the proposed approach, the dynamics of the system is considered prior to solving the optimization problem. Particularly, when encountering severe disturbances, we clearly know that under a severe disturbance, the resulted dynamics may propagate through weak connections or interties. Therefore, under a severe cascading outage and when other remedial actions fail to stop such propagation, it is required to disconnect such weak connections before the slow interactions become significant. To this end, most works, such as [6] and [8], adopted a linear model to ensure the disconnectivity of noncoherent generators. This model uses auxiliary variables and a fictitious DC power flow equation, which are entered into the main linear optimization problem as additional constraints and variables. However, in our proposed approach, as described in Section II-A, we introduced a novel technique to determine the border of islands. Thus, the constraints and variables related to the disconnectivity of borders and connectivity of coherent generators have been omitted, causing the main problem to be even simpler to be used in online applications.
Moreover, it should be noted that the use of original AC power flow might be time-consuming for online applications. Therefore, linearized models of AC power flow equations are used here to establish a trade-off between accuracy and computational efficiency. To be specific, the constraints of this optimization problem are the linearized version of the AC power flow in lines and power flow summation at buses. These linearized equations are obtained from the Taylor series expansion of the full AC power flow equations assuming usual operating conditions, i.e. V i = V j ≈ 1, δ ij ≈ 0. This is a highly constructive assumption, since in an area with coherent generators, the difference between post-disturbance phase angle variations of any pair of adjacent buses is approximately 0.
The purpose of this optimization problem is to find the minimum amount of load shedding required in each area while the load flow requirement for the area is met. Accordingly, the objective function would be as follows: where S is the set of buses in the s th area, N A is the number of areas and P sh L,i represents the amount of active load curtailed at the i th bus. AC active and reactive power flow equations are defined in (7) and (8).
In (7) and (8), is the voltage at the i th bus, P ij and Q ij are the active and reactive powers flowing from bus i to j, respectively, G ij and B ij are the conductance and susceptance of the line, respectively, δ ij is the phase angle difference between the two sides of the line. A linearized model of equations in (7) and (8), which is used to simplify the load shedding problem, can be obtained as in (9) and (10).
The linearized equations in (9) and (10) are then used to form the power balance equations at buses as follows: where Q sh L,i is the amount of reactive load curtailment, P G,i and Q G,i are the active and reactive generation before load are the amounts of increase and decrease in reactive power generations all at the i th bus. There are also constraints on the output of generators in each area that should be satisfied. The constraints are as follows: Constraints in (13) and (14) assure that the output of generators remain in the operational limits of generators, while constraints defined in (15) -(18) limit the amount of increasing and decreasing in output powers within a feasible limit. It is also required to maintain all voltages within secure limits.
Hence, the following constraint should be applied: In addition to the above operational and secure power flow constraints, it is also necessary to follow a rule stating that the amount of load shedding at a bus cannot exceed the original load amount. Accordingly, the amount of load shedding is limited to the following range: The formula presented in this section shows that no extra variables related to the connectivity of coherent generators and disconnectivity of border buses are needed here, since coherent generators and areas' borders have been determined beforehand.

III. PROPOSED APPROACH FOR FAST AND ONLINE SPLITTING STRATEGY DETERMINATION
In this paper, supervised learning is proposed to build decision trees as classifiers for the fast determination of the most suitable splitting scheme following disturbances. Note that the output of the DT determines only areas without providing any information about the load shedding scheme in each area. Therefore, knowing the splitting strategy, the optimization problem described in Section II-B will be then solved quickly. Figure 3 depicts the overall structure of the proposed approach.
To build a classifier, it is crucial to generate a training data set with a sufficient number of scenarios. By sufficiency, we mean that the training data set should include almost all situations as far as possible. In case of online coherency evaluation, there are several parameters that can affect the degree of coherency between generators and buses. These are disturbance type, disturbance locations, fault clearing time (in case of fault occurrence) and system loading level at the occurrence time of disturbance. Considering a probabilistic nature for these parameters, a set of N S scenarios can be defined, so that in each scenario, a specific disturbance is assumed to occur in a specific location at a specific load level with a specific clearing time (if it is a fault). Each scenario is then simulated in the time domain, and the best controlled islanding solution is determined. In the proposed approach, the splitting scheme pertaining to the best solution in each scenario is required to be stored as the target for building the classifier.

A. FEATURE GENERATION
The accuracy of the proposed approach relies on the high accuracy and reliability of the features used as the input of the classifier. To achieve this goal, it is assumed that the power system is equipped with the wide area measurement system (WAMS), so that voltage phasors at all buses are easily available with high accuracy. Then, a set of features, which are appropriate for building the classifier, can be extracted from the voltage phasor data measured by WAMS. By appropriateness, we mean that the features should be defined, so that they reflect the hidden patterns existing between the uncertain parameters of the power system and the controlled islanding strategies.
The first feature considered for this study is the magnitude of the voltage phasor of all buses at the moment just before the occurrence of a disturbance. This feature, which is called F 1 , is selected since the load shedding solution depends on the load level and load dispatch of the system. Voltage magnitudes at buses are a good reflection of the load level of the system.
To generate other features, mathematical morphology (MM) analysis is used in this paper. It is proposed in [32] to use MM for stability prediction, since it has low computational burden and is suitable to extract the useful information hidden in the shapes of variations. Indeed, MM is capable of extracting the edges in the variations of an original signal by converting the original signal into smooth and detail signals. Mathematically speaking, MM use dilation and erosion as in VOLUME 11, 2023 (22) and (23), respectively.
In the above formula, operators dilation and erosion are shown by symbols and ⊕, respectively. In fact, (22) and (23)  is called as the structuring element (SE) [33] whose shape can affect the modification of the original signal. Thus, it is essential to choose a proper shape for SE according to the desired application. Having calculated the erosion and dilation of a(n), the opening and closing of the discrete signal a(n) by b(m) are obtained using (24) and (25), respectively.
In (24) and (25), symbols • and • denote the opening and closing operations, respectively. Now, the smooth and detail signals, which are extracted from the original signal a(n), can be obtained by the following equations: a det ail = a − a smooth (27) Note that the smooth signal is similar to the original signal except that the sharp edges in the smooth signal have been smoothed. Accordingly, the non-zero values in the detail signal represent the sharpness of the edges of the original signal.
In this study, variations of phase angles of voltage phasors following a disturbance are measured at all buses. MM analysis is then applied to convert the phase angle variation signals of each individual bus into smooth and detail signals. Then, appropriate features are extracted from the converted signals. As stated earlier in this section, the first feature (F 1 ) is the magnitude of the voltage phasor of all buses at the moment before the occurrence of a disturbance. Clearly, disturbance characteristics, such as its type and location, can affect the shape of variations in the parameters of the system. By the term ''shape'', we mean the magnitude of the first peaks of variations, their sharpness and the time interval between consecutive peaks. Accordingly, in this paper, the magnitude of the first and second peaks of the detail signal are considered as two other features called F 2 and F 3 , respectively. In addition, the ratio of the first and second peaks in the detail signal is regarded as the fourth feature, i.e. set F 4 . The same ratio is also calculated between the first and second peaks of the smooth signal to form the fifth feature (F 5 ). Up to now, five features have been defined for the proposed learning approach. Two other features are also considered in this paper to examine the propagation of disturbances more effectively. The first one (F 6 ) is the ratio between the first peak of detail signals calculated at either sides of each transmission line, and the other one (F 7 ) is the ratio between the first peak of smooth signals calculated at either sides of each line. Therefore, for a system with N B buses and N L lines, there will be (5 × N B )+ (2 × N L ) features in total. Figure 4 illustrates a graphical view of information included in the training data set. As the figure shows, there are N B components in F 1 to F 5 , while the number of components of F 6 and F 7 is N L .

B. BUILDING THE CLASSIFIER
Neural networks, decision trees and Bayesian networks are among classifiers built using supervised learning. In this paper, it is proposed to use decision trees as the classifier, since it is simple to be built and is suitable for multi-class classifications. It should be noted that in this paper, the purpose of classification is to determine which splitting strategy is suitable if system islanding in the next seconds following the occurrence of a disturbance becomes essential. Hence, as described in Section III-A, the input data are extracted from the raw data (which are voltage phasors across the system) measured during a very short time duration following the disturbance. As figure 4 shows, two types of data are required for generating the classifier. Both types are obtained by processing voltage phasors measured at all buses of the system. In each scenario, a specific disturbance is simulated, and then the splitting strategy (target in Figure 4) is obtained using the method described in Section II and is stored. Moreover, the values of features F1 to F7 defined in Section III-A are also calculated and stored for each simulated scenario. After simulating a sufficient number of disturbances, the training data set is ready to be used to build the classifier.

IV. SIMULATION RESULTS
In this paper, the proposed approach is applied to the 68-bus, 16-machine system shown in Fig. 5. This system consists of two large areas called New England Test system (NETS) and New York Power System (NYPS). Note that generators 14 -16 are virtual large generators used to represent the dynamic characteristics of areas adjacent to NYPS. More details regarding this system can be found in [34]. It should also be noted that all simulations were performed in the MAT-LAB software. Dynamic and time domain simulations were conducted using the Power System Toolbox (PST) available in [35] A. PARAMETER SETTING Two sets of parameters should be tuned in this paper. The first set includes parameters related to the process of finding area centers, which are r a , r b and λ. It has been suggested in the literature to set r a on 0.5 and r b = 1.5 ×r a to avoid forming closely spaced areas [31]. Furthermore, as described in Section II-A, λ is set to be 0.2 to avoid forming small areas that may lack generation. The second set of parameters that should be tuned includes minP t and α, which are used in the 5900 VOLUME 11, 2023  DBSCAN algorithm to cluster the data points. Hence, minP t and α are set on 2 and 0.15, respectively. In other words, at least two buses at a Euclidean distance less than 0.15 can form an area.
Moreover, it is considered that the SE function used in the morphology analysis has a mathematical definition as in (28)

B. PERFORMANCE OF THE PROPOSED SPLITTING STRATEGY DETERMINATION FOR OFFLINE TRAINING
First, the performance of the proposed approach for detecting the best splitting strategy is checked in this section. In this section, after disconnecting the line between buses 8 and 9 (which is done to increase the electrical distance between NETS and NYPS), a 3-phase fault occurs on the line connecting buses 1 and 2 and near to bus 2 and then is cleared after 0.3 s. It is also assumed that the fault has been spontaneously cleared meaning that no line is tripped out. Fig. 6 presents the post disturbance speed variation of the generators. After applying DFT to extract frequency spectrums of all buses and by applying the method presented in Section II-A, buses 21, 64 and 52 are found as central buses. Therefore, data points (or in other words buses of the system) should be analyzed in three complex plains. Figure 7 shows these three complex plains. For example, the 68 data points in Figure 7(a) are the complex CC s between bus 21 and the rest of 67 buses obtained using (2). Thus, using the DBSCAN Note that the clusters identified in Figures 7(b) and 7(c) are the same. Figure 8 depicts the graphical view of a group of buses shown in figure 7, along with the location of area centers. Having obtained two different splitting strategies, now the optimization problem is solved for each of them to find the one needing the least load shedding. In this case, it is found that 11 MW should be curtailed for the splitting strategy shown in Figure 8(a), while the other strategy will be effectively applied without any need to apply load shedding. Thus, the splitting strategy shown in Figure 8(b) is selected as the best solution. Figure 9 depicts the speed variations of generators G1 -G9, which are on the right side of the test system. As the figure shows, the generators reveal approximately the same variations in response to the disturbance. There are also several local oscillations that are damped in the first two seconds following the occurrence of a disturbance. The modes are not observed as we have set α on 0.15. To include the modes and consider local areas, one could consider a smaller value for α.

C. GENERATING THE TRAINING DATA SET
As described in Section III, the training data set should cover a sufficient number of situations that could occur in the system. To achieve this goal, four parameters, includ-VOLUME 11, 2023  ing type of disturbance, disturbance location, fault clearing time and system load level have been assumed to have a probabilistic nature. Note that these uncertain parameters should be defined according to the history of the system, particularly for real cases. For the system used in this paper, these uncertain parameters have been defined below.
Various types of disturbances, such as sudden load change, large motor starting, and faults, can occur in the system. In this paper, large motor starting and sudden load change have been considered to be 10 % of the disturbances in the system, while the rest include symmetrical and unsymmetrical faults. For sudden load changes, the load on which the curtailment occurs is selected uniformly from the set of loads. Furthermore, it is clear that in power systems, single line faults own the most probability of occurrence, while three phase faults as the most severe faults have the less probability of occurrence in the system. Accordingly, the probability of occurrence of different faults was adopted from [36] and is shown in Table 1. The test system contains 66 lines and 20 transformers. It has been assumed that all symmetrical and unsymmetrical faults occur only on transmission lines either on the receiving or sending sides. Therefore, fault location is uniformly selected from 132 locations. Fault clearing time depends on factors, such as circuit breaker operating time and reliability of protection systems. Although the data can be obtained based on the history of a real system, it has been suggested in [37] to model the fault clearing time as a normal distribution with a mean value of 6 cycles and a standard deviation 5902 VOLUME 11, 2023 of 0.667 cycle, i.e. 2 cycles at 3σ . The fourth probabilistic parameter is the load level of the system. Although the load level of a power system varies daily, it also experiences different patterns in hot and cold seasons of the year. It is assumed in this paper that the loading factor of this system follows a normal distribution with nominal mean values as presented in [34] and standard deviation of 3.33 %, i.e. 10 % at 3σ . Moreover, all loads are considered to be uniformly dependent by scaling them with the same loading factor.
Considering the above four probabilistic parameters, 4000 scenarios are defined and simulated in the time domain. A time frame of 20 s is considered in each scenario to establish the coherency in the system. In each scenario, variations of voltage phasors of all buses as well as the final splitting scheme, which was obtained using the procedures defined in Section III and evaluated in Section IV-B, are stored. Having stored all the training data, the multi-class classifier can be built using deep learning tools. It should be noted that after examining all scenarios, it was found that the proposed islanding approach identified 27 different splitting schemes. Thus, the target of the multi-class classifier will be employed to predict the splitting scheme from a set containing these 27 classes.
Obviously, sampling of continuous signals with higher rates can be beneficial for signal analyzing applications. Since two signal analyzing techniques, i.e. DFT and MM, are used in this approach, it is assumed that voltage phasor samples are available with a rate of 120 samples per second (one sample per half cycle of the 60 Hz frequency), which is a high rate in the WAMS employed in 60 Hz power systems. This allow us to obtain a frequency spectrum with higher resolution and more informative detail signals.
To better understand that these features can reflect the hidden patterns in the training data set, five scenarios are selected and the features' values associated with the scenarios are illustrated in Figure 10. Note that the values in Figure 10   are normalized for better illustration. Table 2 presents the details of the scenarios. It should be noted that in scenarios 3 and 4, the same areas were identified by using the proposed algorithm. As Figure 10 shows, each of features 1, 2, 6 and 7 has approximately the same values for scenarios 3 and 4. The same is also true for features 3 and 5 for scenarios 2 and 3; however, the splitting strategies obtained for scenarios 2 and 3 are not the same. Now, a supervised learning should be applied to the training data set to extract the hidden patterns between the values of the features and the splitting strategies.

D. BUILDING THE CLASSIFIER
To build classifiers, various free and commercial tools exist. In this study, WEKA and IBM SPSS Modeler have been used to build DTs. These two software programs contain several learning techniques to train and test DTs. Therefore, a combination of techniques can be available to build DTs, and their performance can be compared in this way. Among the available techniques, C4.5 and its revised version obtained through bagging, which is called Random Forest, as well as C5.0 has been selected. Furthermore, boosting is applied to C5.0 to see how it can improve the accuracy of the model. Notably, in all simulations related to generating classifiers, a 10-fold cross validation approach has been used. Therefore, the whole training data set is randomly split into 10 subsets. Then, a classifier is built considering one of the subsets as the test data set and the rest as the training data set. Please note that this procedure is repeated for all 10 subsets, so that 10 classifiers are built, and the one with the highest accuracy is returned as the best one. Table 3 presents the details of the performance of each technique used to build DTs. It can be observed that C5.0 has revealed a better performance than C4.5. Additionally, bagging (i.e. using Random Forest instead of C4.5) and boosting can improve the performance of classifiers, demonstrating the efficacy of these methods in building ensemble classifiers to offer better accuracy.
Note that some studies have suggested that the synthetic minority over-sampling technique (SMOTE) should be used for the unbalanced data set to balance the number of members in different classes of a multi-class data set [38]. Using SMOTE, a number of synthetic scenarios will be generated using the features in the generating data set. Indeed, instead of generating raw data and then extracting features from them, these new scenarios are directly generated from the features using the nearest cases in each of minority classes. Figure 11 shows the distribution of each splitting strategy among the scenarios in the training data set before and after applying SMOTE. Note that some of highly similar scenarios in most classes have been omitted to generate a balanced data set with a lower volume of data. Hence, a new data set will be obtained, which can create classifiers with higher accuracy. This is confirmed by the results shown in Table 3. It can be observed that after applying SMOTE, the accuracy of all models has increased; this is significant for DTs obtained from C4.5 and Random Forest techniques. However, the highest accuracy pertains to the model obtained from the boosted C5.0 technique showing its high performance in building highly accurate models.

E. ADDITIONAL NOTES ON THE CAPABILITIES OF THE PROPOSED APPROACH
First, in this paper, the classifier is built for a specific topology of the system. In other words, changes in topology, for example intentional line outages for maintenance, were not considered. It should be considered that a change in the topology of the system might have a very slight effect on the post-disturbance dynamic response of the system. However, if the change is strategic like the outage of a tie-line, it is recommended that the model be rebuilt for the new topology.
The second note deals with the capabilities of the proposed approach for online applications. It should be noted that the proposed work is suitable for online controlled islanding with even a faster solution achieving process. Figure 1 presents an illustration of the time duration needed to find the solution by the proposed approach compared to traditional methods. As the figure illustrates, the online usage of the proposed approach consists of three steps as follows: Step p1 -Data acquisition Step p2 -Splitting strategy selection Step p3 -Finding the islanding solution (In this step, the load shedding scheme is determined when the borders are determined in Step p2.) Moreover, conventional approaches try to find their solution using a process including the two following steps (see also Figure 1): Step c1 -Online coherency evaluation Step c2 -Finding the islanding solution A comparison between what occurs in each of the above steps can demonstrate the superiority of the proposed approach over conventional methods. In Step p1, the required raw data, which are the phase angle variations in the selected buses, are stored following the occurrence of a disturbance. Note that as we use the first and second peaks of low frequency variations, the measurement time window used in our work covers the first 0.5 s following the occurrence of a disturbance. Other aspects like phasor calculations that are performed within measurement devices are extremely fast and will not affect this step time execution time. Splitting strategy selection, which is carried out in Step p2, is very fast. In this step, firstly, required features, which are the input of the classifier, are extracted and then are entered into the classifier to be compared to the thresholds. It was observed that the execution time of this step on the computer used for these work was approximately less than one tenth of a second. Finally, in Step p3, an optimization problem is solved to find the best load shedding strategy for each area. It is worth noting that the optimization problem defined in this step is a simpler version of what conventional optimization-based approaches have presented since the borders of areas have been detected in advance (in step p2) and therefore about half of variables that conventional approaches have to deal with in their optimization problems have been omitted. Therefore, Step p3 takes a shorter time to be taken than Step c2. Now, the steps of conventional approaches indicate that in Step c1, it is needed to measure the rotor angle signals of generators for a sufficient time frame (which can be 10 seconds or more) to find coherent generators. Then, in Step c2, an optimization problem should be solved, which, as mentioned, is more time-consuming than the optimization problem used in Step p3. Thus, the above comparison can show that our proposed approach is highly suitable for online applications when compared to conventional methods, since it is capable of reaching the final solution very fast, as demonstrated by the numerical results shown in Table 4. The example simulated in this table pertains to a 3-phase fault occurring on the line connecting buses 1 and 31 with a clearing time of 250 ms.
It must also be noted that by using either the proposed approach or those methods presented in [8] and [14], the same groups of coherent generators will be detected. Therefore, the dynamic response of generators represented by variations in their speeds are the same no matter which method is used. In fact, in this paper, the purpose of presenting the proposed approach is to show that by employing machine learning and a novel way of coherency evaluation in the complex domain, we can now reach an islanding strategy faster. Nowadays, power systems operate close to their stability limits for various reasons. Hence, it is crucial to reach a solution quickly when fast remedial actions are necessary. The proposed approach can successfully cope with this need, as demonstrated by the execution times presented in Table 4.
The need to respond quickly is more vital when a severe disturbance happens that causes one or more generators lose their synchronism and therefore the system moves toward instability. The example shown in Fig. 12 demonstrates this necessity. In this example a 3phase fault has been occurred on the line connecting buses 18 and 17 and is cleared after 0.25 seconds. Although the fault is cleared, the generators G2 and G3 starts to lose synchronism with the rest of the sys-  tem within 3 seconds after fault occurrence. Then, the same happens for generators G1, G4, G5, G6, G7, G8, G9, G10 and G11 until the whole system collapse. Therefore, it is not possible for the conventional methods to correctly identify a suitable islanding mechanism within this short time frame since they need a longer time to analyze coherency. However, using the proposed approach the system could identify a suitable islanding strategy quickly, and then operator can apply the islanding scheme and separate the area containing G2, G3, G4, G5, G6, G7 from the rest of grid quickly. Since the islanding strategy can be identify by our approach in less than 2 seconds, we have assumed that the islanding strategy is applied at 2 seconds after fault is cleared. As the rotor speed variations in Fig. 13 shows generators G2 -G7 have been separated from the rest of system and therefore the rest of system have settled at a new stable operation point after separation takes place. For the generators G2 -G7 that are still unstable remedial actions needed to be carried out in order to make them stable which is out of scope of this paper. VOLUME 11, 2023 In addition to the above comparison, the readers might expect another comparison this time with the existing machine learning-based approaches. However, please note that what we have proposed in this paper is a new version of previous optimization approaches in which a part of the process is replaced with classifiers while the rest, i.e. the process of finding borders and the optimization problem, are totally revisited. Accordingly, we maintain that the best comparison is the one that is performed with conventional optimization-based methods. Furthermore, if we look at relevant machine learning-based approaches, we can observe that they are completely different in terms of their concepts and purposes. Examples of such works are [21], [23], [39], and [40], in which the classifier is built to predict the need for controlled islanding using some criteria extracted from transient responses of the system. Therefore, a separate controlled islanding process is needed to be performed when the classifier detects a controlled islanding situation. On the contrary, our approach presented in this paper specifies a controlled islanding solution in advance and then it will be applied at the time the need for performing the controlled islanding strategy is detected. This is a considerable advantage making our proposed approach fast for online applications. Hence, we argue that such a comparison is difficult as it requires exact implementation or reproduction of other works, and it might not add a significant value.
The third note relates to the scalability of the proposed approach, particularly the data generation and training steps. First, the execution time of the offline training step is dependent on the scale of the system. This is because for a larger power system, more scenarios are needed to be simulated, while the simulation of each scenario takes more time to be conducted for a larger system. However, as mentioned, this process is performed offline and once a significant change in the system occur, the generated model is valid. In this paper, 4000 scenarios were simulated, which took approximately 78 hours to be performed on a PC with medium configurations (a PC with 3.4 GHz Core i7 CPU and 8 GB of RAM). The rest of this offline training process includes the feature extraction and classifier building steps, which took about 24 and 4 minutes, respectively. Here, we can present two suggestions for reducing the time needed for data generation. First, a stronger computer can reduce the time significantly, and second, the user can consider a shorter simulation time for the scenarios, since in most cases, the damping factor of low frequency modes are such that they are damped in less than 10 seconds. However, in a very conservative assumption, we consider a 20-second simulation time frame for scenarios in this study, which is to some extent more than what is needed.

V. CONCLUSION
In this paper, a supervised learning approach was proposed to build classifiers suitable for online controlled islanding applications. The output of the classifier is an optimal splitting strategy, which is obtained through a new concept of coherency evaluation in the complex domain. This new way of coherency evaluation allows for a faster and simpler solution to the optimization problem defined as the last stage in the process of finding the optimal controlled islanding strategy.
In this study, first, the performance of the proposed coherency evaluation in the complex domain, along with the successive optimization stage, was evaluated. Then, various techniques were employed to build the classifier, and their efficacy regarding the accuracy of the obtained classifier was evaluated. Simulation results indicated that the use of the boosted C5.0 technique led to a decision tree (as the classifier) that had the highest level of accuracy for this application. In other words, with the usage of the classifier obtained by the proposed approach, the prediction and selection of an appropriate controlled islanding strategy are viable with high accuracy. This model can become more accurate using an advanced method called SMOTE as demonstrated in the results.