A review on simulation models of cascading failures in power systems

Among various power system disturbances, cascading failures are considered the most serious and extreme threats to grid operations, potentially leading to significant stability issues or even widespread power blackouts. Simulating power systems’ behaviors during cascading failures is of great importance to comprehend how failures originate and propagate, as well as to develop effective preventive and mitigative control strategies. The intricate mechanism of cascading failures, characterized by multi-timescale dynamics, presents exceptional challenges for their simulations. This paper provides a comprehensive review of simulation models for cascading failures, providing a systematic categorization and a comparison of these models. The challenges and potential research directions for the future are also discussed.

A s ones of the most complex engineering systems in the world, power systems interconnect a variety of electrical components.Among various power system disturbances, cascading failures are considered the most serious and extreme threats to the stability and reliability of power systems, which are usually long sequences of dependent failures, progressively weakening the grid in a chain reaction-like manner.Although cascading failures do not occur frequently, their impacts can be severe and extensive once they occur, leading to widespread power outages or even a blackout of the entire power system.Historical cascading and blackout events [1−3] are such as the 1965 Northeast Blackout, the 1977 New York City Blackout, the 1996 Western North America Blackouts, the 2003 US-Canada Blackout, the 2006 European Blackout, the 2011 Southwest Blackout and the 2012 Indian Blackout.In the past half a century, cascading failures and blackout events occurred every few years worldwide.Simulation of cascading failures is crucial to their prevention and mitigation.The simulation models appropriately constructed for cascading failures can help comprehend their mechanisms, evolution patterns, propagation paths and potential control strategies.
There are several definitions of cascading failures.According to North American Electric Reliability Corporation (NERC), "cascading" refers to "the uncontrolled successive loss of system elements triggered by an incident at any location.Cascading results in widespread electric service interruption that cannot be restrained from sequentially spreading beyond an area predetermined by studies" [4] .Another definition provided by the IEEE Task Force on Understanding, Prediction, Mitigation and Restoration of Cascading Failures [5] is "a sequence of dependent failures of individual components that successively weakens the power system".Based on historical records of cascading failures [1][2][3] , various causes have been identified, including natural disasters, equipment failures and human factors.These factors could lead to an initial failure within a local region.If this initial failure is not cleared in time, it may propagate to a wide area and result in a cascading event, which involves multi-timescale dynamics such as overloading, power redispatch, voltage and frequency deviations, load shedding, frequency collapse and voltage collapse [6] .Due to the diverse causes and multi-timescale dynamics, modeling cascading failures has been a task of great challenge.Substantial efforts have been devoted to building cascading failure models.Nevertheless, most of the existing simulation models for cascading failures primarily concentrate on specific mechanisms rather than covering the entire evolution process [7] .This paper systematically reviews and compares existing simulation models, providing an overview of the existing approaches used for model benchmarking.Meanwhile, it highlights the difficulties in effective simulation of cascading failures, as well as the emerging challenges introduced by renewable energy sources.The potential directions to address these challenges are also discussed.
The rest of this paper is organized as follows.Section 1 introduces historical cascading and blackout events in recent decades.Section 2 presents the evolution of cascading failures.Section 3 compares simulation models of cascading failures, including physical models and probabilistic models.Other emerging models such as machine learning-based models are also introduced.Section 4 discusses challenges and potential directions for the future.Finally, Section 5 draws conclusions.

Historical cascading and blackout events
In this section, historical cascading and blackout events in recent decades are presented.Also, the progression of cascading failures based on a specific blackout event is analyzed.

Analysis of the "August 14, 2003" blackout in the United States and Canada
On August 14, 2003, a widespread blackout occurred, impacting wide areas of the Midwest and Northeast United States, as well as Ontario, Canada.During this blackout, around 50 million people suffered from power outages.In the United States, the estimated total economic loss ranged from $4 billion to $10 billion.In Canada, the gross domestic product in August decreased by 0.7% [10] .Since this cascade event is one of the most famous and well-studied blackouts in the past twenty years, this paper utilizes it as an example to illustrate the development process of cascading failures.
The initiation of this blackout can be divided into four phases [10] , as illustrated by a timeline in Figure 1 below.After the loss of Eastlake 5 at 13:31 Eastern Time, the transmission line loadings were higher but still within the normal range.However, the computer failures that happened right after resulted in unawareness of the serious situation in the control room, and the subsequent tripping of key FirstEnergy (FE) lines due to contacts with trees, trig-gered the blackout.Starting from 16:05, the cascade propagated extremely fast and cannot be stopped by any control actions, resulting in a large-scale blackout in just 7 minutes.Based on the analysis of this blackout, it can be concluded that the cascading failure includes two stages.The progression of component failures is slow in the early stage but accelerates significantly in the later stage.The detailed evolution of cascade failures will be analyzed and presented in the next section.

Evolution of cascading failures
Based on the investigation and analysis of those historical cascading failure events [8−13] , the causes, stages and sequences of cascading failures are presented in this section.

Causes of cascading failures
Cascading failures can be caused by various reasons, which are categorized into three types as depicted in Figure 2 and summarized below [6,[17][18][19][20] : (1) Natural disasters: strong winds (such as tornados and hur-  ricanes), earthquakes, thunderstorm and lightning and severe weather conditions.One example is the February 2021 cold weather-induced outages in Texas and the South Central United States [12] .
(2) Equipment failures: electrical equipment failures, computer failures and communication network failures.One example is the "August 14, 2003" blackout in the United States and Canada, which was caused by both electrical equipment failures due to tree contacts and computer program failures [10] .
(3) Human factors: operator errors due to inadequate situational awareness or insufficient training.One example is the November 2006 UCTE system outage which affected most European countries.One of the main causes of this event was that the operators failed to consider the correct values of the protection system when taking corrective actions [9] .
Preventing and mitigating cascading failures is challenging, especially with the first and third types of main causes, namely natural disasters and human factors.Prevention and mitigation efforts against these two types are primarily led by industry.Mitigating the cascading failures due to natural disasters involves enabling backup capabilities of critical infrastructure and even establishing a backup control room, while the cascading failures due to human factors can be prevented and mitigated by enforcing regular operator trainings and drills against emergencies and enhancing operators' decision support tools.In contrast, the cascading failures due to the second type, i.e. equipment failures, can be effectively reduced or prevented through comprehensive planning studies.Consequently, numerous existing studies are concentrated on the cascading failures caused mainly by the second type to understand the mechanisms in initiation and propagation of failures and thus develop effective prevention and mitigation strategies.
In addition to the main causes that contribute to initiating cascading failures mentioned above, there also exist some factors further accelerating the propagation of cascading failures [8][9][10][11][12][13] , which are such as reduced reliability margin and abnormal voltage or frequency conditions.The diversity and uncertainty of causes make modeling cascading failures challenging.Various methods have been developed to address these complexities.Some utilize stochastic methods, accounting for the diversity and uncertainty of causes by varying the failure probabilities of electrical components [21−31] .Others focus on understanding specific causes, such as hidden failure models [32][33][34][35] , which explore the impact of hidden failures on cascading failures.Additionally, high-level probabilistic models aim to approximate the average influence of causes on cascading failures [36][37][38][39][40][41] .

Stages of cascading failures
According to the analysis of historical cascading failures [8][9][10][11][12][13] , it can be observed that the evolution of cascading failures can be split into two stages: the slow stage and the fast stage.Characteristics of these two stages are summarized in Figure 3.
In the slow stage, the evolution of the cascading failure is relatively slow and the time interval between successive events ranges from tens of seconds to hours, and this stage usually lasts from several minutes to several hours.In this stage, initial outages caused by some common events, such as overload, short circuits and open circuits, are typically "N-1" or "N-2" contingencies.Generally, power systems should satisfy the "N-1 criterion" [42] , so they can operate reliably even if a key generator or transmission line is tripped.Some systems are even designed to operate following the loss of two important facilities (i.e., "N-2 criterion").Therefore, after the initial outage, the system is typically in a normal or alert state.However, the system may directly enter an emergency state under some extreme initial events, or may transit from the alert state to an emergency state due to a following contingency.Under the emergency state, operators still have time to take corrective actions to drive the system to a normal state.Dynamics such as the power redispatch, overloading, low voltage, and frequency deviation can happen in this stage and usually have a slight effect on the system stability.Nevertheless, if effective control actions are not taken by operators immediately or any additional faults occur before the system is restored to a reliable state, the system becomes over-stressed and the cascade evolves to the fast stage.
In the fast stage, the evolution of the cascading failure is very fast and mainly driven by the transient dynamics.It is almost impossible for operators to take remedial actions to stop the cascade because the time interval between successive events ranges only from milliseconds to tens of seconds [43] .In this stage, subsequent transmission lines will trip due to overloading, which may further lead to low voltage, low frequency, loss of synchronism, islanding, and even voltage or frequency collapse.Thus, the subsequent failures accelerate the evolution of the cascade and finally may result in a blackout.
As observed from the various stages of cascading failures, cascading failures involve multi-timescale dynamics, posing a significant challenge for modeling.Researchers have developed diverse models to address this challenge.Some models are constructed in the quasi-steady-state domain, efficiently capturing dynamics during the slow stage while ignoring fast dynamics.In contrast, some models are constructed using dynamic models to trace detailed dynamics in both stages, achieving greater accuracy but lower efficiency.

Sequences of cascading failures
To facilitate the ease of conducting statistical analysis on cascading failures, each cascade event is typically represented by a sequence [5] .For the simulation data, one cascade event is naturally grouped into an outage sequence.However, for utility cascade event data, some criteria must be applied to divide the cascade data into several independent cascade events, and further divide each cascade event into several sets of failed components in finite generations, forming an outage sequence.Given that system operators typically complete control actions within an hour, and automatic reclosure operations or fast transient dynamics are typically completed within a minute, Ref. [44] utilized these two time-scales as the criteria for data grouping.First, successive related outages with time differences exceeding one hour are divided into different cascade events.Second, successive outages with time differences of more than one minute are divided into different generations of the corresponding cascade event.As such, each cascade event is represented by an outage sequence with finite generations, and each generation has a set including one or more failed components.The cascade event will terminate if there are no further component failures.Figure 4 shows the sequences of multiple independent cascade events, where N is the number of independent cascade events, and is the set of failed components in generation i of cascade j.

Simulation models of cascading failures
As discussed in the previous section on the evolution of cascading failures, it can be observed that the causes of cascading failures are diverse and uncertain.Moreover, cascading failures include multitimescale dynamics.Therefore, it is challenging to model cascading failures.Many researchers have been working on developing models for cascading failures and diverse models were proposed.Generally, the existing cascading failure models can be categorized into physical models and probabilistic models [36,45,46] .The physical models are established based on detailed power system networks and physical constraints.Different from the traditional power system tools, which are typically designed for specific purposes.For instance, one may focus on steady-state analysis to determine the optimal power flow solution under a particular loading condition.Other tools might be designed for evaluating transient stability or small-signal stability in response to specific disturbances.In contrast, physical models for cascading failures simulate the progressive failures and subsequent impacts on the system in a time-sequential manner, which characterizes how cascading failures propagate, involving modeling various factors such as automation, communication, EMS (energy management systems), protection relays, control and remedial actions, and operation modes, which are unnecessarily power system components.The probabilistic models are constructed offline based on assumed probability distributions or large amounts of historical or simulated event data.

OPA model
The OPA (ORNL-PSerc-Alaska) model is a DC optimal power flow-based tool developed by Oak Ridge National Laboratory, the Power Systems Engineering Research Center at the University of Wisconsin, and the University of Alaska [21,22] .The flowchart of the OPA model is shown in Figure 5, representing a general process for simulating cascading failures.The flowcharts of all other physical models are similar to this one, with minor variations in algorithms.This model considers two time-scale dynamics: one is the slow dynamics related to load growth and power grid upgrade, and another one is the fast dynamics related to power redispatch due to line outages or overloading.The power redispatch is conducted by linear programming, which aims at minimizing load shedding.The OPA model was validated on the WECC (Western Electricity Coordinating Council) 1553-bus system [23] .The statistical blackout data obtained from this model achieved a reasonable agreement with WECC historical blackouts.However, the simulation results on the probability distributions of line outages and cascade sizes did not match well with the utility data.
To overcome the limitations mentioned above, an improved OPA model was proposed [24] .This model incorporates more practical and comprehensive factors of cascading failures, including dispatching, automation, communication, protective relays, and low-probability failures in control, operation modes, and planning.The distribution of the blackout size generated by this model closely aligns with historical data.However, this model still relies on a DC optimal power flow (OPF) method.While the DC-OPF method is efficient, it ignores the influences of voltage deviations and reactive power on cascading failures.
To address the limitations of the DC-OPF-based OPA model, an AC-OPA model was developed [25] .This model replaces both the DC power flow and DC OPF methods with AC-based algorithms, incorporating a voltage stability module that enables modeling cascading failures with considerations of reactive power and volt- age.However, it does not account for frequency influences and transient dynamics.
Later, an AC optimal power flow model considering frequency deviation (AC-OPFf) model was introduced [26] .This model incorporates a dynamic load flow framework, taking into consideration the static frequency characteristics of loads and generators within the AC-OPA model.By considering frequency, this model allows for the modeling of frequency-related remedial controls, such as under-frequency load shedding and under/over-frequency protection.However, this model is still a quasi-steady-state model, ignoring transient dynamics.
To capture more detailed transient dynamics in cascading failures, an enhanced OPA model was developed [27] .This model incorporates transient dynamics through time-domain simulations.While it can simulate more realistic sequences of cascading failures, it is much more time-consuming compared to the conventional OPA model.
The flowcharts for the OPA variants can be obtained by making several modifications to the flowchart of the original OPA model, as summarized below: (1) Improved OPA model: (a) Consider a failure probability for "Calculate OPF" in the fast dynamics loop to simulate communi-cation failures or EMS breakdown.(b) Add a "DC-OPF" block if overloaded lines exist in the slow dynamics loop, simulating influences of operation modes and planning.
(2) AC-OPA model: Replace all the DC power flow and DC-OPF with the AC power flow and AC-OPF, respectively.
(3) AC-OPFf model: (a) Replace all the AC power flow and AC-OPF with the dynamic load flow and AC-OPF considering frequency, respectively.(b) Add under frequency load shedding and generator frequency protections after the dynamic load flow calculation.
(4) Enhanced OPA model: Add a dynamic simulation model after each line outage in the fast dynamics loop.

Manchester model
The Manchester model was developed aiming at calculating a security index, specifically the expected cost due to unscheduled outages [28,29] .This model is based on the AC power flow method and considers various factors, including hidden failures, generator instability, and weather conditions.These factors are modeled with different failure probabilities of electrical components.The disadvantage of this model is that its accuracy relies heavily on the accuracy of failure probabilities, which are not easy to determine.

Hidden failure model
A hidden failure is undetected when a system operates under normal conditions, but it will be exposed due to some disturbances, leading to incorrect relay protection actions and unexpected line outages.Hidden failures are ones of the most important causes of cascading failures.To consider this, a hidden failure model based on the DC-OPF method was introduced [32−34] .In this model, hidden failures are modeled by allowing exposed lines to trip incorrectly with a specified probability.A simple network, as shown in Figure 6, is utilized to explain what exposed lines are.If line 3 is tripped due to disturbances, lines 1, 2 and 6 become exposed lines because they are connected to the tripped line.The probability of exposed lines tripping incorrectly is determined by the line flow limit.Typically, the probability remains quite low when the line flow is below the limit.However, it begins to increase linearly from the point where the line flow equals the limit, continuing until the line flow reaches 1.4 times the limit, at which point the probability reaches its maximum value.Beyond this threshold, as the line flow continues to increase, the probability remains constant [35] .Additionally, the model considers the influence of exposure times by adjusting the probability of an exposed line tripping incorrectly to zero after the exposed line has been tripped once due to hidden failure, which is reasonable because, in practice, the hidden failure of the exposed line is most likely to be fixed after the first exposure.Because the hidden failure model also relies on the DC-OPF method, it ignores the influences of voltage and reactive power on cascading failures.Furthermore, the probability function of hidden failures in this model is oversimplified.

COSMIC model
The COSMIC (cascading outage simulator with multiprocess integration capabilities) model was developed to simulate the detailed dynamics of cascading failures [47] .This model describes system dynamics using hybrid differential-algebraic equations (DAEs).These equations include traditional DAEs related to transient dynamics and discrete equations representing dynamics caused by protective relay actions.The model considers five protective schemes, including over-current relays, temperature relays, distance relays, under-voltage load shedding, and under-frequency load shedding.Furthermore, the model establishes various load models using a static "ZIPE" model [48] , which can represent constant impedance (Z) load, constant current (I) load, constant power (P) load, exponential (E) load, and any combination of them.The COSMIC model simulates cascading failures through timedomain simulation, which is to solve the hybrid DAEs using the trapezoidal rule.The simulation result from the COSMIC model aligns closely with that from the DC power flow-based model in the early stage of cascading failures, but significant differences between these two models occur in the later stage when the evolution of the cascading failure is mainly driven by rapid transient dynamics.Additionally, the COSMIC model's simulation results show that load models have a great influence on the cascade size.Therefore, the accuracy of load modeling is crucial for cascading failure simulation.Since the COSMIC model conducts timedomain simulation to obtain detailed dynamics, it is inefficient to generate large amounts of cascade data for statistical analysis.

Dynamic PRA model
The dynamic probabilistic risk analysis (PRA) model was proposed to cover both the slow and fast stages of cascading failures [30,43] .This model divides the evolution of cascading failures into two phases, including a slow cascade process mainly driven by thermal dynamics and a fast cascade process mainly driven by rapid transient dynamics.In this model, thermal dynamics, considering the influence of climatic conditions, are modeled differently for overhead lines, underground cables, and transformers.The Monte Carlo algorithm is utilized to identify dangerous scenarios that could lead to a blackout because it is not feasible to analytically determine the potential outage sequences for various initial causes of cascading failures.Simulation results demonstrated that this model can effectively identify dangerous scenarios, which might be overlooked when using models that do not account for thermal effects with respect to the temperature evolution.However, the computational burden is substantial due to the requirement for time-domain simulations to capture fast transient dynamics.An additional limitation of this model is that it is challenging to obtain precise failure probabilities influenced by thermal factors, even when actual data is available.

Multi-timescale quasi-dynamic model
Ref. [31] established a multi-timescale quasi-dynamic model that achieves a balance between efficiency and accuracy.This model divides the dynamics of the cascading process into three distinct timescales: short, medium, and long timescales.The short timescale includes emergency load shedding, as well as the tripping of transmission lines or generators due to faults or overloading.The medium timescale includes line tripping triggered by factors like tree contacts or overheating, generator outages due to overexcitation or under-excitation, and power flow redispatch.The long timescale accounts for load variations.The model simulates cascading failures through mid-term dynamic simulations conducted in segmented time intervals with an initial state determined by long-term dynamics.Short-term simulations are performed when specific events are triggered.By incorporating time-related information, the multi-timescale quasi-dynamic model achieves higher accuracy compared to quasi-steady-state cascading simulation models that overlook time effects.This model was applied to simulate the "August 14, 2003" blackout in the United States and Canada, closely aligning with the actual progression.Furthermore, it is employed in the risk assessment of cascading failures using a Markovian tree search approach [49] .However, a drawback of this model is the necessity to estimate the total cascading failure duration to divide the time into intervals, and load variations of each inside interval are ignored.

Software tools for cascading failures
Except for those academic physical models of cascading failures mentioned above, there are also several industrial software tools for cascading failures.
(1) DCAT The dynamic contingency analysis tool (DCAT) was developed by Pacific Northwest National Laboratory [50] .This tool is an industry-grade open platform for analyzing extreme events and potential cascading failures.It has three typical features [51] .First, it conducts hybrid simulations that integrate both steady-state and dynamic analysis.Second, it models protection schemes for generating units, transmission lines and loads in the dynamic simulation.Third, it models both manual and automatic corrective actions, such as generator tripping, load shedding and system reconfiguration, during the post-dynamic steady-state simulation using PSS/E.Because this model conducts dynamic simulation, it is computationally expensive [52] .
(2) ASSESS ASSESS is a commercial tool developed collaboratively by RTE in France and National Grid in the UK [53] .It provides a flexible and comprehensive platform for conducting simulations under uncertainty, while also facilitating the analysis of extensive datasets, which allows for the extraction of essential insights to ensure the secure operation of power systems.This is realized by integrating diverse functionalities, including the security-constrained OPF methodology [54] , the quasi-steady simulation tool "Astre" [55,56] , the time-domain simulation tool "Eurostag" [57] , statistical analysis tools and a variety of sampling functions.All these facilities are incorporated into a unified software environment.While ASSESS is designed for a wide range of studies, including long-term planning, operational planning and system security assessment, it can also be used to simulate cascading sequences and analyze the results using its built-in statistical analysis tools.Nevertheless, conducting cascading failure simulations through ASSESS requires careful configuration.Hence, professional training is essential for this specific context.
(3) CAT CAT (cascade analysis tool) is a commercial module within TRANSMISSION 2000 suite [58] , developed by Commonwealth Associates, Inc., USA.This tool conducts cascade analysis in a steady-state domain through AC power flow calculations, aiming at assessing the grid's vulnerability to cascading failures.It conducts simulations of cascade sequences by evaluating whether post-fault states fall within user-defined thresholds.These thresholds include the thermal overload threshold, low voltage threshold, and voltage deviation threshold.In each generation of cascade sequences, only one component is allowed for disconnection, specifically, either the most severely overloaded component or the load connected to the bus with the lowest voltage.If the power flow of the post-fault network diverges, the load connected to the bus with the lowest voltage will be tripped and the power flow of the updated network will be recalculated.If divergence still occurs, the simulation is terminated.Additionally, the simulation terminates if any of these conditions are met: there is no load connected to the identified lowest voltage bus; the subsequent load shedding exceeds the allowable maximum load shedding amount; or no violations are detected.Furthermore, a severity index for each cascade sequence is determined based on the cumulative load loss from contingencies that result in violations.Because this model relies on steady-state analysis, it does not consider the transient dynamics of cascading failures, and the assumption that only one component can be tripped in each generation may not always hold.
(4) TransCARE TransCARE (transmission contingency and reliability evaluation) is a commercial tool developed by EPRI [59] , which is an enhanced version of TRELSS (transmission reliability ealuation of large-scale systems) [60] .TransCARE employs a quasi-steady-state model to simulate the progression of cascading failures.Compared to traditional OPF-based cascading failure simulators, this software models the initial events of cascading failures from the protection system level, allowing the customization of various thresholds such as the overloaded percentage of transmission lines and the low voltage threshold for load shedding.Also, TransCARE considers the influence of power system relay protection locations on the evolution of cascading failures, allowing both automatic and user-defined breaker placements.Since this model is a quasisteady-state model, it overlooks the rapid transient dynamics of cascading failures.Another limitation is that this model is a deterministic simulation approach, which ignores uncertainties arising from inherent system failures that contribute to the complexity and non-uniqueness of cascading failure propagation.
(5) PCM and PCMTS The PCMs (potential cascading modes) is a commercial tool integrated within the POM (physical and operational margins) suite, developed by V&R Energy and EPRI [61] .The POM is a contingency analysis software, that employs the full Newton method to solve AC power flow equations [62] .Within the comprehensive framework of the POM, the PCM tool conducts steady-state analysis and simulates sequences of cascading failures.These sequences are composed of component failures caused by various factors, such as transmission line tripping due to overloading, load shedding triggered by low voltage, and generator tripping resulting from voltage violations.The PCM also includes evaluation and visualization functions that quantify and rank the impacts of cascading failures, providing insight into the evolution of cascading failures and suggesting available mitigation strategies.The PCM is based on steady-state analysis, which ignores the rapid transient dynamics of cascading failures.To address this, a complementary tool named PCMTS (potential cascading modes-transient stability) is incorporated into the POM environment [63] .PCMTS analyzes cascading failures through time-domain simulation while considering diverse types of relays including over-current, distance, under-voltage and under-frequency relays.While PCMTS enhances model accuracy thanks to the consideration of transient dynamics, it is computationally intensive.Additionally, a common limitation of both the PCM and PCMTS is that their deterministic modeling approach does not account for any uncertainties of cascading failures.

Probabilistic models
When cascading failures are simulated using physical models that reflect the actual topology of a power grid, the time performance of simulation highly relies on the complexity of the grid.In contrast, high-level probabilistic models can offer better efficiency of simulation as they do not require physical details of power grids.They can be constructed offline using a large amount of historical or simulated cascading failure data and then assist system operators in real-time decision making.

CASCADE model
The CASCADE model is a probabilistic model that only relies on the loading conditions of system components [45,46] .It starts with multiple identical components initialized under random loading conditions within a defined range.The cascade is triggered by introducing an initial load disturbance to all components.If a component's load exceeds the loading limit, it fails, leading to a fixed load increase for the remaining components.As more components fail, the stress on the remaining components increases, making further failures more likely.The CASCADE model is straightforward, and it offers an analytical probability distribution of cascade sizes, quantified by the number of failed components [64] .However, due to its lack of consideration for physical details, it can only provide general insights into cascading failures, such as the impact of loading conditions on cascade sizes.

Branching process model
The branching process, a stochastic approach used to model reproduction, has been applied to study cascading failures, leading to the branching process model [65] .In this model, the initial number of failed components in generation 0 is generated by a Poisson distribution.Subsequently, the number of failed components in all the following generations can also be generated using the same Poisson distribution.The cascade terminates when the number of failed components in the next generation reaches zero or when all the components have failed.Therefore, the branching process model can be viewed as simulating a cascading sequence by producing a series of random integers following the Poisson distribution.Parameters for the Poisson distribution are typically estimated using a maximum likelihood estimator based on historical data or simulation data from other cascade models [66−68] .The branching process model can effectively approximate the distribution of the blackout size, which is measured by the number of failed components [44, 69−71] .Due to its straightforward procedure, this model exhibits high efficiency.However, similar to the CASCADE model, the branching process model ignores the detailed dynamics of cascading failures due to the lack of specific physical information about the power networks.Consequently, it cannot provide information regarding the propagation path of cascading failures.

Interaction model
The interaction model is a data-driven probabilistic model [36−38] .This model can be derived from a large number of simulations or historical cascading data, making the assumption that component failures in the next generation are only determined by component failures in the current generation.
Supposing the total number of components is n, a matrix is constructed based on the cascade events data, where each element a ij represents the number of times that the failure of component i leads to the failure of component j.However, this simple assumption considers failures of all components as potential causes of failure for each component in the next generation.Consequently, this simplistic approach tends to overestimate the interactions between failed components.To address this issue, a corrective matrix, denoted as , is constructed based on

REVIEW
A review on simulation of cascading failures . The element of is described as follows: From Eq. ( 1), it is evident that the failure of component j in the next generation is not determined by the failures of all components in the current generation but rather by the failures of certain critical components. A' Based on , an interaction matrix is defined, and its element is formulated as follows: where N i is the total number of times that component i fails in all cascade events, and thus b ij indicates an empirical probability that the failure of component j is caused by the failure of component i.
The interaction matrix indicates interactions between component failures, and it can be visualized by an interaction graph, which is different from the actual topology of the grid.In this graph, components are depicted as vertices, and nonzero elements of are depicted as directed links, representing the causal relationships among component failures.For example, a link indicates that the failure of component i leads to the failure of component j.Each link has a weight obtained by calculating the expected number of times that component failure propagates through this link.This weight represents the contribution of the link to the propagation of cascading failures in the system.Moreover, a set of key links and key components that have the most significant impact on the propagation of cascading failures in the system are identified.Key links are defined as those with substantial weights, while key components are identified as vertices with significant out-strength, which is the sum of weights of all links emanating from a vertex.
Later, the single-layer interaction graph was extended to a multilayer interaction graph [39] .This multi-layer interaction graph model offers insights into the propagation of cascading failures from three distinct aspects: the number of line failures, the loadshedding amount, and the propagated electrical distance.Other data-driven probabilistic models similar to interaction and graphs are such as influence graph models [40,41] .
Many mitigation strategies for cascading failures have been proposed based on the interaction models [72−75] .The basic idea is to determine the vulnerable components and the most likely propagation path using the interaction models.Such insights enable operators to take targeted control actions, such as load shedding or islanding to mitigate the propagation of outages.Therefore, interaction models can provide effective ways to understand the propagation patterns of cascading failures from cascading failure data and predict the most likely propagation scenarios, which can guide operators in taking prompt and effective control actions to mitigate the propagation of cascading failures.However, a drawback of the interaction models is that they assume a one-to-one causal relationship between component failures, which may not accurately represent practical situations where the causal relationship can be one-to-many, many-to-one, or even many-to-many.

Comparison of physical and probabilistic models
The models introduced above are compared in Table 2.In summary, physical models are built based on electrical networks and power system laws, which can be used to simulate power system dynamics and the evolution of cascading failures.DC or AC power flow-based models, characterized as quasi-steady-state models, neglect transient dynamics, making them more efficient but less accurate compared to dynamic models.Within this category, DC power flow-based models are more efficient yet less accurate than AC power flow-based models.On the other hand, probabilistic models are constructed based on assumed probability distributions or data-based statistical analysis, which ignore detailed electrical networks and power system laws.Specifically, the CASCADE model and branching process model are constructed based on assumed probability distributions.While these models are more efficient than all the physical models, their application is mainly limited to approximating the distribution of cascading sizes.Interaction models are constructed based on statistics of historical or simulated cascading failure data.Their construction process can be time-consuming but once established off-line, they can be used on-line to efficiently predict the critical components of cascading failures and the most possible cascade path, offering valuable insights for the operator's online decision support.

Other models
In recent years, machine learning tools have been widely used in the study of cascading failures to reduce computational burdens.Ref. [76] employed the support vector machine (SVM) algorithm to predict whether the cascading failure will happen or not given an initial condition.Ref. [77] constructed an identification model by a deep learning framework to determine the vulnerable set of cascading failures.Ref. [78] proposed a risk assessment strategy of cascading failures based on deep reinforcement learning by expressing cascading failures as Markov decision processes.Ref. [79] proposed a cascading failure screening scheme based on a deep convolutional neural network and depth-first search.Ref. [80] utilized a graph convolutional network to identify critical cascading failures, employing a layerwise relevance propagation algorithm to uncover the reasons for predicted results.Ref. [81] developed a hybrid machine learning approach by integrating a random forest classifier and a regressor to analyze the vulnerability of cascading failures.Ref. [82] proposed an approach based on a deep convolutional generative adversarial network to determine the interactions between failed components in cascading failures.Ref. [83] developed a dual-path convolutional neural network classifier to identify the types of cascading failures in high-proportion renewable energy systems.Machine learning-based models are computationally efficient.However, most of these models mainly focus on classifying whether the cascading failure will happen or not, or assessing their impacts, rather than simulating cascading failure sequences.These models have three notable drawbacks: (1) they need a large amount of data, and the model accuracy highly depends on the quality of data; (2) hyperparameters tuning for neural network-based models during the training process can be time-consuming, especially for high-dimensional networks; (3) they do not provide detailed insights into the mechanisms of cascading failures.

Benchmarking of simulation models of cascading failures
As introduced above, various simulation models of cascading failures have been proposed.Benchmarking is crucial for verifying the effectiveness of these models.The benchmarking of simulation models for cascading failures is primarily conducted through three approaches, as summarized in Table 3 [84−87] .

Challenges and future research directions
This section explores the challenges of simulation models of cas- cading failures, while also presenting the potential directions to address these challenges.

Traditional challenges
One significant challenge of traditional challenges is to maintain a (2) Considering practical factors like dispatching, relay protection, and low-probability failures in control.
The model accuracy relies heavily on the accuracy of failure probabilities, which are not easy to determine.
Professional training is needed.

None
Having an analytical formula for the probability distribution of cascade sizes.
(1) Assuming all components to be identical.
(2) Assuming the failure of each component only depends on its loading level.Branching process model [65−67]  (2) Approximating the distribution of cascading sizes effectively.
Mainly focusing on predicting the final cascade sizes while ignoring the details in failure propagation.
Interaction model [36−41] (3.The one-to-one causal relation between component failures is not practical.

Approach Key features and limitations
Sensitivity analysis It can assess the impact of assumptions and associated parameters on the outcomes from a simulation model by adjusting one parameter at a time.If any assumption or parameter significantly influences the outcomes, the model may be deemed unreliable.It is important to note that this method detects sensitive assumptions or parameters but cannot reveal their ground truths and, as a result, may not accurately reproduce historical cascading events.

Comparing results to real data
It is employed to assess a simulation model's ability to replicate historical cascading events.However, this approach demands extensive real data and records on historical cascading events, which are often unavailable or challenging to acquire.

Cross-validation
It gauges differences by comparing the statistics of a simulation model's outcomes with those of other models, identifying parameters that significantly contribute to their differences or necessitate an in-depth study.However, the effectiveness of this approach, or in other words, the success of cross-validation is highly dependent on the models chosen.good balance between accuracy and efficiency.Most existing physical models primarily focus on capturing cascading failure dynamics within the quasi-steady-state domain while ignoring fast transient dynamics, which play a critical role in the evolution of cascading failures.Therefore, these quasi-steady-state models are efficient but lack accuracy.Although some models attempt to incorporate transient dynamics and relay protections through time-domain simulations, this approach is exceedingly time-consuming.A promising research direction involves developing hybrid models that achieve an optimal balance between accuracy and efficiency [88] .Furthermore, uncertainties in the evolution of cascading failures present an additional challenge in cascading failure modeling [89−91] .These uncertainties are induced by various factors, including the unpredictability of initial events, variations of system states, and potential hidden failures within the system.While deterministic models entirely ignore these stochastic factors, others that do consider them often concentrate on just a subset, lacking comprehensiveness.

Emerging challenges from the integration of renewable energy sources
With the increasing penetration of renewable energy sources, new challenges are emerging.One challenge arises from the stochastic nature of power systems, which is further amplified due to the significant influence of uncertain weather conditions on renewable energy sources, such as solar and wind power [92] .Consequently, future cascading failure models should not only address the stochastic aspects caused by conventional factors on, e.g., load changes and contingencies, but also account for the uncertainties introduced by the integration of renewable energy sources.
Another challenge stems from the lower inertia and the reduced reserve reactive power exhibited by renewable energy sources compared to traditional synchronous machines.This difference may result in distinct transient dynamics, such as larger frequency and voltage deviations [93,94] , potentially leading to varying sizes of cascading failures.Therefore, future modeling of cascading failures should account for the diverse responses to the same initial disturbance at different penetration levels of renewable energy sources.

Conclusions
This paper provides an overview of simulation models for cascading failures, categorizing them into physical models and probabilistic models.Other emerging models such as machine learning-based models are also introduced.Physical models allow for simulation of power system dynamics and responses and offer in-depth insights into the mechanisms of cascading failures but can be timeconsuming for large-scale power grids.Probabilistic models ignore detailed power system information but enable more efficient simulations when only high-level probabilistic analysis is required.The probabilistic models and machine learning-based models can be derived from large amounts of simulation data generated by physical models.Therefore, the physical models are of great importance for simulating cascading failure sequences and understanding the mechanisms of cascading failures.To verify the effectiveness of simulation models, benchmarking approaches are also discussed.
As the penetration of renewable energy sources increases, the system exhibits more uncertainties and distinct transient dynamics compared to the traditional power system.This presents new challenges in modeling cascading failures.Therefore, future research should focus on developing cascading failure models that balance efficiency and accuracy while also considering the influence of renewable energy sources.

1 )
Considering a variety of cascading failure factors by modeling components with different failure probabilities.
Considering exposed times of hidden failures.(1)Ignoring voltage and reactive power effects on cascading failures.

1 )
Modeling a wide range of protective schemes.(2) Considering different load models.Computationally expensive.

1 )
Constructing a two-level model by decomposing the cascade process into two phases.(2)Establishing different thermal failure models with consideration of climatic conditions.(3)Considering failure probabilities with respect to the temperature evolution.Computationally expensive.
Providing a severity index for each cascade sequence.(1) Each cascade generation only allows one component to be tripped.
PCM has evaluation and visualization functions.(2)PCMTS considers diverse types of relays.Ignoring uncertainties.

1 )
Simulating a cascading sequence efficiently by yielding a sequence of random integers following the Poisson distribution.

1 )
It is a high-level probabilistic model.(2)Enabling the prediction of the most possible cascade path and online decision-making support.