Power System Resilience: Current Practices, Challenges, and Future Directions

The frequency of extreme events (e.g., hurricanes, earthquakes, and floods) and man-made attacks (cyber and physical attacks) has increased dramatically in recent years. These events have severely impacted power systems ranging from long outage times to major equipment (e.g., substations, transmission lines, and power plants) destructions. This calls for developing control and operation methods and planning strategies to improve grid resilience against such events. The first step toward this goal is to develop resilience metrics and evaluation methods to compare planning and operation alternatives and to provide techno-economic justifications for resilience enhancement. Although several power system resilience definitions, metrics, and evaluation methods have been proposed in the literature, they have not been universally accepted or standardized. This paper provides a comprehensive and critical review of current practices of power system resilience metrics and evaluation methods and discusses future directions and recommendations to contribute to the development of universally accepted and standardized definitions, metrics, evaluation methods, and enhancement strategies. This paper thoroughly examines the consensus on the power system resilience concept provided by different organizations and scholars and existing and currently practiced resilience enhancement methods. Research gaps, associated challenges, and potential solutions to existing limitations are also provided.


I. INTRODUCTION
Power system resilience evaluation and enhancement methods have been gaining significant momentum.The term ''resilience'' in power systems has several attributes ranging from the ability of a power system to ''resist'' and ''recover'' from a disrupting event to the ability to proactively respond to potential disrupting events and newly emerging threats [1]- [7].Although several power system resilience definitions, metrics, evaluation methods, and enhancement strategies have been proposed, there have been no standardized definitions and metrics to measure the resilience of power systems and evaluate potential solution alternatives [1].Therefore, there is an urgent need for a critical review of current practices, challenges, and research gaps, and The associate editor coordinating the review of this manuscript and approving it for publication was Ravindra Singh.
for comprehensive, concrete, and constructive recommendations and suggestions to contribute to developing universally accepted definitions, metrics, and evaluation methods.
Several research papers have provided reviews for resilience definitions, metrics, and evaluation and enhancement methods [5]- [7].The work presented in [5] reviews engineering resilience definitions, differences between resilience and reliability, and adverse weather events and their impacts on power systems.Also, the paper provides discussions on developing resilience assessment methods and an enhancement framework.In [6], the authors provide a review for resilience-related definitions, taxonomy on known, unknown, and unknowable extreme events, impact of resilience on power systems, and resilience enhancement methods.The paper also discusses a resilience assessment framework and identifies and classifies effective strategies for resilience improvement according to four main criteria: preventive, corrective, restorative, and multifaceted.The work presented in [7] reviews the role of microgrid in power system resilience enhancement.The paper presents several resilience enhancement methods based on dynamic microgrid formation.
Although these review papers provide a good review for existing resilience definitions and metrics, they do not provide a critical and comprehensive review nor do they discuss challenges and potential future directions to contribute to developing standardized definitions and metrics.They mostly either tackle a specific type of systems, such as microgrids, or focus on listing definitions, metrics, and evaluation and enhancement methods and comparing them with power system reliability.Also, these papers do not discuss resilience evaluation criteria, optimization methods used for resilience evaluation and enhancement, solution algorithms, and modeling of system components and extreme events-weather and manmade-for both resilience evaluation and enhancement.Therefore, providing a review paper that tackles these gaps will help the power and energy society to develop universally accepted and standardized resilience definitions and metrics and establish a framework for resilience evaluation and enhancement.
This paper provides a critical and comprehensive review of current practices of power system resilience metrics, evaluation methods, and enhancement strategies.Also, it reviews optimization methods used for power system resilience evaluation and enhancement as well as the system, event, and failure modeling approaches.This paper is unique in the sense that it (1) tackles several aspects of power system resilience including transmission and distribution system (DS) aspects, operation and planning practices, and deterministic and probabilistic resilience evaluation and enhancement; (2) provides a critical review for power system resilience including definitions, metrics, evaluation methods, enhancement strategies, optimization methods used for resilience evaluation and enhancement methods, and system, event, and failure modeling approaches; and (3) discusses future directions and recommendations to develop resilience metrics, evaluation methods, and enhancement strategies.Therefore, this paper will contribute to the ongoing efforts of several entities to develop universally accepted and standardized power system resilience definitions, metrics, evaluation methods, and enhancement strategies.
The remainder of the paper is organized as follows.Section II discusses extreme events and their impacts on power system resilience.Sections III to IX provide a review to the state of the art, discuss current and future challenges to develop resilience metrics, evaluation methods, and enhancement strategies.The prior art in the area of power system resilience can be categorized as follows: definitions, enhancement strategies, evaluation methods, metrics and criteria, optimization methods for resilience enhancement and evaluation, and modeling of extreme events, failures, and system components.Section XI provides concluding remarks.Fig. 1 provides the framework of the paper.

II. IMPACTS OF EXTREME EVENTS ON POWER SYSTEM RESILIENCE
The frequency of extreme weather events (e.g., hurricanes, earthquakes, and floods) has been exponentially increasing.For example, the average number of disaster events in the United States from 2014 to 2018 is more than double the average number of disaster events from 1980 to 2018 [8].Fig. 2 shows the number of disaster events in the United States from 1980 to 2019 that exceed one billion dollars in losses.
Extreme weather events have led to large blackouts and major destructions of power grids resulting in economic losses and more importantly, long outage duration times.An ice storm in China in 2008 caused a power outage for 200 million people and the direct cost of the event was estimated to be more than 2.2 billion dollars [9]. The Great East Japan Earthquake (GEJE) and subsequent tsunami of March 2011 caused the loss of power supply of 8.5 million customers [10].The super-storm sandy of October 2012 caused over 8 million customers to lose power across 15 states in the United States [11].Hurricane Irene in 2011 caused the power outage for 6.5 million people [11].Hurricane Harvey of 2017 caused the power outage to more than 2 million customers [12].A fierce storm in Australia in 2016 caused the power outage to 1.7 million people [13].A windstorm in Canada in 2015 caused the power outage to more than 710 thousand customers [14].Cyclone Dagmar, a powerful European windstorm, caused the loss of power outage to 570 thousand customers [15].Tornado of Jiangsu Province of China in 2016 caused the power outage to 135 thousand households [16].
Blackouts due to cyber-attacks and technical issues have also been increasing [17].For example, the cyber-attack in the Ukrainian power grid in 2015 caused power outages to approximately 225,000 customers [18].Recent (June 16, 2019) blackout in South America caused power outages to more than 48 million customers [19].The most recent power outage (August 3, 2019) in the capital of Indonesia caused the power outage to more than 10 million customers [20].
To provide a complete and concise summary of large blackouts caused by extreme weather events, technical issues, and cyber-attacks, we have collected data from several research papers and technical reports [8]- [20], and expressed them in a graphical format based on their intensity and history.The summary of these data is given in Fig. 3.
From Fig. 3, it is clear that extreme weather events have significantly impacted the reliability and resilience of power supply which has short-and long-term negative societal and economic impacts.Therefore, power system resilience evaluation and enhancement have become more important than ever before.Developing strategies to enhance grid resilience and methods to measure the improvements and compare different alternatives have become important factors for future power system planning, operation, and control.

III. DEFINITIONS
The term ''resilience'' has originally appeared in psychology and ecology fields, which has been used to draw attention to trade-offs between conflicting objectives and attributes such as efficiency and persistence [21].Also, it has been used in psychology to describe the ability to recover from trauma [22].Recently, the term ''resilience'' has been presented in various fields such as interdependent infrastructures, national security, and power and energy systems.The intergovernmental panel on climate change has defined power system resilience in terms of anticipation, absorption, and quickly and efficiently recover after hazardous events [23].In [24], United States' presidential policy directive-21 has defined resilience in terms of prepare, adapt, withstand, and recover rapidly from disruptions.The disruption could be a natural threat or man-made misery such as cyber-attacks.Although definitions of resilience vary among various fields of study, this paper focuses only on the definitions and attributes that are related to power systems including both transmission and distribution levels.

A. EXISTING DEFINITIONS
Converging to a universally accepted definition for power system resilience has been a concern for the power and energy engineers.Several taskforces have been formed and several research teams from different research institutions have come together to develop a commonly agreed upon and universally accepted definition.Several definitions have been introduced which are discussed as follows.Electric Power Research Institute (EPRI) has defined power system resilience in terms of three elements: prevention, recovery, and survivability [25].The U.S. National Infrastructure Advisory Council (NIAC) has defined power system resilience as to prepare and plan, absorb, recover, and adapt to adverse events [26].North America Electric Reliability Corporation (NERC) has adopted the definition of NIAC in [27].United Kingdom Energy Research Center (UKERC) [28] has defined resilience as ''the capability of an energy system to tolerate disturbance and to continue to deliver affordable energy service to consumers.''According to UKERC, resilient energy systems should be able to recover quickly and provide fast alternatives to satisfy the energy service at the time of external calamities.The ASIS (initially it was American Society for Industrial Security, later it became ASIS International to include international countries) International has defined resilience as the capability of a power system to resist and timely recover to an acceptable level during extreme events [29].It has been defined by the United Nation-international strategy for disaster reduction (UNISDR) in [30] to measure the degree of system's ability to maintain its functionality and cope with hazards by organizing and learning from prior disasters.The U.S. National Association of Regulatory Utility Commissioners (NARUC) has described resilience in terms of robustness and recovery characteristics of the power system during and after disasters [31].NARUC has also provided a detailed review of resilience definition provided by different organizations in [32].
The resilience has been defined as to anticipate, absorb, and rapidly recover from low-frequency high impact events in [33].Power system resilience has been defined in terms of several properties of power systems such as resourcefulness, robustness, adaptability, and rapid recovery in [34].Robustness refers to the ability of power system to absorb a shock and continue to operate; resourcefulness is the ability of power system to skillfully manage a crisis as it onset; rapid recovery signifies the ability of power system to quickly restore service to normal state; and adaptability is defined as the ability to incorporate lessons learned from past events to improve resilience.In [35], resilience has been defined in terms of withstand, rapidly recover, and adapt to mitigate the impact of future similar disasters.In [4], resilience has been defined as the ability of the power system to withstand within an acceptable level and recover within acceptable time and cost.In, [36] resilience has been defined as ''ability to prepare, plan for, recover from, and adapt to adverse events.''In [37], resilience has been defined through three attributes: anticipate, perceive, and respond.Resilience has been also identified in four terms: anticipate, perceive, respond, and adapt in [38]- [40].More definitions for power system resilience can be found in [5], [6], [41]- [43].

B. DISTURBANCE AND SYSTEM RESPONSE CURVES
The concept of resilience through a disturbance and impact resilience evaluation (DIRE) curve has been provided in [38], [39] which is shown in Fig. 4. The DIRE curve illustrates the relative performance of a system to optimal and minimum performance level (resilient thresholds) that the system needs to maintain to be considered resilient.The DIRE curve has been suggested as the initiator to develop resilience metrics.Several common terms such as robustness, agility, adaptive capacity, adaptive insufficiency, resilience, and brittleness have been presented in the DIRE curve.The DIRE curve provides temporal demarcation as follows: t i is the disturbance starting instant; t Bi indicates the time at which the performance of the system falls below a minimum normalcy; t R is the instant at which the system reaches a minimum performance level; t Bf indicates the time at which the performance of the system achieves minimum normalcy again; and t f 1 indicates the time at which the restoration processes start.In these notations, i indicates the start of the event and f indicates the end of the event.Also, it is worth mentioning here that the restoration processes could take a long time (i.e., t f 2 t f 1 ).A conceptual resilience curve, which is similar to the DIRE curve, has been presented in [33] to define the resilience concept.In this curve, different resilience features are provided for various system states.Before an event occurs, the system remains at the resilient state; at this state, the system should be robust and strong enough to withstand initial disturbances.After the event progresses, the system enters to a post-event degradation state, where redundancy, resourcefulness, and the adaptive organization provide corrective operation state to adapt and deal with changing conditions.At the restoration state, the system should provide a fast response and recovery to normal state as quickly as possible.After the restoration state, the post-restoration process starts where full restoration of the system might not happen as different damaged infrastructure may take a longer time to recover to normal state.

C. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Although several studies have been conducted to provide a clear definition to power system resilience, it has not been standardized yet.Some attempts have provided insights  into the required features of power system resilience rather than providing a definition.Whereas some definitions relate resilience to system characteristics, others treat resilience as system performance, similar to power system reliability.Several studies have suggested that resilience should capture the dynamics of the system along with its performance.Furthermore, some of the terms used in the literature to define resilience are very vague, unclear, and sometimes they do not capture the essence of resilience.Therefore, further research is required to develop a universally accepted definition that should encompass all possible elements of power system resilience so that it can be eventually standardized.
Although converging to a universally accepted definition seems daunting, there is a general agreement about resilience attributes.From the above definitions, resilience can fall under the following attributes: absorptivity, adaptability, and recoverability.These attributes can help in forming a definition that not only be commonly accepted but also leads to developing standardized resilience metrics thereof.

IV. METRICS
Until now, there have been no standard resilience metrics, nor are there standard methods to evaluate them.Although several resilience metrics have been proposed, there is still a great discussion on how to establish a standardized set of resilience metrics.This section discusses the advances in resilience metrics in terms of their general attributes and features.

A. GENERAL ATTRIBUTES OF POWER SYSTEM RESILIENCE METRICS
Generally, resilience metrics can be categorized into attributes-and performance-based metrics [1], [44].Attributes-based metrics provide answers to what makes the system more or less resilient than system current status.For example, several system attributes such as robustness, adaptability, resourcefulness, and recoverability are measured using the attribute-based metrics.On the other hand, performance-based metrics answer how resilient the system is.Performance-based metrics are used to interpret quantitative data that describe infrastructure outputs, specify disturbances, and formulate metrics of infrastructure resilience.
Several recommendations to develop resilience metrics have been proposed [1], [33], [44].These recommendations can be generally summarized as follows.They should • Capture only the high impact low probability (HILP) events and their consequences (i.e., loss of load, revenue, cost of recovery, number of people without electric power, power outage to critical load, and interruption of business due to power loss); • Be performance-based rather than system attributes; • Reflect true intrinsic uncertainties.These uncertainties drive response and planning activities such as disruptive conditions, damages from the affected population, and response time.
• Be simple, enable retrospective and forward-looking analysis, and highly consistent; • Capture spatiotemporal correlation of the disasters on the power system resilience; and • Provide both global and component-specific resilience of power systems.

B. METRICS BASED ON RESILIENCE FEATURES
Several resilience metrics have been proposed in the literature based on power system resilience features and attributes (resourcefulness, rapid recovery, robustness, and adaptability) [26].Five resilience metrics have been proposed in [45]: (i) load shedding investment costs (for resourcefulness); (ii) restoration saving costs (for rapid recovery); (iii) algebraic connectivity (for robustness); (iv) betweenness centrality (for robustness); and (v) adaptability percentage (for adaptability).Different weights have been assigned to each parameter to adjust the overall resilience metrics.In [46], three metrics have been proposed to capture various features of resilience: (i) flexibility metrics (for resourcefulness)-the ratio of amount of load served following each recovery iteration through topology control, to the total system demand; (ii) outage cost recovery metrics-the amount of total customer interruption costs regained after each corrective action; and (iii) outage recovery capacity metrics-the percentage of recovered load in each recovery step from total lost demand due to disruption.
In [47], the authors have proposed three metrics to quantify the resilience which are Resistancy, Recovery, and Resilience metrics.The resistancy metric is defined as the ratio of the summation of active powers supplied to the non-interrupted costumers to the total power demand of the system loads, considering the load priority factor.The recovery metric is the ratio of the expected energy supplied to recovered loads to the sum of the total energy required to the interrupted load during the same period.The resilience metric is defined as the ratio of the expected energy supplied to the loads during the study period (loads recovered because of the formation of microgrids (MGs) and loads connected to non-faulted feeders) to the total energy required to system loads.A resilience metric based on the speed of the system response, efficiency of the recovery, and economy of the recovery has been proposed in [48] to quantify system resilience after extreme events.In [49], [50], a resilience metric has been defined as the ratio of recovered loads to the actual loads of AC/DC sides of microgrids.This metric ensures the survivability of at least the most critical loads [49], [50].Also, it is measured on a scale from zero to one where zero represents the lowest resilient level and one represents the highest level [49], [50].
In [51], the authors have proposed a resilience metric based on social welfare between the power grid and water systems where the resilience metric was defined as the summation of the robustness of the system, recoverability of the system in a predetermined time, and rapidness of the system recovery.
A conceptual resilience curve has been developed in [33] to define and quantify power system resilience.It shows the level of resilience as a time-dependent function with respect to disaster event as shown in Fig. 5.A set of metrics have been proposed in [52], [53] based on the resilience curve.These metrics are abbreviated as FLEP which stands for: how fast (F) and how low (L) resilience drop in phase I (disturbance progress); how extensive (E) the post-disturbance degraded state is in phase II (post-disturbance degradation); and how promptly (P) the network recovers in phase III (restorative).Also, this resilience curve has been used in [54] to develop a resilience metric that considers the critical load supply at restorative and post-restorative state which is evaluated as follows.
where F(t) denotes the function of system performance; t r represents time at which restoration phase starts; and T 0 represents the duration of restoration and post-restoration phase.The system performance function has been defined as the total power that is supplied to critical loads based on their priority.Similar metrics have been proposed in [55]- [57].
In [58]- [60], the resilience of the power system has been defined as the ratio of the area under the target performance to the actual performance curve.The target performance curve is usually modeled as constant while the real performance curve could vary with time under system restoration efforts and major disaster events.A resilience metric based on maximum reduction in system performance and loss incurred has been proposed in [61], [62], which is expressed as follows, where L Mm is the measure of the maximum reduction in system performance and L max is the loss incurred by the operator when all loads and distributed generators (DGs) are disconnected.
A resilience metric based on event duration and profile has been proposed in [63].This metric is expressed as follows.
where F is the failure profile; R is a recovery profile; T i is time to incident; T f is failure duration; T r is recovery duration.A resilience metric based on Cobb-Douglas Production Function (anticipate, adapt, perceive, and respond) has been proposed in [38], which can be expressed as follows.

C. CODE-BASED METRICS
In order to capture both the magnitude and duration of an outage, a code-based metric has been proposed in [64].At first, unscaled resilience is calculated using ( 5) and ( 6) then a scaling process is applied to transform metric values on a scale from one to nine as shown in Table 1.where m is the unscaled resilience value; c is a binary variable which stores the status of the event occurrence in considerable time frame; α is the duration of outage in seconds; and f is the fraction of unaffected load (based on voltage or current distortion) due to power disturbance events.In code-based metric, it has been assumed that the repair time can be between 10 0 to 10 6 seconds.

D. RELIABILITY-BASED METRICS
Reliability-based metrics have been proposed in [65]- [68] to quantify the resilience of power systems.In [65], a time series analysis-based system resilience approach has been developed to provide a relationship between loss of load frequency (LOLF), energy not supplied (ENS), loss of load expectation (LOLE), capacity margin, and the frequency and intensity of storms.Loss of load after occurrence of disastrous events has been used in [66] to evaluate power system resilience.
In [67], four metrics have been proposed to measure the impacts of extreme events on MGs: (i) a metric (metric-K) that is used to measure the expected number of lines outages due to destructive events; (ii) LOLP is used to measure the loss of load due to extreme events; (iii) expected demand not supplied (EDNS) is used to measure the expected demand not supplied due to extreme events; and (iv) a metric (metric-G) that is used to measure the difficulty level of grid recovery.
In addition, a resilience metric based on the availability of system components has been proposed in [68].This metric captures both time and performance-related properties of the system before and after disasters using steady-state availability and event time.The proposed metrics in [68] are evaluated by multiplying the ratio of availability and the natural logarithm of recovery time before and after external shock.

E. OTHER RESILIENCE METRICS
In [69], resilience metrics have been defined as the reciprocal of average comprehensive load loss considering critical loads.
A resilience metric based on the average of total energy curtailment in multi-microgrid systems during a disturbance event has been proposed in [70].In [71], a resilience metric has been developed to measure the functional service loss during an extreme event.A resilience metric for power distribution systems using graph theory and Choquet integral has been proposed in [72].This metric is based on seven factors: overlapping branches, redundancy of paths, repeated sources, operations of switches, penalty factor and probability of availability, and dominance of aggregated central point.
A metric to evaluate resilience against earthquakes based on the ratio between discharged energy of battery energy storage system (BESS) during the emergency time and the demanded energy by critical loads have been proposed in [73].

F. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Although numerous metrics have been documented to evaluate the resilience of power systems, they are yet to be universally accepted and cannot comprehensively capture the essence of the power system resilience [45], [64].These metrics (i) often undervalue the impacts of high impact events and focus on normal operating scenarios (these metrics cannot completely address the outage caused by cyber-attacks and extreme natural disasters); (ii) use flat rate price scheme for lost load-however, the outage caused by extreme event can compound the price of lost load when they last for long durations; (iii) are mostly based on probability of system failure and thus cannot assess system robustness against disruptions; and (iv) are defined on fixed steady-state probability analysis that usually involves approximation of system state before and after contingencies.
Several perspectives need to be considered to comprehensively capture the essence of the power system resilience: system performance and system attributes; deterministic and probabilistic features; and quantitative and qualitative analysis.Moreover, these metrics must: (i) be able to address the impacts caused by HILP events such as cyber-attacks and extreme weather events; (ii) compound the lost load price according to their duration of outage; (iii) be able to consider the system robustness; and (iv) be able to capture the dynamics of system recoveries from disruption along with the steady-state probability analysis.

V. RESILIENCE EVALUATION CRITERIA
Identifying resilience evaluation criterion is the first step toward developing resilience metrics.For example, if a resilience metric is defined as given in (1), the system performance function, F(t), represents the criterion.Several resilience evaluation criteria have been proposed in the literature, and some of them have been used to evaluate power system resilience.Interruption of service, duration of outages, cost of recovery, and cost for prevention have been used in [42] to evaluate power system resilience.Also, since universally accepted resilience metrics are not available yet, resilience has been evaluated in terms of several deterministic and probabilistic criteria: served/unserved energy, load curtailment and restoration, and outage duration.
Several criteria to develop resilience metrics have been suggested in [1] to capture consequences of extreme events from different perspectives based on (i) electric servicetotal number of customer hours of outage, total number of customer energy demand not served, average number of customers experiencing outage during a given time period; (ii) critical electric service-cumulative outage hours of critical customer, energy demand not served to critical customers, average number of critical loads that experience an outage; (iii) restoration time, recovery time, and cost of recovery; (iv) monetary-loss of revenue, cost of grid damage, avoided outage cost, loss of perishables and assets, cost of interruption of business, impact on total municipal product or total regional product; and (v) community function-critical service without power, without power for more than certain hours, and key facilities without power such as military facilities.Existing resilience evaluation criteria are explained as follows.

A. LOAD CURTAILMENT MINIMIZATION (LOAD SHEDDING MINIMIZATION)
The minimization of load shedding/curtailment or cost of loss of load has been considered as a resilience evaluation criterion in the deterministic approaches.In these approaches, critical load curtailments are considered to degrade system resilience more than noncritical load curtailments.In other words, a system is considered more resilient if no or a very small amount of critical loads is curtailed due to disaster.Although minimization of critical load curtailment can be a priority, in some cases, power supply to parts or all of critical loads is not possible due to system constraints (e.g., damage of lines).In these cases, extra available power is supplied to noncritical loads.Load shedding or cost of load shedding minimization has been considered as a resilience evaluation criterion in several studies [41], [49], [61], [62], [66], [70], [71], [74]- [97].Also, the minimization of load shedding or cost of load shedding combined with service restoration time has been considered in [45], [51].

B. RATE OF RECOVERY
Rate of recovery has been commonly used as a resilience evaluation criterion-service restoration for critical loads has a higher priority over non-critical loads [47], [56], [58], [65], [98]- [102].In [47], [98]- [100], the authors have considered the maximization of critical load restoration after the disaster as an evaluation criterion.Maximization of load restoration and minimization of restoration time have been considered as evaluation criteria in [56].Minimization of the recovery time has been considered as a resilience evaluation criteria in [58], [65].Moreover, reinforcement of the physical energy infrastructure and reduction of recovery time have been used as resilience evaluation criteria in [101].

C. SERVED ENERGY
Both deterministic (minimization of unserved energy and maximization of the weighted sum of restored loads over time) and probabilistic approaches (minimization of the weighted sum of expected energy not supplied) have been used as resilience criteria.Maximization of energy supplied to critical loads has been used in [54], [56] as a resilience evaluation criterion.Also, the minimization of expected energy not supplied has been considered as resilience evaluation criteria in [103].Minimization of unserved energy [104], minimization of the weighted sum of curtailed loads [105], and the maximization of the weighted sum of restored loads over time [106], [107] have also been assumed as resilience evaluation criteria.

D. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Although several studies have been conducted to capture various aspects of resilience evaluation criteria, these criteria can not measure the dynamics of the system response.For example, the robustness of a power system to large disturbances should be considered as a resilience criterion.Also, resilience evaluation criteria should not be confused with resilience metrics.

VI. RESILIENCE ENHANCEMENT METHODS
Resilience enhancement strategies for both electric power distribution and transmission systems are gaining significant momentum [108]- [114].Although resilience enhancement at the distribution level has gained significant interest [7], [56], [75], several resilience enhancement methods have been proposed for the transmission level [45], [115] and for the interdependent systems such as power and natural gas systems [78], [90], [116].
Resilience enhancement strategies can be generally classified into planning-based and operation-based methods.Planning-based methods focus on establishing grid expansion plans to harden transmission and distribution systems against future extreme events, whereas operational-based methods develop optimization-based strategies to utilize available assets against failures and extreme weather events [45], [111], [117].A highlight of the most well-known strategies has been presented in [52].Fig. 6 provides a summary of resilience enhancement strategies.

A. OPERATION-BASED RESILIENCE ENHANCEMENT METHODS
Operational resilience provides immediate solutions to reduce the impact of adverse events on the power grid [52].Operation-based strategies for distribution systems can be generally classified into network reconfiguration-based methods; microgrid islanding formation; utilization of mobile emergency resources and energy storage units; and load restoration-based approaches.
Microgrid-based pre-disturbance scheduling to improve the resilience of the distribution grids has been presented in [89].Determining feasible islands of hybrid microgrids for resilience enhancement has also been studied in [49], [67].In [53], the authors have used a defensive islanding strategy to prevent cascading events that can be triggered due to lines damaged by extreme weather events.A demand response program to reduce load curtailments during emergency periods in microgrids has been developed in [50].In [70], the unused capacity of available resources in extreme events have been employed to enhance microgrid resilience.A detailed review of microgrids to enhance the resilience of power supply has been presented in [120].
In [75], the authors have provided an optimal operation of mobile energy resources during normal and emergency situations.Transportable energy storage, generation rescheduling, and network reconfiguration are integrated to enhance the resilience of electric power distribution systems [97].
• Load restoration-based approaches have also been studied and presented [45], [51], [64], [79], [98], [114], [121].Some studies have focused on dispatching of repair crew to improve restoration of grid elements [106], [117].The effect of DG to speed up the restoration processes during and after extreme events has been presented in [78].A three-step look-ahead load restoration strategy using synchronized DG through restoring critical loads after a major natural disaster has been developed in [57], [99].In [104], the authors have developed a distribution service restoration model to generate optimal switching sequence based on remotely and manually controlled switches and dispatchable DGs in extreme events.In [82], a situational awareness-based integrated resilience response framework (e.g., predicted power outages), preventive response (e.g., security-constrained optimal power flow), and emergency response (e.g., topology switching and load shedding) have been developed to enhance power grid resilience.Mobile dc de-icing devices (MDIDs) scheduling and routing to improve the resilience of electric power transmission systems have been presented in [88].Transmission system reconfiguration has been comprehensively studied in [117].

C. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
A summary of research gaps and challenges to resilience enhancement strategies and potential solutions are provided as follows.
• The proposed resilience enhancement approaches have not given a serious emphasis on proactive operation strategy.Although there are several challenges related to the involvement of various operational components, the implementation of proactive operation strategies will help minimize the damage and maximize the load served during disasters.
• Cost-benefit analysis has not received considerable attention.The trade-off between the cost and uninterruptible power supply during extreme events makes it very challenging for cost justification.Serious research efforts are needed to study the cost-benefit analysis when proposing enhancement techniques.
• Power systems are interdependent with other critical infrastructures such as water supply, communication, and gas supply system.However, due to the complexity of the interdependent systems, enhancement strategies for interdependent systems are still in its early stages.
• Isolated areas (without any power source) have been ignored while modeling and proposing enhancement techniques disregarding the criticality of their loads.It may not be feasible to provide power to all loads of these areas because of limited availability of resources (mobile energy resources, connecting lines), however, some of the very critical loads should be supplied through mobile energy resources.
• Cost-benefit analysis should be carried out while proposing enhancement planning strategies.

VII. EVALUATION METHODS
Developing mathematically accurate and computationally efficient resilience evaluation methods is a key factor for building resilient power systems.In the literature, several approaches have been proposed to evaluate power system resilience including sequential and non-sequential (state sampling) Monte Carlo simulation-based methods, preselected scenario-based methods, contingency-based methods, and machine learning-based methods.The existing approaches to evaluate power system resilience are described as follows.

A. SEQUENTIAL MONTE CARLO SIMULATIONS
The Sequential Monte Carlo (SMC) simulation-based methods have been used in the literature to assess the impacts of failure events on both transmission and distribution systems.
They have been used to generate outage scenarios based on failure probabilities of system components (e.g., transmission lines and towers) under extreme weather events.Extreme weather events can cause outages across large regions, depending on the size and type of the power system and the intensity of extreme weather events, and may divide the system into several islanded regions.
In [121], all transmission lines have been assumed in a single corridor and failure probabilities of each line under different wind speed and intensity have been used to generate outage scenarios.In [81], it has been assumed that failure of a tower or line creates a transmission corridor where the fragility curve is used to generate outage scenarios.In the work presented in [35], [41], [113], [117], the test system has been divided into different regions with different event intensities and failure probabilities.The SMC simulation is used to generate outage scenarios for each region to determine the spatiotemporal impacts of extreme events on power systems.Also, in [58], a cell-partition method has been used to divide a power system into several regions, and then SMC simulations are used to generate outage scenarios based on the intensity of weather events and failure probabilities of system components.

B. NON-SEQUENTIAL MONTE CARLO SIMULATIONS
The Non-sequential Monte Carlo (NSMC) can be used to evaluate the spatial impacts of extreme events on power systems.NSMC simulations have been used both independently and integrated with other methods (e.g., Markov chain and Kantrovich distance-based scenario reduction) to evaluate the impacts of failure events on power system resilience.Both historical data-based and weather intensity-based failure probabilities of system components have been used to sample outage scenarios.In [60], the NSMC simulation has been used to evaluate the expected percentage of customers with a power outage at different areas under hurricanes.Sampled scenarios in [60] have considered two types of probabilities: (i) tree wind-throw-based outage probabilities of overhead conductors, and (ii) probabilities of the number of customers that are out of service due to failure of local distribution circuits.The NSMC has been used in [69] to sample outage scenarios using weather-based failure probabilities of system components.In [107], the expected amount of survived loads under extreme events have been calculated using the NSMC simulation through repeatedly generating damage scenarios.On the other hand, in [67], power system resilience has been evaluated in two steps: (step I) state transition of a power system under extreme events is determined based on the Markov chain, and (step II) resilience indices are calculated when the network topology is changed after extreme events.In [86], the NSMC simulation has been integrated with Kantrovich distance-based scenario reduction method to evaluate the power system resilience.First, a large number of scenarios are generated using the NSMC simulation to capture the complete spectrum of all possible scenarios.Second, the Kantrovich distance-based scenario reduction method is used to find the optimum number of scenarios.In [103], the availability of each component under an extreme event has been determined first based on the intensity of the event, and then, the NSMC simulation is used to determine the reachability between nodes.

C. CONTINGENCY-BASED METHODS
Power system resilience has been evaluated for specific types of contingencies.These contingencies can be classified into the following groups: • Vulnerability-based contingencies: Vulnerabilitybased contingencies have been used in a significant number of studies.Typically, vulnerabilities of system components depend on both intensities and directions of weather events.In [77], [78], [83], a selected number of contingencies have been assumed using weather intensity-based vulnerabilities of power system components whereas [56], [84], [85], [92] have selected the number of contingencies based on considering vulnerabilities of system components positioned in the direction of extreme weather events.On the other hand, system vulnerability depends on the failure of system components.In other words, a power system may become strongly vulnerable due to failure of specific components, while the system may become less vulnerable to the failure of other components.In [73], power system resilience has been evaluated for different system vulnerabilities.
• Failure probability-based contingencies: Selected number of contingencies have been assumed using failure probabilities of system components in [65], [90], [93].Failure probabilities of system components have been determined either arbitrarily or using a fragility curve.
• Microgrid formation probability-based contingencies: Selected number of contingencies have been assumed using extreme weather intensity-based probabilities of forming microgrids to evaluate resilience in [106].
• Cascading failure-based contingency: A contingency considering cascading failure of the entire system has been used in [45] to determine the restoration strategies after disasters.

D. MACHINE LEARNING-BASED METHOD
A predictive statistical machine-learning algorithm has been developed in [122] to evaluate the resilience of power systems in terms of the number of outages, outage durations, and the number of interrupted customers.The required data set to train and validate the network is developed based on characteristics of hurricanes, the climate of service areas, and network topologies.The data set is divided into two groups to train and validate the network: 50% of the total samples are used to train the network and the remaining 50% are used to validate the trained network.A validation technique (fivefold cross-validation) is used to find the optimal number of samples.

E. BAYESIAN NETWORK-BASED METHOD
A dynamic Bayesian network-based method has been proposed in [68] to evaluate the resilience of power grids.The structural and maintenance resources have been considered as the main elements of resilience in [68] and failure probabilities are evaluated for both with and without external shocks.
Power system states (success and failure) are represented by the sates of nodes.The failure and repair rates of each system component are modeled using a dynamic Bayesian network.

F. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Failure of selected components during extreme events have been assumed in a significant number of papers without providing proper justification for the components selection.
Also, the selection of specific components may fail to capture the effects of a large number of components.Thus, the accuracy of the results will be reduced.Also, Monte Carlo simulations have been performed in several studies without defining proper stopping criteria (stopping criteria are needed to achieve accurate results).Moreover, Monte Carlo simulations may not capture rare and extreme events unless these events are exposed (e.g., importance sampling technique).Generally, modeling of the exact number, types, and locations of damaged system components in evaluation methods is extremely demanding in terms of computation.Therefore, mathematically accurate and computationally efficient methods need to be developed to evaluate the actual impacts of failure events on power system resilience.

VIII. OPTIMIZATION METHODS FOR RESILIENCE EVALUATION AND ENHANCEMENT
Developing suitable optimization problems for power system resilience evaluation and enhancement methods has been a concern since the introduction of the resilience requirement.Power system resilience can be treated as an objective function or as a constraint or both.Although there are several commonly used optimization methods for power system operation and control, more sophisticated optimization methods are needed due to the complexity of the problem and the involvement of multi-interdependent infrastructures and other resources.For example, determining an optimal operation strategy for distribution systems with multiple resources during extreme events requires coordination between several infrastructures (e.g., gas and electricity) and available resources (e.g., repair crew and movable generators and storage devices) to serve critical loads.These factors include continuous and discrete variables with different timescales and modeling approaches.Several optimization methods have been introduced in the literature for power system resilience evaluation and enhancement with different objective functions.The optimization methods include deterministic methods such as linear programming [64], mixed-integer linear programming (MILP) [41], [47], [56], [57], [66], [70], [74], [82], [84], [89], [90], [93], [94], [97]- [100], [102], [104]- [106], [127], mixed-integer nonlinear programming (MILNP) [45], and mixed-integer second-order cone programming (MISCOP) [75], [88], [107], [116]; stochastic methods such as stochastic mixed-integer linear programming [91], [92], [96], [102] and stochastic mixed-integer nonlinear programming [61], [62], [83]; and population-based intelligent search methods such as genetic algorithm (GA) [51].The objective functions can be categorized into resilience-based and multiobjective.These objective functions include minimization of load curtailment, unnerved energy, and restoration time and maximization of critical loads restoration.An overview of the existing optimization techniques in the field of resilience is provided in Fig. 7. Existing optimization methods and objective functions in the field of power system resilience evaluation and enhancement are explained as follows.

A. RESILIENCE-BASED OBJECTIVE FUNCTIONS
Resilience-based objective functions have been expressed in terms of resilience elements such as minimization of load curtailment and outage duration and maximization of load restoration, the weighted sum of survival loads, and rate of recovery.These objective functions are related to various states of power system which can be represented as follows: (i) pre-event (resilient state); (ii) during the event (survivability state); and (iii) post-event (recovery state).In the pre-event state, all available resources of a power system are utilized to minimize operational (i.e., unit commitment and load shedding) costs.The weighted sum of survived loads during a disaster is maximized in the survivability state.In the recovery state, the weighted sum of restored loads is maximized while the costs of auxiliary functions such as transportation of MPS and battery life-cycle degradation are minimized.
In [82], the authors have presented two objective functions to minimize costs in both the preventive and emergency states (i.e., load shedding costs considering maximum load shedding limit).One objective function for the preventive state (unit commitment cost minimization) and two objective functions for post-event (mid-level maximization of critical load supply ignoring operational costs and inner-level minimization of load-electricity, gas, and heat-curtailment in worst case scenario) state have been proposed in [127].
In [61], [62], a bi-level objective function has been proposed where the operator tries to minimize operational costs (e.g., voltage regulation, load control, load shedding, and islanding) and the attacker tries to maximize loss for the attacks.In [56], two interdependent stochastic stages-the first stage maximizes the total energy supplied to customers and minimizes the generation cost and the second stage determines the shortest path for truck-mounted mobile emergency resources (MERs) using Dijkstra's algorithm-have been constructed based on Unscented transformation (UT).In [102], a two-stage dispatch framework (i.e., expected load outage duration is minimized based on demand size and priority of loads in the first stage and the Dijkstra's shortest path algorithm is used in the second stage to solve vehicle routing problem) has been presented as the objective function to enhance power resilience.
Objective functions focusing on survivability and recovery of systems have been considered in [107].Several objective functions have been proposed focusing on minimization of priority-based load curtailments in [41], [74], [77], [83], [93] and maximization of load restoration in [47], [57], [94], [99], [100].In priority-based load curtailment, the non-critical loads are curtailed before curtailing the critical loads.Minimization of costs for gas and electric loads curtailment and repair duration has been assumed in the formulated objective function in [78].An objective function has been proposed in [98] to maximize the amount of critical load restoration and minimize the effective restoration path unavailability.The maximization of the weighted sum of the restored load over time and minimization of the total number of travels of repair crew and movable power systems (MPSs) with the very high objective for load restoration has been provided in [106].In [54], [104], the authors have provided an objective function that minimizes unserved energy based on the criticality of loads in their proposed objective functions.

B. MULTI-OBJECTIVE FUNCTIONS
Multi-objective functions have been expressed in terms of integrated functions such as resilience, operational, investment, and planning.In these objective functions, operational, planning, and investment-based elements are usually included to minimize their cost before the occurrence of the events.In the second stage, costs related to the minimization of load curtailment and outage duration, maximization of load restoration and the weighted sum of energy served, and other operational elements during and after the occurrence of events are considered.Although operational and resilience elements have been combined in these objective functions, maximization of resilience has been always prioritized over the minimization of operational costs.Two-stage optimization functions have been formulated focusing on: (i) unit commitment decisions in the normal state (first stage) and minimization of costs for gas production, storage, electricity purchase, and load shedding in the second stage (worst contingency) [116]; (ii) minimization of costs for both dispatchable and non-dispatchable renewable generating units, and load curtailment of microgrids in the first stage, and dispatchable distributed generator units, renewable energy sources, battery energy storage systems, and load curtailment in second stage [70]; (iii) minimization of operational cost in normal mode and operational cost with dynamic penalty cost for load curtailment in the emergency mode [49], [50], [79]; (iv) minimization of normal operational cost and inelastic load curtailment with elastic load curtailment limits [89]; (v) minimization of investment cost at planning stage and operational cost with resilience constraints in the second stage [66]; (vi) minimization of investment cost in first stage and maximization of performance which has been measured based on load accessed to power and water in the second stage [51]; (vii) minimization of resilience-oriented-design investment cost based on line hardening and allocation of resources such as distributed generators and switches [91]; (viii) minimization of resilience oriented design-based investment costs such as line hardening and allocation of resources (e.g., distributed generators and switches) [92]; and (ix) the minimization of sum of cost for operation of generating unit and energy storage degradation in normal stage and penalty costs for load loss with operational cost in the emergency stage have been considered in the proposed management energy system in [75].A three-level objective function has been proposed in [85] to minimize load shedding costs and hardening investment under worst-case scenario.The first level determines the vulnerable lines and hardening strategies.The second level determines the maximum amount of damage that can be caused by an event.The third level minimizes load shedding based on the priority of loads and available power.
The optimization functions have been formulated focusing on minimization of (i) total costs for customer interruption, generation, transportation of transportable energy storage system (TESS), and maintenance of battery [97]; (ii) operational and penalty costs for load shedding based on the given prioritized conditions [71], [86]; (iii) investment cost to make transmission lines protection more reliable against physical attack based on minimizing load curtailment [80]; (iv) base case startup, shutdown, and pre-positioning costs and worst-case operation (includes load shedding cost and power generation cost) and movable resources costs [88]; (v) generation and power outage costs for critical customers [84]; (vi) operation cost-AC and DC generating units, electricity purchase-in normal mode and both operating and load shedding penalty costs in emergency mode [76]; (vii) network upgrade cost with limit on load shedding [96]; and (viii) simulation-based optimization function for both PV and battery systems to minimize the total cost to maintain a certain level of the power supply reliability during islanding condition [87].
A bi-level optimization problem has been formulated in [45] to minimize the total cost associated with restoration time, load curtailment, and generation of power in the upper level (sectionalization) and minimize the cost of load loss, cost of delay in restoration, and cost of generation in the lower level (energization level).The authors of [90] have formulated a defender-attacker-defender (DAD) model for the gas-electric system to minimize costs for production of power and gas, gas storage, and the penalty for not serving power and gas while another DAD model has been formulated in [105] to minimize the weighted sum of load shedding.Four objective functions have been proposed in [103]: the first objective function minimizes costs for PV and battery storage and operation of the entire system; the second objective function maximizes the duration of load support with the PV and BESSs during disruption; third objective function maximizes the support for non-black-start unit; and the fourth objective function minimizes the expected energy not supplied.

C. SOLUTION TECHNIQUES USED TO SOLVE THE OPTIMIZATION PROBLEMS
As most of the optimization problems have been modeled as MILP, MINLP, MISCOP, stochastic MILP, and stochastic mixed integer-nonlinear problem, these problems are very complicated and computationally exhaustive.Therefore, a proper decomposition algorithm is necessary to simplify and efficiently solve them.Commonly used solution algorithms to solve the optimization problems are column and constraints generation (C&CG); nested C&CG; Bender's decomposition; greedy search algorithm; dual decomposition algorithm; scenario-based decomposition; and progressive hedging algorithm.These algorithms have been implemented in various integrated development environment (IDE) such as GAMES and MATLAB and solved by various off-theself solvers such as IBM ILOG CPLEX Optimization Studio, Gaurobi, interior-point optimizer (IPOPT), and DDSIP.

D. RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
A significant number of authors have linearized the nonlinear optimization problem without considering the accuracy of the methods.Also, these problems are very complicated and computationally expensive.Therefore, further research is needed to develop computation efficient methods and algorithms that can capture the true nonlinear behavior of the power system.Integrated machine learning and stochastic approach could be a good approach to tackle these challenges.

IX. MODELING
Modeling of evolvements of extreme events, system components, and failure propagations is an important factor to evaluate the resilience of power systems.Extreme weather events have different models depending on the type and intensity of the given event.For example, HAZUS (Hazards US) models are usually used to forecast hurricanes and floods.Modeling of failure propagations due to simulated weather extreme events is usually carried out using fragility curves.The performance of the system is usually evaluated using system-wide models and constraints such as power flow models.This sections discusses the various models that have been used in power system resilience enhancement and evaluation.

A. EXTREME EVENTS MODELING
Although power system resilience has been assumed to be related to HILP events, HILP events are no longer low probability events [8].Extreme weather events have catastrophic impacts on the society [5], [7], [34], [110], [128] as well as on the resilience of power grids [108], [117].Man-made events such as cyber-attacks have also been considered as high impact events [129].Each extreme event has distinct impacts on the performance of power systems.For example, earthquakes, wind storms, and hurricanes usually result in the failure of underground cables, transmission poles, and overhead transmission lines of power systems [56], [64], [89].On the other hand, cyber-related events impact the power grid through communication channels and control centers [130].This section provides a classification and modeling approaches of extreme events in existing studies.
A proper model is required for any extreme event to identify its propagation and impact.For instance, wind speed has been widely used to determine the intensity of hurricanes [85], [117].Several approaches have been presented to describe physical-and cyber-attacks.Both probabilistic [45] and deterministic [121] methods have been used to model weather-related events.Most of the studies in the field of resilience rely either on historical data of extreme events [50], [113], [114], [131] or forecasting models provided by meteorological agencies to model extreme events [77], [82].The forecasting and historical weather data can be obtained from the National Weather Service (NWS), National Oceanic and Atmospheric Administration (NOAA) and Weather Research and Forecasting (WRF) model [132].In this paper, existing extreme event modeling approaches are categorized into four different groups (i.e., weather-related events, physicalattacks, cyber-attacks, and cyber-physical attacks) as shown in Fig. 8.

1) WEATHER-RELATED EVENTS
Several models have been proposed in the existing work to model weather events.In [121], the Yang Meng wind field model has been used to calculate the wind speed for a moving typhoon and determine the duration of the event.Satellite big data has been used to identify the path of hurricane [56], whereas a tri-level scaled hourly historic wind profile during hurricane events has been applied in [117].In [81], [113], an hourly wind speed profile-obtained by MERRA (Modern-Era Retrospective analysis for Research and Applications)-has been scaled-up based on the Beaufort wind scale provided by the U.K. meteorological office to present a realistic wind profile to model hurricanes.The scaling factors for the provided models have been determined based on characterizing spatiotemporal properties of adverse events through U.K. historical time series of wind gust data.In [65], a simulator has been used to reproduce observed spatial correlation and extreme statistics of adverse winds incorporating the occurrence of storms throughout the year.One of the most widely used hurricane models named HAZUS-MH2 has been developed to simulate a real hurricane event based on historical records [131].The HAZUS-MH model has been developed by the federal emergency management agency (FEMA) to simulate flood scenarios based on historic data [133].The HAZUS-MH model has been used to simulate a typhoon scenario for critical infrastructure resilience assessment [114].
On the other hand, several models have been proposed for earthquakes, wildfires, and floods.In [86], a model has been proposed based on the rate of spread, solar radiation, and radiative heat flux to model wildfire using historical data.A probabilistic earthquake energy transfer model has been proposed based on auto regressive (AR) estimation method in [73].The proposed model can be used to estimate the peak ground acceleration parameter based on three main variables: earthquake intensity in Richter, the distance between the earthquake center and location of interest, and the ground type.In [77], a flood model has been used which is based on rainfall intensities using weather agencies' prediction model.A forecasting model has been used to estimate the ice thickness forecast error in [88].An ice disaster model has been proposed in [58] to calculate the rate of ice accretion based on five main parameters: rate of precipitation, the content of the liquid water, speed of the wind, path, and moving speed.

2) CYBER-ATTACKS
Cyber-attacks can severely impact the resilience of power systems especially if they are planned based on prior reconnaissance missions.Although there have been no sufficient historical data to model cyber-attacks, modeling of cyber layers and their interactions with physical layers can capture the extent to which cyber-attacks can impact the functionality of power systems.Cyber incidents can be classified as inefficiency in the communication, distortion in information, malfunction in the device, leakage in secrecy, and misconfiguration in applications.The main domains for cyber-attacks are application software, communication network, and field devices.Cyber-attack approaches have been reviewed focusing on illustrating several ways to create a cyber-attack event [130].To simulate a cyber-attack, the control systems of 50 generators have been infected by a malware known as Erebos Trojan.A cyber vector represents the path that an attacker takes to target specific cyber elements.The malware was able to drive the generators to the overloading phase leading to the collapse of the system [129], [130].

3) PHYSICAL-ATTACKS
Although physical-attacks may not impact a large part of the grid as compared to cyber-attacks, if an attacker identify and attack a critical component in the grid, the damage can be significant.In [59], two different types of malicious attacks have been studied for resilience enhancement strategy which are: high degree adaptive (HDA) and optimal collective influence (CI).Both models rely on identifying and attacking the most critical edges in a graph-based network.In [80], physical attacks have been simulated as N-2 contingencies for small transmission systems and N-3 contingencies for larger systems.

4) CYBER-PHYSICAL ATTACKS
In [61], [62], a two-level cyber-physical disruption model has been presented.In the first level, distribution network disturbance based on predefined security scenarios has been developed by the National Electric Section Cybersecurity Organization Resources (NESCOR) in [134].In the second level, the transmission network disturbance is modeled based on sudden voltage drop or sudden frequency drop.A securityconstrained N-1 and N-2 contingency approaches have been used to simulate cyber-physical attacks in [135].

5) RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Although modeling extreme events, especially weatherrelated events, have been under extensive studies and development, there are still several research gaps that need further research.First, in most of the available weather-related forecasting methods, several assumptions and approximations have been encountered which reduces results' accuracy.The meteorological data used in forecasting weather-related events mostly rely on local historic datasets capturing the propagation of a single event in a specific geographical location.Therefore, the forecasting models can only be used for the specified location.Moreover, used data are usually assumed to be fully reliable for the given study.However, noise, communication and calibration errors and all encountered uncertainties should be modeled.In order to enhance weather forecasting models, big data analytics and deep learning methods can be utilized to obtain forecasting models with higher accuracy levels.They can also be utilized to develop a more generic scenario creation tool that captures the spatiotemporal effect of adverse weather events.On the other hand, developing a method to simulate cyber-attacks is still a big challenge due to the lack of sufficient historical datasets and the involvement of many uncertainties.

B. FAILURE MODELING
The impact of extreme events on failure of power grids can be classified into system-level failure model and component-level failure model as shown in Fig. 9. Component failure models usually apply probabilistic fragility curves to estimate the probability of failure as a function of weather parameter.On the other hand, system failure models estimate the failure risk based on characteristics of power systems, events, and geographical areas [132].

1) SYSTEM-LEVEL FAILURE MODELS
Two main approaches to model the overall system failure which are multivariate regression-based statistical model and tree-based mining model have been presented.Although a detailed comparison between these methods has been studied in [136] using a statistical validation approach, none of them achieves a 100% accuracy level [132].Moreover, most of these methods assume that power system components are stationary in nature and no changes take place over the time horizon.

a: SYSTEM STATISTICAL REGRESSION MODELS
In [136], two regression models have been studied; generalized linear model (GLM) and generalized additive model (GAM).The GLM is a linear regression model that requires: (1) a conditional distribution for the event for each parameter; (2) a link between event parameters and function of explanatory variables; and (3) a regression equation explaining the function of explanatory variables.On the other hand, the GAM can be used for non-linear relationships.
A multivariate approach has been proposed in [122] to estimate a hurricane outage duration, outage frequency, number of customers affected based on hazard characteristics such as wind speed and wind duration, systems' topology such as protection devices, regions' land-cover, and topography such as soil type and tree trimming.

b: SYSTEM TREE-BASED MINING MODELS
The tree-based mining model uses the recursive binary partitioning of historical data sets to exploit the relationship between response variables such as transmission poles and the explanatory variables [136].Two models of tree mining approaches have been studied: classification and regression tree (CART) and Bayesian additive regression tree (BART) [136].In the CART model, a single tree captures the relationship based on the data clustering method [136], whereas the BART model consists of a large number of small trees with a limited contribution of each tree to the final model [137].

2) COMPONENT-LEVEL FAILURE MODELS
Most of the resilience-based studies have focused on modeling failure of system elements toward extreme weather events.The HILP events are very hard to model due to their stochastic behavior and lack of historical data [43], [50], [57].The most well-known models to allocate failed elements are the random outage methods, scenario-based methods, and fragility curves [75], [117].In random outage methods, several elements are selected randomly to be in the down state without considering a forecasted event scenario or real-time event scenario [67], [83], [118].A scenariobased method implements either a historical real event or a simulated event on a geographical map to determine the impacted points on a real power system [76], [83], [106].A fragility curve model has been used extensively to calculate the probability of failure of system elements for a given event parameter such as wind speed or earthquake ground acceleration [6], [33], [35], [53], [66], [73], [74], [102], [121].

a: FRAGILITY CURVE
A fragility curve captures the stochastic behavior of weather conditions with respect to sequential and regional characteristics based on historical data [41].To obtain a fragility curve, four main approaches have been studied which are: (1) statistical representations of large historical failure data; (2) expert judgments; (3) experimental study based on variable shocks of a given element; and (4) a mixed approach of the three methods [65], [127].The fragility curve varies according to the event measuring parameter [117] and the event severity level [73].A detailed fragility modeling approach has been presented in [113] to estimate the failures of transmission towers caused by severe windstorms.The presented fragility model has been obtained through analyzing geometrical and material nonlinearities under a wide range of wind loading using finite element analysis and European codes.In [60], a lognormal probability distribution function has been used to create a fragility curve for hurricanes based on modeled hurricane scenarios using HAZUS-MH3.Also, another fragility curve has been constructed based on log data of distribution line failures due to wind speeds obtained from national fault and interruption report scheme database [50].

b: WEATHER-RELATED FRAGILITY MODELS
A fragility curve provides a means to assess the impact of extreme events on various system elements and determine their unavailability.At every simulation instant, a forecasted weather profile is mapped to the fragility curve to obtain the failure probabilities [81].Several fragility curves have been studied in weather-related resilience studies [35], [85], [121].A seismic vulnerability assessment algorithm using four fragility curves based on peak ground acceleration due to the earthquake has been presented in [73].A fragility curve model has been implemented in [53], [121] for transmission lines and towers based on wind speeds.In [47], [50], [51], [69], [91], [93], [98], a pre-developed fragility curve has been used for distribution poles and conductors.A fragility model, developed by the Resilient electricity Networks for Great Britain (RESNET), has been used to assess elements failure based on wind speed [41], [117].A flood-induced fragility model based on rainfall intensity has been used for a microgrid proactive scheduling strategy in [77].A detailed methodology has been studied in [138] to estimate the probability of line failure based on wind force and maximum rated line perpendicular stress resistance.A log-normal fragility curve has been presented in [138] to determine the probability of substation failures against wind storms.A fragility model has been used to determine the failures of transmission poles and lines against ice storms in [58], [74], [95].

c: OTHER MODELS
Various equipment failure models and approaches have been presented and studied in the literature of power system resilience.In [80], [97], [100], [104], physical attacks have been simulated by identifying a specific number of attacked lines, substations, and poles.In [80], [97], [100], [104], a scenario-based decomposition algorithm has been used to reduce the number of physical attacks, whereas in [105] the attacked nodes have been selected based on attackers' budget.Another framework has been presented in [57] to estimate the location and duration of fault based on the number, type, location, and resources.A weather forecasting model has been integrated to estimate microgrid islanding time and duration for a proactive management strategy in [89].A distribution power grid has been divided into a specified number of regions where a defined number of power line outages have been determined using uncertainty modeling in [66], [116], and the same approach has been implemented on transmission lines in [82].On the other hand, Monte Carlo simulation has been used in [107] to simulate more than 10,000 randomly generated damage scenarios for power branches.A bi-level interdiction optimization model has been proposed to identify target points to be attacked in a hybrid gas/power interdependent system in [90].In [88], the forecasted ice thickness level on transmission lines has been used to determine the faulty lines.In [86], the main distribution feeder has been selected to be attacked by a wildfire, whereas the failure points have been identified in [106] based on a predefined weather scenario.A trojan malware has been used to control 50 generators and initiate a cyber-attack scenario in [130].

3) RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Despite the extensive research to develop an accurate failure models to assess power system resilience, further research is still required to develop holistic failure models.First, most of the studies implement fragility curves for failure modeling which lack the advantage of capturing the spatiotemporal effects of extreme events.In other words, fragility curves can not provide a realistic realization and propagation of extreme events and their impacts on the failure of power grid devices.Moreover, the interdependency between the grid's cyber and physical layers has not been well-developed and requires more extensive studies to understand the propagation of both cyber-attacks on the physical layer and physical-attacks on communication and cyber layers.Integrating scenario-based simulation methods with more accurate fragility curves could provide a means to develop holistic and accurate failure models.Furthermore, the correlation between interdependent energy systems should be extensively studied to capture the impact of the failure of each system element on the other connected system elements.

C. SYSTEM MODELING
Several models have been presented to model electric power systems in resilience based-studies which vary according to (1) types of systems (e.g., transmission and DSs, microgrids, and interdependent systems); (2) enhancement strategies (e.g., smart grid technologies, utilization of energy storage systems, and resilience-based maintenance scheduling); (3) power flow models (e.g., AC, DC, and linearized Dist-Flow); (4) solution algorithms (e.g., mixed-integer programming, heuristic algorithms, and optimal power flow); and (5) operation and technical constraints (e.g., power balance, ramp rates, and availability).Each category plays a vital role in the formulation of the system model for resilience evaluation and enhancement.Moreover, some studies have reviewed the main assessment models and tools that are encountered in resilience-based problems [109], [123].
The system type is the first and main point to be considered when conducting resilience assessment and enhancement studies.Distribution systems usually have radial network configuration [72] whereas transmission systems are mostly meshed networks [45].Unit commitment, power balance, and power losses are the main constraints to be considered in any transmission level study [81], [117].On the other hand, distributed generators, energy storage units, and statuses of switches are usually considered for the distribution level studies [77], [86].Apart from these, interdependent infrastructure systems require modeling the interdependency between different layers in the system [51], [78], [114].
Similar to other studies, power flow models are essential components in modeling power systems for resilience assessment and enhancement.The main differences between existing power flow models are the degrees of complexity and approximations.AC power flow models provide a detailed representation for the power flow [66], [80], whereas DC power flow models are less accurate [74].Both AC and DC power flow models have been applied to model distribution systems and microgrids [50], [69]- [71], [76], [77] and transmission systems [82]- [84] for resilience evaluation and enhancement.Unbalanced three-phase power flow models have been implemented using OpenDSS in [92].Despite the more frequent use of AC and DC power flow models, a linearized DistFlow model has been proposed for distribution systems [78], [97], [98], [107].

RESEARCH GAPS, CHALLENGES, AND FUTURE DIRECTIONS
Voltage and frequency regulations are usually ignored in system modeling to simplify the computations.Most of existing studies have neglected protection-related parameters and settings during a disaster because it is more challenging to implement segmentation and islanding with conventional protection systems.However, work similar to [139] could be a good initiative where new adaptive protection systems should be studied to accommodate bidirectional energy flow for microgrid resilience-enhancement applications.On the other hand, the dynamic behavior of RES, BESS, mobile emergency resources is usually neglected in microgrid islanding and formation approaches because of the associated uncertainties such as weather-related variabilities.Also, short scheduling simulation horizon, usually one day, has a negative impact on girds with a high penetration level of renewable energy and energy storage units, resulting in shedding of load in the following day while supplying critical load in the current day.Therefore, of the horizon scheduling should be considered by using multi-stage (parallel) optimization approaches.
In distribution systems, most of existing approaches consider radial topology configurations for simplicity.However, real meshed network configurations still need more implementation and extensive studies.Although the process of integrating all meshed network related-constraints VOLUME 8, 2020 is computationally exhausting because of high complexity level, it is needed to capture the performance of practical systems.
Lessons learned from Japan after earthquake and tsunami show that demand-side management (DSM) plays an important role after the occurrence of disasters [10].Although emulating the behavior of customers during disasters is a very complicated process, developing an algorithm to understand these behaviors will facilitate system restoration.
Most of the existing approaches neglect the role of customer-owned energy resources in the resilience restoration phase.The current interconnection standards (primarily the IEEE-standard 1547), which requires customer-owned DGs to be disconnected during disturbance for safety and power quality, is holding back the development of distributed control approaches to include such energy resources for faster system restoration.However, there is no clear compensation scheme for privately-owned energy resources during blackouts.Thus, proper rate incentive plans with new policies should be designed to encourage DGs to participate in grid services.Also, the availability of perfect information has been assumed while modeling the allocation of the resources, which may not be reliable or accessible during and after disasters.
For interdependent infrastructure systems, the impact of extreme events on fuel supply has been neglected resulting in unrealistic evaluation results.Since extreme events have a direct impact on fuel supply, it is a must to study the relationship between fuel and other power system elements.Also, the amount of fuel needed during disasters can be calculated using event information to maintain a minimum resilience level.

X. GENERAL CONSIDERATIONS
This section provides general discussion about the attributes of power system resilience and it's relevance to other studies.

1) SYSTEM PERFORMANCE VS SYSTEM CHARACTERISTICS
Power system resilience has been evaluated as a system performance as well as system characteristics.In terms of power system reliability and stability assessment, the reliability measures system performance whereas the stability measures system dynamics (i.e., system characteristics).Therefore, resilience attributes such as withstand, absorb, recover, and adapt can be regarded as the intrinsic characteristic of power systems.Also, a technical report prepared by Pacific Northwest National Laboratory suggests that resilience is an intrinsic characteristic of a grid or portion of a grid [2].Therefore, for the aforementioned attributes, power system resilience can be regarded as system intrinsic characteristic.

2) NUMBER OF RESILIENCE METRICS
Several metrics may be needed to assess power system resilience as is the case for power system reliability and security assessment.Potential metrics can measure the aforementioned resilience attributes: withstand, absorb, recover, and adapt.Also, transmission and distribution systems should have different resilience metrics because: (1) they have different response dynamics to disturbances; (2) they have different restoration and recovery processes; and (3) whether extreme events may have different impacts on them (they are diffident in sizes and spread over different geographical areas).

3) RESILIENCE AND RELIABILITY
The North American Electric Reliability Corporation (NERC) defines the reliability of bulk power systems based on two concepts: adequacy and operating reliability [140].The adequacy measures the ability of a power system to supply the load demand whereas the operating reliability measures the ability of a power system to withstand sudden disturbances.Power system reliability focuses on the rate of occurrence of events whereas resilience may focus on withstanding disturbances and extreme events as well as the recovery process rather than the rate of occurrence.Also, reliability metrics typically describe system performance but they do not describe system response nor do they include outage information.Therefore, resilience and Reliability assessment are distinct but they do interlink.

4) OPERATING AND PLANNING RESILIENCE
Power system resilience metrics can be divided into operating and planning metrics.The operating resilience has been defined as the characteristic that would help a power system maintain its operational strength and robustness against disasters (e.g., keeping all customers connected).Operating resilience metrics would measure the ability of power grids to withstand, absorb, and recover from sudden disturbances and extreme events with existing resources and control systems.Planning resilience metrics would determine critical components so that they could be used to enhance the resilience of the power grid.

5) IMPACTS OF RES ON POWER SYSTEM RESILIENCE
RES can both enhance and deteriorate the resilience of power systems.High penetration of RES will have negative impacts on system inertial response and voltage stability [141], [142].On the other hand, RES can improve system 'recovery' and expedite the restoration processes after blackouts.
Negative impacts of RES can be mainly related to the angle and voltage stability of transmission systems in that a small disturbance may lead to system-wide instability.Several technical reports and research papers have shown that power systems become prone to instability if the penetration of RES reaches a certain level (these levels are systemdependent) [143]- [146].Therefore, in terms of power system resilience definitions and attributes, RES may deteriorate the ability of bulk power systems to 'withstand' disturbances.RES can improve system resilience by supplying local loads and isolated areas at the distribution level and participating in blackout restoration processes at both transmission and distribution levels.Also, RES can be used to provide local voltage support and to develop autonomous microgrid reconfiguration after disturbances [7], [119].Therefore, in terms of power system resilience definitions and attributes, RES may improve the ability of power systems to 'recover' from disturbances and blackouts.

XI. CONCLUSION
This paper has provided a comprehensive and critical review of existing definitions and currently practiced power system resilience metrics and evaluation methods.Also, it has thoroughly examined the consensus on power system resilience definitions and metrics provided by different organizations and scholars.Furthermore, this paper has identified research gaps and associated challenges, proposed potential solutions, provided future directions for developing resilience metrics and evaluation methods, and discuss general considerations for various resilience related-attributes.The work presented in this paper is intended to contribute toward the development of universally accepted and standardized definitions, metrics, and evaluation methods for power system resilience.In addition to the necessity of developing universally accepted power system resilience definitions, metrics, and evaluation methods, it is critical to develop multi-objective optimization methods for both resilience enhancement and evaluation.Also, comprehensive modeling of system components and inter-and intra-actions between and within subsystems and interconnected systems is a necessary step toward developing effective resilience evaluation methods and enhancement strategies.Therefore, optimization methods and strategies for resilience enhancement and various modeling and their associated challenges, research gaps, and potential solutions have also been provided in this paper.

FIGURE 1 .
FIGURE 1. Framework of the paper.

FIGURE 2 .
FIGURE 2. Number of disaster events in the United States from 1980 to 2019 that exceed one billion dollars in losses [8].

FIGURE 3 .
FIGURE 3. Examples of extreme events, where M denotes the number of customers without power in million.

FIGURE 7 .
FIGURE 7. Optimization methods for resilience evaluation and enhancement.

FIGURE 8 .
FIGURE 8. Types of extreme events.

FIGURE 9 .
FIGURE 9. Categories of failure models.