Evaluation of Power Systems Resilience to Extreme Weather Events: A Review of Methods and Assumptions

The requirement for a sustained supply of electricity intensifies during and in the aftermath of extreme events. In the past, events were considered extreme based on their extensive and devasting impacts and also because they were rare. However, recent studies report an increase in the frequency, intensity, and duration of weather-related power outages. Several studies have proposed approaches for evaluating and enhancing power system resilience. The generic approach entails characterizing weather threats, assessing the system components’ vulnerabilities, analyzing system response, evaluating baseline resilience, and assessing the effectiveness of resilience enhancement measures. This study is a review of the different assumptions and models employed in this multiphase analysis process. To demonstrate its utility, a brief use case has been provided to ascertain Great Britain’s transmission network’s resilience against a lightning strike. The findings demonstrate that network outturn during a threat can severely be influenced by internal systemic maloperations rather than the event’s intensity. This challenges the pervasive view of resilience assessment which mainly focuses on exogenous threats. Moreover, the study highlights threat characterization and vulnerability assessment phases as the main sources of uncertainties that can be moderated through the capture of relevant data aimed at developing holistic empirical fragility functions.


I. INTRODUCTION
Power systems outages are projected to increase with regular occurrence of high impact low probability (HILPs) events [1], [2], [3]. Such events are known for their swift and devasting nature causing damage to multiple components over large areas. This leads to high costs of repairs and economic losses. In the United State of America (USA), such losses from weather-related power interruptions have been estimated at $25-75 billion annually [4] whereas, in a low-income country like Uganda, drought events have previously caused $237 million in losses [5].
The associate editor coordinating the review of this manuscript and approving it for publication was R. K. Saket .
It is projected that HILPs will exponentially increase due to climate change [3], [6] and intensified by a host of other factors such as the rapid increase and change in demand profiles [7]. This will inevitably require significant alterations to the current network architecture, potentially, introducing new uncertainties to grid operations. Similarly, the increase of variable renewable energy (VRE) sources and distributed energy resources (DERs) has been demonstrated to exacerbate the effects of HILPs on power systems [1], [7], [8]. Ironically, VREs and DERs are in various contexts a means for enhancing grid resilience [9], [10], [11]. This dilemma illustrates that there are no neutral solutions to increasing system reliability during HILPs but rather a combination of operational, organizational, and structural trade-offs. Finding effective trade-offs for building and enhancing power systems reliability with respect to HILPs is what informs most resilience analysis processes [12], [13], [14].
In the main, the study of resilience seeks to devise and assess adaptation measures and undertake risk assessments for operating and financing infrastructure projects [15]. These motivations necessitate the formulation and standardization of resilience metrics and the development of formal computational methods for evaluating both baseline resilience and the effectiveness of enhancement techniques [4], [16]. There is also additional interest from policymakers regarding the enactment of evidence-based regulations, strategies, and plans that would both enable enhancing current systems' resilience and the development of industry guidelines to normalize resilience-thinking throughout the infrastructure life cycle [4], [14].
Furthermore, the objective of building resilient infrastructure is not open-ended since it ought to reconcile the energy trilemma goals (affordability, sustainability, and security) [17]. Essentially, this requires a consideration of several parameters and as well a delineation of their interactions. These include target performance, performance threshold, components damages, cascading failures, system performance, resourcefulness, robustness, redundancy, recovery, resilience capacities, failure detection, and human resource capabilities among others [9]. Evaluating such parameters and making sense of their interactions requires complex models and massive datasets. In addition, given that the studies inherently attract numerous uncertainties due to the irregularities of extreme events, several assumptions are made mainly to fill in data gaps and to provide proximate allusions to perceived or expected reality.
Accordingly, recent studies [6], [7], [12], [16], [18], [19], [20], [21], [22] have proposed different approaches to assessing and enhancing power system resilience. The employed methods can be condensed into a generic five-phase resilience analysis approach, namely; (a) threat characterization, (ii) components vulnerability assessment, (iii) system response analysis, (iv) baseline resilience evaluation, and (v) quantification of the effectiveness of resilience enhancement strategies. However, differences in assumptions and modeling approaches have led to differences in outcomes and their interpretation. For example, Espinoza et al. [16] characterized wind threats by estimating return levels whereas Panteli and Mancarella [6] scaled up wind intensities until the maximum equaled a desired set magnitude. In the latter instance, the likelihood of the occurrence of a certain intensity in a specific location was not considered at all. Secondly, Bruneau et al. [20] assessed power system resilience by a shortfall in the level of expected service whereas Vugrin et al. [4] quantified a probability of risk of exceeding a set threshold of consequences [4]. It is not apparent in the literature how such dissimilar assumptions and outcomes could be interpreted concerning a common event and if they can be harmonized into a coherent narrative of the state of a power system's resilience. In addition, previous review studies [3], [9], [10], [12] were mainly focused on demonstrating the various resilience evaluation metrics with little consideration of models for assessing threats and components vulnerabilities. The authors do not know of any study in which such models have been collated or discussed.
Therefore, this paper aimed to analyze the conceptualization of power system resilience and specify its theoretical, operational, and evaluative underpinnings by reviewing recent publications on the topic. The study further sought to catalog and discuss mathematical models for each of the identified resilience analysis phases. Moreover, the phased process was applied to a real use case based on the impact of a lightning strike on Great Britain's (GB) power system. The main contribution to the existing literature is the collation of computational models across the five stages of resilience assessment and their standardization to a common reference resilience curve. Secondly, the case study utilized a variety of indicators to provide a better understanding of the infrastructural, operational, and organizational resilience of the system. Specifically, the study demonstrates the interpretation of the various model outputs against a common extreme event.
The sections that follow are arranged such that Section II provides the common resilience definitions and a breakdown of the resilience concept into its constituent elements. The section relies mainly on the features of the resilience curve to delineate the various compositions of resilience conceptualization and operationalization. In Section III, the multiphase resilience analysis process is described in detail providing various models for each phase whereas Section IV illustrates, briefly, the application of the phased process in evaluating the resilience of the GB transmission power system with respect to the 9 th of August 2019 lightning event. The study's conclusions are drawn in Section V.

II. RESILIENCE CONCEPTUALIZATION AND OPERATIONALIZATION IN POWER SYSTEMS A. RESILIENCE DEFINITIONS
The classical meaning of resilience from its Latin roots is to 'spring back' [10]. This was reflected in the definition provided by the U.S. National Association of Regulatory Utility Commissioners (NARUC) which described resilience in terms of robust and recovery qualities of power systems during and after disasters [3]. This was further expanded by the United Nations-International Strategy for Disaster Reduction (UNISDR) which regarded resilience as the entity's capacity to adapt, resist, or change in order to maintain an acceptable level of functioning and structure [23]. These definitions can be largely classified as reactionary given that resilience is described primarily as a property of the system meant to stonewall attacks or change.
Proactive properties of resilience have been proposed to respond to projected threats. For example, the Intergovernmental Panel on Climate Change (IPCC) defined power system resilience as the ability to anticipate, absorb, and quickly and efficiently recover after hazardous events [24]. This conceptualization adds the capacity to anticipate (and prepare) to FIGURE 1. A conceptual resilience curve. Adapted from [3], [6], [9], [10], [28], [29], [30]. legacy resilience capacities, namely, resistance, recovery, and adaptation. Similar definitions have been proposed by [25], [26], and [27] which essentially regard resilience as the ability to prepare, plan, absorb, rapidly recover, and adapt before, during, and in the aftermath of an adverse event.
Given the role resilience plays in driving and sustaining development commitments, some have argued for the capacity to transform as an integral property of resilience. For instance, Manyena et al. [28] defined resilience as the ability of a system to anticipate, prevent, absorb, adapt, recover, and transform. The capacity to transform is essential in transitioning the system out of its status quo such that reoccurrences of extreme events do not slump it back to previously experienced degraded states.
Some definitions emphasize the functional aspects of infrastructure resilience to communities such as one provided by the United Kingdom Energy Research Centre (UKERC) which considered resilience as the capacity to cope with a disturbance and ''to continue to deliver affordable energy services to consumers'' [31]. Similar definitions conceive resilience as a means to ''minimize interruptions of service'' [32], ''prevent or mitigate the impact of similar events in the future'' [30], and ''continue operating and delivering power'' [4]. Notably, it is mostly on the basis of definitions that sets the context of desired resilience capacities and qualities.

B. RESILIENCE CAPACITIES AND QUALITIES
Resilience operationalization is a multiphase temporal process commonly represented in literature as a ''resilience trapezoid'', ''resilience conceptualization'', ''resilience framework'' or ''resilience curve'' [3], [9], [10], [28], [30], the latter nomenclature being adopted in this study. Although studies do not all agree on the composition and sequencing of phases, there is a generality to the proposed capacities of resilience. That is, a resilient system, at the least, ought to embody preventative, absorptive, recovery, adaptive, and transformative capacities [25], [26], [28], [33] as seen in FIGURE 1. These are time-dependent phases of a system's response to a disruptive event meant not only to stem off undesired system outcomes during an event but also to augment the system to an enhanced resilient state (R * o ) after. It is not only desired that the system is brought back to the 'original' functionality, R o , but that it transitions to the sort that is less vulnerable in the event of the reoccurrence of similar trigger events. FIGURE 1 demonstrates a typical resilience curve in which the different capacities are separated but in reality, at any given instance of a disruption, multiple capacities might be at play. The preventive, and by extension the anticipative capacity, is primarily concerned with ensuring that both the impacts and consequences of disruptions stay within acceptable limits. If t o and t 1 is the time of occurrence of a windstorm and the time the system starts to degenerate in its functions respectively, the lag between these two instances represents the preventive capacity. A long lag would be preferable since it decouples the event and its associated effects as is usually the case in transmission networks with sizeable redundancies. However, in the absence of rapid fault detection mechanisms, the detection of faults could coincide with cascading failures. VOLUME 11, 2023 When the system is overwhelmed and displaced from its normal operational state, the absorptive capacity is relied upon to preserve the essential basic structure and functions of the system whilst avoiding permanent components or system damages. This phase is often referred to as the ''disruption transition'' [9] or ''disturbance progress'' [30], [34]. System degradation can be viewed as progressively moving from an alert stage, R 1 , which is an initial violation of operational constraints [6] to the emergency stage, and ultimately, the system might descend to an extreme state, R 2 at t 2 , below which a total blackout is imminent [9]. During the degraded state, [t 2 − t 3 ], corrective and emergency actions prioritize the reinstatement of critical loads, at t 4 , and thereafter a full system recovery, [t 5 − t 6 ] [4].
In addition to capacities, a typical resilient system embodies qualities. These are the inherent system properties that prevent system breakdown [28]. These include robustness, resourcefulness, redundancy, rapidity, reliability, and reflectiveness [35] whose definitions are briefly provided below. a) Robustness: The ability to absorb shocks and continue operating or the reduced sensitivity of outputs [36]. b) Resourcefulness: The ability to skillfully manage a crisis as it unfolds [11]. c) Redundancy: Spare capacity to accommodate disruptions [35]. d) Rapidity: The ability to get services back as quickly as possible to their pre-event state [37]. e) Reliability: The ability to operate acceptably under a range of conditions [38]. f) Reflectiveness: Continuous evolution and modification of standards based on emerging evidence [35].
The arrangement of resilience qualities in FIGURE 1 is not meant to draw hard boundaries of instances in which they are applicable but rather to demonstrate that at different phases of the disruption or response, some qualities are more vital than others. For example, during the absorptive phase, a network with high redundancy from its overhead lines (OHLs), could supply load centers even if several circuits get tripped.
In addition, the curve reveals that resilience as an organizing concept can be considered in three broad categories, namely, infrastructural, operational, and organizational categories. Operational and infrastructure resilience are the most cited [3], [9], [13] but recent reflections on the impact of COVID-19 on power systems have underscored the necessity of organizational resilience [41]. Organizational resilience ensures the maintenance or recovery of the system's functionality through managerial decisions meant to support business continuity [41]. This could entail effective training and allocation of staff, and swift mobilization and deployment of essential resources for rapid recovery. On the other hand, infrastructure resilience refers to the physical strength of the system's components to resist degradation whereas operational resilience is concerned with mechanisms that limit the frequency and duration of customer interruptions [41]. Recent literature infers that organizational resilience ought to be the backbone of effective resilience enhancement strategies with an assumption that whichever action is undertaken before, during, and after a disruption, managerial considerations prove more effective than both structural and operational factors. It is on the background of determining resilience categories, capacities, and qualities that several studies [25], [28], [29], [42] build assessment tools.

III. A MULTIPHASE RESILIENCE EVALUATION PROCESS A. GENERIC MODELLING AND ASSESSMENT TECHNIQUES
In their resilience assessment study, Dunn et al. [43] employed the Catastrophic Risk modeling framework which consisted of a hazard-generating model, a component's exposure database, a vulnerability assessment criterion, and a resilience evaluation model. An extreme windstorm was generated in a selected geographical region of the power system and its temporal profile was applied to the system's OHLs leading to an assessment of their eventual damage states. They proposed to evaluate resilience as a socialeconomic consequence. Similar modeling techniques have been proposed by other studies [1], [12], [16], [17] which categorize the essential parameters of the resilience evaluation exercise into three main models: weather, components, and power system model. The models tend to be multiphase and sequential as seen in FIGURE 2 such that the weather model output serves as an input into the components' model and in turn, both model outputs are used in the system's model. It then follows that the system output provides the basis upon which resilience is evaluated [16], [17] although, in some instances, resilience is computed at a vulnerability assessment level [44] (i.e., lines under outage, number of flooded substations, and damaged towers).
Most studies [16], [19], [30] assessed resilience based on the current state of the network's topology but Fu et al. [7] incorporated a network growth model which explored how different demand and supply growth pathways could impact the network's response. In their study, the network growth model was the basis for vulnerability assessment. Ultimately, resilience was evaluated using classical reliability indices. Although reliability studies are generally considered inadequate for assessing resilience, reliability metrics (Loss of Load Expectation (LOLE), Loss of Load Frequency (LOLF), and Energy Not Supplied (ENS)) are commonly used [16], [34], [44] in quantifying resilience given that, with slight alterations (such as considering tail-end values of probability distributions), they do capture the essential operational output of the system under extreme disruptions.
Considering the stochasticity of weather events, components' failure, and fault occurrences, the Monte Carlo Simulation (MCS) technique is often utilized to quantify the probability associated with a given system output [3], [10], [12], [13]. The MCS approach primarily acts as an indicator of the sensitivity of stochastic variables. For instance, if the components' damage state is modeled as a random event as 87282 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  Multiphase resilience assessment and enhancement framework. Adapted from [4], [14], [16], [39], [40].
proposed in [19], [44], and [45], the MCS approach captures the different system's responses emerging from running the model several times. Subsequently, any given system impact or response can be interpreted in reference to a continuum of other possibilities.
The resilience assessment process (RAP) is an objectiveoriented exercise in which several studies prioritize the evaluation of baseline resilience. The evaluation is often focused on ascertaining whether the system's functions and services are satisfactorily maintained in the event of a threat. The threats can be historical or projected. Historical data entails the magnitude and duration of the event, and system response [4], [10], [46] whereas for predictive assessments such as Espinoza et al. [16], computation models were relied on to simulate both the event and the system response. Adaptation measures are then modeled by making adjustments to the baseline conditions such as the number of components within the system, the time taken to respond to a fault, the amount of available resources, and the exposure and vulnerability of components [4], [14], [19]. Such measures can also consider increasing and changing demand profiles, penetration of renewables and DERs, conformity to environmental regulations, and proliferation of smart technologies [7]. This section has summarized the generic multiphase RAP framework as proposed by several studies [4], [14], [16], [39], [40]. It comprises threat assessment, components impact analysis, system response, baseline resilience evaluation, and resilience enhancement which are hereunder expounded on.

B. THREAT CHARACTERISATION
The end goal of characterizing the threat is to determine or model its intensity, likelihood, and spatiotemporal profile [16]. Modeling extreme events is an exercise that relies on deterministic or probabilistic methods based on historical data or forecasting. The process involves simulating a representative extreme event in time and space which can be characterized by its magnitude, location, and duration of occurrence [4], [12], [40]. The underlying assumption is that the event ought to have a ruinous effect on the system's components or that it could lead to the functional degeneracy of the system.
Several studies [7], [16], [19], [43] opted to use climate model (reanalysis) data to develop extreme weather scenarios. The data was sourced mainly from the 5 th generation European Centre for Medium-Range Weather Forecasts atmospheric reanalysis (ERA5) [47], the Global Meteorological Forcing Dataset for land surface modeling [48], and Modern-Era Retrospective analysis for Research and Applications (MERRA) [49]. Whilst such data is available at a high spatial and temporal resolution compared to met-station data, it was been observed in the past that it fails to capture extreme intensities [16], [44]. It then necessitates that to use it in conducting impact studies of extreme events, it requires to be scaled up to generate high-order intensities [16], [19]. This approach introduces uncertainties from the onset of the study, which might not be readily quantified.
Espinoza et al. [16] utilized the generalized extreme value (GEV) theory approach to predict different scenarios of return intensities of wind and rainfall. They also created hazard scenarios with combined weather types involving both windstorms and floods impacting the system's components simultaneously. Another study [7] utilized a wind simulator to create wind fields covering the whole of GB whereas another [46] recreated a spatial-temporal profile of a historical windstorm event. Weather generators such as HAZUS-MH2 and UKC09 have also been used to simulate drought and flood conditions [15], [43]. Alternatively, physical attacks on transmission systems were simulated as N-2 and N-3 events in [3] whereas in assessing the impact of climate change on hydropower plants, General Climate Models (GCMs) that project extreme drought conditions were used to project future river discharge under several emissions scenarios [5], [50], [51]. Nevertheless, hazard model outputs are often produced at hourly resolution [16], [17], [19] and the temporal profile spans the entire duration of the event [19] or until the system components are fully restored [44]. TABLE 1 summarizes the different approaches and models proposed in the literature to characterize extreme events.x r is the return intensity in r years;μ ∈ R,σ > 0,ξ ∈ R is the location, scale, and shape parameters respectively that maximize the log-likelihood function of the GEV cumulative probability distribution; p (x) and w x is the probability of occurrence of event x and its weight respectively; I T (x) is the modeled intensity from reanalysis data I t (x) for a dataset whose maximum value is I r (max) scaled up in proportion to the maximum historical valueÎ O (max); The climate moisture index (CMI) is derived from the temporal-aggregated precipitation (P) and the adjusted potential evapotranspiration VOLUME 11, 2023  (PET); f is the multiplicative factor for the selected combination of weather scenarios, nt is the number of thunders, and wg is the maximum wind gust speed (km/h);Î t (x) is the temporal observed intensity for event x spanning the duration t 0 to t T ; pf kj is the probability of flood depth at substation k falling in the interval [f j ,f j+1 ] and g k f is the associated lognormal probability distribution function of the flood; P l is the occurrence probability of a lightning strike l, D is the minimum distance between the lines and the lightning strike, N g,l is the ground flash density, I ms,l is the maximum steepness of lightning peak current, I l is the lightning peak current of lightning strike l, and LI l is the lightning intensity; is the mean number of events in the time interval (t, t + h], υ (t) is the rate of occurrence parameter; P (h) is the Poisson probability distribution function for the annual occurrence of hurricanes, λ is the average number of hurricanes and h is the number of hurricanes per year.
The threat assessment phase can be summed up to entail evaluating return levels, data scaling, a re-creation of historical events, the use of disaster maps, or a projection of 'worst-case' scenarios. The output is often a spatiotemporal profile of the event contextualized within its likelihood of occurrence or its severity weighted against other possible threats. Once the event is characterized, it is then mapped onto the structural components of the system to determine their vulnerabilities

C. COMPONENT EXPOSURE AND VULNERABILITY ASSESSMENT
Extreme weather events affect power systems in three major ways by causing: (i) infrastructural damages (ii) uncertainty in generation (and by extension grid balancing), and (iii) variability in demand [51]. These stem from components' exposure and vulnerabilities to weather attacks. Exposure is the presence of infrastructure components in locations where they could be adversely affected by an extreme event whereas vulnerability is their corresponding propensity to suffer damages or loss [2], [57]. Exposure and vulnerability assessment is about evaluating real and apparent system failure in light of its components' sensitivities. Accordingly, two main approaches have been identified in the literature: regression-based models and tree-based data mining models [3]. These models can be multivariate or univariate depending on the number of explanatory variables (i.e., event intensity, frequency, or duration) attributed to causing a response variable (i.e., component damages, outage duration, or outage frequency). Regression-based models rely on equations that best describe the relationships of the investigated variables. In Tree-based mining models, the vulnerability of components or systems is assessed by recursive binary partitioning of historical data to establish a link between the response and explanatory variables [58]. In both these approaches, point objects like towers, substations, and generation plants are subjected to event intensities within their locality from which their probabilities of failure are evaluated [13], [16]. On the contrary, OHLs that span several regions, different approaches are employed. Some use the maximum intensity within the line path [16], [19], aggregated or weighted failure rates at different line sections [12], and others assume that each section's failure probability is independent [12].
case of exposure modeling, spatial analytic tools are leveraged either in programming packages (such as Geopandas) or proprietary Geographic Information System software to expose the components to the weather threat. Alternatively, the event can be mapped onto the components randomly as demonstrated in [17].
For fragility analysis, the components' probability of failure is derived mainly from fragility curves [43], [59], [60], random-based outage approaches [6] or fuzzy logic models relating intensity bands to multiple damage states [49]. A typical fragility function can be generated through analytical processes using (1) as proposed by Federal Emergency Management Agency (FEMA) [61]. If the function is reduced to a univariate regression model of the intensity and probability of failure (P w ), it can be demonstrated that the resultant cumulative probability distribution function follows a typical sigmoid function illustrated in (2). This model is usually applicable within the range of significant lift-off (w critical ) and the intensity at which the component's damage is certain (w collapse ). Outside this range, the P w for w < w critical can be regarded as either zero (0) or set to a very low number whereas for w > w collapse , P w = 1.
where P is the probability of failure,S d,ds the median value of spectral displacement/acceleration at which the component reaches the threshold of damage state, ds; β ds is the standard deviation of the natural logarithm of spectral displacement for damage state, ds; is the standard normal cumulative distribution function; P w is the probability of failure at weather intensity w, whereas A&B are distribution parameters.
Indicative fragile curves for various power system components are presented in FIGURE 3 and TABLE 2 shows their reinterpretation as univariate regression models following (2). In their study, Dunn et al. [43] empirically generated fragility models from observed electrical faults caused by windstorms. They demonstrated that increasing the spatial resolution improved the correlation between the event intensity and components' failure such that rather than applying a single fragility model across the entire network, regional fragility curves could be generated to reflect local threat peculiarities. The fragility curves by Dunn et al. [43] were generated for the distribution OHLs whereas for Murray and Bell [60], they were for the transmission OHLs and yet they bear a resemblance as seen in FIGURE 3 (A). Most curves demonstrate that failure of overhead lines by windstorms significantly lifts off at 25 m/s and by 50 m/s, failure is almost VOLUME 11, 2023  certain. For towers, studies [7], [44], [49], [56], [62] demonstrate significant disparities with critical windspeed ranging between 30-120 m/s and the speed of imminent collapse being indeterminate as seen in FIGURE 3 (B). In the event, certain faults are deemed to be caused by multiple weather types (i.e., snowfall and wind gusts) or by multiple elements of the same weather type (i.e., snow density and snow depth), a multivariate fragility function was proposed by [60].
The analytical method of generating fragility curves involves structural simulation models which determine the performance limits of components under stress by considering components' material, dimensions, and environmental factors [19], [43]. Alternatively, fragility curves can be generated empirically, experimentally, or by relying on expert judgment. The empirical approach as documented in [43] and [60] utilizes historical components' failure data. Experimental approaches entail destructive testing methods by deliberately destructing the component under simulated extreme conditions [19]. Expert judgment relies on the experiences and opinions of key stakeholders to ascertain the damaged state of a component. In order to overcome some of the limitations inherent in any given approach, a hybrid outlook can be considered. For instance, the analytical approach can be informed by empirical data and expert judgment.
After determining the probability of failure of a component, it remains to ascertain its ultimate damage or functional state. Most studies take one of two options: (i) deterministic vs probabilistic or (ii) continuum vs binary functionality state approaches. several studies [6], [12], [13], [16] employed a stochastic approach that generates a uniformly distributed random number, r∼U [0, 1], at every simulated time step such that a component failed if its probability of failure was greater than r (i.e., P w > r). In contrast, Leandro et al. [64] considered a probability of failure of either 0 or 1 contingent on a set flood level. In their study, a probability equivalent to 1 depicted that the component was inundated and hence was under outage. A related approach was applied by Espinoza et al. [16] who considered that for rainfall accumulation ≥ 20mm/h, 38% and 33% of affected power plants and substations failed respectively. Most studies [13], [16], [17], [19] determined the functional state of a component as a binary outcome such that the component was either fully functional or completely out of service. Whereas this characterization might suit OHLs and towers, it might not represent the reality of other components such as substations and generation plants which might continue to operate even in a degenerated mode. Therefore, Movahednia et al. [53] and FEMA [63] employed damage curves that estimated the operational percentage reduction of a damaged component.
Some studies assess the failure of multiple components. These treat the failure of individual components independently or jointly as a common cause failure (CCF) scenario. For the latter, several studies [6], [7], [16], [65] model the failure of OHLs based on tower failures such that the collapse of a single tower supporting OHLs renders the entire transmission corridor inoperable. Studies that undertake failure analysis exclusively as CCF such as [7] risk underestimating the fragility of the power system since the failure of the weakest components is often tethered downstream of the failure of the strongest components. In reality, a considerably high number of lines will collapse without any collapse of towers. Furthermore, interdependent-related failures are classified into two other categories: cascading and escalating [66], [67]. A cascading failure occurs when failure of one component or system causes failure of another such as an OHL trip after being struck by a falling tree. Escalating failure occurs when a disruption in one system exacerbates an independent disruption in another [67] as would be the case when an impassable road increases the recovery time of a power system following a hurricane.
Ultimately, modeling the failure of components can help in providing a vulnerability index that can be useful in applying resilience enhancement measures [16]. For instance, an OHL with a high percentage of out-of-service duration could imply a requirement of installing a more robust OHL or reorganization of the repair crew to allow swift access and response to the vulnerable circuit.
From this section, it can be deduced that the process of quantifying components' vulnerabilities engenders several uncertainties. These stem from the insufficiency of data used in developing fragility functions and the nonlinearity between the ultimate damage states of components and the corresponding causal events. Improved empirical fragility functions could scale back some of the uncertainties but this could require massive datasets for their development. It would necessitate that such datasets, at least, capture the age, material, location, type, and functional state of components, time of occurrence of a failure, and event's intensity, time of occurrence, and duration. This data would need to be recorded at a high spatiotemporal resolution to facilitate the generation of localized fragility functions at various temporal scales.

D. SYSTEM RESPONSE MODELLING
System response modeling is intended to evaluate the system functional output for every considered timestep subject to components' damage states and to assess the system repair and restoration capabilities.

1) REPAIR MODELLING
Following the failure of a component, it necessitates determining its repair duration. Different time-to-repair (TTR) models, for the time required to bring a component back into service, have been proposed in the literature although it is not apparent in every study whether the proposed models are empirical, experimental, or based on expert judgment. For example, there is a wide range of TTR for OHLs with the typical duration being 10-50 hours, and 48-350 hours for towers [7], [16], [19]. It is commonly assumed that the repair duration exponentially increases with the event's intensity given that the process of fault identification and analysis, resource mobilization and dispatch, and the actual repair works are increasingly difficult in harsh weather conditions [12], [13], [70]. This phenomenon is modeled by scaling up the mean time-to-repair under ordinary weather conditions (TTR norm ). Several studies [16], [19], [70] incorporate stochasticity in generating TTR by scaling up TTR norm with randomly generated coefficients, k, k 1 , k 2 , and k 2 , as seen in TABLE 3. TABLE 3 presents some of the TTR models within literature depicting the restoration duration of power system components under various weather events. w(t) is the instantaneous weather intensity, w norm is the threshold (normal) intensity beyond which TTR norm is scaled up; d is the % of the damage of a component and T is the total time a component is under flood inundation; TTR weekday and TTR weekend are TTR for weekdays and weekends respectively, whereas TTR 7−17 and TTR 17−7 are TTR within the range of subscripted time-of-day.
The models presented in TABLE 3 can be considered simplistic given that they are drawn from regression inferences between the repair time and weather intensity deemed to have caused a component's damage. Complex models have been proposed in literature which seek to represent the effect of several critical variables at play during the restoration process. For example, Zhang et al. [68] presented a model accounting for the distance to the repair site, the number of repair teams, the number of crew within each team, the travel speed, traffic flow, and the traffic capacity of the road. It is not apparent in such studies if the models achieve better performance nevertheless it seems axiomatic that a useful repair and restoration of function ought to embody all critical factors such as crew dispatch and coordination, spare parts inventory, time or day of failure, fault detection, and site accessibility.
Some studies disregard the repair function whereas others adopt a more conservative approach in its application. Studies such as [53], [71], and [72] consider that a failed component cannot be repaired for the remainder of the simulated timesteps of the extreme event whereas others presuppose that no repair works can be done whilst the event intensity exceeds a certain threshold. In the latter category, Souto et al. [69] considered a substation's outage duration to be the sum of the flooding inundation and the time required to repair the damage. In that case, repair works are only possible once the flood is cleared.

2) POWER SYSTEM MODELLING
There are various methods employed in developing power system test models. Fu et al. [7] built a network model using Graphs in which the components (OHLs, substations, and generators) were defined by their demand, capacity, and spatial coordinates. The OHLs were represented as edges whereas the vertices symbolized substations, generators, and demand centers. Likewise, other studies [12], [16], [73] developed network components as geospatial objects; points for substations and towers, and polylines for distribution or VOLUME 11, 2023 [74] and OATS [75] have several test cases based on both real and abstract networks.
The test models can be reduced or full-scale. Fu et al. [7] assessed resilience using the full-scale Great Britain transmission network model rather than the reduced Great Britain Network (RGBN) that has been used by several researchers [16], [19], [30]. The reduced network models lump multiple components into a few representative ones whilst maintaining the expected output of an actual network. This requires that they are validated against full network models. For example, RGBN was validated against a solved AC load flow reference case provided by the National Grid Electricity Transmission [76], [77]. In contrast, some studies employ abstract test models such as the IEEE 6-bus test system which was implemented in [6]. Such test systems are only meant to demonstrate the application of a particular methodology being advanced. Regardless of whether real or abstract, reduced or full, the test case components are subjected to weather attacks, a vulnerability assessment is undertaken, a repair function is effected and the power flows are simulated upon which the baseline resilience is evaluated.

E. RESILIENCE EVALUATION
Most metrics used to evaluate resilience can be broadly classified in terms of energy, time, and money [4]. A finer classification could entail a differentiation of infrastructure and operational, attribute-and performance-based, deterministic and probabilistic, absolute and normalized, threat and systemic, impacts and consequences, and phase and integrative metrics.
Infrastructure-based metrics assess the level of robustness of the system's components whereas operational metrics evaluate the impacts on system functions and services [9]. The use of both categories of metrics has been credited for enabling a systematic risk-based assessment of resilience degradation and recovery of the system [19]. Operational resilience is often quantified by the amount of connected load, generation capacity, generation output, energy not served (ENS), and voltage operational limit [44], [45]. The infrastructure metrics keep track of the number and rate of damaged components [19], [44], [78]. A related contrast is between attribute-and performance-based metrics. Attribute-based indicators assess the qualities that make the system resilient (i.e., robustness, reliability, and redundancy) whereas performance-based indicators seek to quantify system output that demonstrates how resilient it is (e.g. demand served, customers experiencing outages, and outage duration) [3].
System performance is often presented as a probabilistic metric (e.g. mean, Value at Risk (VaR), Conditional Value at Risk (CVaR)) which depicts the magnitude and likelihood of a measured parameter [4], [11]. In contrast, deterministic approaches quantify the value of a parameter without any indication of expectation. Moreno et al. [17] contended that VaR and CVaR were more suited for quantifying resilience given that they capture the essence of resilience analysis which is primarily focused on extreme outcomes. These metrics can be used as indicators of the probability of risk exceeding a particular threshold of impacts or consequences [4]. For example, 5% VaR of 1,000 GWh could mean that there is a 5% chance that the unserved energy exceeds 1,000 GWh whereas 5% CVaR of 1,000 GWh could mean that the average value of the largest probable 5% unserved energy values is 1,000 GWh [4]. Other studies [6], [7], [16], [19], [70] used expected energy not served (EENS) with an assumption that it inherently captures the mean of all possible occurrences although others like [60] regard the median as a more representative metric since it reduces the risk of skewness by outliers. FIGURE 4 demonstrates a typical relationship between EENS, VaR, and CVaR considering the probability distribution curve of ENS.
The difference between a normalized from an absolute metric is that a normalized indicator takes into consideration the measured parameter per unit of a reference parameter. Espinoza et al. [16] used EENS as the absolute metric and the Energy Index of Unreliability (EIU) as a normalized metric. These can either be impact-or consequence-based. Impact metrics are system-centric and concerned with the response or operations of system components such as the number of disconnected lines or amount of load-shedding [16], [44] whereas consequence-based metrics evaluate the effects on the community such as the number of outage hours for customers, number of customers experiencing outages, and value of lost load (VoLL) [3], [4], [14], [17]. Collectively, these attributes can be considered systemic given that they depict the operational output of the system during and in the aftermath of disruptions. These can be contrasted with threat-based metrics which quantify resilience by the magnitude of intensity a system can withstand without significantly affecting its functions and structure (i.e., 100-year return flood) [16].
Although in several studies [16], [17], integrative metrics such as EENS and VoLL were used to evaluate the collective resilience of the system, these metrics do little in communicating the resilience of individual capacities. Other studies such as [10] and [79] contend that to quantify resilience, one ought to delineate the individual constituent capacities. They argue that no single indicator can sufficiently depict the level of resilience given that resilience is a dynamic phenomenon that cannot simply be captured in a single instance. Therefore, phasic indicators are proposed, such as the E (pronounced as ''FLEP'') metric, to capture the rate and magnitude of degeneration, the duration of degeneration, and the recovery rate [44]. TABLE 4 shows the commonly employed models in evaluating the resilience of power systems. Where possible, the reference time and associated instance system performance have been standardized to the resilience curve in FIGURE 1 presented in Section II. t ∈ [t 0 , t 8 ] represents timesteps covering the whole duration of analysis; LOLE is duration of load outages, LOLP is number of times in which the load exceeds generation capacity, ENS is the total energy not supplied, E is the cumulative energy demand during the assessment period, and ENS t is instantaneous energy not supplied; R (t) is the system's performance (i.e., demand, repair costs, economic costs) and A is generation capacity; L Mm is the observed maximum reduction in system performance whereas L max is the expected loss of load in the event of a complete blackout; VoLL is the value of lost load; R o is original system performance, R 2 is performance level immediately post-disruption, R * 0 is the performance at a new stable level after recovery efforts have been exhausted; t δ is maximum amount of time post-disaster that is acceptable before recovery ensues, t * r (= t 4 ) is the time to complete initial recovery and t r = t 8 is the time to final recovery or a new stable state; L 1h,j is the restored load amount of bus j in 1 h after t 3 , L eb,j is the actual load amount of bus j after emergency condition, L nb,j is the load demand of bus j under normal conditions, S jl is the load loss at t 3 , S jl (t) is the real load loss l on bus j at time t during power restoration, nj are number of load splits (according to criticality) at bus j, c jl is the economic loss of load l in unit time on bus j, C rep is the economic cost of repair works; N t is a considered time period i.e., week, month, or year; P is probability of occurrence of associated scenario; α is confidence level or α-percentile of system risk index X (loss variable such as ENS), p is its associated probability density value, and E X is the expectation.
In instances where Monte Carlo simulation is employed, resilience, Re, can be modeled after (3). Where Re i is the quantified resilience in instance i and N is the number of considered simulations.

F. RESILIENCE ENHANCEMENT 1) ADAPTATION MEASURES
In several studies [4], [6], [13], [16], [70], the evaluation of baseline resilience is followed by quantifying the system's resilience under adaptation measures. These measures VOLUME 11, 2023   are, in part, tailored to reduce the fragility of components, increase system redundancy, and reduce repair durations in order to reduce components failure, unserved energy, or associated economic costs. They are meant to improve service to the customer by enhancing the reliability and availability of the system although this is not always the case. For instance, Panteli et al. [30] explored the use of defensive islanding (DI) to mitigate cascading failures. In 25% of instances, it was shown to increase the likelihood and magnitude of load shedding. Colloquially, resilience enhancement techniques have in the past been classified as 'smarter, stronger, and bigger' [1], [13]. The stronger techniques are primarily concerned with upgrading infrastructure such as increasing tower strength and grounding lines. The bigger techniques entail expanding the transmission grid or increasing operating reserves whereas the smarter techniques are concerned with faster restoration, special protection schemes, and fast frequency response [1]. Objectively, all adaptation measures are meant to attenuate the system's sensitivity to the threat in view. Table 5 lists several resilience enhancement techniques presented within the common categories of resilience.
The adaptation measures can be applied under partial substitutability or complementarity approaches. Partial substitutability is regarded as the approach in which a single or a combination of solutions can be effected to deliver the same outcome at a lower cost whereas complementarity reckons that the cumulative application of measures reinforces previous gains [1]. The most optimal solution usually encompasses several measures which when judiciously applied prevent scenarios of maladaptation, over-engineering, and high investment costs. Resilience enhancement solutions can also be targeted with respect to system components that are susceptible to failure or to a specific region known to be highly exposed to threats. In contrast, the solutions could be applied generally across the system such as choosing to install double circuit OHLs across the 400kV network. These measures can be decentralized or centralized. It was observed by [7] that decentralized solutions provided superior resilience enhancement albeit at considerably high investment cost. Ultimately enhancement measures can be selected from probable options following an evaluation of corresponding benefits [17].

2) RESILIENCE ENHANCEMENT MODELLING AND OPTIMISATION
Robustness, a measure of enhancing infrastructure resilience, was modeled in [6] by assuming a 20% shift to the right in the baseline fragility curves. This was meant to depict the increase in structural strength resulting from the use of reinforced OHL and tower material or improvements from enforcing rigorous installation standards. Others have modeled robustness by undergrounding OHLs [80], installation of tiger dams around substations [53], hardening critical substations [4], and decentralizing energy systems [84]. The common motivation for enhancing robustness, i.e., infrastructure resilience, is to raise the threshold intensity at which a fault might be probable. In similar ways, most studies have proposed models for redundancy and responsiveness.
Redundancy was modeled in [19] by setting parallel transmission corridors to existing ones. In [82], redundancy was simulated by overlooking the battery limitations and constraints such that the charging level was always maintained between the minimum and maximum level, and the battery did not charge or discharge faster than the acceptable rates. In addition, in [79], redundancy was modeled by improving the remote terminal unit for battery capacity whereas in [84], electric boilers were incorporated within the system as backups to gas boilers.
Responsiveness was modeled by assuming a constant TTR of a given component for all weather intensities [6]. Two obvious limitations were overlooked in this consideration; (i) the extent of the disruption and repair duration is contingent on the level of the threat intensity, and (ii) during an intense event, it might be practically impossible to attend to any repair works. Other variants of these assumptions have incorporated enhanced remote detection [4], an increase in the number of protection teams [53], improvement of the efficiency of line reparation [79], and isolation of critical sections of the grid to stop cascading failure [44].
Building resilience is a resource intensive undertaking that ought to be optimized. That is, resilience should be enhanced within sections that maximize its gains across the system. Therefore, [19] adopted the Resilience Achievement Worth (RAW). The RAW index quantifies the projected increase in resilience by a component when assumed to be unaffected by the extreme event. This is implemented in the models by ensuring that the component is kept in service throughout the entire duration of the event. For example, the RAW index for transmission corridor n, is demonstrated in (4) taking the probability of failure P f ,n of corridor n, to be zero. In this VOLUME 11, 2023 87291 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. case, the RAW index indicates how much more resilience a corridor would contribute if it was operative throughout the disturbance. Other optimization criteria for enhancing resilience include minimization of CVaR of the grid operational costs [82], VoLL to customers [1], and outages to customers [85], as well as compliance to set budget constraints [4], and protection of critical sections within the network [44].

IV. A USE CASE BASED ON A LIGHTNING EVENT A. EVENT CHARACTERISATION
On the 9 th of August 2019 at 15:52:33 UTC, the 400kV transmission OHL between Eaton Socon and Wymondley substations was struck by lightning [86]. It is reported that the cloud-to-ground (CG) strike had a lightning peak current of 3.5 kA and led to a phase-to-earth short circuit [87]. According to the data from the Gridwatch [88] and ESO [89] portals plotted in FIGURE 5, at the time of the event, the peak demand was 29.1 GW of which 838 MW was disconnected following a frequency drop to 48.8 HZ. FIGURE 5 (A) and (B) show a slight mismatch of system response between demand and frequency curves mainly because the plotted data is for two different time scales: 1-second for frequency, and 5 minutes for demand. FIGURE 5 (B) has labels for various timesteps and demand levels, t∈ [t 0 , t 6 ] and R ∈ [R 0 , R 2 ], adopting the nomenclature used on the resilience curve presented in FIGURE 1.
The event fits the 'extreme' category not based on the lightning peak current (LPC) but on the resultant effects on the system's operations and extensive consequences to other critical infrastructure systems. The recorded LPC of 3.5 kA was considerably lower than what was observed by Pizzuti et al. [90] who demonstrated that the absolute average LPC for negative CG lightning incidences was 20 kA whereas the corresponding value for positive CG strikes was as high as 52 kA. In context, according to the cumulative probability model recommended by the IEEE [91], the event's LPC is exceeded 99.7% of the time. In other words, of the 60,000 CG that strike the UK land area annually [92], the event's LPC was only greater than 180 strikes.

B. VULNERABILITY ASSESSMENT
Considering the model for lightning intensity in Wang et al. [55] and also by adopting their fragility function for overhead contact lines as a proxy for transmission OHLs, the probability of failure of the OHL was evaluated. This was done by using a Ground Flash Density of 0.24/km 2 /year derived from the lightning counts in [92], a maximum steepness of LPC of 0.65 kA as recommended by CIGRÉ [91], and 0.12 km as the distance between the OHL and the lightning strike recorded by [87]. The eventual probability of failure was computed as 0.0033%. Based on this failure likelihood, the system was hardly expected to be affected by the lightning strike.

C. SYSTEM RESPONSE
The transients disrupted the operations of the Hornsea One offshore wind farm causing it to de-load 737 MW [86]. More infeed losses of 641 MW were registered at Little Barford power station and about 500 MW of embedded generation. This caused outages to 1.2 million customers and cascading effects on the railway system, airports, hospitals, water supply, and oil refinery [8]. For instance, about 1000 train services were disrupted and two airports were disconnected from the grid and had to switch to backup generation [86]. It is reported that a total of 931 MW of demand was automatically disconnected [87]. This is 10% lower than that depicted in FIGURE 5 possibly because the plotted data was recorded at 5-minute bands or because the portal did not capture data from all demand centers. The frequency and demand were restored in 5 and 40 minutes respectively.

D. RESILIENCE ASSESSMENT
The narrative emerging from the computed metrics in Table 6 (based on data illustrated in FIGURE 5) is that the system had a single interruption that lasted for about 40 minutes leading to unserved energy of 270 MWh which was equivalent to 1.9% of the expected served energy. It is also notable that within the disturbance's worst period, an average of 97.1% of load demand was served. The recovery of demand was twice much faster than its loss and the system spent 5 minutes in its lowest degenerated state. By assuming the VoLL to be £18,534/MWh based on the inflation-adjusted value proposed by London Economics [93], the economic value of outages was estimated to be £5 Million. The actual value of the loss is likely to be significantly more if the cascading consequences are incorporated into this estimation.

E. RESILIENCE ENHANCEMENT
The cause of the frequency excursion and the subsequent demand disconnection was mainly attributed to the malfunctioning of the control system at Hornsea wind farm which triggered the loss of infeed generation greater than the balancing capacity of 1,000 MW which was held at the time. The effects were exacerbated by a lack of timely or unclear inter-organizational communication, poor application of the Low-frequency Demand Disconnection (LFDD) scheme, and poor response of internal protection systems of several critical loads. Since then, the control system software for the wind farm has been updated, new sector communication guidelines issued, protection system settings for the affected category of trains revised, and LFDD services improved especially concerning critical demand. Detailed resilience enhancement measures for this event are documented in [94] and [95].

V. CONCLUSION
The growing dependence of societies on power systems and the projected increase of their outages due to extreme weather events have sparked interest in resilience assessment and enhancement studies. To this end, several studies have proposed different frameworks and metrics tailored to evaluating power systems' resilience and gains from a host of adaptation measures. Although the propositions differ, the general premise is that the evaluation process should be composed of an extreme weather, components, and power system model. The weather model screens and characterizes the extreme events whereas the components model assesses the fragility of the system's structural components. Finally, the power system model uses the outputs of the weather and components' models to determine the system's response. It is upon the system response that baseline resilience is evaluated which in turn forms the basis of applying resilience enhancement measures. This study has reviewed and discussed the different assumptions and models employed in implementing this multiphase process.
In addition, the study demonstrated a basic application of the process by using the impact of a historical lightning event on the GB transmission network. The results show that whilst the threat was insignificant in its intensity, its impact and consequences were exacerbated by systemic maloperation. Resilience enhancement was operationalized through improvements in power generation control systems, inter-organizational communication guidelines, demand disconnection services, and system protection settings for critical loads.
Given that not all extreme weather events result in component failure, this study underscores the necessity of characterizing extreme events based on a violation of internal design limits and evidence of widespread impacts and consequences. The study also demonstrates that most models used in resilience analysis are deductive and therefore a gap in the validation of their results exists. In particular, the threat assessment and the component vulnerability analysis phases attract numerous uncertainties which can be moderated by developing high spatio-temporal resolution empirical fragility models.