Incorporating Wind Modeling Into Electric Grid Outage Risk Prediction and Mitigation Solution

Electric grids are vulnerable to the impacts of extreme weather. Utility companies face the necessity to reduce the number of power outages caused by weather. This paper expands the approach of predicting weather outages in the distribution grid by incorporating wind modeling. The models for the grid outage State of Risk (SoR) prediction are used by utilities to mitigate potential impacts and reduce outage durations. We study the performance of such models when they are enhanced by incorporating data from Wide Area Fine Grid Wind Modeling (WAFGWM). For a given period, WAFGWM produces wind fields that characterize the direction and speed of the wind over the area of interest. The process of extracting features for the Machine Learning (ML) algorithm is described. The new solution is tested utilizing actual grid performance data from a utility company. The results from nested cross-validation obtained on three years of data reveal that the proposed method improves model performance.


I. INTRODUCTION
Network outages have been reported to cause significant equipment damage and socio-economic losses [1]. Addressing the problem of reducing the number and duration of outages remains among the top priorities of utility companies. Strong winds may cause short circuits in distribution systems by tree branches being blown into feeders [2]. At the same time, technology advances raise consumer needs and expectations of reliable electrical supply [3]. Given the importance of enhancing grid resilience, the industry focus has shifted to predicting outage occurrences through enabling a priori mitigation measures for avoiding or reducing the impacts [4], [5]. This paper incorporates wind modeling into the process of assessing the State of Risk (SoR) of outages caused by impacts of weather and other environmental conditions.
Traditionally, utilities deal with outage impacts a posteriori, i.e., restoration actions are taken as a reactive measure The associate editor coordinating the review of this manuscript and approving it for publication was Giacomo Fiumara .
after an outage occurs [6]. The proactive approach where the risk of an outage is predicted in advance has been made possible with the advancements in Geographic Information Systems (GIS) and Machine Learning (ML) algorithms, complemented by Big Data technologies [7], [8], [9], [10], [11]. Using a proactive approach, the utilities can reduce the risk with preventive actions such as tree trimming, work crew positioning, customer notifications, and back-up generator utilization. Such mitigation measures are increasing resilience, optimizing capital and operational expenditures while limiting socio-economic impact and increasing the overall satisfaction of utility customers.
Recent research [12], [13], [14] describes approaches to estimate the number of outages occurring due to thunderstorms using an optimized linear combination of ML models that include Random Forest (RF), Bayesian Additive Regression Tree, Ensemble regression, and Gradient Boosted Tree. Predicted SoR levels [15] are utilized for assessing the resilience of the distribution grid by adopting a regression model combined with the Naive Bayesian model [16]. The risk of distribution transformer failures due to weather is studied in [17]. Prediction of damages to the grid from extreme weather is discussed in [18]. These studies provide essential information on outage prediction in the grid, but they fall short of considering wind behavior comprehensively. Adding local topographic and vegetation effects to SoR models results in a more accurate wind representation.
We hypothesize that the addition of Wide Area Fine Grid Wind Modeling (WAFGWM) features to these methods can boost the performance of outage SoR prediction, allowing grid operators to make a more focused mitigation plan. Our novel refined wind modeling allows predicting possible wind impacts on the grid (e.g., broken wires, tree limbs contacting wires, and debris blown on feeders).
Our contribution is in improving the SoR prediction model with new features extracted from the WAFGWM process, which allows the selection and implementation of mitigation actions to alleviate the detrimental impact of the forced outages on the customers and utilities. The proposed methodology is evaluated on historical data from a utility company. We illustrate the advantages and shortcomings of our enhanced model by comparing its performance to the metrics of the baseline model. Section II discusses the problem background. Section III describes the process to enhance the SoR prediction accuracy using WAFGWM. The software tools for wind modeling are discussed in Section IV. Sections V and VI deal with data preparation and feature extraction. Prediction model training and testing are presented in Section VII, and conclusions are drawn in Section VIII.

II. BACKGROUND
Atmospheric wind analysis and forecast are extensively used in several applications, such as assessing air transport efficiency and safety [19], evaluating air pollution dispersion [20], and understanding the behavior of wildfires and their possible progression scenarios [21]. In the utility industry, 3D wind forecasting is often used to estimate wind power plant efficiency and predict wind power production [22]. In our study, we use atmospheric wind analysis to enhance the performance of the SoR outage prediction model in power grids, which gives the probability of outage occurrence in a part of the network under given weather conditions. A forecasted wind field example is given in Fig. 1. The wind field is a set of points on a regularized grid that characterizes the horizontal wind vectors at a given height. Our study uses a height of 2 ft above treetops, near the typical distribution line height. There are many approaches to wind modeling. They range between two extremes: extrapolation and interpolation of available observations and comprehensive numerical solutions based on the Navier-Stokes equations for fluid motion [23]. One type of model is a mass-consistent model, which is relatively fast and attempts to balance the simplicity of the interpolation approach and intricacy of the complete set of equations by meeting the requirement of conservation of mass [24]. Another class comprises linear models, which are designed to keep both: the conservation of mass and conservation of momentum [25], [26]. We are using the first type of model in our study.

III. WAFGWM PROCESS
We define WAFGWM as a process of calculating wind direction and speed at each small cell in the area of interest with consideration of elevation and underlying type of terrain. The process summarized in Fig. 2 consists of several steps. First, one needs to prepare the datasets necessary for modeling. The data are correlated in time and space, anomalous values are 4374 VOLUME 11, 2023 discarded, and the data are fed to the wind modeling software. In our case, we are using WindNinja [27]. It supports several types of wind modeling, each requiring different weather input data. In general, the inputs include Digital Elevation Model (DEM), surface roughness, and 10-m above-groundlevel (AGL) wind observations from the Automated Surface Observing Systems (ASOS) [28], as detailed in the next section. The datasets need to be subsetted for the appropriate domain and fetched (downloaded) to a local machine (or cloud-based storage) for easier access. Big Data can become overwhelmingly space-consuming without proper subsetting, quickly growing over several terabytes [7].
After the datasets are prepared, cleaned, transformed, and wrangled into a suitable format (detailed below), the software may start modeling wind fields over a specified area at specified points in time. That process may be executed manually or can be automated. When the number of modeled cases is low, manual operation through a Graphical User Interface (GUI) is acceptable. However, for the tasks of implementing and training ML models, a large number of cases are needed, so in our study this process is automated.
The next steps are to analyze the resulting wind fields (ensure the quality of output, estimate the total computation time) and transform them into table-like datasets. That allows them to be joined with other features (dimensions) that are already present in the training dataset for the SoR prediction model. Since the outputs of WAFGWM are wind fields valid at several different times, spatiotemporal colocation is performed to match features to the events in the distribution grid. GIS ArcGIS Pro and its python library arcpy software are utilized in our case [29].
The final steps are to merge the new features with the training and test datasets, run model training and testing, and calculate performance metrics. Based on these metrics, one can assess the impact and importance of new features on the model performance.

IV. WIND MODELING TOOL SELECTION AND USE
A variety of software packages are available for modeling the wind at distribution line height on a grid from scattered 10-m AGL observations. We have chosen WindNinja (WN), which is developed by the USDA Forest Service, Rocky Mountain Research Station [30], for use by wildfire management crews. WN accounts for our need to include the effect of terrain on the wind flow. Two solvers are available in WN for the calculation of wind fields: conservation of mass solver and conservation of mass and momentum solver. The second solver is generally more accurate, but the computational time required is significantly longer. It was shown that WN has difficulties modeling lee-side flow re-circulation during externally forced high wind events [31], which is linked to the absence of a momentum equation in the first mode of modeling [30]. Additionally, point initialization is not available for the mass and momentum solver. In this study, we are using the first solver, mainly due to a large number of cases, since it takes less than one minute for the first solver to calculate one case compared to anywhere between 10-30 min for the second [32].
Required datasets include a DEM file for the modeled area, surface (10-m AGL) wind speed and direction, and specifications for the vegetation type in the area. Several output formats are supported in WN; here, shapefiles ( * .shp) are used.
Another indispensable feature of WN is the command line interface (CLI) that allows one to automate WN runs for several timestamps or different wind conditions. We are using WindNinja's CLI through a python interface, which allows us to keep the entire process of wind modeling and subsequent ML model training and testing in the same environment.

V. DATA PREPARATION
This section discusses the data preparation for each of the input datasets: DEM, Vegetation Type, and Wind Fields.

A. DIGITAL ELEVATION MODEL
In our application, we decided to use the FARSITE comprehensive landscape file ( * .lcp) format because it also contains information about vegetation in the area as one of the data layers. Such landscape files are primarily used to model fire behavior. These types of files include information on elevation, slope, aspect, fuel model, canopy cover, and optionally can have stand height, height to live crown base, crown bulk density, duff loading, and coarse woody profiles [33]. We are using LANDFIRE (LF) 2016 Remap [34] products, specifically LCP 40 Fire Behavior Fuel Models-Scott/Burgan. The dataset reflects ground conditions for a time period close to 2016 for which we had records of outages. The spatial resolution of the acquired file is 30 by 30 meters.
Generally, the output cell size for the wind field is selected with consideration of the task at hand. For SoR predictions, we used 900 by 900 meters output resolution, which allowed us to balance the computational load and modeling accuracy. Smaller cell size substantially increases computation time, while bigger cell size yields coarser, less accurate results.
Another way of decreasing computation time is dividing the area into several smaller parts and merging the resulting wind fields afterward. However, that approach may yield inconsistencies in wind field values along the edges of the modeled region parts. In this paper, the problem of SoR prediction is solved with reference to separate feeders in the grid, which are not uniform in length, location, or shape. Thus, it is natural to model the entire area in a single run.

B. VEGETATION TYPE DATASET
Vegetation type in the area of modeling determines the surface drag due to vegetation and whether the diurnal flow or non-neutral stability is considered during simulation. It is also used for heat flux parameters by WN. By default, three VOLUME 11, 2023 vegetation types are accessible to the users: grass, brush, and trees. One downside of the WN is that the standard way of specifying the vegetation type is by selecting the dominant vegetation type in the entire area, which leads to inaccurate modeling of wind fields when large areas containing disparate vegetation types are considered.
To overcome that challenge, we used previously mentioned landscape files to specify the underlying terrain. When that approach is utilized, WN is forced to calculate surface drag based on the canopy and fuel information enclosed in the landscape file.
Other software for wind modeling may offer an easier way to define drag over the area. For example, Continuum [35] allows users to use well-known National Land Cover Database (NLCD) land cover GeoTiff files [36] as a source of information for the surface drag.

C. WIND SPEED AND DIRECTION
Initialization values for wind speed and direction in the area are used as reference points to ''fit'' the resulting wind field. WN supports three ways of initializing winds in the area: domain average wind, point initialization, and weather model. Point initialization allows one to define wind speeds and direction at several arbitrary points in the area. The software then calculates a balanced wind field that matches the point observations. The calculation process is iterative and is stopped when the simulated values are within 0.1 m/s of the measured values at every observation point. The method is very convenient since one can acquire wind parameters from stationary weather stations that report weather with high time resolution.
We are using the point initialization method in our present study to be able to introduce forecast data in our SoR prediction model. The weather data for the area of interest has been obtained from the ASOS sites by using an API developed by Iowa Environmental Mesonet [37]. A pandas library is used to reformat the weather data for WN from the original ASOS report format. The set of files for one simulation consists of a.csv file for each weather station (WS) in the area, where each line represents one observation at the specified timestamp. Each.csv file needs to incorporate the following information: • Optional parameters used for diurnal non-neutral stability sub-models are also included: • Temperature • Cloud Cover in Percent In addition to the individual.csv file for each location, the summary file, which contains the list of all individual WS files, is created.

D. CLI CONFIGURATION FILE
To make the SoR prediction process scalable to numerous modeling cases, we have automated WN simulation runs through the CLI. The configuration file needed to specify the options for the modeling needs several important parameters are: -num_threads: defines the number of virtual CPU cores used for simulation.
-elevation_file: specifies the path to the DEM file -initialization_method: select from domain average wind, point initialization, and weather model.
-wx_station_filename: path to the summary file with the list of WS files.
We used a python script to form the list of parameters that change for each run (mainly, the timestamp for each simulation) and then passed them as a command to the CLI of WN.
Once the simulation runs were configured, the simulation process was initiated. Initially, there was a total of 11230 timestamps, among which 5615 had one or more faults in the power distribution system. The time difference between subsequent faults resembles heavy-tailed exponential distribution with a mean of 306.4 minutes, a median of 22 minutes, a mode of 1 minute, and a standard deviation of 995.8 minutes. However, wind fields were not properly produced for every case. Some cases did not converge, and some timestamps did not have all the required weather variables for modeling, so no resulting wind field was obtained for them. The number of successfully modeled cases is 10564, which yields an overall 94% rate of success in modeling.

VI. FEATURE EXTRACTION FROM WIND FIELDS
This section details the processing steps needed to extract features from wind fields and incorporate them with the rest of the datasets. The resulting wind field discussed in the previous section is represented as a shapefile, which is suitable for processing with ArcGIS Pro GIS software used for spatial analysis.
First, the DefineProjection function is used to set the projection for a shapefile with wind fields. The datum and projected coordinate system (PCS) (WN only works in PCS) is defined by the elevation file that was used for the modeling.
In our case -the USA Contiguous Albers Equal Area Conic USGS version [38]. The PCS specifies the geographic location of the wind field with respect to the power distribution grid.
Next, to spatially correlate the feeders in the electric grid with the calculated wind field, the spatial join operation is performed (SpatialJoin). The goal is to match individual points in a wind field (each characterized by speed and direction) to the closest feeder in the network. As a result, each of the wind field points belongs to the closest feeder, and the set of such points characterizes wind behavior for each feeder. To guarantee that each feeder would have at least one wind point assigned to it, we used the following parameters for spatial join [39]: join_one_to_one, keep_common, '1273 Meters'. The first specifies that each point must be matched only with one feeder, and the second instructs the program only to consider wind points within a specified distance, which is the last parameter. The spatial extent of 1273 meters is chosen out of consideration for the spatial resolution of the wind fields, which is a rounded length of the hypothenuse of a right-angled triangle with both sides equal to 900 meters.
To summarize the wind behavior for a feeder, we calculated several statistics from wind field values: Mean speed, Max speed, Min speed, Mean direction, Max direction, and Min direction. We exported the resulting attribute table as a.csv file, ingested it by pandas, and calculated the statistical values. In such a way, for each timestamp, each feeder gets six additional wind field features that aggregate wind information over the feeder and adjacent territory.
The described process took 11 hours and 48 minutes to run when utilizing 14 cores of Intel ® Core TM i9-9900 CPU with 3.1 GHz and 64 GB of RAM. Statistics could also be calculated by ArcGIS by appropriate functions; however, we found that processing times are longer in that case.
The last step is to augment the existing features with the new wind features. That is done by a simple join operation using the feeder's name and timestamp as join keys. The result is a training dataset enhanced with six additional wind field features. The forming of the former is discussed in the next section.

VII. ML MODEL TRAINING AND TESTING A. BASELINE MODEL
The base model results for predicting SoR levels of outages in the system are considered to be a benchmark that the new model with additional features needs to be compared against. The foundation for the base model parameters comprises the weather variables obtained from ASOS observations. The training dataset uses the following features (dimensions): Air Temperature, Dew Point Temperature, Wind Direction in degrees from true north, Wind Speed, Wind Gust, and Onehour precipitation for the period from the observation time to the time of the previous hourly precipitation reset, Relative Humidity, Present Weather Codes. To ensure the robustness of the resulting metrics, we used nested cross-validation (CV) for temporal data with ten folds. The process is illustrated in Fig. 3. Size of validation and testing folds are set to 500. A nested CV is known to provide a virtually unbiased measure of performance [40]. We train the model on the training part of the split, optimize hyperparameters on the validation part and, finally, calculate metrics on the testing part. The data is then augmented with the next fold, and the process is repeated. To calculate the final metrics, we average the results of each fold. We note that the data is temporarily sorted, so there is no ''leakage'' of data. The loss function is set to LogLoss.
The following metrics are utilized for performance assessment: Precision, Recall, F1 Score, Area Under the Precision-Recall Curve (PRC AUC), and Area Under the Receiver Operating Characteristic (ROC AUC) [41]. We use Catboost [42] as our ML algorithm. It has demonstrated better performance when compared to alternative methods [43], is easy to implement, and has valuable features [44].

B. MODEL TRAINING WITH WIND FIELD FEATURES
The next step is to evaluate the performance of the ML algorithm trained using enhanced data with incorporated wind field features. The new features are joined based on the feeder ID and timestamps, which ensures spatiotemporal correlation. Having the exact same rows (examples) for both models ensures that the results are comparable. Special care is exercised when preprocessing the datasets because often the preprocessing procedures include dropping rows with missing data. In such cases, the same rows must be dropped from both training datasets.
The newly created features enrich the dataset with information that was not available to the algorithm in the baseline case. The outage SoR levels are dependent on the speed and direction of the wind along the feeder.

C. RESULTS DISCUSSION
The performance results for both models are presented in Table 1. As can be observed, the Catboost model with wind field features has outperformed the baseline model. The difference in ROC AUC is 1.0 %, Precision did not change, Recall has improved by 2.7%, F1 Score increased by 1.7%, and PRC AUC had a change of 1.1%.
The feature's importance is analyzed next to better understand the value of the new proposed features. This is achieved VOLUME 11, 2023   by measuring the difference between the Catboost loss function with and without individual features. Table 2 shows the feature importance where the new wind features are in bold text, while old features are in italic. One can conclude that some of the new features are more important than the original weather parameters: max speed and mean speed have a more considerable weight when compared to the original wind speed. That is most likely due to the fact that the high wind speeds cause outages in the system, and the WAFGWM process is capable of capturing more granular phenomena in the area. A notable fact is that all of the direction features are among the least important features. This shows that for outage prediction, wind direction may be less important than wind speed when considered in absolute terms.

VIII. CONCLUSION
We concluded that the wind fields have a positive effect on the SoR model performance since the wind field characterizes the components that are strongly influenced by the topography and roughness length of the underlying surface that matches the topology of the feeders in the grid.
Specifically, we demonstrated that: • The WAFGWM process improves the wind impact modeling on feeders since features from wind fields incorporate spatiotemporal dependencies.
• The performance of the ML prediction model with the new wind field features outperforms the baseline model with no wind features.
• The incorporation of the wind field features as one of the dimensions for the ML model boosts the SoR model performance. Given the findings, a promising approach is to account for wind direction in relation to feeder orientation. That is, a wind that is perpendicular to the feeder may have a higher impact than a wind directed along the feeder. Therefore, a closer analysis of wind directions is needed. For example, a more granular timescale needs to be used. These topics are left for future research. He is also the Principal Consultant at XpertPower Associates, a consulting firm specializing in power systems data analytics. He has authored over 600 papers, given over 120 seminars, invited lectures, and short courses, and consulted for over 50 companies worldwide. His expertise is in protective relaying, automated power system disturbance analysis, computational intelligence, data analytics, and smart grids. He is a member of the U.S. National Academy of Engineering. He is a CIGRE fellow, an Honorary member and the Distinguished member. He is a Registered Professional Engineer in TX.

Data used in this project is provided by the
KEITH A. BREWSTER (Member, IEEE) received the B.S. degree in meteorology from the University of Utah, in 1981, and the M.S. and Ph.D. degrees from the School of Meteorology, The University of Oklahoma, in 1984 and 1999, respectively. Since 1993, he has been a Senior Research Scientist and the Director of operations at the Center for Analysis and Prediction of Storms, The University of Oklahoma, where he has worked on advancing research in weather radar data analysis, data assimilation, high resolution numerical prediction and ensemble prediction for improving forecasts of thunderstorms, flash flooding, renewable energy production, and winter storms. He has published more than 150 journals and conference papers. He is a member of the American Meteorological Society and the American Geophysical Union. He serves as the Co-Chair for the NOAA Unified Forecasting System Post-Processing Committee. He is on the AMS Nationwide Network of Networks Committee.
ZORAN OBRADOVIC (Senior Member, IEEE) is a Distinguished Professor, the Center Director at Temple University; an Academician with the Academia Europaea (the Academy of Europe); and a Foreign Academician with the Serbian Academy of Sciences and Arts. He has mentored 46 Postdoctoral Fellows and Ph.D. students, many of whom have independent research careers at academic institutions and industrial research laboratories. His research results were published at about 400 data science and complex networks articles addressing challenges related to big, heterogeneous, and spatial-temporal data analytics motivated by applications in healthcare management, power systems, and earth and social sciences. He was the general chair, the program chair, or the track chair of 11 international conferences. He is the Steering Committee Chair of the SIAM Data Mining Conference. He is an Editorial Board Member of 13 journals. He is also the Editor-in-Chief of the Journal of Big Data.