Risk Assessment of a Wind Turbine: A New FMECA-Based Tool With RPN Threshold Estimation

A wind turbine is a complex system used to convert the kinetic energy of the wind into electrical energy. During the turbine design phase, a risk assessment is mandatory to reduce the machine downtime and the Operation & Maintenance cost and to ensure service continuity. This paper proposes a procedure based on Failure Modes, Effects, and Criticality Analysis to take into account every possible criticality that could lead to a turbine shutdown. Currently, a standard procedure to be applied for evaluation of the risk priority number threshold is still not available. Trying to fill this need, this paper proposes a new approach for the Risk Priority Number (RPN) prioritization based on a statistical analysis and compares the proposed method with the only three quantitative prioritization techniques found in literature. The proposed procedure was applied to the electrical and electronic components included in a Spanish 2 MW on-shore wind turbine.


I. INTRODUCTION
Wind energy is one of many renewable energy sources that offer an alternative to burning fossil fuels [1] and is now one of the most widely used sources of renewable energy [2]. Wind energy is popular because of the lower investment cost and well-developed technology compared to the other renewable energy sources [3].
In compliance with WindEurope, (i.e. the Association for Wind Energy in Europe), the European Union (EU) is moving toward renewable energy sources, with hundreds of billions invested in renewable energy development and many new installations. About 95% of all new EU power installations in 2018 were for renewable energy: 19.8GW out of a total 20.7GW of new power capacity [4]. To put this into context, in the last ten years, coal and natural gas have been the main form of power generation in Europe, each with a total installed capacity of 150GW to 200GW [4].
The inevitable power fluctuations represent one of the greatest drawbacks of wind energy, as they introduce serious technical challenges into the electric power grid, such The associate editor coordinating the review of this manuscript and approving it for publication was Cristian Zambelli .
as power system quality and reliability, system protection, and power flow control [3]. Moreover, compared to other electricity generation systems, wind turbines (WTs) have relatively higher failure rates because of the harsher operation conditions and higher maintenance costs due to their relative inaccessibility [5].
Consequently, the main purpose of this work is to propose a simple procedure based on the standardized Failure Modes, Effects and Criticality Analysis (FMECA) which must be both cost-effective and cost-efficient. There are few studies on FMECA for wind turbines presented in literature. Some paper simply presents the results of a classical FMEA or FMECA on on-shore or off-shore wind turbine (see for instance but not only [6]- [8]) without explain how to set the optimal risk threshold. Other papers [9], [10] integrate the aspects of traditional FMEA with some economic parameters. Arabian-Hoseynabadi et al. [11] presents the results obtained using a suitable FMEA software package. Tavner et al. [12] uses the FMECA to compare the prospective reliabilities of three versions of the geared R80 turbine with different drive train solutions. Kahrobaee and Asgarpoor [10] presents a quantitative approach called Risk-Based-FMEA, based on the failure probabilities and incurred failure costs instead of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ rating scales. Dinmohammadi and Shafiee [13] develop a fuzzy-FMEA approach for risk and failure mode analysis in offshore wind turbine systems. The proposed approach is helpful to identify the most critical components and optimize the maintenance plan in order to reduce the unprogrammed system downtime due to corrective maintenance operation. Moreover, the international standard IEC 608212 [14] that regulate the FMECA technique misses to consider a method to identify a risk threshold and consequently to divide the failure modes in critical modes and negligible modes, as well as the existing literature on FMECA for wind turbine. Therefore, this paper introduces a new approach to evaluate the optimal risk level based on statistical parameters and compares it with three different threshold estimation method found in literature. Finally, the paper proposes a case study to test and validate the potentiality of the proposed methodology. A horizontalaxis wind turbine is a complex system that can be broken down into several subsystems, including nacelle, rotor, tower, and blades [15], [16]. The nacelle is an enclosure containing the electrical/electronic (the topic of this paper) and mechanical components needed to produce electricity (e.g. gearbox, brake, yaw mechanism, generator, control system, etc.).
Following the guidelines provided by the international standard ISO 14224 (2016) [17], figure 1 illustrates the lowlevel taxonomy of the turbine tested during this analysis (in compliance with [16]). The turbine is divided in twelve different subsystems, and each of them is composed by several subunits and components. The state of the Art for wind turbine taxonomy is RDS-PP R , however in this paper a different approach was chosen because the classical taxonomy leaded by the guidelines included in the international standard ISO 14224 represents a better solution as initial step to carry out the FMECA procedure.

II. FMECA METHODOLOGY FOR ONSHORE WIND TURBINE
Failure modes and effects analysis (FMEA) is a systematic procedure to identify potential failure modes, their causes, and their effects on system performance [14]. FMECA (Failure Modes, Effects, and Criticality Analysis) is an extension of FMEA to include a means of ranking the risk related to the failure modes to allow prioritization of countermeasures. This is done combining the frequency of occurrence rank (usually called O), the severity measure rank (usually called S) and the detection index (usually called D) as follow [14], [18]: More details on FMEA and FMECA processes and applications are given in references [19]- [25]. Table 1 summarizes the factors that influence the criticality index and the rules to assess the rating of each one. The table highlights that parameters O, S and D are generally measured on a 10-point scale wherein greater O and S numbers stand for increasing values of the frequency of occurrence and of the severity respectively, whereas D is ranked in a reversers order, namely the higher the detection value, the lower the detection probability of the failure mode. Consequently, the RPN index assumes values within the range from 1 to 1000; a higher RPN indicates the necessity to solve the failure mode with maximum priority.
The FMECA is a powerful tool to carry out a risk analysis [25]. Therefore, this paper discusses the risk assessment using a FMECA tool of a Spanish onshore 2MW wind turbine located in the region of Aragon. As the number of WT installations continues to grow worldwide, the need for fault detection systems is increasingly important. Since most wind turbines are situated on high towers, installed in remote rural areas or offshore, distributed over large geographic regions, exposed to harsh environment, and subject to relatively high failure rates, their maintenance requires significant effort and cost [26]- [31]. A FMECA makes possible to study every possible problem that might arise from malfunctions of the system being tested and to implement the optimal fault detection and diagnosis system. The investigation should start at the lowest taxonomic level and continue to the equipment unit level.
The first phase of the work focuses on the identification of all the failure modes and their respectively causes for each of the electrical and electronical components inside the turbine. Each failure mode can have several failure causes, and every cause must be included in the FMECA final report. Thus, all the possible scenarios will be considered in the risk assessment. This is an important issue because a neglected cause could produce an un-studied situation linked to risk for the environment, the operator and the system itself, with a consequent loss of availability and safety.
The following step is the failure rate evaluation because the failure rate is an important and useful parameter linked to the failure probability and can be used to rank occurrence. The failure mode probability, usually expressed by α, represents the percentage of time that the equipment will fail in a given mode [32].
Thus, if λ is the failure rate of the component, then the mode failure rate λ (M ) is given by: Table 2 shows the criteria proposed to assess occurrence based on the mode failure rate. As the table shows, occurrence is ranked on a scale from 1 (best case) to 10 (worst case); this scale appears on standard FMECA forms. The rating is based on the methodology proposed in the international standards IEC 60812 (2006) [33] and IEC 60812 (2018) [14]. In particular, a 1-to-10 scale is assessed, where the higher is the mode failure rate, the higher is the occurrence rate. In order to determine the mode failure rate intervals, data coming from the owner of the wind turbine tested are used. In particular, the minimum and the maximum mode failure rate was used to set the range for occurrence O = 1 and occurrence O = 10 respectively. The intermediate ranges are determined in such a way to set them with all the same length.
The consequences of each failure mode on system element operation, function, or status need to be identified, evaluated and recorded. Failure effects are classified as local and global effects. The local effects describe the consequences of a failure mode on the operation, function, or status of the specific item under consideration, while the global effects stand for the consequences on the operation, function, or status of the higher-level taxonomy categorization. In this work, it refers to the effects on the nacelle and the whole wind turbine. In addition, this paper includes two effective parameters to evaluate the risk level: • Turbine functionality: this parameter gives the turbine operational status after the failure: -No impact: the turbine continues its work although the failure mode has occurred. -No impact in the short term: initially the turbine continues its work with all functionality, but a maintenance action is needed. -Reduced: Redundancy and auxiliary systems allow the turbine essential functionality; the turbine continues to provide electricity, but some operations are not available. -Strongly reduced: Most operations are not available; the turbine continues to provide electricity with low efficiency. VOLUME 8, 2020 -Doesn't work: The turbine can't produce electricity.
• Safety loss: This parameter indicates if the failure modes could reduce the safety level, with a consequent risk for the environment, the operator, or the turbine itself. Table 3 shows the rules to assess severity based on the two previous parameters: turbine functionality and safety loss.
At the initial phase of a project, little information about diagnostic systems is generally available. Therefore, detection is classified on a 3-value scale, from 1 (best case) to 3 (worst case), where 2 represents the partially detectable scenario, as shown in Table 4.
This solution is used to mitigate one of the RPN drawbacks that many papers pointed out, that is the same relative importance of O, S and D in equation (1) [18], [34]- [39]. The use of a 1-3 scale introduces a different importance between the three parameters, saving the nature of the standard RPN and giving more weight to Severity and Occurrence.
According to these ratings, using eq. (1), the RPN can assume value in the interval [1; 300].

III. APPLICATION TO E/E/PE COMPONENTS OF THE WIND TURBINE
The wind turbine tested is a G80/2000 machine manufactured by Gamesa Corporación Tecnológica. The turbine is mounted on the top of a 60-meter tubular tower and is operated by Vestas Wind Systems.  This study focuses on the (E/E/PE) components (i.e. electrical/electronic/programmable electronic items) inside the turbine. As Figure 1 shows, all the E/E/PE items are gathered together in two subsystems: the control system and the electrical system.   (6),'' as per ISO 14224 [17]. The items inside the top boxes belong to the ''Subunit (7)'' level; the ''Maintainable Items (8)'' level boxes are at the bottom of the figure.
The control system is a very critical unit characterized by several purposes, such as: • To collect information coming from the SCADA (Supervisory Control and Data Acquisition) system and from the other external sensors; • To communicate with the operating center sending information about the current status of the turbine, including process information and diagnostic data useful for evaluate the health state of the system; • To process the acquired data in order to manage all the turbine functionalities using the actuators, such as the movement of the nacelle toward the wind direction, the activation of the brake when the wind speed is too high, the management of the gearbox and the generator and so on; The electrical equipment unit (see Figure 3) is a generic subsystem containing all the electrical components in the turbine, except the generator. The taxonomy of the electrical level, ''Equipment unit (6),'' shown in Figure 3, contains the following equipment: • A power converter including an IGBT module, a rectifier bridge, a crowbar system and other discrete components; • A PFC system used to improve the power factor; • A soft starter used with AC electrical motors to temporarily reduce the load and torque in the power train and electric current surge of the motor during start-up; 20184 VOLUME 8, 2020   The proposed approach should be carried out at the early phase of the design so that it is more cost-effective and efficient. Field data about components' failures of the turbine under test are not available during design phase, and the statistics available in literature may not be as detailed as necessary for the investigation, therefore are not taken into account. Since there are no specific standards or handbooks containing failure data of wind turbine, then many generic handbooks are used to carry out the functional failure analysis of the G80/2000 WT tested in this work. The main sources are: HDBK-217plus (2015) [40], Telcordia SR-332 (2016) [41], MIL-HDB 338B (1998) [42], IEC TR 62380 (2004) [43], Italtel IRPH (2003) [44] and Siemens SN 29500-1 (2010) [45].
The first section of Table 5 gives an overview of the studied components. The ''Upper level taxonomy'' column includes the higher hierarchical levels; the ''Classification'' column shows the current taxonomy level; the ''Taxonomy'' column identifies the components, and the ''Function'' column explains the objective of the components. The table has a second section for the standard FMEA procedure including the ''Failure Mode'', ''Failure Cause,'' and a detailed explanation of the failure effects, as described in the previous section. Some useful parameters are included in the third section, such as the ''Turbine functionality'' and the ''Safety loss'' used to assess the ''Severity rate'' and the mode failure rate VOLUME 8, 2020

IV. RISK THRESHOLD EVALUATION
The components covered by the FMECA procedure are usually very different from a risk value point of view. The most important failure modes, characterized by a high RPN, should be separated from those characterized by a significantly lower RPN value. The selection of ''high priority'' failure modes is a very critical issue for the development of corrective action plans. The question is: ''How can such separation be achieved?'' The international standard IEC 60812 (2018) [14] which define and standardize the FMECA procedure miss to consider a method to evaluate the RPN threshold, as well as recent literature. Usually companies define this threshold using questionnaires to take into account the judgement of multiple experts in qualitative manner. Only three quantitative approaches were found in literature, and they are explained below.

A. BLUVBAND METHOD
Bluvband et al. [46] and Bluvband and Grabov [47] Note that the slope of the two straight lines f 1 (x) and f 2 (x) is considerably different. In particular, the line that fits the uppermost part of the plot is almost seven times greater than the other line.
The results of the proposed method are illustrated in the Scree Plot in Figure 5.
Analysis of the Scree Plot in Figure 5 makes it possible to define an RPN threshold value that represents the division between the negligible failure modes and the critical failure modes from the risk value point of view.
The threshold can be identified by evaluating the ordinate of the intersection between the two fit lines in Figure 5, and the result is approximately 100.

B. ZHAO METHOD
Zhao et al. [48] propose an alternative method to evaluate the RPN threshold value as follows: • Create Scree plot, following the rules explained in Section IV.A.
• Fix the turning point of the RPN plot linear growth trend using the linear regression method. Fit the RPN values into a straight line and obtain the turning point using the confidence interval. • Determine the threshold value of RPN from the turning point.
The results of the procedure applied on the E/E/PE component of the WT under test considering a 95% confidence level are illustrated in the Scree Plot in Figure 6.
The RPN threshold provided by the Zhao procedure using the linear regression method and with a 95% confidence level is approximately 140. The 1 st -degree polynomial fitting curve is the following: The use of the 80:20 Pareto principle is the most established approach in reliability analysis to rank failure modes according to their RPN value and to optimize corrective actions for critical components. The Pareto diagram is helpful to visualize the differences between the rankings for the failures and effects. The 80:20 principle can be explained as follow: 80% of the total Risk Priority Numbers calculated during the FMECA procedure comes from only the 20% of the potential failure modes.
Pareto analysis starts with the prioritization of failure modes by ranking them in order, from the highest risk priority number to the lowest. The Pareto chart combines a bar graph with a cumulative line graph; the bars are placed from left to right in descending order, while the cumulative line distribution shows the percent contribution of all preceding failures. The combined chart uses the 80:20 rule to indicate where the engineering effort should be focused more [49]- [56].
The results of the analysis are illustrated in Figure 7. Each blue bar stands for the RPN assessment of the corresponding failure mode (y-scale on the left side of the chart), while the red curve represents the cumulative percentage distribution of the RPN (y-scale on the right side of the chart).
According to the 80:20 rule, the RPN threshold provided by the Pareto chart is approximately 48. Figure 7 shows the evaluation of the threshold using Pareto method. The first step is the identification of the 80% of the cumulative distribution of the Risk Priority Numbers, then the RPN threshold value is given by the value of the Risk Priority Number of the failure mode linked to the 80% of the cumulative percentage.

V. A NEW APPROACH FOR RPN THRESHOLD EVALUATION
The three procedures analyzed above give quite different results. The Zhao technique suggests considering only four failure modes inside the group of the most critical failure modes (threshold equal to 140), whereas the Bluvband approach recommends considering 11 failure modes inside this group (threshold equal to 100), and the Pareto chart indicates that 55 failure modes are critical (threshold equal to 48).
Analyzing in detail the obtained results, it is clear that all the previous techniques have some critical drawbacks. For instance, according to the 80:20 rule of the Pareto method, 80% of the criticality should arise from 20% of the causes. The study's results suggest this principle does not fit very well with this kind of application. As a matter of fact, 80% of the RPNs of the E/E/PE components in the wind turbine represent 55% of the failure modes. The Pareto chart cannot be considered a powerful technique to identify the RPN threshold of a system, actually the principle used to select the numerical value of the threshold should be reviewed and specifically defined for each kind of application. In this case, it is absolutely not reasonable select a threshold of 48 indicating that more than half of the failure modes are critical.
Quite the opposite, the Zhao method suggests for the system under test that only four failure modes are critical. More generally, this technique provides untrustworthy results for many applications because of the manner in which the threshold is evaluated. In fact, using this procedure very few risk priority numbers overpass the 95% confidence bound falling in the critical modes group.
The Bluvband method provides interesting results, both threshold value and number of modes considered critical is reasonable. Anyway, the procedure for the threshold evaluation is vague and extremely subjective. According to the authors, the calculated RPNs form a right-skewed distribution, with a first tail on the left and a second tail on the right with very different slopes, but no information about how to divide the distribution in two sections are given. As a consequence, the identification of the threshold is dependent on the judgment of the designer that carry out the procedure. Therefore, a new approach has been introduced to overcomes the limits of the previous methodologies. The proposed procedure consists of the following steps: 1) Calculation of the Risk Priority Numbers according to the guidelines provided in section II; 2) Identification of the main statistical parameters of the RPN set (25 th percentile, mean value, median value, 75 th percentile, outliers, minimum and maximum value); 3) Generation of the boxplot of all the assessed RPNs; 4) The negligible modes are all the failure modes with RPNs below the median value; 5) The critical modes are all the failure modes with RPNs above the 75 th percentile; 6) The interval between the median value and the 75 th percentile is considered ALARP (''as low as reasonably practicable'') region. As the acronyms suggests, the ALARP region refers to reducing risk to a level that is as low as reasonably practicable. In practice, this means that the operator has to show through reasoned and supported arguments that there are no other practicable options that could reasonably be adopted to reduce risks further [57].
If a failure mode is characterized by an RPN value that falls inside the ALARP zone, then designers have to analyze possible countermeasures to reduce the risk bearing in mind the benefits resulting from its acceptance and taking into account the costs of any further reduction. Then designers could choose to apply countermeasures or not based on the previous consideration. The upper and lower limits of the ALARP region must be considered as low as reasonably practicable too.
Instead, if the RPN is above the 75 th percentile then the risk is regarded as intolerable and cannot be justified in any ordinary circumstance, so corrective actions must be implemented.
The proposed approach was applied to the case study described in the previous sections, and the results of the statistical analysis are the following:   • 75 th Percentile: 87 • Outliers: none (considering outliers all the RPNs more than three standard deviations away from the median). Figure 8 shows the boxplot of the RPNs for the WT under test, highlighting with different colors the area of interest. The green zone (below the median) stands for the negligible failures, the yellow region represents the ALARP and the red region (above the 75 th Percentile) indicate the critical failure modes.
In particular, the proposed method suggests 25 failure modes inside the critical group (RPN higher than 87), 27 failure modes inside the ALARP region and 48 negligible modes (RPN lower than 54). Table 6 compares the results obtained with the proposed approach and the other methods (100 failure modes were identified in the subsystems under test).
The threshold to identify the critical modes of the proposed approach falls between Bluvband and Pareto method, as well as the number of critical modes. Considering only the red zone of figure 8, the Boxplot method is a more conservative approach respect to the one proposed by Bluvband. Designers must always choose the best solution in terms of cost and risk level. It is generally more advisable to select the worstcase scenario, that is, the procedure providing the lowest RPN threshold, considering a larger number of failure modes in the critical area. In this application, the worst-case scenario is the 80:20 rule applied in the Pareto chart, but it provides not reasonable results in terms of the cost of the corrective actions. Indeed, it is not possible to apply countermeasures on the 55% of the failure. Therefore, the optimal trade-off between cost and threshold level is provided by the proposed method. Moreover, the new technique allows to introduce also an ALARP zone where each mode could be considered critical or negligible, depending on the scenario.

VI. CONCLUSION
This paper focuses on risk assessment of a 2MW onshore wind turbine using a new procedure based on Failure Mode, Effects, and Criticality Analysis.
The proposed procedure starts with a functional failure analysis that is mandatory during the initial phase of the system design to identify every possible failure mode, failure cause, and failure effect related to the component tested.
Every analyzed failure mode is reported in Figure 4 in RPNs in ascending order, highlighting the frequency of the repetition of each RPN value.
To separate the failure modes into critical and negligible failures, the paper compares three different RPN prioritization procedures: the 80:20 rule applied in the Pareto chart and two graphical procedures proposed respectively by Bluvband and Zhao. The Bluvband method includes 11 failure modes inside the group of the most critical failure modes, but the procedure is vague and extremely subjective. The Zhao method is too optimistic because it provides only two critical modes. The Pareto chart is just the opposite; it is too conservative and considers more than 50% of failure modes as critical. This is mainly linked to the way the Pareto method is defined and evaluated. In theory, the 80:20 rule suggests that 80% of the criticality should arise from 20% of the causes, therefore considering the 80% as threshold value the 20% of the modes should be critical. Actually, the case study presented in this paper highlights that this is not true. With this kind of dataset, the 80:20 relationship is not verified, and the number of critical modes is much higher than the 20%, leading to inaccurate and too conservative results.
Therefore, this paper introduces a new approach based on a statistical analysis and a boxplot to separate negligible and critical modes. The proposed methodology represents the optimal trade-off between cost and threshold level, and it has several advantages: • It is an easy, practical and repeatable solution; • Unlike other methods it takes into account the ALARP region; • It is based on statistical analysis; • It suffers no subjectivity in threshold definition.