Ensuring a Reliable Operation of Two-Level IGBT-Based Power Converters: A Review of Monitoring and Fault-Tolerant Approaches

Despite the emerging multi-phase and multi-level converters, two-level insulated gate bipolar transistor-based power converters are still widely used in industrial applications in nowadays. Thus, its reliability is of significant importance for ensuring industrial safety. In this paper, a review of the possibilities to ensure the reliable operation of these power converters is presented. The possible approaches are categorized into two groups: condition monitoring and fault-tolerant control. The former approach performs the monitoring methods of power converters, which enables the identification of device degradation to be realized. Accordingly, the mechanisms, indicators, and measuring methods of degradation are demonstrated in this paper to assist the design of condition-based maintenance. In contrast, the latter approach is a post-fault one, where power converters remain the operation by activating the fault-tolerant units after faults are identified. For this approach, fault detection, fault isolation, and tolerant strategies are essential. Finally, the performance and cost-effectiveness of the two categories are discussed in this paper.


I. INTRODUCTION
Insulated gate bipolar transistors (IGBTs)-based two-level converters are still widely adopted in today's industrial applications, e.g., wind turbine power systems [1], [2], photovoltaic (PV) systems [3], [4] and electrified vehicles [5]- [9]. In those applications, the power converters may face harsh conditions, which as a consequence, challenges the reliability, being the main lifetime-limiting component. For instance, according to the study in [10], power converters account for 34% of all the failures in electrified railway traction systems, as shown in Fig. 1. A similar conclusion has been revealed in [11] (see Fig. 2) that PV inverters contribute 37% of overall unscheduled maintenance events. The power converter failures lead to 59% of the total unplanned maintenance The associate editor coordinating the review of this manuscript and approving it for publication was Yu Wang . expenditure. Furthermore, based on an industry-based survey in [12], the most fragile components in power converters are semiconductor power devices, contributing to around 31% of all the converter failures.
There are two types of failures in IGBT power devices: 1) wear-out failures and 2) catastrophic failures [13]. The first category of failure is package-related, where the failure is a consequence of accumulated damage due to the temperature, vibration, and humidity stresses on the devices [12]. Among those stressors, the temperature, more specific, the junction temperature is considered the most critical failure-inducer in IGBT power devices, especially for the wear-out failure [14], [15]. This can be further demonstrated using the cross-section view of an IGBT device, as shown in Fig. 3, where the wear-out failure images and the coefficient of thermal expansion (CTE) of the materials of a wire-bonded IGBT can be observed [15]- [20]. Due to the   [11].
CTE mismatch between the adjacent layers, bond wires, solder joints, and metallization layers, thermo-mechanical stresses in the case of temperature swings can accelerate the degradation (i.e., wear-out), leading to various wear-out failures like the bond wire lift-off, bond wire cracking, and metallization reconstruction. On the contrary, the IGBT catastrophic failures are not accumulated failures but triggered by a single overheat, overvoltage or overcurrent event. They are categorized as open-circuit faults and short-circuit faults. Open-circuit faults may be caused by the failure of gate drivers or bond wires, while short-circuit faults may occur due to unclamped inductive switching, high-temperature latchup, second breakdown, or energy shocks [13]. Therefore, ensuring reliable operation of power electronic converters is of importance to avoid unexpected downtime of the entire systems, and hence to lower the maintenance efforts.
Many attempts should be made to address the two failures. In the literature, there are two ways to cope with the above IGBT failures, and by doing so, the reliability of the entire power converters can be improved. The first approach is based on reliability assessment schemes, which provides information about the health status of the power devices. By evaluating the degradation level, the conditionbased maintenance can be performed before the catastrophic failures take place, which could reduce the maintenance costs and potential loss. In contrast, the other approach attempts to improve reliability by increasing the system operation redundancy for the two-level converters. That is to say, it is a postfault approach. When faults are identified, the power electronic converters can still operate by activating the redundant unit (e.g., reconfiguring the power converter system). Clearly, the fault-tolerant strategies inevitably require more power semiconductor devices as well as more sophisticated control schemes (i.e., fault identification, isolation, and reconfiguration). Consequently, costs are rising. Nevertheless, it is highly application-dependent to select a cost-effective approach to guarantee the reliable operation of power electronic converters.
In light of the above, this paper reviews the possibilities to enhance power converter reliability. The performance and cost of the methods are benchmarked considering different application conditions, where the selection criteria are also presented. The rest of this paper is organized as follows. An overview of the monitoring methods is given in Section II, including degradation mechanisms and indicator extraction methods. In Section III, fault detection, isolation, and tolerant strategies are discussed. A comparison of these approaches under different working conditions is presented in Section IV. Finally, concluding remarks and discussions are provided in Section V.  [15]- [17].

II. MONITORING OF DEGRADATIONS
In order to achieve the highly-reliable operation of power converters, an alternative is to evaluate the health state and then perform pre-fault schemes before the converter system fail. This strategy requires an understanding of how the converter will degrade and how degradation can be identified, which are demonstrated in part A and B, respectively.
A. WEAR-OUT MONITORING IGBT power devices age over time, and in this case, some parameters will drift away from the initial values accordingly. Thus, the parametric changes can be the indicators of wear-out failures. If the monitored parameters exceed the thresholds, an early warning replacing the aged device will be provided. Depending on failure mechanisms, different monitoring techniques should be applied. Thus, the following firstly reviews the failure mechanisms, which is followed by the corresponding indicators.

1) BOND WIRE FATIGUE MECHANISM
Power chips are wire-bonded during packaging. The wire bonds suffer from severe temperature swings due to switching. As shown in Fig. 3, there is a significant CTE mismatch between the bond wires (aluminum-based) and chips (siliconbased), and thus the shear stress is introduced to the interface. This further imposes repeated flexures on the aluminium wires, i.e., fatigue [21]. The bond wire fatigue, leading to the bond wire lift-off and/or bond wire cracking, is one of the dominant failures in IGBT modules. The bond wire liftoff initiates from the fracture at the tail of the bond, and it propagates to the center. Eventually, the bond-wire is lift off and loses the electrical contact with the IGBT chip because of the spring effect of the aluminium wire loop [20]. For the bond wire cracking, according to [22], the heel of the bond wire suffers from larger stresses than other parts, and it becomes the most critical part in terms of cracking.
Nonetheless, the above two failures will contribute to an increase in the total resistance of the bond wires. Accordingly, the on-state collector-emitter voltage V ce,on will increase [15], [18], [23]- [28]. Hence, the voltage V ce,on can be an observable variable for bond-wire fatigue. For instance, an increase of 5 % [15,18], 15% [25] and 20% [29] in the voltage V ce,on from the initial value are considered as the thresholds. In [30], the threshold was set as 1500/I Rated with I Rated being the rated current of the power device. Although the increase of the resistance leads to the on-state voltage increase, the change is sensitive and affected by various factors. First, the voltage V ce,on is a temperature-sensitive electrical parameter (TSEP) [31], and it changes with the junction temperature (T j ) during operation. For example, in [25], a sudden drop followed by a sudden increase in the voltage V ce,on was reported. It is because the solder fatigue occurs, the on-state voltage V ce,on decreases dramatically, when the IGBT operates in the negative thermal coefficient area. Then, if the bond wire fatigue happens, the on-state voltage V ce,on will increase. In order to eliminate the thermal effect, compensations should be applied. In [26], [32], the relationship between the on-state voltage V ce,on and the junction temperature T j is determined through power cycling tests. Subsequently, the thermal effect can be compensated by subtracting the temperature -induced voltage changes. Additionally, variations of V ce,on are relatively small, compared to the original voltage, which requires high-resolution measurements of the voltage V ce,on . This may increase the cost of the hardware system. Meanwhile, small variations may be hidden due to the load current or temperature changes under harsh operating conditions [33].
The bond wire crack and lift-off will affect the parasitic parameters. In return, the changes of the parasitic parameters 89990 VOLUME 8, 2020 will inevitably affect the IGBT gate characteristics [34]. Therefore, the change of gate characteristics can be another indicator of the bond wire fatigue [16], [35]- [37]. More specifically, the bond wire lift-off will decrease the gateemitter capacitance and increase the parasitic inductance. Hence, the voltage drop induced between the power emitter and auxiliary emitter V EE ' increases and the Miller-plateau duration t gp declines [38], [39]. Meanwhile, the gate-emitter voltage V ge and the collector-emitter voltage V ce will rise faster during turn-on and turn-off, respectively, compared to the case without bond-wire fatigue [35]. In addition, the metallization reconstruction and solder fatigue take place simultaneously with the bond wire degradation. Fortunately, the change of gate characteristics caused by the bond wire fatigue is not significant [37]. However, the results are entirely different in [16] in terms of the gate-emitter voltage V ge . There are no significant changes in the voltage V ge until all the bond wires are lifted off, which is in contrast with the results in [35]. Consequently, more attempts should be made to explore how different degradations contribute to the changes in the gate (switching) characteristics, and how the characteristics correlate to early wear-out failures.

2) SOLDER FATIGUE MECHANISM
The solder fatigue is another dominant wear-out failure in IGBT modules. It is typically induced by the temperature wing and CTE mismatch. There are two solder layers in one IGBT device, as shown in Fig. 3: one between the chip and the substrate and one between the substrate and the baseplate. Due to the thermo-mechanical stress, delamination incidents, cracks and voids can occur in solder joints. Typically, cracks start from the edge of the joints, decrease the thermal dissipation path and increase the thermal resistance (R th ), which eventually leads to a higher junction temperature T j . In the positive temperature coefficient area, the increased junction temperature results in more power losses and a further higher junction temperature. Hence, the positive feedback accelerates the failure of the IGBT device. To prevent this, the junction-to-case thermal resistance R thjc (or junction-to-case thermal impedance) [40]- [42], and the junction temperature T j [43], [44], are commonly used as indicators to monitor the solder fatigue in IGBT. Furthermore, considering the non-uniform distribution of the thermal resistance R th on the case, due to the solder fatigue, the ratio of the junction-tocase-center resistance to the junction-to-case-edge thermal resistance can also be utilized as an indicator [45]. Besides, the chip solder degradation leads to the rise of the gatecollector capacitance and trans-conductance, decrease of the gate-emitter, which eventually cause the decline of voltage change rate dv ce /dt and increase of current change rate dI c /dt during turn-on [46], [47]. Additionally, because the junction temperature T j influences the turn-off transient, low-order harmonics will be affected, when the solder fatigue occurs. Therefore, the 5th-order harmonic can also be used to monitor the solder fatigue incident [48].

3) METALLIZATION RECONSTRUCTION MECHANISM
The aluminium metallization layer is deposited on the chip, providing electrical connection between the power dies and the emitter. Meanwhile, it maintains off-state parasitic components, which are inherent to the structures of power components [49]. The CTE mismatch between the chip (Si) and the aluminium layer (Al) is the reason for the metallization reconstruction, as shown in Fig. 3. Subsequently, the huge thermo-mechanical stress may exceed the elastic limit of the thin aluminium film, which may cause plastic deformation at the grain boundaries, leading to the extrusion of aluminium grains or cavitation effects at the grain boundaries [50]. In this case, the active cross-section of the metallization is reduced and the sheet resistance R sheet increases linearly [50]- [55]. Thus, the collector-emitter voltage V ce,on will have a linear increase, which can indicate the metallization reconstruction [45], [56].

4) GATE OXIDE DEGRADATION MECHANISM
The gate oxide degrades along with the above degradations. It is because the high temperature (e.g., a high electric filed) causes time-dependent dielectric breakdown or a high current causes hot electrons [57]. Due to the accumulated charges in the gate oxide, the capacitance-voltage characteristics will shift along with the gate voltage, and the threshold voltage (V th ) will increase, consequently [24]. In addition, the electron injection may degrade the quality of the oxide, and lead to an increase in the gate leakage current (I ges ) [37], [57]. Further, the accumulated charges and the gate oxide degradation increase the gate oxide capacitance and the Miller capacitance, which extends the t gp [39], [58].
The major degradations in IGBT modules are discussed in the above and summarized in TABLE 1. The mechanisms of those degradations are also shown, where the observable variables to indicate the corresponding degradation have also been presented. The prior-art methods using those indicators to identify degradations can be found in the literature, which are also given in TABLE 1. The main problem of using the indicators to monitor the health state of the IGBT is that the indicators are influenced by more than one failure mechanisms. For example, t gp is affected by both the bond wire lift-off and the gate oxide degradation, while their effects to t gp are opposite. Furthermore, most of the indicators are temperature dependent, i.e., they cannot be able to reflect a correct health state unless they are measured at the same temperature. Thus, new indicators that dedicated to one failure mechanism or indicator combination that can effectively monitor the health state of the IGBT are expected in the future.

B. INDICATOR MEASUREMENT
The condition of the IGBTs can be monitored through the indicators. In this part, methods that are utilized to do so in the literature will be reviewed and compared.  Relay-based Vce,on measurement circuits [28], [56].

1) MONITORING THE COLLECTOR-EMITTER ON-STATE VOLTAGE V ce,on
The on-state voltage V ce,on can indicate the condition of the IGBT devices in terms of bond wire fatigue and metallization reconstruction. However, V ce,on does not vary significantly, and thus measurements of a high resolution are required. In addition, the high voltage from the DC-link should be isolated and blocked to protect the measurement circuit, when the IGBT is off. Relays [28], [56], MOSFETs [15[, [18], [33], diodes [28], [59], [60], and multiplexers [25] are common blocking devices, which can be seen in TABLE 2.

a: RELAY-BASED METHODS
All the relay-based extraction methods for V ce,on are offline schemes, since the response of relays is much longer than that of IGBT devices. In [56] (see Fig. 4(a)), the Analog-Digital Converter (ADC) is directly connected to the IGBT through a reed relay. When the power converter is operating, the relay is open to block the high voltage from the power electronic converter. Only when it stops or is forced to stop because of the measurement routine, a control signal will be generated to close the relay, and the ADC is then connected to the IGBT device, starting the measurement. Meanwhile, a simple current injection switching sequence is applied to turn on each IGBT. Thus, the on-state voltage V ce,on of each IGBT can be measured with high noise immunity and high resolution (2-3 mV), as reported. The method in [28] (Fig. 4(b)) is similar to that in [56]. The difference is that it uses an amplifier to extract V ce,on . Nonetheless, the relay-based measurement is a relatively simple but offline approach.

b: MOSFET-BASED METHODS
Since MOSFETs switches at a higher switching frequency than IGBTs, MOSFETs are also used to block the voltage as an online or quasi-online method to measure the voltage V ce,on . In [49] (Fig. 5(c)), when the IGBT is turned on, V ce decreases, and the falling edge will be detected by the falling edge detector. Following, the monostable multivibrator triggers the MOSFET to measure V ce,on . In [18], an online method to measure the voltage under high and low currents is proposed, as shown in Fig. 5(a). In this method, taking the upper switch as an example, for the measurement of V ce,on under a high current, AS 2 is turned off, and the following are obtained (1) 89992 VOLUME 8, 2020
where V RC1 is the voltage across R C1 , V ASC1 is the Drain-Source voltage of AS C1 , V C1 indicates the voltage of the current source and V SG1 represents the Source-Gate voltage of AS C1 . According to (2), when the power converter is operating and IGBT1 is on, AS C1 will turn on because V ce,on is small and V SG1 is larger than the threshold voltage of AS C1 (V SGTH 1 ). Then, it can be obtained from (1) that V ce,on is almost equal to V RC1 , as the on-state resistance of AS C1 is relatively small compared to R C1 . When IGBT1 is off, the collector-emitter voltage V ce increases and V SG1 is smaller than V SGTH 1 , and AS C1 is turned off. Consequently, the offstate high voltage is blocked. To measure the voltage V ce,on under a low current, AS 2 is turned on at the current zerocrossing, and R TJ 2 is set to 0.1×V dc to inject a current of 100 mA into IGBT1. Then, V ce,on can be measured as mentioned above. This needs a short period of zero load currents, which may affect the normal operation. Finally, V RC1 (V ce,on ) is generated on the resistor R LA1 by the amplifier, which gives a current i m1 equal to VR C1 /R LA1 , passing to the ADC through AS LA1 . This current can be a few dozens of milliamperes. Thus, a high-immunity transmission is guaranteed and the galvanic insulation is avoided. In [15], a simple method to measure V ce,on is proposed, using a depletion-mode small-signal MOSFET with auxiliary components (see Fig. 5(b)). Notably, when the voltage V CC is larger than the collector-emitter voltage V ce,on , the input impedance of the measurement circuit becomes higher. When the upper switch T UH is turned on, V C = V ce,on < V CC . In addition to the high impedance, I D cannot flow through the MOSFET, and V GS = 0. Thus, the MOSFET is turned on and V out = V ce,on . When the upper switch T UH is turned off, V ce increases, and finally exceeds V CC . Then, I D flows through the MOSFET, and a voltage drop across R is produced, which makes V GS smaller than the voltage threshold. Therefore, the MOSFET is turned off, the high voltage is blocked, and V out is clamped to V CC . However, because of the low voltage rating of the MOSFET, more MOSFETs will be required to block the DC-link voltage in high-voltage applications. VOLUME 8, 2020 FIGURE 6. Diode-based Vce,on measurement circuits [28], [59], [60].

c: DIODE-BASED METHODS
Compared to the relay-based and MOSFET-based methods, measuring the voltage V ce,on with diodes is a cheaper solution, especially for higher voltage applications. Fig. 6(a) shows a Zener diode-based circuit that measures the collector-emitter voltage [28]. The off-stage voltage is clamped by the Zener diode, and another diode with low stray capacitance is connected in series to reduce the stray capacitance effect. When the IGBT is turned on, V ce,on can be measured by the amplifier. The resistors are used to limit currents and minimize the common-mode error. Nevertheless, when the IGBT is on, the clamped voltage on the resistance may induce variations, which may affect the operation of the power converter. In addition, limited by the rating of the Zener diode, this circuit in Fig. 6(a) is only effective and safe under 600-V off-stage voltages.
In [61], a measurement circuit for higher voltages is thus proposed ( Fig. 6(b)), where D 1 and D 2 are forward-biased by the current source when the IGBT is turned off. Thus, the off-stage voltage is blocked. When the IGBT is turned on, V ce, . This is realized with two amplifiers by setting R 5 = R 6 and R 2 /R 1 = R 4 /R 3 . In this case, the difference between the two diodes may lead to measurement errors.

d: MULTIPLEXER-BASED METHODS
In [25], a two-to-one multiplexer circuit is used to filter out the off-state data. An amplifier is adopted to scale down the on-state voltage. When the gate signal is active, the scaled voltage V ce,on will be selected as the output. Otherwise, zero will be the output. Limited by the scaling circuit, this extractor cannot work under high voltage. The detailed circuit is given in Fig. 7.
The on-state voltage of IGBT measuring methods have been discussed above and concluded in TABLE 2. It can be a reference when selecting appropriate and cost-effective methods to measure or monitoring the collector-emitter voltage, through which the fatigue assessment in IGBT modules can be enabled. The relay-based methods are preferred for the offline conditions as they can fulfil the task with the lowest cost. However, the diode outperforms the other methods if both the cost and the online performance are taken into consideration.
Note that V ce,on depends on the temperature and current, which requires the measurement at the same working point and same temperature. To tackle this problem, [62] decomposes the voltage into three parts and gives the temperature and current dependent formula of each part. In this way, the degradation caused component can be estimated even the temperature or current varies. It should be noted that the accuracy will be limited by the precision of the formula. Besides, the so-called inflexion point where V ce,on almost keeps constant with different temperatures could be another solution to avoid the influence of the temperature [63].

2) MONITORING THE JUNCTION TEMPERATURE T j
The junction temperature T j can be measured by a thermocouple sensor or infrared camera directly, or by the thermosensitive electrical parameters (TSEPs) indirectly. In practice, however, it is impossible to perform the junction temperature measurement directly without modifying the package or housing (e.g., open the module and remove dielectric gel). As a result, the direct measurement is limited for temperature monitoring in practice. Although NTC thermistors are provided in most modules, the calibration between T j and the thermistor is lacked. TSEPs, dependent on the junction temperature T j , are therefore preferred to estimate the junction temperature, since they can be measured directly.

a: TEMPERATURE ESTIMATION BY V ce,Ihigh
The relationship between V ce,Ihigh and T j is not exactly linear. With an acceptable range of errors, the relationship can be 89994 VOLUME 8, 2020 approximated as [65], [66] V ce,Ihigh where V ce,Ihigh is the measured collector-emitter voltage under high currents, T jo denotes the base junction temperature, V ceo is the on-state collector-emitter voltage at T jo , T r indicates the series resistance temperature, T ro represents the series resistance base temperature, r o is the series resistance at T jo , k Vceo and k ro indicate the temperature coefficient of V ce and r o , repsectively, V ge is the gate-emitter voltage variation, k ge denotes the coefficient of V ge , and I c means the collector current.
Practically, it is difficult to measure V ce from the module terminals because the chip is packaged inside the module. The measured voltage V ce,Ihigh consists of V ce and the series packaging resistance voltage. In addition, V ce,Ihigh is affected by T j , T r and V ge . However, V ge can be treated as a constant in normal operation mode. Therefore, T j can be estimated as In the literature, it is considered that T r is equivalent to T j . Assuming that the temperature distribution is homogeneous when the coefficients k ro and k vceo are identified by experiments. T j_est is then obtained as However, it should be pointed out that T r and T j are not equal due to the non-uniform temperature distribution. In order to improve accuracy, T r should be estimated as a prerequisite. According to [66], it gives where T H is the heatsink temperature and α is the scaling factor that is obtained by experiments. With (6), the temperature T j can be estimated accurately as (7) in which V ce,B is the voltage V ce at the base temperature. It can be found in (7) that the estimated temperature T j using V ce,Ihigh is sensitive to the coefficients k Vceo , k ro , and the collector current I c . The sensitivity varies from 1 mV/ • C (300 A) to 5 mV/ • C (1000 A) [67]. In addition, the online junction temperature estimation requires measuring V ce,Ihigh , I c , and some predefined parameters.  Unlike the positive temperature coefficient for the high current, it can be seen that the temperature coefficient for the low current is negative. Compared to the method using V ce,Ihigh , utilizing V ce,Ilow neglects the series packaging resistance because its voltage drop is relatively low. At the same time, the self-heating effect can be eliminated. Thus, it has higher accuracy and stronger linearity than the previous method (i.e., using V ce,Ihigh ). In this case, the relation can be expressed as where k Vceo is the temperature coefficient for the low current. Fig. 9 depicts the voltage V ce,Ilow of an IGBT as a function of the junction T j under various low currents. It can be observed that the voltage change rate in respect to the junction temperature is within the range of −0.19 mV/ • C to −0.28 mV/ • C, where the current varies from 0.5 mA to 1 A. Although the injection of lower currents leads to a higher change rate (absolute value), it becomes non-linear at high temperatures, as shown in Fig. 9. In most applications, the load current is normally much higher than the currents in Fig. 9. In this case, the power converter should be stopped first to allow the low current injection to estimate the junction temperature. This may be not easy to implement. Nonetheless, online current injection and voltage VOLUME 8, 2020 FIGURE 10. Temperature estimation using the resistance R G,int : (a)R G,int measurement circuit [72] and (b) gate driver RLC network [71].
drop measurement strategies have been proposed in [68], [70]. However, the injection and measurement window may decrease system performance. It should be noted that the low current should be injected to measure the voltage V ce,Ilow immediately, after the load current is suspended and the transient has passed. By doing so, the maximum error of the estimated junction temperature T j caused by the cooling system can be reduced. The advantage of this method is that the self-heating effects can be avoided and the packaging degradation caused voltage drop is negligible.

c: TEMPERATURE ESTIMATION BY R G,int
The thermo-sensitive resistance R G,int is a resistor in the center of the die. By calculating its temperature, the junction temperature T j can be estimated. Nevertheless, it is almost impossible to implement the measurement during the converter operation without opening the module and adding measurement circuits, as shown in Fig. 10(a). To improve this, Baker et al. proposed a peak gate current method, where R G,int is considered as the equivalent series resistance (ESR) of both the gate-emitter capacitor C ge and the gate-collector capacitor C gc . Therefore, the gate driver RLC network can be depicted in Fig. 10(b) [71]. During the turn-on delay, both C ge and C gc are stable before the gate voltage reaches the threshold voltage V th . The gate current I g can be taken as a step response of the RLC network and the parasitic gate inductor should satisfy R 2 > 4L/C. Hence, the RLC network is overdamped, and I g can be approximated by Given that the gate capacitance is stable, the gate inductor then has a negligible effect on the overdamped circuit. Assuming that the external resistor is not strongly dependent on the temperature, the resistance R G,int can be estimated by (10) when the peak gate current is detected.
It should be noted that the gate driver voltage is affected by the high dV ce /dt and temperature. To create an exact step voltage, the difference between the gate voltage before (V G,neg ) and after turn-on (V G,pos ) is utilized. Considering that the voltage is easier to measure than the current, V peak /R G,ext is used to calculate the peak current, where V peak is the FIGURE 11. Temperature estimation using the resistance R G,int : (a) gate peak voltage detection circuit and (b) calibration betweenR G,int andT j [71].
peak voltage on the external gate resistor. Fig. 11(a) shows the detection circuit for V peak [71]. Finally, R G,int can be measured as Then, with calibration, T j can be estimated, as shown in Fig. 11 (b) [71]. There is a strong linear relationship between the resistance and the estimated temperature, as shown in Fig.  11(b). The advantage of this method is that it is immune to the load current and the measurement circuit can be integrated into the driver. Nevertheless, measurement errors may appear due to the assumptions.

d: TEMPERATURE ESTIMATION BY I sat
If the self-heating effect is neglected, the IGBT saturation current can be calculated as [77] where β PNP indicates the PNP transistor current gain, µ ns is the surface mobility of electron, C OX denotes the oxide capacitance, Z c is the channel width, and L c is the channel length. It is nonlinear and there is the coupling with other thermosensitive parameters, as demonstrated in Fig. 12 [31]. Due to the nonlinearity and the coupling, it is not recommended to estimate T j using the saturation current I sat .

e: TEMPERATURE ESTIMATION BY I sc
In [78], a short-circuit current-based estimation method is proposed. The short-circuit pulse is introduced by a bypass IGBT, as shown in Fig. 13(a). When the Device under Test (DUT) is on, the bypass IGBT will be triggered for a short time to create a short current pulse, whose amplitude is approximately linear in respect to the junction temperature T j . By measuring the amplitude of the short-circuit current, T j can be estimated, as exemplified in Fig. 13(b). This method has a relatively high sensitivity which is about −0.35 • C/A. Meanwhile, it is immune to the DC-link voltage. However, due to the risk of short-circuit, an additional protection scheme is required. Furthermore, additional control for the bypass IGBT and high-current sensor for the short-circuit pulse are needed to realize this. 89996 VOLUME 8, 2020

f: TEMPERATURE ESTIMATION BY V ge
The gate-emitter voltage V ge is another TSEP for the junction temperature estimation. Berning et al. proposed a circuit to estimate T j , as shown in Fig. 14(a) [91]. In order to eliminate the voltage spike induced by the oscillation-free gate resistor, the difference between the cathode voltage and the gate voltage is utilized. The results show good linearity and the sensitivity is within 11.6 mV/ • C to 13 mV/ • C, with the collector current being 1 A to 25 A., as shown in Fig. 14(b). However, this method is an offline approach, as the bias current source, consisting of a 62-V voltage source and a 75-k resistor, will affect the normal conversion operation.

g: TEMPERATURE ESTIMATION BY V th
The threshold voltage V th of an IGBT is the gate-emitter voltage when the device begins to turn-on. According to [81], V th can be described as (13) in which κ is the Boltzmann constant, q denotes the elementary charge, N A indicates the concentration of acceptors, N D implies concentration of donors, Q f is the fixed oxide charge, Q m is the charge of mobile ions, Q ot indicates the intrinsic charge within the oxide, C ox denotes area specific capacity of the oxide layer, ε Si indicates permittivity of silicon, and n i is intrinsic carrier density. From (13), it can be found that V th only depends on the junction temperature T j , as all the other parameters are fixed. However, it is not strictly linear. Fortunately, in the range of operating temperatures, the relationship is nearly linear. In this case, V th decreases linearly with the temperature due to the positive correlation between n i and T j . The sensitivity varies from −6 mV/ • C to −9 mV/ • C in different conditions [81], [92].
In order to measure V th online, the parasitic inductance L σ E between the Kelvin and power emitter terminals is utilized, as discussed in [81], [82], [83] (see Fig. 15(b)). When the IGBT starts to turn-on and the current begins to flow through L σ E , the parasitic voltage V EE is induced. Then, by comparing V EE with the reference voltage, the trigger pulse is produced. With NAND-gates and the driver output voltage, the measurement pulse is generated, which can enable the sample-and-hold (SH) gate to hold V th and disable the SH when the freewheeling diode is on. This measurement circuit can be integrated into the gate driver due to the common ground reference. However, the reference voltage should be FIGURE 15. Temperature estimation using V th : (a) calibration withT j (b) measurement circuit [81], [82], [92]. set carefully. Otherwise, V th may not be measured as the turnon transient is also relevant to the junction temperature.

h: TEMPERATURE ESTIMATION BY V gp
The Miller-plateau voltage V gp can be calculated by (14) [84].
where g m is the trans-conductance. Since V th decreases and g m increases with the rise of the temperature, it can be concluded that V gp decreases monotonically with the temperature rise. In practice, it is difficult to measure V gp because R G,int is inaccessible. Thus, one has to estimate V gp through V meas by (15).
Note that R G,int is temperature dependent, the compensation must be made before using it to estimate T j . According to the sensitive and error analysis, the precise knowledge of g m is crucial for the estimation. However, the small error of the current is acceptable which implies that the averaged phase current utilized in the control software is applicable here. The comparison between the prediction and the sensing results for a hybrid IGBT module FS800R07A2E3B13 is given in Fig. 16. The sensitivity is 1.5 to 7 mV/K over the entire operating range. Consider the fact that it is affected by the load current, the current dependency should be understood first. Otherwise, it must work at the same current, which limits its application.

i: TEMPERATURE ESTIMATION BY t don
The turn-on delay t don is the time between the start of the gateemitter voltage V ge rising and the beginning of the collect  current I c rising. It can be described as [85] As aforementioned, R G,int increases and V th decreases with the increase of the junction temperature T j . As a result, t don is sensitive to the junction temperature. More specifically, it increases with the rise of temperature when the temperature effect of C g is neglected. When referring to the dependency, V th only depends on T j and R G,int is only affected by T j . As for C g , it consists of the oxide capacitance C ox and depletion capacitance C dep . C ox can be seen as a constant, while C dep is governed by where A represents the surface area of the capacitor and e 0 indicates the unit charge. It is dependent on the DClink voltage V dc [85]. In addition, the intrinsic carrier concentration increases at higher temperatures, which indicates that C dep increases along with the temperature increase. Therefore, t don is monotonically temperature-dependent and it is affected by V dc . However, it is immune to the load current. Consider that V dc is almost kept constant, this method is more applicable than the current-dependent methods. The delay time t don versus the junction temperature T j is depicted in Fig. 17, and the sensitivity of T j is about 2 ns/ • C.

j: TEMPERATURE ESTIMATION BY dI c /dt max,on
The turn-on current slope is related to the gate-emitter voltage change rate as where α PNP is the gain of the inherent bipolar transistor, µ is the mobility, W is the width, and L is the length of the MOS channel. According to (16), the temperaturedependence caused by α PNP , µ and V th on dI c /dt max,on can be obtained [85]. The correlation between dI c /dt max,on and T j under different currents and voltages is shown in Fig. 18(a). The relation is not as linear as the TSEPs above. As for the sensitivity, it is about 40 A/(µs· • C), which is affected by the DC-link voltage and load current.

k: TEMPERATURE ESTIMATION BY t off
Equation (19) gives the description of the turn-off time t off [82]. As it contains g m and V th -both decrease with the increase of the junction temperature T j , the turn-off time t off is also a TSEP. The correlation between t off and T j under different currents and voltages is shown in Fig. 18(b).
where R represents the gate resistance and C ISS is the input capacitance.

l: TEMPERATURE ESTIMATION BY t doff
The turn-off delay t doff can be divided into three parts, denoted as t 1 , t 2 , and t 3 , which are shown in Fig. 19 and described as (20)∼(22) [87]. where V gp is the Miller-Plateau voltage, L M presents the half physical length under gate region, n ac denotes carrier concentration, J c is the collector current density, and J ch indicates the electron current density reduction in the MOS channel under the gate region.
As shown in (18), the first staget 1 is mainly affected by R G and V gp . Consider that both of them have positive temperature coefficients, R G and V gp increase with the rising junction temperature T j , and similarly, t 1 increases. For t 2 and t 3 , when T j increases, α PNP and n drl increase at the same time [93], [94]. In addition, J ch can be described as implying that it decreases with the temperature rise. Therefore, t 2 and t 3 will increase when the junction temperature T j rises. Due to the reason that t 1 , t 2 and t 3 increase with T j , the turn-off delay t doff increases monotonically, when T j goes up. Further, it can be found from (18)∼(20) that t doff is also affected by the load current and DC-link voltage. The correlation between t doff and T j under different load currents and voltages is shown in Fig. 20. It indicates that the sensitivity is about 4 ns/ • C. However, it varies slightly according to the load currents and DC-link voltages. In Fig. 19, it can be found that there are voltage pulses across L σ E at the beginning and the end of the turn-off delay, both of which can be utilized to measure t doff .

m: TEMPERATURE ESTIMATION BY V EE max
In the turn-off period, the negative voltage pulse across L σ E is induced by the drop of I c , which can be described as [88]   where V EE is the voltage between the Kelvin emitter and power emitter. Hence, instead of measuring dI/dt max,on , the maximum voltage V EE ,max can also be used to estimate the junction temperature. The correlation between V EE ,max and T j under different load currents and voltages is depicted in Fig. 21. The sensitivity varies from −29.11 mV/ • C to −74.72 mV/ • C, depending on the DC-link current and load current. In [89], T j with respect to the V EE ,max and I c is modelled through leastsquares fitting in the form of (25). Hence, it could be more practicable in the real-time T j estimation.
The temperature dependence of V fb could be attributed to the ionic contaminants induced mobile oxide charges Q ox which changes with temperature [90]. Thus, according to (26), V fb is a TSEP.
The measuring circuit and the different values of V fb are given in Fig. 22(a) and Fig. 22(b), respectively [90]. The gate capacitance reduces sharply when the gate voltage reaches V fb due to the depletion capacitor is added in series with the oxide capacitor. As a result, the gate voltage increases faster before it reaches V th . Two differentiators and two comparators are utilized to capture this moment and trigger the ADC to sample V fb at this time. The results of an Infineon FF1000R17IE4 module is given in Fig. 22(b) with the sensitivity at about 3.1 mV/ • C. It turns out that the temperature dependency of V fb is not strictly linear, which may limit the practical performance. Nevertheless, its advantage is that it is measured before the device conducting the load current, i.e., it can work well under different working condition.
The above methods to estimate T j are further benchmarked in TABLE 3, in terms of online performance, selectivity, linearity, sensitivity, additional hardware, effects on converter performance, and integrability. The selectivity represents the factors that can affect TSEPs. Due to the DC-link voltage is kept as constant in most cases, the collector current is the main factor that influences the estimation performance. The accurate current measurement may require additional expensive current sensors, which increases the costs. In this sense, the R G,int and V fb methods outperform the rest methods. The linearity shows the theoretical accuracy of the estimation method, where the higher linearity leads to higher accuracy. The sensitivity is a derivative of the TSEPs concerning T j . A higher sensitivity indicates the larger variation of TSEPs with the same junction temperature rise, which can deal with noise and measurement errors. Additional hardware evaluates the cost of the corresponding estimation method. It can be concluded from TABLE 3 that the TSEPs measured through the gate or auxiliary terminals are much cheaper because they are free of the high voltage or current. Normally, they can be integrated into the gate driver at the same time. Thus, this kind of TESPs has greater potential in commercial products. The converter performance effect indicates if the performance would be affected by the TSEP measurement. Another concern of T j estimation by TSEPs is that most TSEPs are affected by the device degradation. For example, the parasitic inductance and gate capacitance vary with the fatigue of the package and gate oxide, which could lead to significant errors of the gate-or auxiliary-terminal-based methods. In this sense, V ce,Ilow method has advantages because the sensing current is such low that the voltage deviation caused by the package degradation is negligible.

3) MONITORING THE JUNCTION-TO-CASE THERMAL RESISTANCE R thjc
The junction-to-case thermal resistance R thjc can be calculated as where T j is the junction temperature that can be estimated by TSEPs, T c is the case temperature that can be measured directly, and P loss denotes the power loss, including the switching loss and conduction loss. The power loss P loss can be obtained online with a predefined lookup table [45], [57] or curve-fitted model from datasheet [95], [96]. Then, the variation of the calculated R thjc can be monitored. Furthermore, R thjc indicates the degradation of the IGBT to a certain extent. Hence, Dawei et al. use R th to monitor the solder fatigue with a case-above-ambient temperature [45]. By calibrating P tot with different working points for the healthy IGBT, the power loss response-surface is obtained. In addition, a Cauer thermal network of the heatsink VOLUME 8, 2020 FIGURE 23. Using power losses to monitor the thermal resistance: (a) power losses vs. the case temperature and (b) flowchart to calculate the thermal resistance change R th [42].

FIGURE 24.
Control structure for the harmonic resonance and suppression to monitor device degradation [48].
is developed to calculate the real-time power loss with the case-to-ambient temperature. Besides, the solder fatigue will result in the resistance change R th between chip and substrate or between substrate and baseplate, depending on the solder layer type. Then, V ce,on rises with the subsequently increased T j , which makes a higher power loss P tot , as illustrated in Fig. 23(a). Finally, R th can be obtained through the flowchart shown in Fig. 23(b). However, it should be careful if the ambient temperature is measured accurately when other heat sources are presented in the application scenery.
Notably, the thermal resistance change R th is induced by the reduced thermal dissipation path. Consider the fact that the crack propagates from the edge to the center, the temperature of the case bottom surface declines while the temperature in the center of the case bottom surface increases [97]. This characteristic can be represented by the ratio of the junctionto-case-center thermal resistance to the junction-to-case-edge thermal resistance. Compared to R thjc method, it is not costeffective, while it eliminates the influence of the different operation points without all calibrations and is free of ambient temperature.

4) MONITORING THE 5 th HARMONIC
The 5th-order harmonic voltage can be extracted by the converter controller without additional hardware, as shown in Fig. 24 and discussed in [48]. The inner-loop harmonic resonance controller amplifies the small error before and after IGBT ageing to enhance the measurement accuracy. In Fig. 24, v * hc is forced by the outer loop to follow the harmonics produced by the inverter, and then the harmonic voltage can be measured. This method is cost-effective, as it requires no additional hardware. However, the system should operate at the setpoint, which makes it difficult to measure the harmonic online. Additionally, the degraded IGBT cannot be identified as the degradation is detected at the system level, i.e., the confidential level of the identified degradation is low.

5) MONITORING THE MILLER-PLATEAU DURATION T GP
The circuitry that measures t gp is shown in Fig. 25 [39]. The RC network receives the gate signal and outputs the differential results to provide the time instant before and after the Miller-plateau. Then, the signal tracking circuit and voltage divider R6, R7, R8 give the adaptive voltage reference for the comparator, so that the circuitry can work under different working points. Next, the output of the differentiator is compared to the adaptive voltage reference to generate the double-pulse signal which implies the information of t gp . Finally, an isolator is used to separate the analogue circuit and digital circuit. There are some details should be noted in the measuring circuitry. First of all, C1 should be small enough so that the gate transients will not be influenced. Besides, R1 should be small enough to ensure high bandwidth and large enough to provide a detectable signal. Meanwhile, the time constant of the RC network should be smaller than 1/10 of t gp . In fact, the measured time interval is not exactly the same with t gp , while it is precise enough to monitor the state of the IGBT.

III. TOLERANCE OF THE CATASTROPHIC FAILURES
The catastrophic failure is caused by overstresses or wear-out, which makes the IGBT uncontrollable. It can be classified into open-circuit failure and short-circuit failure. Disconnections between the chip and terminal or the driver and the terminal may induce open-circuit failures. The former disconnection results from the bond-wire lift-off or bondwire rupture under high short-circuit currents. In contrast, the latter is mainly caused by vibration, corrosion, and driver failures. The short-circuit failures may be the consequences of high gate voltages, external failures, latch-up and rapid increases of intrinsic temperatures due to the second breakdown or energy shock, high voltage breakdown or thermal runaway [13]. Short-circuit failures can occur during turn-on transients or on-state operation, which is related to the above mechanisms.
As the focus of this paper is to provide the reliability improvement methods for two-level IGBT-based converters, only this kind of converters are considered below. Denoting S k (k = a, b, c) as the switching state function. S k = 1means the upper IGBT turns on while the lower one turns off and the opposite for S k = 0. Then, the phase voltage can be expressed in (28). Under normal condition, the estimated phase voltage is close to the measured phase voltage and the voltage error e kn = 0. Taking the measurement error, discretizing error, and non-ideal switching characteristics like switching delay and dead time into consideration, e kn is not strictly equal to 0. Thus, the voltage threshold h and time threshold T are adopted to avoid the false alarms, which are given as h = 10V and T = 50T s . Where T s is the switching period [98], [99].
For a rectifier, however, the phase-to-phase voltage can be expressed in (29). where u xy,est is the estimated phase-to-phase rectifier voltage and e xy represents the phase-to-phase grid voltage. Then, the switching state and the corresponding error can be estimated by (30).
   S xy,est = u xy,est V dc ε xy = S xy − S xy,est (30) Thus, if T 1 fault happens, switch state error for phase A will be 1 while the errors for the rest two phases are 0. This leads to ε ab greater than the threshold T th , ε ca smaller than -T th and ε bc = 0. Accordingly, all the single switch fault can be diagnosed through  [106] As the state model is application-dependent, the two-level voltage source inverter fed induction machine drive system (see Fig. 26) is taken as an example for illustration purpose. It can be described by with Taking the Luenberger observer [88] as an example (other observers, e.g., PI observer, can also be adopted), the stator current in the dq-frame can be observed by Then, the residuals can be obtained as Similar to the Luenberger observer, the first-order sliding mode observer can also be utilized with the form of (34) [106].ẋ = Ax + Bu + I s (34) where represents the observer gain and I s is the switching vector. The ratio r n of the mean absolute value of measured current and observed current can be calculated in (35). For the normal condition, the ratio is close to 1 because the two currents are almost identical to each other. For one-switch fault condition, however, the ratio is calculated in (36), indicating r n is smaller than 0.318. As for the open-phase condition, the measured current approaches to 0 and r n is about equal to 0. Hence, the fault can be detected when r n is smaller than a threshold K d .
where e n is the faulty component in the observed current which is close to the measured current maximum amplitude I m .
The fault identification indicator s n is defined in (37). When it is normal, the mean value of the observed current is close to 0, and thus, s n is equal to 0. When an open-switch fault occurs, the observed fault component provides a DC bias which makes the observed current totally positive or negative. Hence, s n equals to 1 for upper switch fault and −1 for lower switch fault. s n = î n dt î n dt (37) c: MIXED LOGICAL DYNAMIC MODEL [16], [107], [108] For a three-phase two-level converter, it can also be described by where v kg is the voltage between the phase k (k = a, b, c) and the negative pole of the DC-link, s 1 ∼ s 6 represent the control signal of the corresponding IGBT, V dc denotes the DC-link voltage, and δ k represents the current direction of each phase (positive if flowing into the load). Then, the converter can be represented by the mixed logical dynamic model following   u an u bn u cn 90004 VOLUME 8, 2020 Subsequently, the residual is generated as where δ' denotes the discrete input of the real plant (considering the control signal of the open-circuit IGBT as 0), and δ is the discrete input for the observer (generated by the controller). Finally, the fault can be detected if the residual exceeds the threshold and the fault type can be recognized according to the residual vector phase as shown in (46) and  TABLE 6.
Nevertheless, when applying this method to a single-phase converter, the diagonal IGBTs cannot be separated unless extra operations are performed to the converter [107]. In order to settle this problem, the changing rate of the residual is adopted to identify the faults, as shown in (47). By adding the switch information to the fault indicators, all the fault types can be identified without extra operations.

d: MODEL REFERENCE ADAPTIVE SYSTEM [109]
For a permanent magnet synchronous motor drive system with a 2-level converter, as shown in Fig. 23, current dynamics in the dq-frame including the open-circuit-induced voltage distortions can be represented by where λ m is the flux linkage established by the permanent magnet. For this reference model, it assumed that the voltage distortions are zero in a healthy model. Thus, it can be described as Combining (48) and (49) Considering the dead-time effect, the threshold voltage is selected as V threshold = m × V dead with m being a positive constant that can minimise the noise/dead-time effect induced false error detection and V dead being voltage distortion caused by the dead-time effect. By transforming v q_dist and v d_dist to the variables in the abc-frame, the Boolean errors can be obtained as Based on (51), the faulty switch can be recognized though  TABLE 7.
Overall, the model-based methods can detect and identify single-switch open fault and phase open fault effectively. Yet its ability to diagnose the double-switch fault has not been reported, which requires further investigations.

2) SIGNAL-BASED APPROACH
Normally, the signal-based approach utilizes the intrinsic characteristic of the faulty converter, which means that the current or voltage behaves differently under healthy and faulty conditions, including the current trajectory pattern, VOLUME 8, 2020 the mean current (DC current), the reference value and the current distortion. Thus, those signals are employed to identify the faults.
a: CURRENT PATTERN [110]- [112] The αβ components of AC currents can be obtained through the Clarke transformation as In a healthy condition, the current trajectory in the αβplane is a circle. When one or two IGBTs are in the opencircuit fault mode, the trajectory deviates from the circle to become a sector. Depending on the fault type, the sector has different sizes and angles, as demonstrated in Fig. 27. Therefore, the current trajectory pattern recognition can be one way to detect and identify the fault type.
The easiest way to recognize the fault pattern is to calculate the slope of the αβ-current as where i jk and i jk−1 represent the sampling at k and k−1 instant (j = α, β). However, it is only effective for the one-switch fault case. Also, it is not able to distinguish the two faulty switches in the same leg. An extra measure that is used to detect the missing half of the phase current is needed to identify the faulty switch, as shown in Fig. 27. Alternatively, the entire circle can be divided into 24 sections. Define a 24-dimensional vector whose value is given as follows. If the fault pattern vector is in one section, the corresponding element in the vector is denoted as 1. Otherwise, it is −1. Because the fault patterns are different, each fault has a unique identification vector, based on which the faulty switch can be recognized. Additionally, the sector size, the mass center angle, and the difference between the maximum angle and the minimum angle of the sector can also be utilized to identify the faulty switch. In this case, the normalized current is recommended to eliminate the load effects on the sector size, which is given as b: MEAN CURRENT [113]- [121] If one IGBT is in the open-circuit fault, the three-phase current of the vector-controlled drive system is shown in Fig. 28  enable the multi-fault diagnosis.
where · means the average value, K 0 denotes the threshold voltage that is set as 5% of the rated current, l, m, n ∈ (a, b, c) with l = m = n, and S n is the auxiliary variable defined as two times the ratio between the mean absolute value of the target phase, and that of the sum of the rest two phases. Besides, D n (k) is defined as the normalized mean current and W n (k) is the long interval of near-zero currents, and then, S n (k) is utilized to overcome the ill-condition of R n (k) when two faults are in the same leg. Combining R n (k) and S n (k), in total, 27 kinds of faults can be diagnosed. Consider that the current frequency in the drive system is varying according to the motor speed, the variable parameter moving average method is adopted to calculate the mean value adaptively [121].
Alternatively, the normalized current can be given as Under normal conditions, the mean value of the average normalized current can be calculated as where ω s is the current frequency. When an open-circuit fault occurs in the system, one of |i nN | will be larger than 0.5198, and thus, the fault can be detected. However, |i nN | carries only the phase information, and that is, the faulty IGBT cannot be identified. Therefore, the mean value of the normalized current should be considered. By classifying |i nN | into 4 stages and i nN into positive and negative states, all single-and double-switch fault conditions can be properly detected and identified [115]. Consider that the diagnostic algorithm may not be reliable when the current approach to zero, i nN is calculated only when the current is larger than 2% of the rated current in [118]. Nevertheless, the fault under low current may be missed. It is recommended in [120] that the inverse absolute phase current can be used to avoid this problem, as shown in (59). Notably, the absolute mean value of the angle of deviation of the Clarke trajectory |φ| can be added into the diagnostic system to prevent it from false alarms and to enhance its robustness [117].
Apart from the open-switch fault, the intermittent faults which caused by electromagnetic interference or components ageing also exist in industrial applications. In this case, fuzzy logic can be used to identify the faults effectively [119].
However, it must be pointed out that double-switch faults, involving two upper (or two lower) transistor failure, and triple faults, involving also the lower (upper) transistor in the remaining leg, are indistinguishable. For example, the double fault of T 1 and T 3 is indistinguishable from the triple fault of T 1 , T 3 , and T 6 . Besides, the triple-switch fault with two switches in the same leg and the quadruple-switch fault with two healthy switches on the opposite sides of different legs are indistinguishable. For example, the triple-switch fault of T 1 , T 2 and T 3 cannot be distinguished from the triple-switch fault of T 1 , T 2 and T 6 , nor can it be distinguished from the quadruple-switch fault of T 1 , T 2 , T 3 and T 6 .
c: REFERENCE VALUE [122], [123] If an open-circuit fault in a three-phase two-level converter occurs, some switch combinations of the converter cannot be reached. As a result, errors are produced, since the reference value cannot be tracked perfectly. Further, the controller will try to overcome this by adjusting the reference. With those considerations, the current and voltage reference values can be utilized to diagnose the faults. For instance, if T 1 is faulty (open-circuit fault), the positive half-cycle is zero. Then, the phase-A mean reference current error normalized by the mean absolute value of the current (d a ) can be calculated as where I m is the current amplitude. It is also true for other IGBT faulty conditions. However, it cannot identify the phase open condition, and again, (56) is adopted to solve this issue. d: CURRENT DISTORTION [124] As mentioned earlier that the IGBT open switch fault will lead to the disappearance of the positive or negative half cycle VOLUME 8, 2020 of the phase current. On this basis, fault detection can be realised by a zero-crossing detector. Then, by identifying the increasing or decreasing trends of all the phases, the single switch fault can be recognized [124].
It can be found that the signal-based methods are rather simple that only requires a few mathematical operations. This makes it easy to integrate them into the control unit as only a little calculation resource needed. It also worth to point out that both single and double switch faults can be diagnosed by this approach.

3) DATA-DRIVEN APPROACH
The data-driven methods do not require the precise model of the target system. It fulfils the fault detection and localization by means of machine learning. The first step of these methods is to extract the fault features which include the wavelet coefficients, wavelet energy, raw currents, etc. Then, they will be fed to the artificial neural network, which could be the conventional BP neural network or emerging deep network, for training. At last, the well-trained network will be used to finish the open-switch fault diagnosis. It should be pointed out this approach requires the huge scale of data.

a: FEATURE EXTRACTION
The most basic fault feature is the phase current itself with a certain length. In [125], 150 sampling points are acquired with the sampling frequency of 900 Hz to train the network. The length of the current for the diagnostic accuracy for a random vector functional network (RVFL) has been discussed in [126], which shows high accuracy can be achieved if the current length exceeds 60 ms. Besides, the double chain quantum genetic algorithm can be utilized to optimise the current length and the denoising sparse autoencoder can extract the fault feature automatically [127], [128]. In [129], each phase current is shifted by 120 degrees and 240 degrees and performed the Clark transformation to generate the direct currents in d-q axis. Wavelet decomposition is another widely adopted method to extract the fault feature in both timeand frequency-domain. The coefficients of each detail part are used in [130] and the number of decomposition level is determined by the sampling frequency and signal frequency. On the other hand, the energy of each detail part is calculated after the decomposition, after which the principal component analysis is used to reduce the feature dimension. Thus, the training efficiency can be improved [131].

b: NETWORKS
The BP neural network is the most common one in the literature which consists of one hidden layer [129], [130]. Nevertheless, the performance of the BP neural network is not so satisfying. For example, further steps have to be performed on the outputs of the BP neural network to confirm the fault diagnosis results in some cases [129]. Ensemble learning is an optional solution to improve the performance of neural networks. A classifier that consists of 200 single trained RVFL is constructed to diagnose the IGBT open-switch fault in a converter in [126]. Combining with the decisionmaking process, e.g., the voting process, the faulty IGBT can be detected and identified. Alternatively, the performance can also be improved through the deep network. The 7hidden-layer sparse autoencoder based deep neural network and 128-hidden-layer long short-term memory network have been proved to be able to provide accurate diagnostic results [125], [128].
The comparison of the above methods is given in TABLE 9 in terms of the diagnostic time, load independency, and complexity. Note that the diagnostic time of the data-driven methods are not given because they are not performed in the control unit like other methods do, i.e., the data are transmitted to the host PC to finish the diagnosis. Meanwhile, it implies that the data-driven approach is more complex than the other two approaches. The huge scale data requirement is another bottleneck for some applications. Its advantage lies in that so long as the different condition data are fed to train the network, it can provide reliable diagnostic results. The diagnostic time of both model-based and signal-based methods are comparable and both of them are simple enough to be integrated into the control unit. However, the signal-based methods are less dependent on the load variation because the normalization will be performed before the diagnostic process. So far, it seems that more papers are focused on signalbased methods and deep learning methods in recent years. The reason could be the simplicity and the effectiveness of the former one and the potential for big data application for the later one.

B. SHORT-CIRCUIT FAULT DETECTION METHODS
The short-circuit fault can be categorised into the hard switch fault (HSF) and the fault-under-load (FUL) [132]. The HSF is referred to as the case when the IGBT is turned-on under short-circuit conditions, while the FUL occurs when the IGBT is on under normal conditions. For the normal conditions, V ce is still high, I c is zero and V ge starts to increase. At this point, I g mainly charges C ge , as C gc is much smaller than C ge because of the small C dep . When V ge reaches V th , I g begins to increase until it rises to the load current and V ce keeps dropping toward the saturation voltage. Then, C gc becomes large, and charged by I g , which causes the socalled Miller plateau. After that, V ge goes up to the gate input voltage V D and V ce reaches V ce,on . Under the HSF condition, however, V ce cannot be changed, which keeps C gc small. Consequently, the Miller plateau of V ge disappears. Meanwhile, I c rises fast to the short-circuit current. When the FUL occurs under an on-state IGBT, I c increases sharply, which causes the IGBT to quite the saturation region and V ce rises quickly from V ce,on to V dc . Therefore, C gc goes back to a small value and the displacement current from C gc to the gate circuit is produced, which increases V ge . The characteristics of the normal, HSF, and FUL conditions are described in Fig. 29, based on which the short-circuit can be detected. 90008 VOLUME 8, 2020   [133]- [136] According to the above analysis, the gate charge characteristics under normal conditions and short-circuit fault conditions can be obtained as Fig. 30. Under the normal condition, the amount of gate charge is larger than that under the HSF condition, when V ge is higher than the Miller plateau voltage V gp , and that under the FUL condition when V ge is higher than V D . Therefore, the threshold voltage and charge can be set, as shown in Fig. 30. In this case, if both above V ref and under Q ref are met, the short-circuit fault can be confirmed. Fig. 31 shows the circuit of the above method. The voltage across the gate resistor is sampled by a differential amplifier and is integrated to obtain the gate charge. Then, it is fed to a comparator to check if it exceeds Q ref or not. The gateemitter voltage V ge is compared with V ref at Comparator1. Following, an AND gate is utilized to combine the two results to achieve the detection. b: MILLER PLATEAU TIME METHOD [137], [138] For the HSF, the Miller plateau disappears as shown in Fig. 29. Hence, the time difference between V T and V T + 5V changes under the normal condition and short-circuit condition. The measurement circuit is shown in Fig. 32. Three amplifiers are adopted to obtain the gate-emitter voltage, which can eliminate the interference caused by the emitter inductance. Then, a hysteresis comparator circuit is utilized to generate the detection pulse. This pulse enables the capacitor to be charged when it is active. With different Miller plateau time, the capacitor has a differently charged voltage, which can represent the time difference. By comparing the capacitor voltage with the threshold, the HSF can be detected. Note that the FUL cannot be detected with this method. c: GATE VOLTAGE METHOD WITH mILLER PLATEAU [139], [140] It has been demonstrated earlier that the gate voltage will rise when the FUL happens. Thus, it could be a good indicator to VOLUME 8, 2020   detect the FUL [141]. It is also true for the high short-circuit inductance case for the HSF because V ce reduces significantly which cause the dV gc /d t , and thus, the current produced by the voltage variation increases the gate voltage. However, it will be not reliable for the low inductive HSF as V ce reduces slightly [139]. Consequently, the gate voltage and the Miller plateau are combined to detect the short-circuit [139], [140]. The detecting circuit is given in Fig. 33. The gate voltage is fed to a filter which converts the two rising edges before and after the Miller plateau into two pulses. Then, the pulses drive the T flip-flop to generate a pulse under the normal operation, which will be latched high when HSF occurs. V ref 2 is set as (61).

1) GATE-BASED APPROACH a: GATE CHARGE METHOD
where V gp,HSF is the Miller plateau voltage threshold under HSF condition. Thus, both the outputs of CMP2 and T flip-flop are high when the HSF happens. Accordingly, the AND gate output a signal of HSF. V ref 3 is set such that it exceeds the gate voltage supply V D while smaller than the gate voltage under FUL V ge,FUL . A possible value is given in (62). Hence, when the voltage drop across R 2 is higher than V ref 3 , the FUL can be detected.
2) COLLECTOR-CURRENT-BASED APPROACH It can be seen from Fig. 29 that the collector current I c rises sharply to a high level under both the HSF and the FUL. Consequently, the current slop can be utilized to detect the short-circuit fault.
a: DIRECT METHOD [142]- [144] In this method, the collect current I c is measured directly by a direct current1-current transformer (DCCT). To achieve reliable detection results, three sample-and-hold (H/S) circuits that can generate two current slopes are adopted with two phase-shifted clocks 1 and 2 (clock 2 lags clock 1 by a half period). This configuration can sample two consecutive slops with the help of two different calculators. Then, comparing both the slopes with the threshold. If the slope exceeds the threshold continuously for two samples, the short-circuit of the IGBT can be determined. Alternatively, the current can also be measured by a shunt resistor [144]. By comparing the sensed voltage with a predefined threshold, the short circuit can be recognized. Another idea is to separate a small part from the main IGBT to form a so-called sense emitter, by which the user can measure the current easily [143]. b: di/dt METHOD [145]- [148] The costs of the direct methods are relatively high either for the user or the manufacturer. To reduce the cost, the di/dt methods, which get free from the high voltage, are introduced. It is well known that the current variation generates the magnetic flux intensity variation, and thus, the electromotive force will be produced in a coil nearby. A coil near the busbar and a printed circuit board Rogowski coil have been applied in [147], [148]. Also, the electromotive force can be integrated to obtain the current. Therefore, both the current change rate and the current amplitude can be utilized to detect the short-circuit fault. It should be careful with the commonmode voltage when measuring the electromotive force and the DC offset during integrating. The cost can be reduced further by making use of the auxiliary emitter according to [145]. The detection circuit is shown in Fig. 34. When the shortcircuit occurs, the rapidly increased current leads to a voltage V EE on the stray inductor between the power emitter E and Kelvin emitter E . If an RC filter is applied in parallel with the stray inductance L EE , the transient steep short current can be described as where C f , R f and V o are the filter capacitance, filter resistance, and output voltage of the filter, respectively. It can be found from (63) that I c is proportional to V o with constant C f , R f and L EE . Under the normal condition, the S-terminal of the R-S latch is in a high state. After the short-circuit, the S-terminal can reach the maximum allowable low-level input voltage of the latch circuit. Setting the R-terminal to ''1'' for a single mode and ''PWM'' for multiple modes, the fault can be detected.
3) DE-SATURATE-APPROACH [132], [149]- [152] According to Fig. 29, the gate voltage is consistent with the collector-emitter voltage under the normal condition. When the short-circuit fault occurs, however, they are on different trends. With this concept, the fault detection circuit can be designed as Fig. 35. The comparator is locked by an AND logic operator during the switch-off period. When the IGBT is turned-on, V ce measured by the diode D1 is compared with the threshold voltage V ref . Under the normal condition, the saturation voltage V ce is quite low and V ref is set higher than that. Thus, the AND gate outputs a ''0'' signal. If the short-circuit fault occurs at this time, IGBT will be out of the saturated region. Because the short current and V ce will rise and rapidly exceed V ref .
Then, the fault is detected, and further protection measures will be implemented. However, it should be noted that the voltage V ce will be higher than V ref for a while during the turn-on transient when the AND gate is activated. To avoid false alarms, a delay is introduced, in such a way that the comparator remains locked before the voltage V ce reaches V ce,on .
The detection time, costs and detection performance of the above short-circuit detection methods are compared in TABLE 10. The gate-based and de-saturation methods are faster than the direct current method because they can finish the detection during the transient processes. In terms of the cost, it is obvious that the direct current method and the desaturate method are more expensive due to the demand of the high current or voltage components. The gate-based approach and the auxiliary-emitter-based approach are preferred to detect the short circuit because of the fast diagnostic speed and low costs. However, the parasitic parameters variation caused by the degradation should be taken into consideration when setting the thresholds, by which false alarms can be avoided.

C. FAULT ISOLATION CIRCUIT
After a fault is detected, it should be isolated from the main circuit by the fault isolation circuit as soon as possible. Thus, the fault effect propagation can be limited. It is easy to isolate the open-circuit faults by blocking the corresponding gate signals of the faulty IGBTs, as shown in Fig. 36(a). For the short-circuit fault, however, fuses and potential additional components are required. VOLUME 8, 2020 [153]- [155] The method isolates the faulty devices by blowing out the fuse. Fig. 36(b) shows a circuit that can deal with the single switch short-circuit fault. When assuming that T p is shorted and has been detected by the short-circuit detection methods, the TRIAC TR will be triggered. As a result, the fuse F l will be burned by the shoot-through current, and the faulty leg is isolated. The half DC-link voltage can be obtained through a split capacitor or auxiliary IGBT, depending on the fault-tolerant topology. However, because the fuse is on the load side, the one phase short fault cannot be cleared by this circuit. Fig. 36(c) presents a similar solution. By contrast, this circuit is dedicated to the neutral leg fault-tolerant topology. The load impedance should be low. Thus, the shoot-through current can be large enough to blow the fuse. It should also be pointed out that the current rating of the TRIAC should be high enough to survive under the shoot-through current.   Fig. 36(e) triggers SCR p and SCR n on the faulty leg at the same time after the detection. Then, the capacitor charging current will blow the fuses to isolate the fault. In this sense, the value of the capacitance, the current rating of the thyristor and fuse should be selected carefully to ensure reliable isolation.

3) NO FUSE METHOD [159]
A fuse-free solution for isolation is given as Fig. 36(f). When a single switch short-circuit happens (say T p is faulty), the complementary switch and the TRIAC will be turned-off after the detection. It has been discussed in [159] that the current in the faulty phase will reach zero crossing under this condition. Nevertheless, this circuit cannot handle the phase short-circuit fault. TABLE 11 compares the complexity and isolation ability of the circuits shown in Fig. 36. Regarding the additional expenses, the cost of a standard two-level three-phase converter is taken as the base as 1-p.u., IGBT and TRIAC as 0.17-p.u., turns ratio adjusted IGBT as 0.10-p.u., capacitor as 0.42-p.u., thyristor as 0.09-p.u. and fuse as 0.09-p.u., and then the cost of all the isolation circuits can be calculated [160].
Considering the capability of isolating all kinds of faults, fuse-on-leg circuits are preferred. However, if the cost is taken into consideration, the circuit in Fig. 36 (d) is promising.

D. FAULT-TOLERANT TOPOLOGY
After the fault has been detected and isolated, the converter should be reconfigured to keep operating, which requires the fault-tolerant topology. Normally, this kind of topology is realised by redundancy for two-level three-phase converters.
1) CONVERTER-REDUNDANT TOPOLOGY [160]- [166] In this topology, two converters are utilized to drive the motor, which are connected to the dual stator windings or cascaded to the stator winding (see Fig. 37 (a) and Fig. 37 (d)). If a fault occurs, the converter with a faulty switch will be blocked by the isolation circuit, and the healthy converter supplies power to the load. In this case, the trade-off between output power and system cost should be considered. It is assumed that the output voltage and current of the standard converter are 1p.u.. For the dual stator windings load, the maximum voltage vector and current of the fault-tolerant topology are the same as those in the standard converter, if the turns ratio adjustment is not considered. Otherwise, the current will be reduced to 0.5-p.u., and the cost will decrease at the same time. The cascaded topology works as a two-phase full-bridge converter after faults, which can provide 1-p.u. voltage and 0.58 p.u. current without the turns ratio adjustment, and 0.5-p.u. voltage and 0.58-p.u. current with the turns ratio adjustment. Both of the two topologies require modifying the control.
2) LEG-REDUNDANT TOPOLOGY [153], [155]- [158], [167]- [171] A fourth leg is adopted as a redundant phase in this topology, which is connected to the three legs through TRIACs, as shown in Fig. 37 (b). Under the normal condition, TRIACs are blocked and the additional leg is inactive. In the case of faulty conditions, the faulty leg will be disconnected by the fault isolation circuit, and the fourth leg will replace it by triggering the corresponding TRIAC. Thus, the system can still work as a three-phase two-level converter. At the same time, the gate signals of the faulty leg are moved to the fourth leg, and minor control modification is required. In this condition, the output current rating and voltage rating are the same as those in the standard converter. 90012 VOLUME 8, 2020  Another kind of leg-redundant topology connects the fourth leg to the neutral point of the load, which is shown in Fig. 37 (c). The TRIAC is activated after the fault is detected and isolated, by which the system operates as a two-phase full-bridge converter. In this case, the post-configuration with two-phase control can only provide 1-p.u. voltage and 0.58p.u. current. It should be pointed out that the TRIAC will suffer 1.73 p.u. current because of the zero-sequence current.
3) SPLIT CAPACITOR TOPOLOGY [154], [159], [167], [169], [171]- [179] The split capacitor topology is similar to the leg-redundant one (see Fig. 37 (e) and Fig. 37 (f)). There are also two connections available. The first one is to replace the faulty leg, working as a four-switch three-phase converter (see Fig. 37 (e)). The other one is connected to the neutral point of the load, working as a four-switch three-phase converter (see Fig. 37 (f)). The reconfiguration method is to trigger the TRIAC, which is identical to the leg-redundant topology. The benefit of this kind of topology is that the cost can be reduced. Nevertheless, the output is reduced. Additionally, the capacitor voltage unbalance should be considered in the control design phase and operation.  (64). The cost for the 1-p.u. system is the same as that in TABLE 11, and the output power for 1-p.u. is equal to that of the standard converter.

η =
Output power Cost (64) In terms of reliability, the percentage of mean time to failure (MTTF) is used, which is defined as the MTTF increase in the fault-tolerant topology over that of the standard converter. The operation condition is assumed to be identical to [153]. Therefore, the constant failure rates of IGBT and TRIAC are 7.236 and 0.8735 failures per 10 6 hours, respectively. Additionally, the current level factor of the TRIAC is considered, which yields 1.0881 failures per 10 6 hours for the neutral leg and four-switch two-phase topologies, and 0.7025 failures per 10 6 hours for the four-switch three-phase topology. Note that the current and voltage level factors are not considered in the failure rate of the IGBT. The reason is that it is a combination of MOSFET and BJT from the handbook [180]. The MTTF can be calculated by the Markov reliability model and the percentage of the extended MTTF (MTTF%) can be obtained with (65). The extra requirements in  Overall, Fig. 38 concludes the reviewed possible online methods that can ensure the reliability of the two-level power converters.

IV. COMPARISON BETWEEN CONDITION MONITORING AND FAULT TOLERANCE
The condition monitoring and the failure tolerance have been review above, both of which can ensure the reliable operation of power converters. However, the effectivenesses of the two approaches may differ according to different target applications. In this section, possible strategies for condition monitoring and failure tolerance will be given first. Then, they will be compared in different aspects, including maintenance availability, converter stress level, costs, mission importance, etc.

A. POSSIBLE STRATEGY FOR MONITORING AND TOLERANCE
It has been discussed in Section II and Section III that there are many ways to realize the condition monitoring and fault tolerance. Therefore, it is necessary to select a possible strategy for each of the solution to be compared.

1) MONITORING STRATEGY
A good monitoring strategy should be able to handle all the common degradations, which have been discussed in Section II. Considering that each degradation mechanism has more than one indicators, and each indicator has several extraction methods, the most cost-effective strategy should be investigated for condition monitoring. For the bond wire fatigue and metallization reconstruction, it can be found from  Fig. 36(d)) method is selected to isolate the faults due to its low cost and good isolation ability, which can be seen in TABLE 11. As for the fault-tolerant topology, the best option depends on the specific application. For example, the fourth leg topology is the best solution for the power-critical application, as it has the highest post-fault output power. Similarly, the four-switch three-phase topology is suitable for reliability-critical applications, as it has the highest MTTF%.

B. SOLUTION COMPARISON FOR DIFFERENT APPLICATIONS
The performance of both condition monitoring and fault tolerance vary with different application domains as they are related to the maintenance availability, the stress level of the converter, cost consideration, mission requirement, etc.

1) MAINTENANCE AVAILABILITY
The maintenance availability is of paramount importance to condition monitoring. Because the converter needs to be maintained as soon as possible after the degradation has been detected, as the degradation will accelerate the wearout process. If it takes too long to achieve the maintenance, the converter may fail first, making it unreliable. In extreme circumstances, taking the spacecraft as an example, it is almost impossible to be repaired even though the degradation is detected. Thus, the more difficult the available maintenance is, the worse the monitoring performance is. On the contrary, the tolerance strategy is immune to maintenance availability. Because it improves reliability by means of redundancy, which is irrelevant to the maintenance.

2) STRESS LEVEL
The stress level that the converter suffers is another factor that may have impacts on the performance of reliability solutions. Most catastrophic failures are caused by overstresses rather than wear-out, which implies that the monitoring strategy may not be able to ensure the reliability of the converter in this case. The reason lies in that it could not predict this kind of failure and no maintenance can be implemented in time.
On the other hand, fault tolerance can detect catastrophic failure and reconfigure the converter timely. Thus, the fault can be mitigated and the converter can be brought back to normal operation. Therefore, the harsher working environment leads to poorer monitoring performance and the tolerance strategy outperforms the monitoring strategy in this case.

3) COST
In industrial applications, the cost is also a key factor in addition to the performance. For the monitoring strategy, the circuit cost mainly comes from the block devices for Vdc, namely, the diodes and/or the MOSFETs. The expense of the measurement of V ge is relatively low, considering that it could be integrated into the gate driver. On the contrary, the circuit cost of the failure tolerance strategy is rather high. Although the cost of fault detection circuit/algorithms is low, both of the fault isolation circuit and tolerant circuit require the same level cost as the converter as shown in Table 11 and  Table 12. Thus, the monitoring solution is much better than the tolerance when the power rate of the converter is high. Yet this advantage decreases with the decline of the power rate. Meanwhile, it is also true for the volume and weight, because the cost of silicon devices is proportional to the volume and weight. In this sense, the monitoring strategy is better for those applications that are cost-limited, volume-limited and weight-limited, e.g., electric vehicles.

4) MISSION IMPORTANCE
The performance of the two solutions is also affected by the output requirement of the mission. If the output is not constrained to a certain level, i.e., the output could be flexible, the condition monitoring could extend the operational lifetime of the converter employing smart derating even the maintenance is not available. More concretely, when the degradation of the component is detected by the condition monitoring system, the output of the converter will be reduced. As a result, the stress of the component can be diminished and the converter can work for a longer time than expected. Under this condition, the fault tolerance can have similar performance with limited increased cost by utilising the split capacitor topologies given in Fig. 37. Nevertheless, if the output requirement is critical, i.e., no derating or interruption is allowed, fault tolerance has its advantage in this case. Because it requires no maintenance and can deal with high stress.
Overall, the comparison of the aforementioned two strategies for reliable converter operation in different applications is given in Fig. 39. For the condition monitoring, it takes the advantage in the cost, weight and volume-limited applications where the maintenance can be accessed easily. On the other hand, the fault tolerance solution outperforms the condition monitoring when the maintenance is unlikely available and the converter works under high stress for the mission-critical applications.

V. CONCLUSION AND PROSPECTS
In this paper, the monitoring strategies and tolerance strategies that can improve the reliability of the power converters have been reviewed. In terms of monitoring, why IGBT devices degrade, what indicators can be used to monitor the degradation, and how to extract these indicators were discussed and compared in detail. As for the fault tolerance, the fault detection algorithms and circuits, fault isolation circuits and fault-tolerant topologies were further reviewed and compared in this paper, too. Although the monitoring can improve the reliability of power converters, the performance is limited as it is a cheap and affordable method. On the other hand, the fault-tolerant strategies provide a better performance, but the associated cost is also higher.
Besides the advances in reliability enhancement strategies, which have been presented in this paper, we have identified four main challenges to cope with in the coming future: 1) Understanding the multiple-parameter shift caused by wear. All the modern condition monitoring techniques are based on calibration or re-calibration methods during operation over long-term time scales. In the case of multidimensional degradation, though, calibration strategies are not effective. This is an intrinsically big challenge in terms of knowledge; 2) Similarly to 1), it is highly demanded to find a way to reliably decouple the temperature-sensitive electrical parameters and damage-sensitive electrical parameters (DSEPs). To achieve this, a deeper physics insight, especially on the damage mechanism side, urges to be gained; 3) Regarding fault-tolerant strategies, reliable detection methods of open-switch fault under near-zero current conditions and multiple faults (i.e. more than three IGBTs failing at the same time) are demanded. Emerging technologies, such as artificial intelligence and deep learning should be explored in the near future; 4) Fault-tolerant topologies are way not as efficient and cheap as expected (see Table 12). New concepts to enhance the overall efficiency of fault-tolerant topologies are highly demanded.  He was a postgraduate student with Southeast University, China, from 2009 to 2011. In 2013, he spent three months as a Visiting Scholar with Texas A&M University, USA. He is currently an Associate Professor with the Department of Energy Technology, Aalborg University, where he also serves as the Vice Program Leader for the research program on photovoltaic systems. His current research is on the integration of grid-friendly photovoltaic systems with an emphasis on the power electronics converter design, control, and reliability.
Dr. Yang was a recipient of the 2018 IET Renewable Power Generation Premium Award. He is the Chair of the IEEE Denmark Section. He is the General Co-Chair of the IEEE International Future Energy Challenge (IFEC 2020) and a Publicity Co-Chair of the IEEE Energy Conversion Congress and Exposition (ECCE 2020  He is currently a Professor of reliable power electronics with Aalborg University, Denmark, where he is also part of CORPE, the Center of Reliable Power Electronics. His research interests are in the field of reliability of power devices, including mission-profile based life estimation, condition monitoring, failure modeling, and testing up to MW-scale modules under extreme conditions. He has authored or coauthored more than 210 publications on journals and international conferences, three book chapters, and four patents. Besides publication activity, over the past years, he has been contributing 17 technical seminars about reliability at first conferences as ISPSD, EPE, ECCE, PCIM, and APEC.