Introduction
Insulated-gate bipolar transistors (IGBTs) are widely used in solar energy, wind energy, electric vehicles, and other energy industries. In power electronics, IGBTs are estimated to be responsible for 34% of all inverter failures, indicating that the reliability of power electronics is highly dependent on the reliability of the IGBT module [1], [2]. It is desirable that the failures of IGBT occur in a fail-safe manner, and an online degradation state assessment helps in reaching that goal.
Several papers have concentrated on physics-of-failure-based and data-driven techniques for early anomaly detection and lifetime prediction of IGBTs. In particular, Valentine et al. [3] investigated the physics of failure of metal–oxide–semiconductor field-effect transistors (MOSFETs) and IGBTs and hypothesized that the failures are a combination of manufacturing defects and poor thermal management. Patil et al. [4]–[6] tested the degradation performance of IGBT under different stress conditions like temperature. Ghimire et al. [7], [8] studied online degradation diagnosis using a single parameter. Patil et al. [9]–[14] carried out the diagnostics and prognostics of IGBT by using Mahalanobis distance (MD) of the selected parameters. Sutrisno et al. [15] used a
In the process of predicting the degradation state of the module, collector to emitter voltage,
The distance-based method and self-organizing map method are very commonly used for anomaly detection or degradation assessment. However, these two methods may give a misleading result when the module experiences more than one degradation stage especially the superposition of multi degradation modes in the later period which may lead to irregular fluctuation in the data.
In this paper, a prediction interval-based methodology for degradation state assessment of IGBTs has been developed based on run-to-failure measurements of multiple IGBTs. The time series raw data of collector voltage and case temperature were used to compute the power. The data were first preprocessed to remove the transients and nonoperational time period and then down-sampled before being labeled using the 3D representation of the preprocessed data. Thereafter, the degradation assessment was conducted using three different methods: i) Mahalanobis distance (MD) based method ii) self-organizing map (SOM), and iii) the developed prediction interval-based classification. The novel method was developed after critically analyzing the MD and SOM methods and their inability to classify the degradation states. The results are compared to determine the accuracy and effectiveness of the methods for the online degradation state assessment. Figure 1 shows the flowchart of the approach that has been followed.
The rest of the paper is organized as follows. Section II presents the measurement setup, data collection process along with some plots of raw data to infer some key trends and information. Thereafter, Section III discusses the data preprocessing process. In Section IV, the results using distance-based and cluster-based methods are presented and analyzed. Section V explains the prediction interval-based method, presents and analyzes the results on multiple modules to validate the applicability of the developed methodology. Finally, Section VI presents the conclusions.
Measurement Setup
The measurement setup for the run to failure measurements of IGBT has been shown in Figure 2. The IGBT module (Figure 2a) is installed in an aging chamber shown in Figure 2b and wired using a circuit diagram shown in Figure 2c. The measurements can be controlled from outside of the chamber using a control panel shown in Figure 2d. The IGBT module, also called the device under test (DUT), was powered by two supplies, a program-controlled test power supply (PWR) of 5 V/300 A and a gate-foot program control power supply (VG) of 0–15 V. The gate pin series resistance (RG) of
In the paper, the case temperature is used to represent the effects of changes in the junction temperature. In order to validate this hypothesis, a simulation model before and after the degradation [22] was built as shown in Figure 3a. The junction to case thermal impedance is estimated using the transient double interface method whereas the case to ambient impedance is estimated by using experimental case temperature data and parameter estimator tool. Simulation results and temperature variations for both case and junction temperature are shown separately in Figure 3b and Figure 3c respectively. The results show that both case and junction temperature demonstrate similar trends before and after degradation as shown in Figure 3c. Therefore, it is reasonable to use the case temperature to represent the effects of changes in the junction temperature.
Data Preprocessing
Throughout this paper, we give values for the case temperature and the collector voltage. The power measurements have similar behavior to the collector voltage because of the constant current condition. Figure 4 and Figure 5 show the plots for the measured data after removing the off-phase data. It is observed that the original data did not show anything of significance for the case temperature or collector voltage before and after the removal of the off phase data. Thus, to avoid redundancy and make useful information clear, the original data is not shown here.
A. Removal of Off-Phase
The first step in data preprocessing is to delete the data where the circuit is switched off (the off-phase data). Figure 4 and Figure 5 show the plots for the measured data after deleting the off-phase data. Since we have deleted the off-phase data, we do not label the
Three distinct phases are evident in both these figures. In phase 1, both the case temperature and collector voltage are relatively constant. After 350 K data points, there is a jump in the values but the net trend is still constant. After 500 K data points, there are some fluctuations in both the case temperature and collector voltage.
B. Removal of Transient Data
In Figure 4, there are large peaks in case temperature that occur every time the aging machine is restarted. The temperature increases rapidly at that phase, it takes several minutes for the case temperature to go from 20°C to 70°C. We refer to these fluctuating data points as transient data points. Transient data points have a temperature increase of more than 5°C between two adjacent points. The transient data points do not represent the degradation of the IGBT and may lead to incorrect conclusions since the initial transient data points may interfere with the stable data points after degradation has occurred as shown in the Figure 8a where a 2-D plot of collector voltage vs the case temperature has been shown without removing the transient data. The transient data takes only 2% of all data, so the removal of the transient data points will not affect the result. Figure 8b shows the same 2-D plot after removing the transient points. It is clear that by removing the transient data, the plots are much cleaner. In this paper, only the data after removing the transient phase is used for health assessment. Figure 6 and Figure 7 show the plots after removing the transient data points.
Collector voltage vs case temperature before (a) and after (b) removing the transient data. The red dots in (b) show the mean value of the group of data.
C. Data Downsampling
Figure 8b shows how the data points are downsampled. It shows four groups of 100 data points along with their average value. The groups are chosen from different regions of the input space: the first group is from the first 100 points, the second group is taken right after 200 K data points, the third group after 400 K data points, and the fourth group after 600 K data points. These samples show interesting patterns. For the first group, the collector voltage increases from 1.72 V to 1.78 V with an average marked with a red circle, and the temperature remains relatively constant. For the second group, the average temperature increases a little. For the third group, the average temperature further increases and this time by more than 12°C. However, for the fourth group (taken after 600 K data points), the temperature decreases, making a different angle from the start of the process as shown in Figure 8b.
For downsampling different window sizes were considered. For instance, downsampling by taking the average value of 1000 data points or 100 data points showed similar results. In the meantime, using a window size of 100 data points provides data points, which is helpful for the degradation state assessment. For the remainder of the analysis, downsampled data is used. The detail of every step is shown clearly here so that the entire process can be implemented online.
D. Data Labeling
In order to see the degradation trend of the entire dataset, Figure 9 shows a 3D plot of the data. The downsampled data for collector voltage and case temperature has been plotted with data points (time) increasing in the
Degradation State Assessment
Two methods are commonly used for degradation level assessment—a distance-based method and a clustering-based method. In this Section, these two kinds of methods are discussed, and the performance results with each of these methods are presented and analyzed.
A. Distance-Based Method
Mahalanobis distance (MD) is one of the most commonly used distance metrics. The MD represents the covariance distance between the test data and the reference distribution. It takes into account the relationship between the various features of the reference distribution. It is an effective way to calculate the similarity of two unknown sample sets. The MD of sample \begin{equation*} {\mathrm {MD}}_{i}=\sqrt {\left ({x_{i}-\mu }\right)^{T}S^{-1}\left ({x_{i}-\mu }\right)}\tag{1}\end{equation*}
Figure 11 shows the MD for all data points. Note that over the course of the run-to-failure measurements, the overall trend of the aging data points is increasing. There is a decrease phase after 5500 data points while the module is in the degraded state. Thus, MD does not represent the degradation of the module accurately. The decrease phase in the MD is the degradation level 3 in Figure 10. In degradation level 3, some data points are nearer to the initial (normal) data points than in degradation level 2. Therefore, if the data has this kind of fluctuating trend, the distance-based metric may not work.
B. Self-Organizing Map
A self-organizing map (SOM) is a type of clustering method that facilitates data visualization by projecting multi-dimensional feature space to a two-dimensional map. A SOM is an unsupervised learning method, it constantly gathers data with similar characteristics through the competition of the neurons to form a two-dimensional map. The entire process of the SOM is shown in Figure 12.
Generating a self-organizing map (SOM) and calculating minimum quantization error (MQE).
All the data is normalized in the first step. In the second step, the map size, or the number of neurons, is determined by the number of healthy data points, as shown in Eq. (2).\begin{equation*} M\approx 5\sqrt {N}\tag{2}\end{equation*}
\begin{equation*} w_{i}\left ({t+1 }\right)=w_{i}\left ({t }\right)+h_{ci}\left ({t }\right)\parallel x\left ({t }\right)-w_{i}\left ({t }\right)\parallel\tag{3}\end{equation*}
\begin{align*} h_{ci}\left ({t }\right)=&\eta \left ({t }\right)\exp \left ({\frac {{\parallel r_{c}-r_{i}\parallel }^{2}}{2 \times {\sigma \mathrm {(t)}}^{2}} }\right) \tag{4}\\ \eta \left ({t }\right)=&\eta \left ({0 }\right)\left ({\exp \frac {-t}{n} }\right) \tag{5}\\ \sigma \left ({t }\right)=&\sigma \left ({0 }\right)\mathrm {exp}\left ({-\frac {t}{\frac {n}{\ln \left ({\sigma \left ({0 }\right) }\right)}}\big) }\right)\tag{6}\end{align*}
Once the optimal weights are obtained for input, another training data vector is picked up and the process is repeated until optimal weights have been computed for the entire training data and at least 200 iterations have been reached. The process produces a SOM as shown in Figure 13. The SOM clusters data points of different degradation levels together, the blue part is data points that are normal data points, the green part on the right represents degradation level 2, and the yellow part is the data points that are severely damaged (degradation level 3).
Unlike MD, SOM is able to find three clusters from the input. The SOM here is constructed based on all the data points in the aging or run-to-failure measurements. However, practical applications require online degradation assessment. If we want to classify assuming online arrivals, SOM becomes infeasible. For instance, Figure 14 shows the SOM map based on the initial (normal) data points. The observation data points are used as the input of the SOM and the minimum quantization error (MQE) as the output. Even though all the data points belong to the same class, the SOM still tries to cluster them in different classes and hence assigns incorrect labels to the data points. In fact, the performance is even worse than MD-based classification for the online case. The results are shown in Figure 15.
Prediction Interval-Based Method
In order to carry out an online degradation assessment, a prediction interval-based method has been developed. A prediction interval is a type of interval estimate, computed from the statistics of the observed data. Factors affecting the width of the prediction interval include the size of the sample, the confidence level, and the variability in the sample.
The prediction interval is based on the existing fit to the data, it accounts for both the uncertainty in estimating the population means and the random variation of the individual values. The equation for calculating the prediction interval is shown in Eq. (7), \begin{align*}&(\bar {x}-t_{b}\left ({n-1 }\right)\times s_{pred},\bar {x}+t_{b}\left ({n-1 }\right)\times s_{pred})\tag{7}\\&s_{pred}^{2}=s^{2}+s_{\widehat {y^{\ast }}}^{2}\tag{8}\\&s_{\widehat {y^{\ast }}}^{2}=s^{2}\left({\frac {1}{n}+\frac {(x^\ast -\bar {x})}{\Sigma (x_{i}-\bar {x})^{2}}}\right)\tag{9}\end{align*}
In this paper, the initial 1000 data points are used as the reference data points, and the prediction bound is drawn based on the prediction interval of 0.9999 to make sure that the bound can cover most of the healthy data. The prediction bound separates data in degradation level 1 and 2 from degradation level 3. When the online test data is within the bound, it is either in degradation level 1 or 2. When 3 consecutive data points of the online test data points exceed the bound, the module is considered to be damaged and hence in degradation level 3. When the test data is within the prediction bound, MD is used to divide the state of the module by a threshold value. The conservative threshold value of
To show how degradation levels 1 and 2 can be separated from level 3, the absolute value of the difference from the mean value is plotted in Figure 17. The threshold is based on the confidence bound that ensures that 99.99% of the first 1000 data points are within the bounds. When three consecutive data points are over the threshold, they lie in degradation level 3. Please note that each point here is the mean of 100 data points. The number three has been arbitrarily chosen. Three consecutive data points are selected to avoid possible misjudgment it is a fairly conservative judgment where the impact three hundred data points are considered before making a declaration, and once it is classified as degradation level 3, it won’t be classified as level 2.
Figure 17 clearly demonstrates that this approach can easily separate degradation level 3 from degradation levels 1 and 2. To distinguish degradation level 2 from level 1, MD is used, the result is shown in Figure 18. It can be seen that the data starts in degradation level 1 and moves to degradation level 2 after around 3100 data points. The degradation assessment of the prediction interval method can be realized in an online manner.
To verify the effectiveness of the developed method, the run-to-failure measurements of another IGBT (module B) were carried out, as shown in Figure 19. This module hardly experienced degradation level 2 and jumped directly from the degradation level 1 to level 3. This result proves the effectiveness of the prediction interval-based degradation classification. The method achieved a classification accuracy of 98.4% when it classified 76 data points in degradation level 3 instead of level 1. However, all those data points were just before the module reached degradation level 3 as shown in Figure 20. Some red data points in the prediction bound are misclassified, but it only accounts for less than 2% of the total points. This validates the applicability of the developed method across different modules.
To demonstrate the generalizability of the method, additional modules that work under different aging modes are used to validate the methodology. As mentioned in Section II, there are two aging modes; one uses the temperature range and the other sets the on-off time. The temperature range for the aging of module C is set to 40°C –90°C whereas temperature range settings for the aging of module D is set to 30°C –70°C.
The time-series data points distribution and the prediction interval results for module C are shown separately in Figure 21a and Figure 21b.
Time series data points distribution and classification by prediction interval-based method of module C.
The time-series data points distribution and the prediction interval results for module D are shown separately in Figure 22a and Figure 22b.
Time series data points distribution and classification by prediction interval-based method of module D.
The results show that the prediction interval method worked well for both modules C and module D, where the degraded state was effectively detected using the prediction interval method. However, unlike modules A and B, neither of them experienced the degradation level 2, they directly moved from a healthy state to an unhealthy state shortly before they stopped working. Distance-based methods or self-organizing map-based method are not suitable for such variations in the data as they are more effective for the data with a little variance whereas the proposed methodology not only works well for the module that experiences three degradation levels but also for modules where it directly moves to degradation level three from healthy state.
In addition, the change in ambient temperature impacts both the case temperature and voltages and does not affect the degradation assessment.
Conclusion
This paper developed a prediction interval-based classification methodology for the degradation state assessment of IGBT modules. First, we showed that Mahalanobis distance (MD) and self-organizing maps (SOMs) are not suitable for degradation assessment of IGBT modules in an online manner. Mahalanobis distance-based classification is not suitable because of the fluctuations of features such as collector voltage and temperature as the module gets degraded. Self-organizing maps are suitable for clustering the data points with similar features when all the data points are available. Online classification requires a large amount of training data, which may not be suitable for practical applications. The results of online degradation assessment using self-organizing maps and minimum quantization error (MQE) of all data points were no better than the Mahalanobis distance results.
The developed prediction interval-based method outperforms the self-organizing map and Mahalanobis distance methods for online data degradation assessment. The method classifies the data points into three degradation levels in an online manner and does not require a large number of data points (the initial 1000 points in the healthy state) for training. Furthermore, the prediction interval-based method can be extended to other modules with more than 98% accuracy.
The prediction interval-based method takes data space distribution into consideration in comparison to the distance-based methods (not only Mahalanobis distance) that is only suitable for data with two degradation states and may result in the replacement of modules earlier than required. The Mahalanobis distance method is based on the distribution of health data and uses the MD as an indicator. The MD results could be misleading when the measurement data fluctuates in cases where there are multiple degradation modes. The self-organizing map was not as good as MD based classification when it was applied for online degradation assessment.
The work in the paper was carried out under experimental conditions. For further work, real working conditions will be considered. For real application situations, a simulation model will be built that takes the real power signal and ambient temperature as the input and outputs case temperature and collector voltage. The simulation results will be obtained and compared with the experimental results to obtain the degradation assessment.