Remaining Useful Lifetime Prediction of Discrete Power Devices by Means of Artificial Neural Networks

This work proposes a deep learning-based model for predicting the lifetime of power devices subjected to power cycling. To this purpose, a neural network based on bidirectional long short-term memory is adopted. The neural network is trained with experimental on-voltage degradation profiles. The application of the proposed method is based on the monitoring of a precursor, that is the on-voltage degradation. According to considered precursor, the model allows predicting the remaining useful lifetime (RUL) of power components. In order to prove the accuracy of the model, TO-247 power devices are stressed under power cycling and their wear-out is experimentally investigated. RUL predicted by the neural network is then compared with the experimental lifetime of power devices. Thanks to the proposed deep learning model, the accuracy in the lifetime estimation improves as long as more information about the state of health of the device under test is acquired.


I. INTRODUCTION
Many industrial, healthcare, automotive, energy, transportation, and aerospace applications rely on power electronic circuits [1].The requirement for reliability in this field has increased considerably [2], [3].For instance, in some applications such as avionics, the demand for failure tolerance is even zero [1].Moreover, the sustainability of a power electronic circuit/system is closely related to its durability.Consequently, it has a significant impact from both economic and safety perspectives [1], [4], [5], [6].
Among the failure mechanisms occurring with greater probability in power electronic circuits, those affecting semiconductor power devices are of high relevance.The power dissipated in electronic devices leads to self-heating effects, which in turn bring to thermo-mechanical stress at the interface of materials with different coefficients of thermal expansion [7].This phenomenon is particularly severe in the case of varying power dissipation, and it is then referred to as power cycling.Two main failure mechanisms can occur, both in discrete devices and modules: solder joint fatigue and wire bonds degradation [8].
In general, the lifetime of power components can be estimated by considering model-driven and data-driven approaches.Model-driven approach can be either empirical [9], [10], [11], [12], [13], [14], [15] (i.e., calibrated according to accelerated lifetime tests), or physics-based [16], [17].Models, in combinations with Miner's rule [18], allow estimating the lifetime consumption by considering a given mission profile in terms of temperature swing, average temperature, heating time and current density [19].Data-driven approach is based on the monitoring of the State of Health (SoH) of the component.In the case of wire bonds degradation, on-voltage is usually adopted as a precursor, while in the case of solder joint fatigue the thermal impedance gives a better indication of the SoH [20].The knowledge of the SoH allows implementing prognostic techniques and hence estimating the Remaining Useful Lifetime (RUL).The implementation of prognostics techniques is the key to achieve predictive maintenance and hence to avoid catastrophic failure events [1], [21], [22], [23], [24], [25].In fact, failure phenomena are intrinsically random events, being modeled with the Weibull statistic distribution in the case of power cycling effects [26].As a result, the assessment of lifetime by means of a model-driven approach, allows estimating the number of cycles to failure for a given Probability of Failure (PoF).The selection of a low value of PoF (≤10%) is a conservative approach in the estimation of component/circuit lifetime [27].Prognostics, based on SoH monitoring, allows overcoming this limitation, since RUL is estimated online for each specific device under test.
Some data-driven methods use the on-voltage to predict the fault event, by implementing particle filter algorithms [28], [29], [30].In [29] the Mahalanobis distance algorithm is also used for anomaly detection, which, however can be affected by signal noise [31].According to [32], [33], imprecise knowledge of parameters of the function describing the SoH, as well as the inaccurate initialization of the filter, can lead to inconsistent results in the prognosis.
Neural networks (NNs) represent a viable solution for datadriven prognostic methods, allowing to avoid the definition of models, to learn online and adopt itself to the degradation profile [33].In [33], a time delay neural network (TDNN) was developed to monitor the SoH of insulated gate bipolar transistors (IGBTs) through the on-voltage and it was combined with a stochastic approach for the prediction of RUL.In [34], a feedforward neural network (FFNN) was considered in order to estimate the RUL based on the evaluation of the on-resistance.However, the above-mentioned NN approaches considered a limited dataset for training the model.In particular, in [33], four profiles were considered, with three utilized for training and one for testing, along with their respective combinations.On the other hand, in [34], the focus was on only two profiles, one for training and the other for testing.However, neither TDNN nor FFNN take into account memory effects, which become especially relevant when the SoH at a given moment is influenced by preceding events.
In [35], lifetime prediction was addressed using a memoryeffect-incorporating network called LSTM (Long Short-Term Memory).In this study, a total of six samples were used to train the network.Specifically, leave-one-out cross-validation methodology was employed for the training phase.This form of training entails partitioning individual profile data into training and validation samples.Nonetheless, this strategy results in the omission of some profile data points during the training dataset.Potentially, this approach could curtail the statistical significance and robustness of the outcomes.A similar approach was proposed in [36], where, despite improving the network performance through a physics-informed approach, the training method further fragmented individual profiles into data to be used for the training and for testing phases.
In this work, a data-driven method, based on NNs, has been implemented, allowing to estimate the RUL of semiconductor power devices under power cycling stress.More specifically, the prognostics technique is based on a bLSTM (bidirectional LSTM) network and the on-voltage degradation is adopted as a precursor parameter.Discrete IGBT devices are stressed under constant power cycling conditions and experimental on-voltage (V ce,on ) degradation profiles are used to train the NN.The trained bLSTM network allows estimating the End of Life (EoL) of components also based on the real-time acquisition of the on-voltage degradation.By exploiting the memory capability of the bLSTM network, the accuracy of the RUL prediction is improved, allowing to account for the intrinsic statistical distribution of the failure phenomenon.In contrast with the methodology proposed in [35], [36], in this work a sliding window approach is employed to consider the entire on-voltage profile independently of the chosen inputs number.Furthermore, the approach pursued in this work is broader and more comprehensive compared to previous studies.While prior studies considered a single stress condition with a limited number of samples and combinations, this work explores multiple datasets and stress conditions.Specifically, it incorporates outcomes from a comprehensive set of 28 networks, corresponding to all possible combinations for each stress condition.
The remainder of this work is organized as follows.In Section II, the data-driven model is described, by focusing on the methodology considered for the training process of the bLSTM network and for its adoption in the RUL estimation.Section III presents the experimental setup along with power cycling experimental tests.In Section IV, several test cases are considered for the evaluation of RUL based on the developed model.Finally, in the conclusive section the main achievements are summarized.

II. METHODOLOGY FOR RUL ESTIMATION BASED ON BLTSM NETWORK
The proposed approach aims at developing a deep learningbased model for predicting the degradation profile of the on-voltage of switching devices under fixed stress conditions.Being the failure event a stochastic phenomenon, NN models are the most suitable to account for the variability in the degradation process.Fig. 1 illustrates the expected outcome of the data-driven model, with the predicted on-voltage profile over time as the model output.In power cycling stress scenarios, the on-voltage is expected to increase due to wire bonds degradation, and a 5% increment is considered as the failure threshold [20].The estimated on-voltage profile, and consequently the lifetime prediction, relies on the real-time on-voltage acquisition.Initially, the prediction is mainly based on the off-line training of the model, resulting in an approximation close to the average value of the voltage profiles used in the training phase.However, as the monitoring time increases and the on-voltage of the tested device is experimentally measured, the accuracy of the lifetime prediction improves.Consequently, the RUL estimation approaches the ideal value.

A. ARTIFICIAL NEURAL NETWORK MODEL
To tackle time-sequence forecasting, recurrent neural networks (RNNs) are designed to effectively process sequential data.Compared to traditional feedforward NNs, where inputs are propagated and processed through the hidden layer stack,  RNNs allow previous outputs to be used as inputs.The key feature of RNNs is their ability to maintain an internal memory or hidden state that can capture temporal dependencies in the input data.This memory enables RNNs to process sequences of variable length and make predictions based on previous elements in the sequence.
RNNs are affected by the vanishing gradient issue, making it challenging for RNNs to learn and capture long-term dependencies effectively.LSTM can be considered to overcome this problem, thanks to its ability to ignore or retain information to remember [37].The atomic element of an LSTM network is the gated cell shown in Fig. 2.
The cell is supplied with three gates, namely forget, input and output, regulating the flow of information into and out of the cell.Each gate processes the linear combination of its inputs through a non-linear function (i.e., the activation function) and returns a value between 0 and 1 used to weigh the desired information.The forget gate combines the input x k and the output of previous h k-1 : Where W F,h and W F,x are weight matrices, b F is a bias constant and σ is the sigmoid activation function.
The input gate I k regulates the amount of new information, i.e., G k , that has to be added to the LSTM cell's memory.I k and G k components are non-linear functions of x k and h k-1 , each one with its respective activation function: sigmoid (σ ) and tanh [36]: The output gate O k is related to x k and h k-1 as: Where W O,h , W O,x and b O represent the weight matrix and bias constant associated with the output gate.Ultimately, the cell output h k is governed through the following equation: The input gate I k regulates the amount of new information, i.e., G k , that has to be added to the LSTM cell's memory.I k and G k components are non-linear functions of x k and h k-1 , each one with its respective activation function: sigmoid (σ ) and tanh [36]: Remarkably, inputs and states are both processed using the tanh function to mitigate the vanishing or exploding gradient issues.
An extension and improvement of LSTM performance is achieved with the bidirectional LSTM (bLSTM) [38].As illustrated in Fig. 3, bLSTM consists of two chains of LSTM cells that consider both time directions.According to the temporal input order x k , gated cells connected in ascendent order define the forward state.On the contrary, the ones associated with the descending order give the backward state.The output layer (i.e., the output sequence y k ) is then given by a combination of both forward and backward states.

B. TRAINING OF BLSTM MODEL
The architecture of the artificial neural network (ANN) model is based on a time-series forecasting structure.Multiple bLSTM layers are connected in cascade to catch the trend of the target on-voltage profile through the selected activation functions in each layer [39].
The target output is the on-voltage profile of a power device under the effect of power cycling stress.The device voltage is measured after each temperature cycle, meaning the profile is a function of the number of applied cycles.To hold down the complexity of the ANN, the samples are filtered and downsampled (i.e., 100:1).
The following approach is based on a single-step timeseries forecasting model.A fixed window, containing m samples, from the input sequence x is selected as the model's input (i.e., x k , …, x k-m+1 ).The neural network predicts the subsequent value xk+1 , where k is the index of the last input value.
The learning process is aimed at tuning the parameters of the non-linear function f NN associated with the ANN architecture minimizing the loss function (e.g., RMSE) of the predicted value: (7) with respect to the real one x k+1 .To this purpose, the input dataset used for the training is composed of portions of the on-voltage profiles arising from different samples.The corresponding next value of the sequence window is the target output.

C. RUL ESTIMATION
The proposed approach is aimed at estimating the RUL of a device under constant power cycling stress.The forecast is based on recursive iterations of the bLSTM model to obtain the on-voltage profile along the thermal cycles, as schematically reported in Fig.    this definition, the RUL can be expressed as (9) where k and i represent the number of monitored cycles and the remaining number of cycles to failure, respectively.x EoL is the failure threshold.

III. EXPERIMENTAL POWER CYCLING TESTS A. EXPERIMENTAL SETUP
The experimental investigation of power cycling phenomenon requires the application of controlled temperature cycles in the device under tests (DUTs), along with the capability of real-time monitoring the on-voltage.The experimental setup adopted for this goal is reported in Fig. 5 [40].It consists of a power supply (EA-PSB 9080-120), a custom board with DUTs placed on liquid-cooled thermal plate, a temperature controller (Julabo Presto A40) and a compactRIO system.As reported in Fig. 6, two IGBT devices are stressed within the same experiment.The power supply provides a high current (I dc ), flowing alternately in the two DUTs.The compactRio generates control signals for switches S 0 and S 1 , and allows for V ce measurements on the DUTs.In order to measure V ce , an amplifier with voltage gain of 3 is adopted.The conditioned signal is acquired by the compactRIO's analog-to-digital converter (voltage range of +/-10V, sampling frequency of 1 MS/s and resolution of 16 bits).The thermal cycling across the DUT arises from a heating-up phase and a cooling-down phase, lasting a time t on and t off , respectively.The desired temperature swing ( T j ) is achieved by properly selecting I dc , t on /t off times and the temperature of the thermal plate.Although the current in both DUTs is the same, the temperature swings can be slightly different, because of mismatches in the thermal pads and intrinsic devices characteristics, or because of mutual heating effects.In order to achieve the same T j on both devices, slightly different t on times are selected.According to the guidelines for the qualification of power devices, such us [41], heating current and t on /t off times are kept constant during the entire experiment.Since the component degrades during the power cycling test, changes of T j are possible.
The gate of DUTs is biased with a DC voltage of 15V, hence devices are in conduction state for the entire experiment.During the on-phase, the on-voltage across the IGBT (V ce,on ) is acquired and used to monitor the degradation state of the component.Typically, increases in V ce,on ranging from 5% to 20% are regarded as EoL thresholds for determining device failure due to wire bond degradation (the sole failure effect considered in this work) [20].In this study, an increase of 5% in V ce,on is considered as EoL condition.During the off-phase, a small current I ref = 50 mA is injected in the device.The measured V ce,off voltage is used as a Temperature Sensitive Electrical Parameter, allowing to estimate the junction temperature of the component.
DUTs used in the experiments are commercial IGBTs in TO-247 packages, with a rated pulsed current of 120A, rated voltage of 600V, typical on-resistance of 10m , and maximum junction temperature of 175 °C.

B. POWER CYCLING EXPERIMENTS
Power cycling tests are carried out in this work by considering two different types of stress: T j = 120 °C (I dc = 70.5A)and T j = 140 °C (I dc = 68.5A).In both cases, the minimum junction temperature is 25 °C.For each stress condition, eight different DUTs were considered.Experimental V ce,on profiles as a function of the number of cycles are extrapolated from [42] and reported in Fig. 7. Initially, V ce,on is almost constant, while for a large number of cycles an increase of V ce,on is observed, which can be ascribed to wire bonds degradation.The increase of V ce,on by 5%, with respect to the initial value, is commonly considered as a failure criterion for the device.It is worth noting that the increase of V ce,on slightly changes the temperature in the device.In fact, at the end of each experiment, T j exceeds the nominal value of about 10 °C (not shown here).This temperature increase is expected to modify the number of cycles to failure.More specifically, according to [7], [43], [44], [45], [46], a lower lifetime is foreseen with respect to the case of a constant T j for the entire experiment.
The application of a given thermal cycling stress (either 120 °C or 140 °C), leads to a significant randomness in the device lifetime (in terms of the number of cycles to failure), which is well described by a Weibull distribution [40].It is therefore fundamental that the proposed neural network model is trained by considering an adequate number of samples, having different lifetimes.This allows the neural network to be robust against the intrinsic variability of failure events.

A. TEST RESULTS OF THE NEURAL NETWORK
The proposed neural network has been trained according to the procedure reported in Section II-B, by using the experimental V ce,on profiles reported in Fig. 7.These profiles are decimated by a factor 100 in order to reduce the complexity of the neural network while maintaining good performances.A window size (m) of 45 elements (which also corresponds to the batch size of the bLSTM) is considered for both training and testing phases, corresponding to 4500 cycles for the chosen decimation factor.The network structure consists of a sequence of bLSTM layers, with the initial layer comprising 16 units, followed by a subsequent layer with 36 units.Additionally, two individual units of bLSTM are present, employing tanh and exponential activation functions to enhance the understanding of the V ce,on profile behaviour from the mentioned 16 and 36-units bLSTM layers.The outputs of these supplementary units are ultimately combined in the last layer of the network, which performs summation.
Regularization techniques have been implemented to improve the network's learning ability, and the Adam algorithm with a learning rate of 0.1 has been used to train the neural network [47].The dataset is split into a training subset (6 profiles) and a test subset (2 profiles).To verify the robustness of the model concerning the partition of the available dataset, the model is trained using every possible unique combination of the 8 available samples, resulting in a total of 28 distinct neural networks.This number (28) is determined by the binomial coefficient (8,6), being 8 the number of available experimental samples and 6 the number of samples included in the training subset.It is worth mentioning that all 28 NNs share the same architecture but are individually trained with a different selection of 6 samples and are tested with the remaining 2 samples, ensuring a unique combination of training/test subset.
Two different conditions are considered for the training phase: T j = 120 °C and T j = 140 °C.An example of V ce,on profiles estimated by means of the neural network is reported in Fig. 8.In particular, Fig. 8(a) (or 8b) considers a neural network trained at T j = 120 °C (or T j = 140 °C) with samples #1, #2, #4, #5, #6, and #7 (or samples #9, #10, #12, #13, #14 and #15) and tested on sample #3 (or #11).Experimental V ce,on profiles as a function of the number of cycles are reported in black (solid lines), along with the thresholds assumed for the failure criterion (dashed lines).The other curves are those predicted by the neural network according to the selected observation windows, i.e., the monitored number of cycles indicated as k in ( 8) and (9).After an observation of 4500 cycles, predicted lifetimes are relatively different from those experimentally evaluated.However, the predicted values are within the range of values adopted for the neural network training.It is worth noting that, the training phase is based on power cycling experiments carried out with a constant current stress, where T j slightly increases over the wear out phase.As a consequence, the proposed model is affected by inaccuracy in the initial stage of the monitoring phase, being the predicted lifetime mainly based on the average profiles adopted for the training phase.Moreover, when a limited number of cycles is monitored, the degradation of V ce,on can be negligible and the SoH cannot be quantified by the model.As a result, the accuracy does not necessarily improve in this case.As the monitored number of cycles increases, and more knowledge is available about the SoH of the component, the predicted V ce,on profiles get closer to the expected ones, hence improving the accuracy in the lifetime estimation.

B. REMAINING USEFUL LIFETIME
The remaining useful lifetime represents the difference between the predicted lifetime and the monitoring time, both expressed as number of cycles.In Fig. 8, the predicted lifetime is calculated as the number of cycles required to reach an increase of V ce,on by 5%.Hence, the RUL can be easily calculated as a function of the monitored number of cycles.For the given dataset, by considering the selection of 6 out of 8 samples for the training phase, the testing on each sample foresees 7 differently trained neural networks.
The results of the RUL analysis are reported in Figs. 9 and 10 for T j = 120 °C and T j = 140 °C, respectively.Both RUL and monitored number of cycles are expressed as a percentage value of the effective lifetime.The RUL is estimated for all the 16 samples (8 for each stress condition) considered in this work as summarized in Table I.As mentioned above,  the 7 different curves reported in each sub-plot refer to different neural networks, trained with a different combination of samples.For each T j stress condition, 28 neural networks are trained in total, which are used to test the 2 samples not adopted in the training phase of the specific neural network.As a result, 56 RUL curves are visible in each figure .Although the estimated RULs can be initially different with respect to the ideal ones (black dashed lines), in general the accuracy of the RUL prediction improves with the monitored number of cycles.For each sample, reported in Figs. 9 and 10, the RULs predicted with 7 different NNs are averaged and the results are summarized in Table I.
In order to assess the performance of the proposed neural network model, the relative error, defined as the relative difference between the predicted and the experimental lifetime, is averaged for all the 56 tests performed at a given T j .The results are reported in Fig. 11.Regarding the relative error, in the range of the monitored number of cycles, comprised between 20% and 100% of the device lifetime, its average value is always lower than 15%, although there are individual cases in which a larger error can be found (according to the error bar of Fig. 11).As long as the number of cycles increases, the relative error, along with the standard deviation associated to the averaging process, tends to decrease.For example, by exceeding 80% of the device lifetime, the average relative error is below 7%, with a standard deviation lower than 5%.This is a remarkable result for predictive maintenance, since the EoL can be accurately predicted well before the failure event.

V. CONCLUSION
In this article, the development of a deep learning-based model for the lifetime prediction of semiconductor power devices is discussed.The proposed NN model is composed of bidirectional LSTM blocks.The model is trained with experimental on-voltage degradation profiles arising from power cycling stresses and featuring a temperature swing T j of 120 °C and 140 °C.Eight samples are considered for each stress condition, representing the dataset adopted to train and to test the proposed neural network.
A fundamental peculiarity of the model is that the training phase is carried out by considering a significant number of experimental on-voltage profiles arising from different samples stressed under the same conditions.More specifically, 6 out of 8 samples are adopted for the training phase.
The application of the model consists in the prediction of the lifetime based on the monitoring of the on-voltage profile.When a limited amount of data is available, the lifetime prediction is within experimental range of samples adopted in the training phase.As long as more data are acquired, concerning the SoH of the device under test, the accuracy of the model improves.
In order to understand the impact of dataset partitioning on the NN performance, the model is trained with all the possible combinations of subsets.Therefore, 28 neural networks are trained for each T j stress condition.Those networks are hence used in this work to evaluate the RUL of test samples as a function of the monitored number of cycles.The relative error between the lifetime predicted by the NN and the actual experimental lifetime tends to decrease by increasing the monitored number of cycles.Its average value (among all the trained neural networks) is always lower than 13% and it becomes as low as 5% when the monitoring time is above 80% of the device lifetime.The accuracy of the model is influenced by the size of the training dataset.Therefore, a larger number of experiments is expected to improve the capability of the model to recognize any on-voltage degradation profile.

FIGURE 1 .
FIGURE 1. Graphic representation of the expected outcome of the data-driven model.

FIGURE 2 .
FIGURE 2. Schematic description of a gated cell (LSTM network).σ and tanh are the sigmoid and hyperbolic functions, respectively.
x refer to the weight matrix expressions associated with these two layers, and b I ed b G are bias constants.The outcomes of (2) and (3) are combined with the contribution of the previous state c k−1 and with (1) to define the state c k as follows:

4 .
At the first iteration (initial guess), m samples of the experimental profile are provided to the NN model to guess the subsequent value xk+1 .At the next iteration, the predicted value xk+1 is used as the model's input discarding the oldest sample x k−m+1 and sliding one step forward the m-length window.At the i-th iteration, with i ≥ 1, the on-voltage is predicted through both experimental and predicted samples if i<m, or only predicted values if i ≥m xk+i= f NN xk+i−1 , . . ., xk+1 , x k , . . ., x k−m+i+1 , i < m xk+i = f NN xk+i−1 , . . ., xk−m+i+1 , i ≥ m (8)This process is iterated until xk+i reaches the EoL condition (i.e., an increase of 5% of the initial on-voltage value).From

FIGURE 4 .
FIGURE 4. On-voltage prediction according to the proposed methodology.m samples are considered (x k , …, x k-m+1 ) as the input of the NN and allow calculating xk+1 .Subsequently, the vector ( xk+1 , …, x k-m +2 ) is considered as a new input of the NN and another value ( xk+2 ) is estimated.This process is repeated until the EoL condition is reached.

FIGURE 5 .
FIGURE 5. Picture of the experimental setup for power cycling tests.

FIGURE 6 .
FIGURE 6.Schematic description of power cycling tests.

FIGURE
FIGURE on-voltage profiles as a function of the number of cycles.V ce,on profiles are obtained for (a) T j = 120 °C and (b) T j = 140 °C.

FIGURE 8 .
FIGURE 8. V ce,on profiles estimated by the neural network in the case of (a) T j = 120 °C and (b) T j = 140 °C.Each curve arises from the experimental observation of a given number of power cycles (as reported in the legend) and from the application of the proposed recursive algorithm.As a result, the accuracy in the lifetime estimation improves as long as the monitored number of cycles increases.

FIGURE 9 .
FIGURE 9. Predicted RULs in comparison with ideal RULs (dashed curves) for all 8 samples stressed at T j = 120 °C.The 7 curves reported in each subplot arise from different neural networks, each one trained with a different selection of the 6 (out of 8) training samples.

FIGURE 10 .
FIGURE 10.Predicted RULs in comparison with ideal RULs (dashed curves) for all 8 samples stressed at T j = 140 °C.The 7 curves reported in each subplot arise from different neural networks, each one trained with a different selection of the 6 (out of 8) training samples.

FIGURE 11 .
FIGURE 11.Relative error between predicted and experimental lifetime as a function of the monitored number of cycles.Errors are averaged over the 56 tests for both T j = 120 °C and T j = 140 °C.Error bars represent the standard deviations around the average values.