A Data Mining Approach for Transformer Failure Rate Modeling Based on Daily Oil Chromatographic Data

Evaluating the real-time failure rate of transformers can effectively guide the planning of maintenance and reduce their failure risk. This paper proposed a novel transformer failure rate model that considers the impact of maintenance based on daily oil chromatographic monitoring data mining. Firstly, to ensure the quality of the modeling data, an improved k-nearest neighbor (KNN) algorithm based on genetic algorithm (GA) is proposed to repair the missing monitoring data. The repaired data is then mapped to the equivalent state duration (ESD) by the M-BPNN proposed, which is used to modify the multistate Markov process of transformers so as to quantify the impact of maintenance on failure rate. Considering the changing characteristics of the dissolved gases’ content in the short period, the ESD is further merged in sequential periods to obtain the merged equivalent state duration (MESD). Finally, an analytical function of the transformer failure rate with respect to the MESD is obtained. Case studies on a typical substation demonstrate that the proposed approach has the ability to characterize the impact of maintenance and the actual failure rate, thereby improving the accuracy of the substation reliability assessment.


G
The target matrix G of storing the DGA data G k The DGA data matrix of fault type k N k The number of days in fault type k g s The child samples before mutation g s The child samples after mutation g k, min The minimum values of each dissolved gas content in G k g k, max The maximum values of each dissolved gas content in G k VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ g train,w The w th sample in G train g test (l) The l th dissolved gas in g test g test (y) The repairing value of the missing data ζ i,j The transition rate from state i to state j t i The SD corresponding to operation time t t " i The ESD corresponding to operation time t t " i The MESD corresponding to operation time t C The external random failure rate λ(t) The failure rate at time t Dist f ,f +1 The distance between the DGA data samples g f and g f +1 .
The ESD that corresponds to the b th data sample in period p. t p The MESD that corresponds to period p

I. INTRODUCTION
As the core electrical equipment in a substation, the transformer plays a significant role in the substation reliability [1]. The failure rate is a significant parameter that measures the ability of a transformer to maintain its normal operations.
Evaluating the failure rate of transformers accurately can lay a solid foundation for the reliability assessment of substations. So far, the average failure rate model has been widely used in the substation reliability assessment. However, this model assumes the constant failure rates and cannot reflect the short-term fluctuation of the reliability level of an individual transformer [2], [3]. In reality, the failure rate of a transformer changes in real time according to its own operating condition [4], [5]. Therefore, it is necessary to establish a novel time-varying transformer failure rate model [6], [7].
Transformers faults can be divided into internal latent fault and external random fault [8]. Owing to the differences in the development modes, these two kinds of fault should be considered separately when modeling the failure rate model [5]. This paper focuses on the modeling of internal latent failure rate of transformers.
At present, the main approach for establishing the internal latent failure rate model is to analyze the internal physical and chemical processes of the transformers [1]. The degree of the internal latent failure is characterized by a distribution function, such as the Arrhenius-Weibull distribution. Accordingly, the trends of the failure rate can be evaluated. Recently, modeling the failure rate of transformers based on oil chromatographic data has become a popular method [9].
Oil chromatographic data, also known as dissolved gas analysis data (DGA data), include the content of the dissolved gases in a transformer's oil [9]. These dissolved gases are generated from the operational and fault events of the transformer [10]. Therefore, the content of the dissolved gases has a tight connection to the fault type and fault severity of the transformers [11]. By now, DGA data has been widely applied in the transformer faults diagnostics and remaining useful life (RUL) estimation aided by some powerful machine learning algorithms such as dynamic bayesian networks and temporal fault trees [13]- [17]. Reference [1] captures the qualitative causal relationship between dissolved gases and transformer faults, then a probabilistic diagnosis framework is proposed. Reference [18] established an improved transformer RUL prediction approach combining the data-driven thermal forecasting models and model-based lifetime experimental models. These studies excavate the value of DGA data promote its application. To the best of the authors' knowledge, DGA data also accurately characterize the development degree of internal latent faults and can be used to establish the models of internal latent failure rate of transformer [19]- [25]. Reference [19]- [21] choose the DGA data as the features and solve the multistate Markov process to obtain the time-varying failure rate model of transformers [5]. However, the transformer failure rates obtained by those models are still equal as long as the state and the state duration are the same although operating at different operation time. That is these models cannot describe the impact of maintenance on the transformer failure rate [22]. The reason for this defect is that those models ignored the actual operating conditions of the day when calculating the real-time failure rate of the transformer. To overcome this problem, some improved models introduced more covariates so as to describe the operating conditions and quantify the impact of maintenance when modeling the failure rates [22]. Reference [23] proposed a transformer failure rate model based on Proportional hazard model. In this model, the information of real-time hot spot temperatures (HST) is introduced to reflect the real-time operating conditions. Reference [24] further used some covariates such as the moisture content of insulating paper to modify the Markov process. Reference [25] selected the service life of the equipment as new features to establish the failure rate model.
Unfortunately, these external covariates such as service life, HST are hard to accurately describe the internal operating conditions of the transformer on daily basis. Besides, the acquisition of these covariates relies on complex accelerated aging experiments and it is difficult to obtain the actual operating data. An alternative method is to mine the daily DGA data, which is the most direct reflection of the daily operating conditions of the transformer. However, the sampling resolution of the DGA data in the above models is usually measured in months [26], making it difficult to capture the short-term changes in the DGA data.
In addition, the quality of DGA data is another important factor that affects the accuracy of the transformer failure diagnosis and failure rate modeling [27]. Nonetheless, a repairing model for missing DGA data is absent in the existing literature.
For the system reliability assessment considering component failures, a whole range of different approaches to the problem have been proposed. Reference [28] established an uncertainty-aware dynamic reliability analysis framework without the need of exact failure data of its components. Reference [29] presented a framework for prognosticsupdated dynamic dependability assessment based on the online data analysis. Reference [30] used the FTA for performing reliability analysis of an management system. In this model, the system reliability was also evaluated based on the given failure rates of the components. Reference [31] established a conceptual framework to incorporate complex basic events in hierarchically performed hazard origin and propagation studies (HiP-HOPS), which can guarantee the modelling capability on complex failures and the efficiency of Model-based safety analysis (MBSA) effectively. Reference [16] gave the detailed classification for safety models associated with machine learning (ML) and further proposed a novel approach based on ML and real-time operational data to reduce the difficulty in modeling and evaluation these complex safety-critical systems. In [32], the state analysis method is introduced for generic power system reliability assessment. Considering the requirements of the case studies, state analysis method is adopted in this paper for the substation reliability evaluation.
This paper established a novel modeling framework for transformer failure rate based on daily DGA data mining. The framework solves two main problems of repairing the missing DGA data and quantifying the impact of maintenance on failure rates. The short-term change characteristics of dissolved gases' content are also analyzed in this paper.
In summary, the main contributions of this paper are as follows: 1. A data repairing method derived from the improved KNN is proposed; thus, the quality of the modeling data is well guaranteed.
2 In order to eliminate the influence of unbalanced sample space on the repairing accuracy, the Genetic algorithm (GA) is introduced to construct a balanced sample space.
3. The parameter of ESD, which can reflect the realtime operational conditions when calculating the failure rate, is proposed to modify the Markov failure rate model. And then, a Multi-back-propagation neural network (M-BPNN) model is constructed to map the DGA data to ESD.
4. The MESD is further obtained by merging the ESD based on the analysis of the variation in the content of the dissolved gases. Thus, an analytical function of the failure rate with respect to the MESD is obtained based on a multistate Markov process.
The rest of this paper is organized as follows. The model for repairing the missing DGA data based on the GA and improved KNN algorithm is proposed in Section II. Section III established the transformer failure rate model proposed and its validation method. Case studies are given in Section IV. Finally, Section V concludes the paper.

II. REPAIRING MODEL FOR MISSING DGA DATA
The quality of DGA data is an important factor that affects the accuracy of the transformer failure rate model. The DGA data could be missing in the process of collection, transmission and storage [33]. It is necessary to repair the missing data. Most existing data repairing methods are based on time series analysis [34]. These methods have performed well when the missing data occurs under the normal state of transformers. However, the missing data often appears when the transformer works in abnormal/fault states [35]. Therefore, this paper proposes an improved KNN algorithm to repair the missing data. However, the number of DGA data samples in the abnormal or fault state is much smaller than that in the normal state due to the low occurrence probability of abnormal and fault states. It could result in an unbalanced data sample space and errors for the KNN algorithm. To avoid this problem, a balanced sample space of DGA data is first constructed through sample expansion based on a GA.

A. EXPANSION OF DGA DATA BASED ON GA
The genetic algorithm imitates the process of biological evolution and includes operations such as selection, crossover and mutation [36]. This paper proposes a sample expansion method that is inspired by the population generation method of the GA.
A target matrix G is created for storing the daily DGA data, including the content of each type of gas in N days. The n th row of the N * L matrix G represents the DGA data sample of n th day. It can be expressed as: where 1 ≤ n ≤ N , and L is the total number of types of dissolved gases. The dissolved gases include H 2 , C 2 H 2 , CH 4 , C 2 H 6 , C 2 , CO, CO 2 and total hydrocarbons. The fault types for these N days could be determined through their content ratios of dissolved gases [37], [38]. The 12 fault types and their corresponding ranges for the ratios of dissolved gases are given in Table 1.
Accordingly, each DGA data sample in G can be allocated to one of these fault types. The number of days when the transformer in fault type k (1 ≤ k ≤ 12) is N k . It has the following relationship with N : The DGA data belonging to type k constitutes an N * k L matrix G k : Choose G k as the initial parent population and randomly match the data samples contained in this population into pairs. A total of N k /2 pairs of parent samples can be obtained.
Take a pair of parent samples, for example, g k,u and g k,v , and perform the crossover operation on this pair, which is expressed as follows: where g s1 , g s2 and g s3 are newly generated DGA data samples, named here child samples. Formula (4) presents a typical linear crossover in GA proposed by Wright [36].  In formula (4), the coefficient of g k,u and g k,v are set to 1.5 and -0.5 when generating g s1 , to prevent g s1 from being too similar to g k,u or g k,v . The same reason holds for g s2 and g s3 . By keeping all the parent samples and child samples, the number of DGA data samples corresponding to fault type k will be 2.5N k .
To ensure the heterogeneity between the child samples and parent samples, an uncertain increment is added to the original values of each child sample. The step length variation method [39] is used to process those child samples obtained from (5).
where a(q) is equal to 1 with a probability of 1/m. Otherwise, a(q) = 0. Here, m is set to 20 normally, and g is a variation vector with the value range [g k,min , g k,max ]. Formula (5) uses an uncertain probability multiplied by an uncertain step length g to describe the uncertain increment for g s . Note that the step length g is bounded by the value range of each dissolved gas in G k .
After the process of mutation for child samples, the validity of each child sample generated should be examined; that is, only the samples with values within [0.95 g k,min , 1.05 g k,max ] can be added to G k . Otherwise, the samples should be removed from G k .
Perform the above operations iteratively until the sample size meets the requirement. At this point, the algorithm ends, and G k is updated. The sample expanding to consider other fault types can be treated with the same method. Finally, the data matrixes for all fault types constitute the updated G.

B. REPAIRING METHOD BASED ON IMPROVED KNN
The repairing method proposed in this paper is developed from the observation that the DGA data under the same abnormal/fault state have similar patterns and the missing data can be estimated according to these DGA data [34]. This paper uses the KNN algorithm [40] to find adjacent samples of the DGA data samples. The missing data is then repaired according to these adjacent samples.
Due to the missing data, the dimensionalities of the test samples and the training samples are not equal. Therefore, it is hard to choose the nearest samples by the Euclidean distance used in the traditional KNN. In this paper, the Pearson similarity [41] is selected to be the measure of the distance because it is not affected by the dimensionality mismatches between the samples.
To avoid the error in the distance calculation caused by the dimensionality mismatch and elementwise numerical difference, the DGA data samples should first be centralized. For the n th DGA data sample in G, g n , its centralization can be realized according to the following formula (6): where µ n is the mean value of all elements in g n . For the testing sample g test to be repaired, if the y th element of g test, g test (y), is missing, then the centralization formula should be described as follows: where µ test is the mean of all elements in g test after setting the y th element g test (y) to 0. Similarly, the other data samples in G can be processed by formulas (6) and (7). The advantage of (6) and (7) is that each centralized element becomes a value regardless of its original unit and magnitude.
The distance D n between the centralized test sample, g test , and the n th sample in G, g n , can be calculated based on the Pearson similarity: where cov represents the covariance between g test and g n . σ means the standard deviation of the vector. A smaller D n implies a closer distance between g test and g n and more similarity between them. According to formulas (8), the distance between g test and each sample in the updated G can be calculated. Select r nearest training samples and record these samples as G train . The weighted average value in G train is calculated as the repaired value of the missing data.
where η train,w is the weight of each data sample in G train , and 1 ≤ w ≤ r. Details of the weight calculation method can be found in [42]. A smaller distance in (8) results in a larger weight in the estimation of the missing data, which indicates its greater similarity to the test sample g test .

III. REFINED FAILURE RATE MODEL OF THE TRANSFORMER A. MULTISTATE MARKOV PROCESS OF A TRANSFORMER
According to IEEE Std C57.104 [35], the operational states of transformers can be divided by their DGA data into four states: normal state, abnormal state, warning state and failure state.
As a typical repairable component, the operating state of the transformer generally transfers between the above four states. Specifically, the transformer may change from a normal state to an abnormal/warning state with an increasing failure rate during the operation. It can also return to the normal state from the abnormal/warning state through maintenance. The multistate Markov process of the transformer is illustrated in Fig 1.  In Fig 1, ζ i,j represents the transition rate from state i to state j, which can be calculated as follows: where count i,j is the times that the transformer transfer from state i to the j. T i is the total duration that the tranformer stays in state i. Formula (10) can be used to calculate the transition rate between states with a direct transition path. Based on the relationships in Fig 1, the transition rates between all states form a Markov time-varying state transition matrix A, as follows: where A i,j = 0 indicates that there is no direct transition path between state i and state j.
Assume that the probability of the transformer operating in various states at time t is P(t) = [P 1 (t), P 2 (t), P 3 (t), P 4 (t)]. Since the Markov process established in this paper is continuous in operation time, this section uses the Kolmogorov differential equation [43] to calculate the P(t). The Kolmogorov differential equation that corresponds to equation (11) can be expressed as follows: where the left side of equation (12) is the derivative of P(t) with respect to the operation time t.
The probability of the transformer operating in various states P(t) can be obtained by solving (13): where When operating in the failure state (i.e. state 4), the transformer needs to be out of service and protected immediately. Therefore, the failure state is considered as the ending state of the internal latent fault of the transformer. Consequently, the internal latent failure rate at time t can be considered as the probability that the transformer transfers from the operating state to the failure state at time t, that is P 4 (t).
It should be noted that P(0) is the initial conditions of the differential equation (12). It denotes the initial probability that the transformer operates in each state at time t. The value of the i th element is set to 1 if the transformer is in state i at time t. Otherwise, it is set to 0. For example, P(0) = [1 0 0 0] is the initial conditions for solving the analytical expressions of the time-varying internal latent failure rate when the transformer is operating in normal state at time t. Similarly, P(0) = [0 1 0 0] and P(0) = [0 0 1 0] are the initial conditions for obtaining the failure rate expressions when the transformer operates in abnormal state and warning state at time t respectively. Additionally, α 0 , α 1 , α 2 , and α 3 are exponential functions determined by the eigenvalues of matrix A. These functions can be solved by the Hamilton-Hailey method [24], [46].
Define the SD t i as the duration that the transformer has stayed in state i at time t, and 1 ≤ t i ≤ T i . From equations (10)- (14), the analytical expressions of the failure rate λ i (t) for the latent failure of the transformer can be expressed by equation. As shown in formula (15), as shown at the bottom of the next page, the time-varying internal latent failure rate of transformer at time t under different operating states has different analytical expressions.
When calculating the latent failure rate of the transformer at operation time t, the variable used should be the SD that corresponds to t, t i , rather than t. A constant C is chosen to represent the external random failure rate of the transformer [27]. The total failure rate at time t can be obtained as follows: The above failure rate model is called the Markov failure rate model (Markov model). An illustrative example is given in Fig 2, which shows the change in the failure rate with the operation time, according to the Markov failure rate model. Suppose that the maintenance is performed on the transformer at time t 3 in a warning state, which results in a significant decrease in the failure rate at time t 3 . The SDs in state 2 at time t 1 and t 2 are equal, and the failure rates at time t 1 and t 2 calculated by formula (16) and are also the same. The reason for equal SDs at times t 1 and t 2 is that the Markov model always assumes that the transformer can return to the initial state after the maintenance is performed at time t 3 [22].
Obviously, the values of SD t i used in the Markov model are determined only by the current state and the corresponding state duration at time t. As along as the transformer is under the same state and has the same state duration, the values of SD remain the same. However, the maintenance might not return the transformer to its original condition [22]. The SD at time t 2 after maintenance should be no longer the same as that at time t 1 . Therefore, the Markov failure rate model tends to have an optimistic view of the impact of maintenance by ignoring the actual operating condition after maintenance.
The next section introduces a modification of SD combined with the actual operating condition after the maintenance to effectively describe the impact of the maintenance.

B. REFINED MODEL OF THE TRANSFORMER FAILURE RATE 1) MODELING OF ESD
To quantify the impact of the maintenance, the SD t i in formula (16) is modified to its corresponding ESD t i . Compared with SD, ESD is able to simultaneously consider the current state, the corresponding state duration and the operating condition at time t. The DGA data should be accounted for in the calculation of ESD because it can efficiently describe the operating condition of the transformers. Fig 3 illustrates the modification of SD to ESD. The DGA data that corresponds to time t is used to obtain t i . The quantitative relationship with non-linear correlation between the ESD and the DGA data is modeled by BPNN in this paper. Note that the other neural network models like Radial Basis Function neural network (RBF) can also be used for modeling [44], [45]. BPNN is a multilayer feedforward network that is trained by an error back-propagation algorithm. This network is able to describe nonlinear relationships and has a good learning ability [46]. In addition, BPNN has been proved that it has a reasonable explainability [47].The network structure of the BPNN is shown in Fig 4: As seen from Fig 4, the BPNN includes an input layer, hidden layer and output layer. Formula (17) describes the feature delivered from the input layer to the hidden layer: where h 1 is the feature vector for the first hidden layer. X is the input vector. W 1 is the weight coefficient matrix of λ 1 (t) = α 3 ζ 1,2 ζ 2,3 ζ 3,4 if P(0) = 1 0 0 0 λ 2 (t) = ζ 2,3 ζ 3,4 [α 2 − α 3 (ζ 2,3 + ζ 3,4 + ζ 3,2 + ζ 3,1 + ζ 2,1 )] if P(0) = 0 1 0 0 λ 3 (t) = α 1 ζ 3,4 − α 2 ζ 3,4 (ζ 3,4 + ζ 3,2 + ζ 3,1 ) + α 3 ζ 3,4 [ζ 2,3 ζ 3,2 + (ζ 3,4 + ζ 3,2 + ζ 3,1 ) 2 ] if P(0) = 0 0 1 0 (15) the hidden layer. B 1 is the bias vector. 1 is the activation function.
A sigmoid function is used as the activation function, which is shown in (18). The relationships between the hidden layers can be expressed as (19) for the multilayer neural networks.
where e is the index of the hidden layers. W e is the weight coefficient matrix of the e th hidden layer. B e is the bias vector of the e th hidden layer.
Assuming that there are X hidden layers in total, the predicted outputŷ can be described as: Since the calculation of the output layer is exactly the same as the hidden layer, the output layer can be regarded as the X + 1 th hidden layer. The error δ y between the predicted output and the actual output can be calculated as δ y =ŷ − y (21) Combining (20) and (21), the back derivation of the error of the e th hidden layer can be expressed as: where the weight coefficient matrix of the e th hidden layer can be updated as: The above is an iteration in the training process of the BPNN. The learning parameters of the BPNN can be tuned after several iterations to obtain a more accurate BPNN model.
The ESD is equal to its SD at any time before the maintenance of the transformer. Nonetheless, this circumstance is not the case after maintenance. To correctly represent the relationships between the ESD and DGA data, only the DGA data and its corresponding ESD (equal to SD) before maintenance should be selected as the training samples of the BPNN.  STEP2: Tune the learning parameters. The training samples are inputted into M-BPNN for learning, in such a way that the mapping relationship between the input and output can be constructed.
Testing: STEP3: Choose the DGA data G aft after maintenance and determine the operating state of the transformer. Next, input the data into the different BPNN models according to the corresponding states. Finally, the ESD T aft corresponding to the DGA data G aft can be estimated from the BPNN model.

2) MODELING OF THE MESD
Minor changes in the dissolved gases within a short period of time are commonly seen in the DGA data. Such changes do not necessarily indicate changes in the failure rates. If the changes in the dissolved gases between any two days are less than a given threshold in this period, then the ESD and failure rate of the transformer can be considered to be the same. Therefore, the ESD in this period could be merged into the same value. To achieve this goal, this paper proposes a clustering algorithm based on the distance matrix between the time series of the DGA data. By this method, the time series of the DGA data can be partitioned into different sequential periods.
The first step of the clustering algorithm proposed is to create a distance matrix, which can be expressed as: where F is the number of DGA data samples of a particular transformer. Each element in matrix Dist adopts the standard VOLUME 8, 2020 Euclidean distance: where 1 ≤ f < F, and s i is the standard deviation of each element. The advantage of using the standard Euclidean distance is that this formula can avoid the influence of dimensionality differences in the distance calculation. Next, the procedure of the clustering algorithm proposed in this section is shown in Table 2. Assuming that time t is clustered into period p, the MESD t i corresponding to time t can be obtained by averaging the ESD of each day in period p: (26) where q is the number of data samples in period p. Here, t b is the ESD corresponding to the b th data sample in this period. The total failure rate at time t can be modified from (15) and (16) by replacing the parameter ESD t i with MESD t i (t p ): Combining (26) and (27), the analytical failure rate model proposed in this paper is established.

C. VERIFICATION METHOD OF THE FAILURE RATE MODEL
Although the failure rate of an individual transformer is affected by factors such as the region, operation time, and statistical methods, the transformers' failure rates have similar changing trends over time [48], [49]. The trend in the individual transformer failure rates should be consistent with the statistical data of the failure rate that is observed from multiple transformers. This paper proposes a verification method of the transformer failure rate model by comparing the similarity in the changing trend of the failure rate over time.
Suppose that λ pro = {λ pro,1 , λ pro,2 , λ pro,3 , . . . , λ pro,τ } is the time series of the time-varying failure rates of an individual transformer obtained from the failure rate model. Additionally, λ sta = {λ sta,1 , λ sta,2 , λ sta,3 , . . . , λ sta,υ } is the time the series of failure rates from the statistical data of multiple transformers. The similarity between the overall failure rate and the individual transformer failure rate can be characterized by the parameter RMSD as follows: where τ and υ are the number of points in the sequence λ pro and λ sta respectively. DTW(τ , υ) represents the distance between λ pro and λ sta with dynamic time regularization [50]: DTW(o, z) represents the Euclidean distance between the o th point of the sequence λ pro and the z th point of the sequence λ sta . A smaller RMSD value indicates greater similarity between the trends of the two failure rate time series. As a result, the change in the failure rate of an individual transformer calculated by the failure rate model is more consistent with that obtained from the statistical data; i.e., the model is more accurate.
To better explain how different methods mentioned above join together to established the novel transformer failure rate model above, the flowchart of the transformer failure rate modeling process proposed is given in Fig 6. Note that the proposed validation method in Section III-C does not need to be considered in the flowchart description below.
As shown in Fig 6 above, the failure rate modeling proposed can be divided into three main parts including repairing the missing DGA data, modifying the model parameters and calculating the transformer failure rate. The detailed explanation of part 1, part 2 and part 3 can be found in the section II and section III.

IV. CASE STUDY
This paper uses the DGA data of multiple 110 kV oil-immersed power transformers from an electrical company in southwest China in 2017 to verify the proposed failure rate model of a transformer. The model is applied to two transformers of a typical substation, as shown in Fig 7. The two 220/110 kV transformers, T1 and T2, use the same DGA data. The external failure rate of transformer C is set to 0.005. The constant failure rate λ c of the two transformers is set to 0.2. The reliability data and load data of this substation can be found in [51]. The failure rates from the statistical data of multiple transformers can be found in [52].
In this section, study 1 verifies the effectiveness of the repairing method for the missing DGA data. In Study 2, the failure rate of the transformer without maintenance was calculated. Study 3 calculates the failure rate of the transformer while considering maintenance first. The results are then applied in the reliability assessment of the substation by the state analysis method [53]. Study 4 verifies the accuracy  The samples with missing data are shown in Table 3. The unit of each gas content is µL/L. ' * ' represents the missing DGA data in each sample. Table 4 gives the corresponding RV, RE (Relative Error) and FTJ (Fault Types Judged) obtained by the three methods.
It can be seen from Table 4 that the three methods all have low relative errors (<10%) when the data missing occurs in a normal state (sample 1 and sample 2). At the same time, all three methods make the correct judgment of the fault types based the repaired DGA data. On closer examination, the repaired value based on the proposed method keeps the relative error less than 1%, which means that it is better than the other two methods when the data missing occurs in a normal state.
When the missing data occurs in abnormal or fault states (sample 3 and sample 4), the proposed method still shows good performance. The relative error is still less than 15%, which implies its accuracy for the fault type judgment. However, the relative errors obtained by the KNN and Arima methods are larger than 45%, which results in mistakes in the fault type judgment. The results confirm that the repairing method proposed is suitable for repairing the missing data that occurs in abnormal or fault states compared with the traditional KNN method and Arima method. Table 5 gives the maximum, minimum and the average values of RE obtained from the three methods.
It can be seen from Table 5 that the proposed repairing method always has a lower RE regardless of the operating states of the transformers. The average RE is reduced by 23.69% and 17.07% compared with KNN and Arima, respectively. Therefore, the method can provide accurate repaired data for the failure rate model.       According to Fig 9, the failure rate given by the proposed model is maintained at a constant value in a short time period, even though the DGA data are changing every day. For example, the failure rates from 2010-1-22 to 2010-1-27 maintain the same value. The reason is that the model proposed considers short-term changes in the dissolved gases to be a normal phenomenon. Overall, the failure rate curve still shows a trend of stepped increase, whereas that obtained from the Markov model is continuously increasing. Moreover, the failure rate curve obtained by the proposed model fluctuates around the Markov failure rate curve.
In fact, the transformer had just been put into service from 2010-1-22 to 2020-2-15, and the failure rate will not reach 0.2 in this period. The failure rate only increases from 0.00541 to 0.00555, and the internal latent failure rate only increases from 0.00041 to 0.00055. The reason is that the transformer is in a normal state during 2010-1-22 to 2010-2-15. The main cause of transformer failures should be attributed to external random failures rather than internal latent failures.

C. STUDY 3. SHORT-TERM FAILURE RATES WITH MAINTENANCE
To characterize the impact of maintenance on the transformer failure rate, study 3 selects the DGA data from 2013-11-3 to 2013-12-1 as the modeling data for transformer T1 and T2. The maintenance of the transformer is scheduled on 2013-11-12. Table 6 shows part of the data from 2013-11-3 to 2013-12-1. It can be seen from Table 6 that the ESD will not be equal to the SD for each day after the maintenance. This table also shows that the MESD in 2013-11-11 is 142.5 days in state 3, and the MESD in 2013-11-12 is 171.890 days in state 1, after maintenance. However, the Markov model considers that the transformer is able to return to state 1 of the first day after maintenance.
The failure rates from 2013-11-3 to 2013-12-1 based on the proposed model are displayed in Fig 10. They are compared with the results obtained by the Markov failure rate model. This case focuses on the analysis of the impact of maintenance on the transformer failure rate. The increment of the failure rate caused by maintenance is the difference between the failure rate obtained by the proposed method and the Markov model. The increments recorded in the period from 2013-11-12 to 2013-12-1 are shown in Table 7. Note that the unit for the values in the table below is 10 −2 .
After maintenance, the failure rate obtained by the Markov failure rate model is significantly smaller than that obtained by the proposed model. The reason is that the actual effect of the maintenance is overestimated in the Markov model. As can be seen in Table 7, the failure rates obtained from Markov model are keep in 0.005 from 2013-11-12 to 2013-12-1 because the internal latent failure rates calculated are ignored as they are very close to 0. Although maintenance indeed can reduce the failure rate, the transformer cannot be repaired to the initial operating state. The results of the reliability assessment for the substation can be calculated based on the two failure rate models. This paper selects Loss of load probability (LOLP) and Expected energy not supplied (EENS) as the indices for the reliability assessment [54]. The LOLP and EENS are shown in Fig 11. Before the maintenance of the transformer, the results of the assessments from the two models are close to each other. However, the LOLP and EENS calculated based on the proposed model are significantly higher than the results by the Markov failure rate model after maintenance. The results show that the maintenance operation has a large influence on the substation reliability assessment. If the actual impact of the maintenance is not considered in the reliability assessment, the assessment results will be overoptimistic.

D. STUDY 4. VERIFICATION OF THE FAILURE RATE MODEL
In this case, the RMSD between the failure rate curve obtained by the proposed model and the statistical failure rate curve is used to measure the accuracy of the proposed model. Details on the RMSD can be found in section III.
The daily DGA data used in the proposed failure rate mode is collected from the transformers whose operation time range is from 7.4 to 11.3 years. In total, 7 sampling points are selected, and the intervals between the sampling points are uneven. The average failure rates at the sampling points are then calculated separately.
The failure rate curves obtained from the proposed model and the statistical data are shown in Fig 12. For comparison,  Fig 12 also gives the curve obtained from the Markov model. The failure rate curve in blue is the overall failure rate curve based on the statistical data. The curve in orange is obtained from the proposed model. The gray curve is the transformer failure rate curve based on the Markov failure rate model.
To verify the effectiveness of the different failure rate models, the RMSD between the orange curve and the blue curve, and that between the gray curve and the blue curve, are calculated. The results are shown in Table 8. Table 8 shows that the RMSD of the proposed model is smaller than that of the Markov failure rate model and is   decreased by 60.34%. The trends in the failure rate curve proposed are more consistent with the overall failure rate curve. This finding shows that the effectiveness of the model is indeed improved compared with the Markov failure rate model. The results also show that maintenance has a large influence on the failure rate, and its impact cannot be ignored.

V. CONCLUSION
This paper proposes a new approach for transformer timevarying failure rate modeling by mining the DGA data on a short-time scale. The model obtained by the approach builds a connection between the DGA data and the time-varying failure rate of the transformer, which quantifies the impact of maintenance on the failure rate.
Case studies show that DGA data can effectively describe the operating conditions of transformers and the impact of maintenance on the transformer failure rate. After maintenance, the failure rate will decrease, but it is still higher than that obtained from the Markov failure rate model. On the other hand, the LOLP and EENS calculated based on the proposed model are higher than that obtained from the Markov failure rate model. The results show that the trend of the failure rate curve obtained from the model proposed is more in line with the statistical failure rate curve. Therefore, the proposed model can better characterize the actual transformer failure rate considering the impact of maintenance. Furthermore, the proposed repairing method for missing data can effectively improve the accuracy of the modeling data.
With the installation and operation of a large amount of transformer monitoring equipment, massive quantities monitoring data will be used in the modeling of transformer failure rates. Determining how to effectively identify the erroneous monitoring data and improve the accuracy of the transformer failure rate model will be an important research direction in the future.