Failure Analysis for Truncated and Fully Censored Lifetime Data With a Hierarchical Grid Algorithm

In the field of failure analysis and reliability evaluation, truncated and censored lifetime data due to observational constraints and periodic inspection schemes are highly common. The expectation-maximization algorithm is a widely employed approach for the parameter estimations of truncated and censored lifetime data. In this study, a new hierarchical grid algorithm is proposed to estimate the model parameters based on left-truncated and fully-censored data. An improved expectation-maximization algorithm is also adapted under incomplete information. These two methods are compared using Monte Carlo simulations. The confidence intervals corresponding to different quantile methods of the nonparametric bootstrap are compared in terms of coverage probabilities. In a case study, the failure analysis and prediction intervals for individual coupler knuckles of rail wagons are discussed. This simulation study and case study verify the effectiveness of the proposed framework.


I. INTRODUCTION
Lifetime data has been used to evaluate the time-to-failure distribution and to predict remaining useful life (RUL) for a long time in the field of reliability. The quality of the lifetime data collection strongly affects further research. However, incomplete data containing truncation and censored information are highly common due to long-interval inspections, especially during the process of establishing the condition monitoring system.
Censoring is generated by the scenario that researchers cannot fully observe the occurrence of a research event during the observation time window due to realistic limitations. This characteristic is an attribute of individuals in the sampling group, which means that under the same observation conditions, the entry information of a sample is known, but the start-stop information of individuals is (partly) unknown. Unlike censoring, truncation is an attribute of the population. Due to the limitations of observation conditions, the subjects can only be observed when they appear within a particular The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojun Li . range, and the existence of individuals who fall outside the certain range cannot be known. In other words, the entry information of the population is incomplete. The books by Cohen [1], Meeker and Escobar [2] provide detail discussions on truncation and censoring.
Many researchers have developed statistical inference methods based on truncated and censored data. Among these methods, the left-truncated and right-censored structure is the most widely studied data scenarios. Hong et al. [3] developed a statistically based analysis and predicted the RUL of high-voltage power transformers for maintenance planning and capital expenditures purposes. In their study, the collected power transformers lifetime data were lefttruncated at the starting date of record keeping and rightcensored at the ending date of the study. Following studies by these researchers, Balakrishnan and Mitra [4]- [6], Emura and Shiu [7], and Kundu and Mitra [8], [9] worked on likelihoodbased inference for left-truncated and right-censored data. Zhao et al. [10] and Bai [11] also studied left-truncated and right-censored data in other areas, such as clinical trials.
Right censoring could be viewed as a special form of interval censoring. Interval-censored observations consist of the upper and lower bounds of a failure time and often occur in, for example, epidemiological and medical follow-up studies. Shen [12], [13], Wang et al. [14], Shen and Li [15], and Shen et al. [16] proposed a series on studies of the conditional likelihood approach for statistical inference based on lefttruncated and interval-censored data in an AIDS cohort study. Balakrishnan and Mitra [17] proposed to replace the continuous checkup with a periodic checkup of power transformers to obtain left-truncated and interval-censored data.
The expectation-maximization (EM) algorithm [18]- [20] has been used by numerous researchers, especially in the context of incomplete data. The EM algorithm consists of two steps: the expectation step (E-step) and the maximization step (M-step). In the E-step, the conditional expectation of the complete data likelihood is obtained, given the observed incomplete data and the current value of the parameter. In the M-step, this expected complete likelihood is maximized by adjusting the parameters. The E-step and M-step are then iterated until reaching the convergence criterion [21].
Balakrishnan and Mitra [4] developed an EM algorithm for fitting a lognormal distribution to the left-truncated and right-censored data. Similar EM algorithms have been developed for Weibull distribution [5] and generalized gamma distribution [6]. Recently, Shen et al. [16] proposed an EM algorithm based on left-truncated and interval-censored data for obtaining the conditional maximum likelihood estimator (cMLE). Balakrishnan and Mitra [17] developed a stochastic expectation-maximization (St-EM) algorithm to estimate the model parameters.
However, the inference of parameter estimation methods and their applications to left-truncated and fully censored data have not been studied extensively. Table 1 summarizes a detailed comparison of the proposed method and the related literature.
Although the EM algorithm is widely used for parameter estimation, its limitations are also significant. In the case of small samples and high truncation ratios, the EM algorithm is relatively limited to the selection of the initial value, and the overall accuracy of parameter estimation under these conditions needs to be further improved. Furthermore, the EM algorithm possesses many iteration steps in the parameter estimation process and the derivation process is highly complicated with no closed-form solution for most cases.
In this study, a framework for analyzing left-truncated and fully censored data is proposed. A new hierarchical grid (HG) algorithm is proposed for parameter estimation and is compared with an adapted EM algorithm. The effectiveness, superiority, and practicality of the proposed HG algorithm is validated by a simulation study and case study.
The rest of this article is organized as follows. Section II describes the data features and the likelihood function construction. Section III introduces the proposed HG algorithm for the maximum likelihood estimation of the model parameters. Section IV presents a simulation study to compare the improved EM algorithm and the HG algorithm. The nonparametric bootstrap methods for the construction of the parameter confidence intervals are also discussed. Section V provides a case study to illustrate the whole framework of handling left-truncated and fully censored data for failure analysis and RUL predictions. Section VI summarizes this study with a general conclusion and suggestions of further research directions.

II. LIKELIHOOD CONSTRUCTION OF LEFT-TRUNCATED AND FULLY CENSORED DATA
In this section, a realistic scenario containing incomplete lifetime data is introduced to illustrate left-truncated and fully censored data. The construction of likelihood functions is also presented.

A. REALISTIC LEFT-TRUNCATED AND FULLY CENSORED DATA SCENARIO
The lifetime data of coupler knuckles is collected from a rail wagon maintenance factory, which was established in 2015. The lifetime data is observable only when the unit failed after the year 2015, as careful archival record keeping of knuckle couplers' lifetime data started in that year. In other words, the lifetime data has been left-truncated at 2015. The recording time-window is from 2015 to 2019, a four-year period. Clearly, the interval is significantly shorter than the average lifetime of a coupler knuckle of 8-9 years under normal operating conditions. However, the customers of the maintenance factory maintain numerous rail wagons, which have been running for decades. Fig. 1 shows an illustration of the truncation and censoring mechanism based on the observation time-window. For ''Subject a'', the service starting time and the failure time of the coupler knuckle are both before 2015. There is no information available (including its existence) for Subject a, as this coupler knuckle is completely outside the observation VOLUME 8, 2020   Fig. 2 shows the mechanism for generating different censored data based on the maintenance policy. All the observable units are pulled back to the same starting point (the beginning of their service time). It is known that coupler knuckles are inspected every two years. ''Subject e'' failed at the first inspection, and its failure should have happened sometime before the first inspection. Hence, Subject e is leftcensored at the first inspection. The ''Subject f'' passed the first inspection and failed at the second inspection. The failure of this subject should have happened sometime between the first inspection and the second inspection. Hence, Subject f is interval-censored. ''Subject g'' remained in service after the third inspection, and its failure should be sometime after the third inspection. Hence, Subject g is right-censored at the third inspection. It is worth noting that these collected data are also used in the case study section to illustrate the proposed models.

B. LIKELIHOOD CONSTRUCTION
Likelihood represents the probability of data generation. From this perspective, truncation could be viewed as a conditional sampling distribution, and censoring could be viewed as the cumulative likelihood of an interval. The likelihood construction of truncated and censored data has been well established in the literature [22], [23]. In this section, a concise summary of related concepts is given to avoid any misunderstanding.
The lifetime of a unit is denoted by the random variable t. Let i be the index of the unit and j be the index of inspection. Let t j i denote the lifetime of the i th unit in the j th (j ≥ 1) inspection. For a unit installed before 2015 (the truncation point), let τ L i denote the time between the year of installation and 2015. For a unit that is still working after 2019 (the rightcensored point), let c R i denote the time between the year of installation and 2019. Let α i denote the truncation indicator, that is, Let β i denote the censoring indicator, such that: Further, let S 1 and S 2 be two sets of units, where S 1 is the set of units installed on or after 2015 and S 2 is the set of units installed before 2015. The likelihood function for the lifetime data is represented as After simplification, the log-likelihood function for lifetime data is given as

III. PARAMETER ESTIMATION METHODS
In this section, the idea and procedure of the proposed hierarchical grid algorithm are introduced. An improved version of the EM algorithm that can handle fully censored data is also presented for comparison purposes.

A. PROPOSED HIERARCHICAL GRID ALGORITHM
For right-censored lifetime data, the log-likelihood function is relatively simple, and the analytical formula can be obtained by EM algorithm with Newton-Raphson method. For right-censored and left-truncated lifetime data, approximate solutions could be obtained by EM algorithm with the missing information principle [19]. However, the data structure studied in this paper contains fully censored data with a high truncation ratio, and it is highly difficult to obtain the derivative of likelihood functions. Since the proposed loglikelihood function model is convex-that is, it has a unique peak value, and the function value changes gently near the peak value-this paper proposes a hierarchical grid algorithm inspired by the idea of the gradient descent algorithm.
The gradient descent algorithm is one of the simplest and oldest methods for solving unconstrained optimization problems. Based on this approach, many effective algorithms have been improved. The gradient descent method uses the negative gradient direction as the search direction. Thus, in the proposed HG algorithm, the log-likelihood function is converted to its negative value, and then the maximum of the likelihood function is converted to obtain the minimum of the likelihood function. The multivariate function −Q (θ) is defined to be equivalent to the log likelihood function logL (θ | DATA). Then, solving the optimal parameters of the log-likelihood function is equivalent to solving the extremum of this multivariate function. Suppose the parameters to be solved follow normal distributions. Then the multivariate function to be solved is a binary function.
In view of the monotonicity of the likelihood function, if the maximum of the current log-likelihood function is located in the center of a rectangular grid lattice, the global maximum exists in the four-unit domains adjacent to the current maximum. In short, the HG algorithm first searches the region where the extremum exists from the global feasible solution space and then searches for the extreme points in the extremum region. As shown in Fig. 3, the HG is roughly divided into the following two stages.

1) STAGE 1. FIND THE REGION WHERE THE EXTREMUM EXISTS
Step 1: Set the current minimum point where θ * denote the global optimal value. Construct an equally spaced initial collection of points Define the diagonal length of the unit grid as the iteration step size of the corresponding parameter, and its size is The description of the symbols in Fig. 4 is as follows: where σ i1 and σ in are the minimum and maximum values of the collection of points i in the σ direction, µ i1 and µ in are the minimum and maximum values of collection of points i in the µ direction, f i is the step coefficient, and C is a constant. The larger C is, the larger the coverage of the collection of points i . The constant m determines the coverage of the initial point set. Let θ i,1×1 , θ i,1×n , θ i,n×1 , and θ i,n×n be the lower-left endpoint, the upper-left endpoint, the lower-right endpoint, and the upper-right endpoint of the collection of points, respectively. In Stage 1, θ 0 assumes the prior parameter, such as the mean and standard deviation of the sample. Let θ i = θ i−1 , i = 1, 2, 3, . . ..

2) STAGE 2 OBTAIN THE EXTREMUM IN THE REGION WHERE THE EXTREMUM EXISTS
Step 3: Set the current minimum point . .. Let parameter θ * denote the global optimal value. Construct the equally spaced initial collection of points j = {θ j,1×1 , θ j,1×2 , . . . , θ j,1×n , θ j,2×1 , . . . , θ j,n×(n−1) , θ j,n×n }. The description of the symbols in Fig. 5 is as follows: where σ j1 and σ jn are the minimum and maximum values of the collection of points j in the σ direction, µ j1 and µ jn are the minimum and maximum values of the collection of points j in the µ direction, f j is the step coefficient, and C is a constant. The larger C is, the larger the coverage of the collection of points j . Let θ j,1×1 , θ j,1×n , θ j,n×1 , and θ j,n×n be the lower-left endpoint, the upper-left endpoint, the lowerright endpoint, and the upper-right endpoint of the collection of points, respectively.
Step 4: As shown in Fig. 5, we obtain the current minimum It is worth noting that this algorithm requires that the precision of the two parameters both meet the requirements, rather than only one parameter.
In summary, the proposed HG algorithm shares three basic elements with the gradient descent method, namely, the initial point, the descending direction and the descending step size. Table 2 compares the three elements of the proposed HG algorithm and the gradient descent method.

B. IMPROVED EM ALGORITHM
The EM algorithm is a generalized iterative algorithm that computes a maximum likelihood estimation of a parameter in the case of incomplete data. The approach corrects the censored data using the parameter estimation from the previous  step and then updates the parameter estimation with the maximum likelihood criterion. The EM algorithms proposed in the literature were mainly focused on the case of left-truncated and right-censored data. Therefore, EM algorithms need to be improved to handle interval-censored data. In this section, an improved EM algorithm is proposed to analyze left-truncated and fully censored data with different censoring types. In general, the partial derivative of the function can be calculated to obtain the general expression of the extreme point of the function. If the multivariate function is complex and it is difficult to derive the partial derivative of the function, then the numerical solution can be used to solve the approximate optimal solution of the function.
Let Y = {y 1 , · · · , y k+1 , · · · , y n } be the complete logtransformed lifetime data. It is assumed that the lifetime of the units follows the normal distribution with a probability density function (PDF) as: Since certain lifetime data points are not observed exactly, let Z = {z 1 , · · · , z k , z + k+1 , · · · , z + n } be the observed function of Y, and among the points, z + k+1 , · · · , z + n are the censored data.
Based on the left-truncated and full-censored dataset Z, this paper obtains the model parameters θ = (µ, σ ) by the following improved EM algorithm. E-step: Calculate the conditional expectation of a loglikelihood function.
where µ (i) , σ (i) are the parameter values at the beginning of the i + 1 th iteration and S 1 is the left truncated data sets. To simplify the function, the following expressions are defined: where i = 1, 2, · · · and j = k + 1, · · · . Then, the summation part of the log likelihood function can be expressed as

M-step:
Obtain the maximum point of the function Q θ | θ (i) .
First, the following equations are defined to further simplify the likelihood function: Then, it is easy to verify that a (t) = (t) + µ (i) and b (t) = t + µ (i) (t) + µ (i)2 + σ (i)2 . Thus, (9) and (10) can be updated as: If the incomplete data does not contain truncated data, then the likelihood function can be partially derived to obtain the general formula of the parameter estimation as: If the incomplete data contains truncated data, then it is difficult to solve the partial derivative of the likelihood function. Therefore, the Nelder-Mead simplex algorithm is used to obtain the approximate optimal solution. Through improvement, the improved EM algorithm can solve the parameters based on left-truncated and full-censored data.

IV. NUMERICAL VALIDATION
In this section, the validity, superiority, and robustness of the proposed HG algorithm are discussed. The comparisons of different bootstrap confidence interval estimating methods are also discussed based on a simulation study.

A. NUMERICAL VALIDATION BASED ON LEFT-TRUNCATED AND RIGHT-CENSORED DATA
To verify the proposed HG algorithm in the case of lefttruncated and right-censored data, the EM algorithm of lognormal distribution introduced by Balakrishnan and Mitra [4] is employed for comparison. The sample mean µ = 3.03 and standard deviation σ = 0.41 are taken as the initial values to obtain the target parameter estimations asμ = 3.481 and σ = 0.456. The maximization step of the EM algorithm involves an approximation using a Taylor series expansion.  The tolerance is indicated by the Cartesian distance between the corresponding parameter point of the current iteration and that of the previous iteration. The iterative process and results of the two algorithms are shown in Table 3.
As shown in Table 3, the parameter estimation results of the HG algorithm and the improved EM algorithm are almost the same, which indicates that the HG algorithm is also suitable for dealing with left-truncation and right-censored data. The HG algorithm takes fewer steps to reach convergence than does the improved EM algorithm. The HG algorithm only needs four iterations to reach a tolerance of less than 0.001. These results reveal that the HG algorithm converges more rapidly than does the improved EM algorithm.

B. SIMULATION COMPARISONS FOR FULLY CENSORED DATA
To verify the applicability of the proposed HG algorithm to left-truncated and fully censored data, this section adds interval censoring to the simulation data from Balakrishnan and Mitra [4]. The improved EM algorithm is used for verification and comparison. Per Hong et al. [3], the truncation time is fixed as 1980, and the censoring time has been fixed as 2008. The specific simulation steps are summarized as follows.
Step 1: To analyze the influences of the percentage of truncation on the algorithm, the truncation percentages are set to be 30% or 60%.
Step 3: Set the sample size n. In this study, three levels of sample size are considered: n = 75, 100, and 200.
Step 5: Combine the lifetime with the corresponding installation year, and then, the year of failure of the machine is obtained. If the year of failure is before 1980, we discard the unobserved machine and generate a new observed one with the new installation year and lifetime. If the year of failure is between 1980 and 2008, then this entry is interval censored.
If the year of failure is after 2008, this entry is right-censored.
Step 6: Define two parameters following uniform distribution, i.e., δ 1 , δ 2 ∼ U (0.05,0.1). If the machine is intervalcensored, then add these two parameters to the endpoints of the lifetime, i.e., (L i , R i ] = (X i − δ 1 , X i + δ 2 ]. If the machine is right-censored, then add the first parameter to the left endpoint of the lifetime, i.e., (L i , Since the generated sample data is fully censored data, the sample mean and standard deviation of the left endpoint of the lifetime can be taken as the initial value. The results including the bias and mean square error of the parameter estimation, average number of iterations needed, and average tolerance limit based on a 1000-time Monte Carlo simulation are summarized in Table 5. EM I corresponds to the improved EM algorithm for adapting left-truncated and fully censored data, while HG corresponds to the hierarchical grid algorithm.
The simulation results of Table 5 show that the bias and mean square error values of the model parameters of the two methods are approximately the same. When the parameter has a large true value, the average number of iterations needed and the average tolerance limit of the HG algorithm are significantly lower than those of the EM I algorithm, which means that the iteration step size of the HG algorithm is larger than those of the EM I algorithm.
Both algorithms use conditional probability to correct the deviation caused by the truncation so that the estimation results should have less deviation from the true value. As the number of data point increases, the deviation between the estimation result and the true value decreases. In such cases, the truncation ratio has effects on the parameter estimation.
Based on the above analysis, the stability of both methods is good, but the HG algorithm has a faster convergence speed. For sets of simulation data with small true values of model parameters, both algorithms show better stability and convergence performance.

C. DISCUSSION ON INITIAL VALUE AND STEP SIZE
The above discussions verify the effectiveness of the HG algorithm. Based on the derivation and the simulation results, it is easy to conclude that the EM I algorithm can find the general formula of the parameter estimate without considering the truncated data. In this section, the influence of initial value and step size is discussed.
The EM algorithm depends on the choice of initial values and shows a problem of convergence when a good choice is not given. Table 6 presents the average number of iterations needed and the running time of the above two algorithms. The running time is counted when the algorithm ends iteratively, that is, when the parameter accuracy requirement is met. This discussion is based on the simulation data in the above section with µ = 3.5, σ = 0.5, and n = 200. This study selects TABLE 5. Bias (B), mean square error (MSE), average number of iterations (AI), and average tolerance limit (AT) for the lognormal distribution using two different methods (EM I and HG). two far-off initial values for µ and σ : µ = 0.1 or 10 and σ = 0.01 or 5.
As shown in Table 6, the EM I algorithm takes more steps to converge, and it even cannot converge in some cases. In contrast, the HG algorithm requires fewer iterations, has a shorter running time, and exhibits a good robustness.
Considering that the parameter value of the first solution is in the edge region, setting the size of the new parameter domain directly determines whether the global minimum point can be found. This section also compares and analyzes the effect of the step size on seeking the global minimum point.
Set the step size coefficients of the HG algorithm to be m = 25 and c = 10. Fig. 6 shows the process of obtaining the global minimum value according to the step size coefficient function f i = C 1+e −i × m−1 2 and constant, i.e., f i = m−1 2 . Evidently, the change in the step length effectively reduces the number of iterations of the algorithm and helps the algorithm to converge quickly. The authors suggest to further study the The proposed HG algorithm is similar to the Nelder-Mead method [26], since both of them are numerical optimization methods. The HG algorithm combines the ideas of simplex method and gradient descent method. In the first phase of the HG algorithm, the number of grid points is relatively large, which consumes more computing resources. The HG algorithm searches the grid points to get the optimal solution. The values of grid points are enumerated and compared to determine the peak domain, and the global optimal solution is obtained within the peak domain. It is worth noting the feature of global convergence of the HG algorithm is under the assumption that the log-likelihood function is strictly VOLUME 8, 2020 convex near the peak with a flat unimodal area. In summary, the number of iterations of the HG algorithm is significantly less than that of the improved EM algorithm, and the proposed method converges more quickly when given a far-off initial value. The HG algorithm has stronger applicability and can obtain a global optimal solution.

D. COMPARISONS AMONG BOOTSTRAP CONFIDENCE INTERVALS
The bootstrap method was originally developed by B.Efron as a method for deriving the standard error of arbitrary estimates, and it can be divided into the nonparametric bootstrap approach and the parametric bootstrap approach [27]. The parametric bootstrap method makes assumptions about the underlying distribution. However, when dealing with data involving censoring or truncation, the parametric bootstrap approach requires specifications as to how the censored and truncated data were generated. Since the distribution assumptions on the truncation time and censoring time are not available for most cases, the sample is fully censored, and the sample size is sufficiently large, this study employed the nonparametric bootstrap method to obtain the parameter interval estimations.
The classical approach for constructing the nonparametric bootstrap confidence intervals of a parameter of interest is to use the "appropriate" quantiles of the empirical bootstrap distribution of that parameter estimator. In this study, four approaches of quantile selection are considered, namely, the simple percentile method (SP), the BCa percentile method, the BC percentile method, and the nonparametric basic bootstrap method (BM). Table 7 and Table 8 present the coverage probabilities of the four confidence interval methods based on the nonparametric bootstrap for parameters µ and σ with different percentages of truncation and different nominal confidence levels.
From the results presented in Table 7, it is observed that for the parameter µ, the coverage probability of BM is closest to the nominal confidence level when the true value of the parameter is large. The coverage probability of SP is closer to the nominal confidence level when the true value is small. When the truncation ratio is higher, the coverage probability of BM is closer to the nominal confidence interval, and when the truncation ratio is lower, SP is closer to the nominal confidence interval. The choice of confidence level has an impact on the coverage probability. As the sample size increases, the coverage probability of SP for different combinations is closer to the nominal confidence interval. It is worth mentioning that the difference among the four methods exists in the thousandth position for the parameter µ, which can be ignored for most of the cases. From the results presented in Table 8, it is observed that for the parameter σ , the coverage probability of the BCa bootstrap method is closest to the nominal confidence level. In addition, the coverage probabilities of the different methods present significant differences in the interval estimation of the parameter σ . In summary, the BCa bootstrap method is the preferred approach.
Based on the simulation study, the proposed HG algorithm can handle left-truncated and fully censored data. The parameter estimation accuracy of the algorithm is at the same level as that of the improved EM algorithm. The convergence speed of the algorithm is generally higher than that of the EM algorithm. The dependence of the algorithm on the initial value is more robust than that of the EM algorithm.

V. CASE STUDY
The lifetime data of coupler knuckles mentioned in the previous section is employed as a case study to illustrate the proposed method and its applications. The coupler knuckle is a component used to achieve the connection between a locomotive and a car or between cars in a rail wagon. This component transmits traction and impact forces and maintains a certain distance between the vehicles. The failure of a coupler knuckle threatens the safety of rail transport.
In this case, randomly selected maintenance data by a fixed observation interval are collected. Based on the collected data and the methodological framework above, this section provides the lifetime analysis model for the coupler knuckle data and the predictions for the RUL of individuals. The cumulative number of failures for the overall population of coupler knuckles in service is also analyzed as a function of time.
The dataset used in this study contains 6724 observations including 3872 failures. Table 9 gives a summary of four types of lifetime data from different manufacturers. Except for the MB manufacturer group, most of the lifetime data of coupler knuckles are truncated. The overall truncation accounts for 52.62%.
The Weibull distribution is the most widely used lifetime distribution for time-to-failure modeling. The lognormal distribution is suitable to describe the time to fracture fatigue crack growth in metals. In addition, Xue [28] proposed that the fatigue life of the fatigue zone of the hook body and that the coupler knuckle follows a normal distribution when considering the truncated and censored data. In this study, these three distributions are analyzed and discussed. It is straightforward to obtain the log-likelihood function of the normal distribution by substituting specific parameters, as shown in the following equation: Based on the logarithm of the lifetime data, the likelihood function in (21) can be used to represent the likelihood function of the lognormal distribution as well. The log-likelihood function of the Weibull distribution is shown in the following equation: The Akaike information criterion (AIC) is selected as the model selection indicator in this study. Table 10 shows the parameter estimations based on the assumptions of the lognormal, Weibull, and normal distributions by the HG algorithm. The lognormal distribution exhibits the largest AIC among the three distributions, indicating that it is a relatively bad choice. However, the Weibull distribution and the normal distribution yield quite close results in all respects.
Prediction performance is also important in model selection. The Weibull distribution and the normal distribution are both used to predict the cumulative total failure number. Hong et al. [3] proposed the use of calibrated prediction intervals to obtain the RUL of the individual and the cumulative numbers of future failures in a population, as a function of time. This study adopted the same procedures to predict the cumulative failure number of coupler knuckles.  The cumulative number of failures are predicted for every three months of the next two years. Fig. 7 shows the expected predictions and the 99% prediction bands for the cumulative number of future failures for three distributions. By partially amplifying the total number of failures after one year, it can be found that the number of failures of the normal distribution are significantly lower than that of the Weibull distribution. This study adopts a conservative strategy to predict the number of future failures. Therefore, the Weibull distribution is selected. Fig. 8 shows expected predictions and the 90% and 95% pointwise prediction bands based on the Weibull distribution for the cumulative number of future failures.
It is known that there are 2852 coupler knuckles still in service within the sample data. As seen from Fig. 8, in the next year, the cumulative number of failures is approximately 664 with a 95% confidence band of [623,707]. Fig. 9 shows the 90% prediction intervals of the RUL for a subset of individual coupler knuckles that are at risk. For example, the current life of Subject 11 is 583 days with a 90% confidence probability that it will expire between 762 and 3350 days. The current life of Subject 22 is 2002 days, and it will expire between 2050 and 3662 days with a confidence of 90%, which means it is at high risk of failure within a short period of time.

VI. CONCLUSION
In this paper, a unified framework for analyzing left-truncated and fully censored data was proposed. The hierarchical gradient algorithm was introduced and compared with the EM algorithm using the classical data set. To verify the validity of the HG algorithm for the left-truncated and fully censored data, the EM algorithm was improved, and the two algorithms were compared based on the simulation data. The HG algorithm presents fewer iterations and is more stable to the choice of the initial value. Within the framework of the HG algorithm, four quantile methods of nonparametric bootstrap were used to construct the confidence intervals and evaluate the performance in terms of the coverage probability. The coverage probability of the BCa percentile method is closest to the nominal confidence interval. Finally, the proposed algorithm framework was applied to an actual case study and predictions of the RULs of individual coupler knuckles and the cumulative number of failures were given.
It can be seen from the simulation results that the left truncation ratio has an influence on the parameter estimation. Furthermore, the magnitude of the true value of the parameters and the sample size also affect the parameter estimation. It is suggested to further study the effect of the truncation ratio on the parameter estimation. Another topic of interest may be extended from the outcomes of this study to develop a reasonable maintenance strategy. Predictions for the RULs of individual coupler knuckles and the cumulative number of failures in the future provide useful information for maintenance strategies and the inventory management of spare parts.. Economic factors and security considerations should be included in the optimization of maintenance strategies.