First-Order Uncertain Hidden Semi-Markov Process for Failure Prognostics With Scarce Data

Failure prognostics aims at predicting the object equipment’s future degradation trend and derives the remaining useful life with a predefined failure threshold. Hidden semi-Markov process (HSMP) is widely adopted for failure prognostics of degradation process with discrete states. The effective estimation of the holding time distribution on each degradation state is of critical importance for the prediction performance of a HSMP model. The distributions are generally estimated with frequencies-based probabilities given a large amount of degradation data. In practical engineering applications, it is difficult to collect enough data and the data can even be scarce. In such situations, the estimated distributions are no longer reliable. First-order uncertain hidden semi-Markov process (1-UHSMP) based on uncertain statistics is defined in this work. The holding time distributions in 1-UHSMP are described with uncertainty theory and are adaptively updated with conditional uncertainty distributions given observations related to the true degradation states. Analytical expressions are derived for the expected remaining useful life for 1-UHSMP with regular uncertainty distributions, i.e. normal and linear uncertainty distribution. The proposed method can build a degradation model from scarce data and derive adaptively the remaining useful life with associated uncertainty interval. A case study concerning centrifugal pumps in a nuclear power plant is considered to verify the effectiveness of 1-UHSMP.


I. INTRODUCTION
Failure prognostics aims at predicting the future degradation evolvement of an object equipment and, thus, its remaining useful life given its intended functions with the desired specifications [1]. Remaining useful life (RUL) is the length of time from the present time to the expected time at which the object equipment will no longer perform its intended function. Effective failure prognostics may increase the maintenance efficiency and equipment availability, thus, reducing the life cycle cost [2].
Many efforts have been devoted to the methodological research. The developed prognostic approaches can be mainly categorized as physics-of-failure (PoF) based and data-driven methods [3]. PoF-based methods model the degradation process by analyzing the inherent physical, chemical and The associate editor coordinating the review of this manuscript and approving it for publication was Yu Liu . structural interactions of the object equipment. Data-driven methods extract the statistical relations between the monitoring data and the RUL from the collected data. Since equipment are becoming more and more integrated and complex, it is quite difficult and time-demanding for building a precise PoF-based model. With the success of computer science, data-driven prognostic methods have become a major trend in failure prognostics-related research. The commonly adopted data-driven methods include auto-regressive model [4], Bayesian methods [5], Markov process [6], artificial neural networks [7], support vector machines [8], deep learning [9], etc.
Although data-driven methods have achieved satisfactory results in many laboratory applications, much efforts are still needed to bridge the gaps between the theories and industrial applications in failure prognostics. Unlike laboratory applications, a major gap lies between the demand of large amount of degradation data for training an effective data-driven model VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. The structure of a first-order hidden semi-Markov process. and the limited informative degradation data collected from equipment under operation [10]. The reasons for data scarcity are diverse, including limited number of deployed equipment as satellites, conservative maintenance strategies as in nuclear power plants, low signal-to-noise ratio in collected data from equipment as airplanes.
For a degradation process with discrete states, the hidden semi-Markov process (HSMP) is among the most adopted ones for failure prognostics [11]. Figure 1 shows the structure of a first-order HSMP for a degradation process with N + 1 discrete states. In the figure, S N represents the healthy state and S 0 is the failure state. Along with each degradation state S i , a holding time distribution i is assigned to describe the possible holding times on the corresponding state, with i = 1, . . . , N . The lifetime of a new equipment following this degradation process is the sum of the holding time on all possible states S i with i = 1, . . . , N . The time series data z t are the observations related to the degradation states of the object equipment. True distributions of holding times on different states are hardly known and are usually approximated with empirical ones based on the collected observations.
The failure prognostics with first-order HSMP can be divided into two parts, i.e. degradation process modelling and RUL prediction. For degradation process modelling, these holding time distributions i are estimated with the degradation data collected from similar equipment. The derived first-order HSMP model describes an average degradation process of this type of equipment. In the second step of RUL prediction for an object equipment, the observations z t related to its true degradation states are used to update the estimated holding time distributionsˆ i , to approximate the true RUL of the object equipment. Results on failure prognostics with first-order HSMP have been reported on hydraulic pumps [12], bearings [6], high-speed milling [13], fuel cells [14], railway turnout systems [15], reciprocating compressors [16], turbofan engines [17], etc.
In these previous works, the holding time distribution on a specific state is estimated with frequencies-based probabilities and updated with Bayes theorem, given a large amount of degradation data. However, it is not always possible to collect enough data, and the data can even be scarce in many industrial applications, such as aerospace, nuclear power plants, etc. For scarce data, frequencies-based probabilities may no longer approximate the true holding time distributions. Considering the scarce data, first-order uncertain hidden semi-Markov process (1-UHSMP) is proposed in this work. A first-order UHSMP is the conventional first-order HSMP with the holding time distributions described by uncertainty distributions in uncertainty theory.
Uncertainty theory capable of tackling statistics with scarce data is firstly proposed in [18]. For distribution estimation, instead of using the statistical frequencies from large amount of data, belief degrees are assigned to possible events in uncertainty theory, describing the chance that these events may occur. The calculation of belief degrees does not rely on a sufficiently large amount of data. Thus, uncertainty theory is more suitable for situations with limited/scarce data for statistical analysis. It has been introduced to various domains, such as reliability analysis [19], [20], risk analysis [21], supply chain [22], accelerated degradation testing [23], data development analysis [24], etc.
Facing the scarcity of degradation data, uncertainty theory is integrated with first-order HSMP for failure prognostics in this work. Uncertain graph is proposed in [25] and its characteristics are analyzed in recently published works [26]- [28]. To the knowledge of the author, this is the first time that 1-UHSMP is proposed and adopted for failure prognostics. An update strategy for 1-UHSMP is also given for adaptively adjust the holding time distributions with respect to the observations and, thus, the RUL. First-order UHSMP for failure prognostics takes the same structure as in Figure 1, while the holding time distributions are estimated and updated within the scope of uncertainty theory. An application in nuclear power plants is considered to verify the effectiveness of the proposed model.
The contributions of this work include: 1) the definition of first-order UHSMP; 2) the derivations of predicted RUL and the associated prediction interval with respect to 1-UHSMP; 3) the application of 1-UHSMP in nuclear power plants.
The remaining of the paper is structured as follows. Following a brief review of uncertainty theory, the UHSMP, especially the 1-UHSMP is described in details and the expected value of the predicted RUL is derived for some regular uncertainty distributions. Results on the case study concerning centrifugal pumps in a nuclear power plant are reported, with comparison with conventional first-order HSMP based on probability theory. Some conclusions are drawn in Section 4.

A. PRELIMINARIES ON UNCERTAINTY THEORY
In first-order UHSMP, the holding time distribution is estimated with uncertain statistics and updated with the conditional uncertain distribution. To explain the proposed method, a brief introduction to the uncertainty theory is given in this section.
Let ( , L) be a measurable space and M be an uncertain measure on the σ -algebra L, the following four axioms form the corner stones of the uncertainty theory [18]: where k are arbitrarily chosen events from L k for k = 1, 2, . . ., respectively. Definition 1: The uncertainty distribution i of an uncertain variable ξ i is defined by for any real number x and i = 1, . . . , N .
In the proposed UHSMP, the uncertain variable ξ i is the holding time on the i-th state. i (x) represents the uncertainty that the holding time ξ i is smaller than x.
Definition 2: An uncertain variable ξ is called linear if it has a linear uncertainty distribution ) where a and b are real numbers with a < b. Definition 3: An uncertain variable ξ is called normal if it has a normal uncertainty distribution where e and σ are real numbers with σ > 0. A normal uncertainty distribution is called standard if e = 0 and σ = 1. Definition 4: An uncertainty distribution (x) is said to be regular if it is a continuous and strictly increasing function with respect to t at which 0 < (x) < 1, and Definition 5: Let ξ be an uncertain variable with regular uncertainty distribution (x). Then the inverse function −1 (α) is called the inverse uncertainty distribution of x. Definition 6: Let ξ be an uncertain variable. Then the expected value of ξ is defined by (1) provided that at least one of the two integrals is finite.
Definition 7: The conditional uncertainty distribution of an uncertain variable ξ given A is defined by Theorem 1: Let ξ 1 , ξ 2 , . . . , ξ N be independent uncertain variables with regular uncertainty distribution 1 , 2 , . . . , N , respectively. If f is a continuous and strictly increasing function, then has an inverse uncertainty distribution The sum Theorem 2: Let ξ 1 , ξ 2 , . . . , ξ N be independent uncertain variables with regular uncertainty distribution 1 , 2 , . . . , N , respectively. If f is a continuous and strictly increasing function with respect to x 1 , x 2 , . . . , x m and strictly decreasing with respect to x m+1 , x m+2 , . . . , x n ,then has an uncertainty function Theorem 3: Let ξ be an uncertain variable with regular uncertainty distribution (x). Then Theorem 4: Let ξ be an uncertain variable with regular uncertainty distribution (x), and let t be a real number with (x) < 1. Then the conditional uncertainty distribution of ξ given ξ > t is Theorem 5: Let f and g be comonotonic functions. Then, for any uncertain variable ξ , we have

B. FAILURE PROGNOSTICS WITH FIRST-ORDER UNCERTAIN HIDDEN SEMI-MARKOV PROCESS MODEL
As illustrated in Introduction, the failure prognostic with 1-UHSMP can be divided into two steps: degradation process modelling and RUL prediction. With collected holding times on different degradation states from failed equipment, the empirical distributions of i are, firstly, derived, i = 1, . . . , N . Then, during the RUL prediction of an object equipment, the time series data of observations related to the true degradation states are acquired. The holding time distributions of the object equipment are updated with the observations. This process is a dynamic update process as more observations are available, as shown in Figure 2.

1) DEGRADATION PROCESS MODELLING WITH FIRST-ORDER UHSMP
The first step is to estimate the uncertainty distribution of holding times on different degradation states with scarce data. In this work, a parametric method is adopted. The uncertain variable of holding time on different degradation states is supposed to follow a specified regular uncertainty distribution with unknown parameters. Linear uncertainty distribution, zigzag uncertainty distribution and normal uncertainty distribution are commonly adopted regular uncertainty distributions. The aim is to estimate the unknown parameters with the collected degradation data. Suppose the sorted available holding time values of the uncertain variable ξ i on degradation state S i are (x i1 , x i2 , . . . , x in ), with n being the number of elements in the vector and i = 1, . . . , N , the empirical distribution function method is adopted for estimating the corresponding uncertainty distribution. The belief degree α ij of each element x ij in (x i1 , x i2 , . . . , x in ) can be calculated with approximate median rank method. Equal intervals are assigned to each element in (x i1 , x i2 , . . . , x in ), i.e. α ij = (j − 0.3)/(n + 0.4). If n = 1, then α i1 = 0.5, which is in accordance with the maximum uncertainty principle. The unknown parameters θ i in the uncertainty distribution i of ξ i are estimated by minimizing the following objective function Q.
By the end of the first step, the estimated uncertainty distributionsˆ i of all uncertain variables ξ i are obtained, with i = 1, . . . , N .

2) RUL PREDICTION WITH FIRST-ORDER UHSMP
Considering a first-order UHSMP, the second step is to adaptively update the uncertainty distribution of the predicted RUL and to derive the RUL prediction results including the point estimation and its associated prediction interval.
With the observations z 1:t 0 until the current time instance t 0 , suppose the object equipment is on the k-th degradation state and it has been on this state for a time T k t 0 which can be derived from z 1:t 0 , the uncertain variable ξ of the RUL is expressed as the following equation, with ξ i being the holding time on the i-th degradation state, i = 1, . . . , k.
With Theorem 2, the uncertainty distribution of the uncertain variable ξ , is estimated as Equation (9).
Note that, in this equation, the uncertainty distributionsˆ i of uncertain variable ξ i , i = 1, . . . , k − 1 can be adopted directly from Section 2.2.1, while the uncertainty distribution * k of ξ k is no longerˆ k in Section 2.2.1. Andˆ * k is the conditional uncertainty distribution given T k t 0 , i.e.ˆ * k (x) = k (x|(T k t 0 , +∞)). Since the uncertainty distributionˆ k obtained from the first step follows a regular uncertainty distribution, with respect to Theorem 4 the conditional uncertainty distribution is expressed as follows. * It is obvious thatˆ * k (x) is not always a regular uncertainty distribution with the update process in Equation (7), as it may not be a continuous and strictly increasing function with respect to x at which 0 <ˆ * k (x) < 1. However, Theorem 2 requires that all uncertainty distributions should be regular. Thus, Equation (7) should be slightly perturbated such that it becomes regular. The small perturbations are added with respect to two separate cases. (7) can be rewritten as the following equation, with 1 and 2 being small positive constants (8), as shown at the bottom of this page. (7) can be rewritten as the following equation, with 2 being a small positive constant (9), as shown at the bottom of this page..
The prediction results for the current time include the predicted RUL, i.e. the expected value E [ξ ], and the associated prediction interval under a given confidence level α 0 .
With respect to Definition 5, Theorems 3 and 5, the predicted RUL valueŷ is given in the following equation.
The inverse uncertainty distributionˆ * −1 k (α) is given in the following two equations for Case 1 withˆ k T k t 0 ≥ 1/3 and Case 2 withˆ k T k t 0 < 1/3, respectively (11)(12), as shown at the bottom of the next page. Now, the expected values of the conditional uncertainty distribution for two popular regular uncertainty distributions, i.e. normal and linear uncertainty distribution, are given in the following two theorems.
Theorem 7: Let ξ be an uncertain variable with linear uncertainty distribution (x) = L(a, b), and let t be a real number with (t) < 1. Then the conditional uncertainty distribution of ξ given ξ > t, i.e. (x | (t, +∞)) has an expected value expressed as below, Following the same steps as the proof of Theorem 6, it is easy to prove Theorem 7 and the details are not provided in this paper.
The theorem is proved. Supposeˆ i follows a normal uncertainty distribution N (e i , σ i ), i = 1, . . . , k, with Theorem 6 the predicted RULŷ can be further expressed aŝ and E (ξ k ) bottom of the next page with If i (x) follows a linear uncertainty distribution L(a i , b i ), i = 1, . . . , k, with Theorem 7, the predicted RULŷ is given in the following equation.
For estimating the prediction interval, we try to find the minimal b that satisfies the following equation.
Since Mŷ − b <ŷ ≤ŷ + b ≥ˆ ŷ + b −ˆ ŷ + b , the prediction interval with a confidence level α 0 is suggested as [ŷ − b,ŷ + b] meaning that we have a chance of α 0 to cover y with our confidence interval. And, thus, the estimated value of b is expressed aŝ With Equation (2), the inverse uncertainty distribution −1 of the RUL in Equation (18) with respect to a belief degree α is expressed as the sum of the inverse uncertainty distributions of ξ * k and ξ i , as shown in the following equation.

III. CASE STUDY A. DESCRIPTION OF THE PROBLEM
In nuclear power plants, increasing the inherent safety features is of critical importance and unexpected accidents are not acceptable. The passive residual heat removal system (PRHRS) removes the core decay heat by natural circulation in the case of emergency conditions and long-term cooling for repairing or refueling [29]. The layout of a passive residual heat removal system is shown in Figure 3. The circulation of the cooling liquid is controlled by the pneumatic values driven by centrifugal pumps. In case of emergency, Layout of a passive residual heat removal system [29].
the normal function of the centrifugal pumps is highly important for the safety of a nuclear power plant. Effective failure prognostics may provide valuable information for proper maintenance actions. However, limited degradation data are collected concerning the degradation process of centrifugal pumps. The main reasons include: i) PRHRS as a standby system does not work for most of the time and limited degradation may occur; ii) the undertaken conservative maintenance strategy barely tolerates the failure of a centrifugal pump; iii) for the safety of personnel and the plant, the restricted access to PRHRS makes it hard to acquire the health condition data. The degradation process of a centrifugal pump includes four different states, i.e. S 3 , S 2 , S 1 , S 0 , with S 3 being the healthy state and S 0 being the failure state, as shown in Figure 4 [30]. Uncertain variable ξ i represent the holding time on state S i , and it follows a normal uncertainty distribution i with unknown parameters {e i , σ i }. The holding times on different states are independent. Assume that the maintenance is carried out with equal time interval. The observations z t are collected during each scheduled maintenance and may indicate the current state of the pump and, thus, the cumulated   time T i t that the pump has been on the current state S i can be derived. In the case study the observations z t and cumulated time T i t are supposed to be precisely known. The degradation data (holding times on different states) are only collected from five failed pumps, as shown in Table 1. Randomly, the degradation data on the first three pumps form the training data and the last two pumps are the test ones. The test pumps are supposed to be under operation, and their degradation states can only be observed during the scheduled maintenance. The objective is to adaptively predict the RUL of the test pumps with the observations z 1:t 0 until the current time t 0 .

B. EXPERIMENTAL RESULTS
For failure prognostics, the first step is to model the degradation process with the holding times on different states of training pumps. Following Section 2.2.1, the degradation model can be obtained by estimating the unknown parameters in the corresponding uncertainty distributions. Given that the uncertain variable representing the holding time on a degradation state follows a normal uncertainty distribution, the unknown parameters are estimated with respect to least squares principles as in Equation (4). The estimated values are shown in Table 2.
The obtained degradation model gives a general trend of the degradation process. For failure prognostics of a specific pump, the uncertainty distributions for holding times  should be updated with respect to the observations and, thus, the prediction results. The uncertainty distribution and the prediction results are, separately, updated with Equation (3) and Equations (9) and (15). The prediction results for pumps number 4 and 5 are shown, separately, in Figures 5 and 6.
In these two figures, for a specific time instance t 0 on the horizontal axis, the observations z 1:t 0 (dot in the figures) until t 0 are known, indicating the degradation states for the past time. The holding time on the current state until t 0 , i.e. T k t 0 is obtained from the observations. For example, in Figure 5, at time instance t 0 = 30, the pump 4 is on the state S 2 and T k=2 t 0 =30 = 9. Thus, the holding time of pump 4 on S 2 is at least T k=2 t 0 =30 = 9. The uncertainty distribution of the holding time on S 2 is updated with the previous conditional uncertainty distributionˆ 2 (x | (9, +∞)). The predicted RUL (i.e. 11.02) and its associated prediction interval with 90% confidence level (i.e. [3.93 18.11]) for pump 4 at t 0 = 30 are derived and they are marked as five-pointed star and vertical segment line in the figures, respectively. The diamond shows the true RUL for this instance (i.e. 11).
These figures show that satisfactory results have been obtained for both test pumps. It is observed that relatively better results are achieved for pump 4, in comparison with pump 5. This is because the holding times of pump 4 on different degradation states are close to the expected value of the estimated uncertainty distribution given in Table 2. Figures 6 shows clearly the effectiveness of the adaptive prediction update process with conditional uncertainty distribution in 1-UHSMP. Since the true holding times of pump 5 on S 2 and S 1 are all higher than those in the training pumps, the predicted RUL becomes lower than the true RUL from time 18. However, the proposed method seems realize this problem with the new observations and successfully approximates again the true RUL at the end.

C. COMPARISON WITH FIRST-ORDER HSMP
Traditional first-order HSMP with frequency-based probability estimation is considered as benchmark method. The holding time distribution is also updated with the observations with Bayes theorem. Except 1-UHSMP with normal uncertainty distribution, 1-UHSMP with linear uncertainty distribution is also considered in the comparison. They are noted as 1-UHSMP normal and 1-UHSMP linear , respectively. First-order UHSMP and HSMP are compared with respect to the mean relative squared error (MRSE) and the coverage percentage of the prediction interval on the true RUL. The prediction results of HSMP for test pumps are shown in Figures 7 and 8, respectively, and the results of these two methods are compared in Table 3. From Table 3, one may observe that, with respect to MRSE, these three methods give all acceptable results on both pumps, with HSMP slightly better on pump 5 and 1-UHSMP linear better on pump 4. The proposed method gives higher coverage on the true RUL for pump 5. And with respect to coverage, UHSMP with normal uncertainty distribution achieves better results than that with linear uncertainty distribution, since linear uncertainty distribution is bounded with upper and lower bound.
In this case study, it can be concluded that for fixed training pumps, 1) first-order HSMP and UHSMP both perform well on the test pump (i.e. pump 4) with holding times within the range of those in training pumps; 2) UHSMP performs better than HSMP for test pump (pump 5) with holding  times out of the range. The latter one is caused by the fact that HSMP based on probability theory may overfit the limited training data and be misled by the limited data in the distribution estimation. 3) For scarce data, 1-UHSMP with normal uncertainty distribution is preferred than that with linear uncertainty distribution.
Thus, 1-UHSMP is recommended for degradation process modelling when the training data is not enough to approximate the true holding time distributions with probability theory.

IV. CONCLUSIONS
For tackling failure prognostics with scarce data, the firstorder uncertain hidden semi-Markov process denoted as 1-UHSMP is proposed for degradation processes with discrete states. The holding time distributions are estimated with uncertain statistics and the proposed model can adaptively update the prediction results with the conditional uncertainty distribution considering observations related to the true degradation state of the object equipment. Analytical expressions are derived for the predicted RUL with two popular regular uncertainty distributions. A case study on the centrifugal pump in nuclear power plants is carried out. Experimental results show the effectiveness of the proposed method. Comparison with the conventional HSMP proves the benefits of 1-UHSMP in failure prognostics with limited degradation data.
The proposed model can be further improved in the future on the following to directions. 1) The imprecise observations can be integrated in the proposed model, since the precise observations assumption is not always guaranteed. 2) The proposed semi-Markov process can be extended to cases with high orders, since some degradation process exhibits more complex and diverse degradation process.