Hybrid Dynamic Probability-Based Modeling Technique for Rolling Stock Failure Analysis

The purpose of this study is to propose a novel hybrid dynamic probability-based failure analysis technique consisting of dynamic Bayesian discretization (DBD) and stochastic Petri nets (SPNs) for railway rolling stock (RS) failure analysis. Performing failure analysis and diagnoses for integrated RS subsystems is challenging and can lead to operational delays affecting fleet reliability and availability. This paper presents an integrated feature of updative adaptation using DBD methods to analyze prior continuous and discrete probability data—by means of evidence-based propagation to ascertain posterior faulty component states and simultaneously allowing for rapid failure notification, detection, and isolation of multiple RS subsystems using the reachability tree characteristics of SPNs. Unlike other dynamic probability methods, the DBD-SPN hybrid model presented here reduces computational time and enhances convergence accuracy using the Kullback–Leibler measure, sequential event analysis, and stable and low-entropy-error characteristics. In an extensive UK-based RS case study, it was observed that this approach is suitable for rapid failure notification, detection, and isolation of traction door interlock failure. It is also believed that the current study represents a useful contribution to the research and technology of hybrid DBD and SPNs for the failure analysis of a system consisting of multiple subsystems, since its application makes the difference between being able to evaluate realistically common cause and sequential failure analyses of complex systems.


I. INTRODUCTION
The growing complexity of modern engineering systems and the dynamic behaviors of their individual components makes it challenging to analyze the multiple interactions of faults with classical and steady-state probabilistic risk assessment (PRA) methods such as failure mode effects analysis (FMEA), fault tree analysis (FTA) [1]- [4] and event tree analysis. This problem is further compounded by the fact that the subsystems associated with individual complex systems are furnished with unique software that offers minimal information about its underlying source code, diagnostic inaccuracies, and internal functionalities [5]- [10].
In order to adequately overcome the limitations and challenges of engineering failure analysis, other data driven methods such as Bayesian networks (BNs), stochastic Petri nets (SPNs), Monte Carlo simulations (MCSs), and Markov The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Pietrantuono . chains have been proposed by earlier studies [11], [12]. However, most of such dynamic failure models are often plagued by shortcomings that impede their functionalities when applied as standalones, which has necessitated the incorporation of numerous sensors [13]- [16]. While recent sensor-enhancement initiatives have no doubt improved the population of detectable faults, the complexity of signal processing has correspondingly increased [16]. The fusion of complex signal processing with and already complicated system architecture immensely raises diagnosis downtime as well as the possibility of errors. Based on these premises, it would be beneficial to further investigate the use of errorbased probability features of complex systems to improve the speed of failure detection and notification [17].
Consequently, the current paper presents a hybrid dynamic probability-based model that applies DBD and stochastic SPN. The proposed harmonized method relies on the relative strengths of the extended DBD method with an adaptive updating feature that allows for evidence-based forward and backward propagation. Additionally, the presence of a reachability tree feature for sequential failure analysis of the SPN provides reliable notification of faults that subsequently prompt the initiation of appropriate maintenance actions.

II. LITERATURE REVIEW
Dynamic probability-based techniques are powerful mathematical models that are capable of explicitly handling interactions among subsystems, components, and process variables of complex systems. They represent a more realistic way of modeling the probabilistic features of complex systems based on their physical behavior and characteristics, by integrating different types of system information (both quantitative and qualitative), including labeled, ranked, discrete real, continuous interval, and Boolean logic data. In instances whereby multiple failures from different subsystems can potentially affect overall system performance with a number of different consequences, the system model requires a representation of multiple state variables beyond that provided by conventional failure analysis techniques [17]. Although there are many different forms and extensions of dynamic probabilistic methods for complex failure analysis, including Markov chain, MCS, BNs, and SPNs, their capabilities have been proven to be limited when applied standalone methods, as depicted by Table 1 [19-[35].
Although the individualized forms of the aforementioned dynamic probability models (shown in Table 1) have been proven to have various shortcomings, the integration of two or more of such techniques provides an alternative approach whereby strength(s) of one tool compensates for the weakness of another. While there have been advances in research and application of standalone DBD and SPN methods [6], [27], [31], [33], [36], [41], [39], [45], [ [62] into hybrid BNs using dynamic discretization as well as Fenton et al. [61] that used BNs to predict software defects and reliability. This therefore iterates the existence of knowledge gaps in the area of hybrid DBD-SPN methods for failure analysis of systems that are associated with multiple subsystems.
While the Markov-Chain-Monte-Carlo hybrid probability model attempts to simulate various continuous distribution functions from different sources, the rate of convergence is generally low with limited capability to handle prior information compared to the DBD-SPN hybridization model [22], [59]. In addition, hybrid SPN-MCS models suffer from fixedbased probability transitions, and the accuracy of the simulations can be affected by the quality of the pseudo-random numbers generated by the Monte Carlo simulation [32]. Nevertheless, DBD can handle almost any data type. SPNs can also handle both discrete and continuous data. Based on this premise, it is envisaged that DBD-SPN hybrid models could efficiently handle complex systems' faults with high accuracies and at reasonable speeds. Although other models such as SPN-Markov-Chain model can also simulate some degree of complexity with reasonably good accuracy, they are less useful for modeling complex problems consisting of various data forms (such as ordinal data) and are not acceptable for evidence updating [30]. Similarly, Bayesian-Markov model is intuitive at handling complex systems with redundancies and concurrent behavior, they are less suitable for problems in continuous time, owing to the memoryless characteristics of the Markovian model. This in turn impedes their ability to efficiently handle prior information as well VOLUME 8, 2020 as seeking stationary solutions [34], [60]. Static hybrid BN-SPN models suffer from having too many states that have low probability regions and too few states that have high probability regions in the results [22], [44], [45]. However, the DBD-SPN hybrid probability model has a feature that allows for evidence updating using prior information for both backward and forward propagations, which can provide an efficient notification process via the firing of minimum cut set transitions, including sequential failure analysis through the reachability tree characteristics. Hence, the primary contributions of this study can be summarized in two parts: (i) The proposal of an innovative hybrid dynamic DBD-SPN probability-based model that can predict and diagnose fault states using multiple data types and sources from multiple subsystems in complex engineering systems.
(ii) The proposed DBD-SPN model allows for real-time evidence updating while simultaneously providing automatic notification of the failure status through sequential failure analysis. These features of the model allow for accurate and rapid failure diagnoses using known evidence while reducing the number of false alarms in complex engineering failure analysis.
Therefore, this paper presents the development of an innovative and enhanced dynamic hybrid probability-based modeling approach using the DBD and SPN techniques to analyze and diagnose complex engineering systems failures. The aforementioned objectives of the paper are hereby accomplished by organizing the remainder of the paper as follows: The hybrid probability-based model of DBD and SPN is described in Section III, while its application on for diagnosing complex RS electrical multiple unit (EMU) interlock faults is presented as a case study in Section IV. The results and discussions are then presented in Section V, while Section VI offers the concluding remarks.

III. HYBRID DYNAMIC PROBABILITY-BASED FAILURE MODELING TECHNIQUE
As earlier highlighted, this paper primarily presents a hybrid dynamic probability based DBD and SPN modeling technique for analyzing and diagnosing failures caused by multiple subsystems. The basic concepts of BNs, as described in the literature [37]- [40], are directed acyclic graphs that represent a set of variables and their conditional dependencies. A BN model consists of a quantitative component, containing a direct acyclic graph, and a qualitative component, containing the prior and conditional probabilities of the BN nodes, underpinned by a probability theory called Bayes' Theorem. The basic concepts of an SPN, as described in the literature [32], [41]- [43], are a formal graphical and mathematical modeling technique appropriate for specifying and analyzing the behavior of complex, distributed, and concurrent systems. An SPN is a bipartite directed graph represented by the 6-tuple. The DBD component of the model establishes the fault conditions of multiple subsystems within a complex system using a posterior probability-based theory with prior probability information. The SPNs, on the other hand, enable a sequence of events or failures to be identified in order to assess the probability of occurrence of those events or failures using a reachability tree. The fault conditions are thus realized through the DBD model using an evidence-based adaptive updating feature that serves as an input token for the initial places in the SPN, which in turn provides fault detection, acknowledgment, and management.
The details of the dynamic probability-based modeling framework are discussed in this section under the following assumptions: (i) the component failure rates obey an exponential distribution; (ii) after repairs, the repaired component is considered to be as good as new; (iii) the overall maximum number of iterations required for the DBD model is 50; (iv) transition firings are considered to be exponentially distributed and immediate. Fig. 1 shows the dynamic probability-based model, and a flowchart of the proposed technique is shown in Fig. 2.
Step 1: The proposed dynamic probability-based modeling technique begins with the information extraction phase, in which the data are extracted from various subsystems and include the design requirements, historical records, reliability databases, system reliability expert opinions, FMEA sheets, and environmental data (temperature, pressure)-in the form of component failure rates. Because the collected data are generally in different forms-discrete, continuous, and alphanumeric or imprecise, vague, and limited in naturethe obtained data are fuzzified into the node data type using node probability tables (NPTs). More specifically, the failure rate data are assigned specific continuous distributions, and discrete parameters are classified as ranked data or Boolean logic. The data are then used to construct the initial BN structure of the subsystem considering all component inputs that lead to subsystem failure.
Step 2: As soon as the input variables, nodes, and NPTs are known, the BN is converted to an intermediate structure for dynamic discretization, called a junction tree (JT) [44]- [46], which provides higher speed and accuracy than other static discretization approaches [43]- [49]. An initial X , is then assigned to all continuous variables in the JT.
Step 3: The discretized conditional probability density for each node in the NPT is estimated using the given initial discretization for each node, (0) X , and evidence is propagated through the BN structure. For example, if X is a continuous numeric node in a BN, its range is denoted by hypercube x and its probability density function (PDF) is denoted by f x .
The discretization approximates f x by first partitioning x into a set of intervals x = w j , then defining a local constant functionf x for each of these partitioned intervals. This task involves finding an optimal discretization set x = {w i } and optimal values for the discretized probability density functionf x .f x dynamically searches x for the most accurate specification of the high-density regions (i.e., around the modes) given the model and evidence by iteratively calculating a sequence of discretization intervals in x .
Step 4: The BN is queried to obtain a posterior marginal density for each node, and the intervals are split according to the highest entropy error in each node. At each stage in the iterative process, a candidate discretization, x , is tested to determine whether the resulting discretized probability density function,f x , has converged to the true probability density function, f x , within an acceptable degree of precision. Thus, at convergence, f x can be approximated byf x . The relative entropy, or Kullback-Leibler (KL) measure, is used to determine the convergence rule for the discretized function, f x . The KL measure is defined as the distance between two density functions f and g, expressed as a metric D of the error introduced by approximating the true (but unknown) function f (x) using some approximate function g(x) [44]- [46], and is calculated by: In order to approximate the true function f x , a bound on the KL distance (E j ) based on an estimate of the relative entropy error between a function f and its discretizationf (determined using the meanf , maximum f max , and minimum f min , probability density values g, in the discretization interval w j ) can be estimated as: where w j denotes the length of the discretization interval w j and the probability density valuesf , f max , and f min are approximated using the midpoint of an interval and its points of intersection with neighboring intervals.
Step 5: Using known evidence, such as the symptoms of component failure, the lack of known parameter reading, and abnormal measurements, the conditional probability densities of the model are iteratively recalculated by propagating the existing BN with known evidence to obtain the marginal probabilities. The intervals with the highest entropy error are then split until the model converges to an acceptable level of accuracy as determined by two convergence stopping rules: the stable-entropy-error (SEE) stopping rule and the lowentropy-error (LEE) stopping rule, both of which apply at the node level [43]- [49]. Therefore, during calculations, some nodes will stop discretizing when a stopping rule is triggered, whereas others will continue. However, the maximum number of iterations the entire algorithm is allowed to run can be determined using the beta distribution function to estimate the number of iterations (i.e., samples) based on the confidence level and reliability target. The SEE stopping rule observes three consecutive iterations (k) to determine whether the entropy has converged to a stable value within some limiting region defined by (1 − α, 1 + α), and is expressed as: where m is the maximum number of iterations and S (l) X = W j E j is the approximate relative entropy error. The LEE stopping rule determines whether a particular node has breached an absolute entropy error threshold, and is given by: The stopping rule determines the maximum number of iterations for the model m, which cannot be exceeded; the algorithm will stop after m iterations, regardless of whether the nodes are sufficiently accurate. The maximum overall number of runs or iterations m for the model can be determined by beta distribution logic based on the desired confidence and reliability target levels, as follows [50]: Note that the confidence level and reliability target in (5) are established during initial testing to determine the optimal number of maximum iterations for which the model may run. The higher the confidence and reliability target levels, the higher the number of iterations required.
Step 6: The SPN model for failure notification and detection initiates when the output from the DBD model convergence rule completes the iteration process and delivers the output (the faulty component state). The faulty component state serves as an input token to initialize the SPN process through a sequence of failure and reachability tree analyses. The SPN process is modeled as a in which the parameters can be defined as T = {t 1 , t 2 , . . . , t n } is a set of transitions, each of which represents an event or action that can be fired with a firing rate λ i corresponding to transitions t i , i = 1, 2, . . . , n; M = {M 1 , M 2 , . . . , M n } is a set of markings (places) in a reachability tree formed by the outputs from the DBD; F i (τ ) is a probability distribution function of the time interval τ between the time at which transition t i , i = 1, 2, .., n will be able to fire and time at which transition t i is completed; Q M i M j (τ ) is a transitional probability function expressing the probability that marking M i changes to M j because transition t j fired in an amount of time less than or equal to τ ; f ij is the transitional probability that a process starting in marking M i will be in M j after m additional transitions in a given sequence; and E = M i M k , . . . ., M h M j i = k = h = j is a sequence of events in which M i M j indicates that marking M i changes to marking M j and |E| = n where n is the number of transitions [32], [41]- [43].
The reachability tree (Fig. 3) describes the dynamic behavior of the system determined by the outputs of the DBD model and shows all possible markings and firings at each marking. Thus, based on the markings (input places with tokens coming from the DBD model), the possible consequences of failure and possible sequences of failure, including the probability of occurrence of a given sequence of failures, can be computed.
Step 7: Establish the transition firing times. The transition firing times F i (τ )are considered to be exponentially distributed, therefore, the probability distribution function can be assumed as: where τ is the transition firing times and λ the firing failure rate. Three possible branching scenarios can be considered when evaluating a reachability tree [51]- [53]. Scenario 1 considers situations in which there are no branches. Therefore, the amount of time required to fire transition t l is less or equal to τ , the transitional probability that marking M i changes to M j in the general case can be written as follows: In Scenario 2, there is only one branch. In this case, there is uncertainty regarding which transition should be fired to compute the probability of firing a specific transition. Therefore, the probability that transition t l is fired given that an alternative transition t k is not fired up to the time τ is estimated. The probability that transition t k is not fired up to the time τ is given byF k = 1−F k . Therefore, the transitional probability is given by: Fig. 3 presents the alternatives to either fire transition t 1 or transition t 6 . The probability that marking M 1 will change to M 3 or M 2 is respectively given by: (10) . However, in Scenario 3, the number of branches is greater than one. In this case, the probability that transition t l has been fired given that the alternative transitions t l , t k , . . . , t n are not fired up to time τ is estimated. Thus, the transitional probability Q M i ,M l (τ ) that marking M i changes to M j is: Based on Scenarios 1, 2, and 3, the following transitional probabilities for all possible changes of markings are obtained as: Step 8: The next step is to compute the probability of occurrence of sequential events assuming a stochastic (or random) process. A stochastic process is a family of random variables . Based on this definition of a stochastic process, the probability f ij for all i, j ∈ E is given by: (13) Consider a process that is observed at discrete time points to be in any one of the possible markings M , i.e., M 1 , M 2 ,. . . ,M n . After observing the state of the process for a period of time T (i.e., assuming a finite time), a transition is fired based on output from the DBD model. Provided that the process is in marking M l at time n when transition t is chosen, the next marking of the system is determined according to the transition probabilities Q M l ,M j (t). Based on this description, if M n denotes the marking of the process at time n, then: Therefore, the transition probabilities are functions of only the present marking and the subsequent transition that can be fired to reflect the actual state of the system. Thus, given a sequence of events E, the probability f ij can be computed as: Equation (15) is derived by using the following first principle. Let X = {x 1 , x 2 , . . . , x n } be the state vector indicating the sequence of transitions as follows: Let the structural function φ(x) be: A sequence of transitions is fired if all transitions in the sequence are fired; thus, φ(X ) assumes the value 1 when . . = x n = 1, and 0 otherwise. Therefore: where: Step 9: The next step is to estimate whether the transition probabilities f ij have been fired. The memoryless property of the exponential distribution means that the transitions fired according to (18) are independent of each other, as follows: pi (20) where pi is the probability that transition t i is fired; thus, pi = Q M i ,M j . The probability that a process will make a state transition to state j (notification and detection), given that the DBD model has an input token beginning from state i, can then be estimated as: where h is the number of possible state transitions from state i to state j. Given the underlying stochastic nature of the process, it is not possible to compute the exact duration of the stay in state j(τ j ), i.e., the duration of time between when transition t i , i = 1, 2, . . . , ncan fire and the time at which transition t i is completed. Therefore, some approximation is required to compute the likelihood that the system will reach the failed state by acknowledging the failed status after a certain time. Accordingly, the procedure for finding the approximated duration of the stay in state j consists of two steps. The first step is to normalize all firing rates of the transitions in a given sequence by summing them together, then dividing each by the sum to obtain the result (w j ). In the second step, the duration of stay, τ j , for each transition is computed as τ j = w j × T , where T is the total time.
Step 10: Finally, an accept/reject failure notification is provided. If accepted, then the system stops; otherwise, the iterations can be initiated in step 5 to run the hybrid dynamic probability-based model again with new known evidence (in the form of fault symptoms). Maintenance and repair actions can be implemented upon acknowledgment of the failure notification.

IV. CASE STUDY OF ROLLING STOCK INTERLOCK FAILURE DIAGNOSIS USING THE PROPOSED MODEL
The traction door interlock circuit is a major safety feature of all passenger train RS operating on the United Kingdom rail network. It ensures that all external train doors (either powered or slam) are correctly closed and locked before the driver can gain tractive power. The traction door interlock system is an electrical circuit of microswitches fitted to each external door subsystem (EDS) and connected in series. When an EDS VOLUME 8, 2020 is closed and locked, the microswitches complete the traction interlock circuit locally; consequently, all of the switches for each EDS on the train must be correctly operated in order for the driver to gain tractive power. If this traction power circuit is broken while the train is not at a station in response to the opening of an EDS, a microswitch fault, or a passenger operating an emergency egress device, the train brakes will be automatically applied [55].
The failure of the traction door interlock could result in a serious accident, such as a person falling out of a moving train at high speeds. In fact, the UK Rail Accident Investigation Branch (RAIB) reported an accident in which a person was trapped in a train door and dragged at Jarrow station due to a fault related to the traction door interlock circuit [56]. This incident, along with other reported interlock incidents, led the UK Rail Safety and Standards Board (RSSB) to implement measures against train movement and provide for immediate withdrawal from service when a traction door interlock circuit fault occurs [57]. Maintenance and repair of a door traction interlock often requires the maintenance engineer to replicate and isolate the cause of the fault, which can be challenging for trains already in service. Furthermore, since trains with traction door interlock issues are typically withdrawn from service-as per RSSB guidelines [57]-interlock faults play a significant role in the public performance measure (PPM), reported by the UK Office of Rail and Road (ORR) to quantify the number of delays and fleet cancellations. According to the ORR, in the fourth quarter of the 2019-2020 fiscal year, train cancellations were 3.8% worse than in the same quarter of the previous year, and delay minutes (i.e., delays of three minutes or more) increased by 21%, resulting in an average PPM of 83.8% due to technical fleet failures [57].

A. BACKGROUND OF THE CASE STUDY
The case study evaluated in this research was conducted on EMU rolling stock from an intercity train operator in the UK to improve failure isolation and notification during fault-finding, and aid in-situ repair and maintenance efforts to reduce delay minutes (i.e., train delays three minutes or more) and cancellations, thereby improving the reliability and availability of the RS fleet.
These efforts are expected to reduce penalty charges and prevent the need to withdraw RS from service, thereby improving the PPM and operator reputation while reducing operational costs. To preserve the commercial confidentiality of sensitive information, the name of the operator and the collected data are not presented in this paper.
An example of the interface between the local door control equipment and the traction circuit is shown in Fig. 4. In this evaluation, only the critical components of the EDSthe electronic door control unit (EDCU), limit switch (i.e., microswitch), drive mechanism, and power supply-are considered. The EDCU provides overall control and monitoring of the closing and opening of each passenger door. The limit switch detects when the door is in its closed and locked positions. The power supply provides power to the drive mechanism that opens and closes the door.
The traction subsystem (TS), with which the door interlocks the interface, has many components; however, the three most critical fault-contributing components are the motor converter, brake resistor, and traction motors. The motor converter converts the direct current link voltage to a variable voltage, variable frequency supply for the traction motors. The brake resistor unit provides forced ventilation to cool the motor converter; the TS is shut down when its temperature sensor detects that the maximum allowed temperature has been reached. The traction motor consists of a three-phase, four squirrel-cage asynchronous motor (specially designed to reduce pulsating torque, losses, and the noise levels caused by the converter supply) that provides the tractive force to move the train. Fig. 5 shows a schematic of the three-car EMU evaluated in this study. The two driving motor cars of the EMU have two TSs per car. Each TS is connected by an OR gate to its three main components (motor converter, brake resistor, and traction motor). The failure of any of these components could, therefore, lead to a complete TS failure, and thus, to a train interlock failure. Each traction motor is redundant and is accordingly connected by an AND gate. There are four EDSs installed in each car, but the trailer car of the EMU has no TS, meaning that there must be a hardwired connection from all the EDSs directly to the train interlock line. Each EDS is connected to all four of its components (i.e., EDCU, limit switch, drive mechanism, and power supply) via an OR gate. Therefore, the failure of any of these components could lead to the complete failure of the EDS. Tractive power can thus only occur if the traction door interlock is achieved locally and between all three EMU cars at the same time.

B. CASE STUDY DATA AND INFORMATION
The analysis began with data and information extraction to determine the prior probabilities for the primary event nodes of the critical components of both the EDS and TS using a criticality analysis. The root nodes for the components were defined to establish the node data types as continuous  interval, discrete real, Boolean logic, ranked, and labeled. The time-to-failure, τ , for components with continuous intervals assuming an exponential failure distribution function λ at operation time t was evaluated as e −λt . Under a constant failure rate assumption, the NPTs for the components with continuous interval nodes, such as the EDCU, drive mechanism, traction motor, brake resistor, and motor converter were computed as shown in Table 2. The temperature sensor was represented as a Boolean logic data type residing in either the 'On' or 'Fail' state. The BNs containing the JT structures of each node of the EDS and TS components were then constructed. The next step was to recalculate the NPT approximations (i.e., marginal posterior probabilities) over the current discretized domains for each component node. For the discrete variables (Boolean algebra), this was accomplished by propagating the discrete BN of each node to compute the approximate marginal posterior probability density functions, f x , using (1)- (4).
Other synthetic nodes such as the EDS evaluation node (Eva), EDS car type failure, overall EDS failure, traction motor evaluation (TM_Eva), continuous traction evaluation (Cont_Eva), car type traction subsystems, reliability states, and overall TS failure were introduced as child nodes to allow for the mixture of continuous interval and Boolean logic nodes.

V. RESULTS AND DISCUSSION
The traction door interlock failure analysis was conducted using Bayesian network software for risk analysis and decision-making by AgenaRisk [58]. Table 2 summarizes the NPT and prior information used for each node. To evaluate the use of the proposed DBD-SPN modeling technique, we considered two test scenarios: (i) in Test Scenario 1, there was no interlock failure (normal system operation) after an operation time of t = 500 h, and (ii) in Test Scenario 2, the limit switch, power supply for the DMC1 EDS, and temperature sensors of DMC1 and DMC2 were considered to have failed after an operation time of t = 500 h. VOLUME 8, 2020

A. TEST SCENARIO 1 (NORMAL OPERATION, NO FAILURE REPORTED)
The hybrid probability-based dynamic model was analyzed to establish whether a failure occurred in either the EDS or TS components that could contribute to a train interlock failure. The high-level DBD node model configurations for the EDS and TS based on the three EMU cars are shown in Figs. 6 and 7.  In Test Scenario 1, neither the EDS nor TS reported any failures during an operation time of t = 500 h. Given the evidence (no failures), the time-to-failure for the binary child node (i.e., the evaluation of Eva) was established to be greater than the operational time t; thus, it was considered to be active. With Eva active, it is expected that the EDCU and drive mechanism were also functional. Furthermore, in light of this evidence, the limit switch and power supply were considered to be 'On'. All the EDSs in the three-car EMU had the same conditions. Similarly, considering the absence of symptomatic evidence, the binary child nodes of TM_Eva and Cont_Eva were considered active. Additionally, the temperature sensor and reliability states were considered to be 'On', meaning that the TS on DMC1 was considered to be functional. The model was run for 50 iterations, and the resulting probability risk graphs are shown in blue for the EDS and TS in Figs. 8 and 9, respectively. It can be seen that in the case of normal operation, represented by Test Scenario 1, after operating for 500 h, the EDS and TS are 'On' with overall posterior probabilities of 84.33% and 73.406%, respectively. Therefore, there was no failure from the DBD to send as a token to the SPN, and the SPN sequence failure analysis thus provided no notification. In Scenario 2, the EDS limit switch and power supply of car DMC1 were considered faulty during an operation time of t = 500 h, as were the temperature sensors for the TS on both DMC1 and DMC2. Given this new evidence, the limit switch was updated to the 'Fail' state, including the power supply node for car DMC1. The prior information for the primary nodes and binary nodes remained the same as in Test Scenario 1 in Section V.A, including the posterior PDF. Considering the new evidence, the TS temperature sensors in both DMC1 and DMC2 were updated to the 'Fail' state. The model was run for 50 iterations, and the resulting probability risk graphs are shown in green for the EDS and TS in Figs. 8 and 9, respectively. The resulting posterior failure density function for both the EDS and TS during the operation time of t = 500 h was evaluated as 100% 'Fail'. Therefore, the failure of either or both the EDS and TS resulted in a train interlock failure, which initiated the SPN process for failure notification.

C. FAILURE NOTIFICATION USING THE SPN FROM TEST SCENARIO 2
With both subsystems failing in Test Scenario 2, the failure notification sequence was initiated based on the resultant posterior PDF considered a token in the EDS and TS. An abstract of the SPN model was constructed (Fig. 10) with two tokens in places P 1 and P 2 after an operation time of t = 500 h with EDS and TS failures. As previously indicated, the failure of either or both the EDS and TS will always lead to a train interlock failure. The firing rates (failure rates) of the transitions for the EDS and TS were evaluated using historical failure data for λ EDS and door λ TS with the values of 0.00006 and 0.00003, respectively. The immediate failure transition λ 3 was considered to be 0.01.
To identify the events and compute the probability of occurrence of the sequence of transitions leading to a traction door interlock failure, the reachability tree of the SPN model was constructed (Fig. 11) to include four markings: M 1 , M 2 , M 3 , and M 4 . Given the failure scenarios for both the EDS and TS, the initial marking M 1 on the reachability tree could be modeled by firing all possible transitions enabled in all markings reachable from M 1 . The list of markings for the SPN shown in Fig. 11 is presented in Table 3, in which the markings that show a value of 1 in place P 3    cut sets) T 1 , T 2 , or both. In the case of a train interlock failure related to the EDS, transition T 1 fires, and after maintenance repair is conducted, the immediate transition T 3 will equally fire to reset to normal working conditions. Thus, the transition sequence of the train interlock failure due to EDS failure was determined by T 1 T 3 .
Similarly, the transition sequence of train interlock failure due to TS failure was established as T 2 T 3 . Now, suppose that based on Test Scenario 2, the temperature sensor of the TS failed during the operation time at t = 500 h, causing a train interlock failure. The transitional probabilities can be computed by identifying the sequence Similarly, the transitional probabilities for a failed EDS during the operation time is E = M 1 M 3 , M 3 M 4 . The next step is to compute the approximate time interval between the time at which the subject transition in the sequence can fire, and the time it was fired. The approximated duration of the stay in state j was computed by first normalizing all transition firing rates by summing all firing rates, then dividing each by this sum to obtain the weight w j , as shown in Table 4. Then, the approximate time in which the transition could fire was computed by τ J : τ J = w j × T , where T = 500 as shown in Table 4. Next, the transitional probabilities of train interlock failure occurring due to a failure of the EDS, TS, or both were  computed using (6)-(11), as follows: where the superscriptk in d k i indicates the index of transition events and the subscript i indicates the number of events in E, which was previously defined as E = M 1 M 3 , M 3 M 4 and M 2 M 3 , M 3 M 4 for an EDS and TS failure, respectively. This indicates that E has two events in either an EDS and TS failure; thus, i = 1, 2. Therefore, the sequence probabilities f n ij that began in marking i and developed into an interlock failure in marking j after n transitions in a given sequence are listed in Table 5.
The probability that the train interlock failure notification process could fail in 500 h was estimated using (20) as 2 n=1 f n = 0.36266. In light of the new evidence provided in Test Scenario 2, the probability of successfully detecting a train interlock failure caused by EDS and/or TS components during an operation time of 500 h was estimated to be 0.63734 ≈ 64%.
This result demonstrates that the hybrid DBD-SPN dynamic probability-based model provides an effective and rapid failure analysis and diagnosis model that can isolate and subsequently notify operators of the cause of a failure in a complex system such as that present in RS. The efficacy and precision of the model could be further improved by adding more components and performance nodes that contribute to subsystem failures in the Bayesian discretization model. However, a large number of nodes can adversely affect the processing speed and necessitate more iterations of the model. Therefore, a criticality analysis should be robustly implemented to identify the most critical components and functional parameters.
Additionally, the SPN notification accuracy could be improved by adding more firing transitions to minimize false alarms during the failure notification sequence. The more transitions added to the sequence, the lower the probability of sequence failure F ij and the higher the probability of notification process success R ij . However, the use of additional transitions leads to disadvantages such as increased cost and computation time to determine the transitional probabilities of the failure sequences.

VI. CONCLUSION
The novel DBD-SPN hybrid dynamic probability-based model for failure analysis and diagnosis for rolling stock systems was introduced in this study. It was shown that, based on the enhanced dynamic adaptive updating feature of DBD and the reachability tree characteristics derived from the SPN, complex engineering failures can be detected, isolated, and resolved in light of system evidence, such as symptoms, and that the sequences of the failures-including the minimum cut sets-can be identified.
Therefore, the proposed hybrid dynamic probability-based modeling technique provides a dynamic and comprehensive approach to isolating intermittent failure conditions such as common cause failure and sequence of failure events without physically simulating the failure scenarios of the physical complex engineering system via evidence-based forward and backward propagation. In this manner, the DBD-SPN technique establishes the true state of the subsystem conditions while minimizing false alarms through rigorous sequential notification and acknowledgment processes. The difference between the proposed hybrid dynamic probability-based and traditional fault-finding methods lies in its ability to perform real-time fault detection, notification as well as simultaneously allow for isolation without physically disassembling the system. Additionally, the unique characteristics required for identifying the sequential occurrence of failures via notification and adaptive evidence updating makes the proposed DBD-SPN modeling technique robust and versatile for multiple failure diagnosis and detection in most industrial applications that are characterized by discrete and continuous data.
Just as is the case with all engineering failure analysis approaches, the accuracy of the proposed DBD-SPN approach reasonably relies on the quality and quantity of data and information fed into it, which may sometimes pose a limitation for novel systems that do not have historical data. However, as such information become available over time, the accuracy of earlier analysis can be continuously refined. While the results obtained from the case study provided some convincing findings as to the efficacy of the DBD-SPN model, considerable effort remains necessary to obtain quality component, subsystem, environmental, human and functional data describing a complex system such as the RS system to which the proposed hybrid dynamic probability-based modeling technique was applied. Owing to the historical magnitude of human contribution to faults and errors within most safety-critical engineering operations, future studies are planned towards incorporating human reliability analysis concepts into the current model under specific human performance factor conditions that are related to repair and maintenance operations.
[64] M. Taleb-Berrouane, F. Khan, and P. Amyotte, ''Bayesian stochastic Petri nets (BSPN)-A new modelling tool for dynamic safety and reliability analysis,'' Rel. Eng AKILU YUNUSA-KALTUNGO is currently a Senior Lecturer in reliability and maintenance engineering with the Department of Mechanical, Aerospace and Civil Engineering (MACE), The University of Manchester (UoM). He has research expertise in operational reliability, safety management, industrial maintenance, and asset management. He has recently served as the Principal Investigator for industrial safety research projects funded by Lloyds Register Foundation (LRF) as well as the Engineering and Physical Sciences Research Council (EPSRC). His LRF funded Discovering Safety Research Project involved multidisciplinary collaborations between various academic institutions in Nigeria, such as the University of Lagos and the Yaba College of Technology, a Cement Manufacturing Organizations, such as LafargeHolcim, Nigeria, and Breedon Cement, U.K., a Nigerian Governmental Agencies, such as the Federal Ministry of Health, the Federal Ministry of Environment, the Council for Regulation of Engineering, the Institute of Safety Professionals, and so on, and a Non-Governmental Organizations (NGOs) with the aim of understanding how health indicators can be better incorporated into existing occupational health and safety performance indicators (OSH). His EPSRC-funded research project focuses on the development of safer drones-based inspection and stock level estimation mechanisms for confined spaces, to minimize human exposure to inherent hazards. He also has extensive industrial experience with the world's largest manufacturer of building materials-LafargeHolcim PLC-in diverse roles, including a Health & Safety (H&S) Manager, a Maintenance Manager, a Training & Learning (T&L) Manager, a Reliability Engineer, a Mechanical Execution Engineer, a Plant Operations Champion, and a Project Core Team Lead for a multi-million GBP coal plant project. On this project, he will primarily contribute his expertise in H&S and T&L towards the successful realization of the project milestones. He has published over 45 technical articles (peer-reviewed top quartile journals and conference papers) with internationally reputable publishers. He has also published a book on condition monitoring of industrial assets.
He is currently a member of several industrial and academic committees and working groups, including the Institution of Mechanical Engineers (IMechE) Safety and Reliability Working Group (SRWG) and the British Standards Institute (BSI). He has delivered several national and international keynotes. He has reviewed several internationally reputable journal articles and engineering standards, such as the International Standards Organization and the International Electro-technical Commission). He moved to academia, in 2005. He has 16 years of industrial experience. He joined The University of Manchester, U.K., in January 2007. Since 2007, he has widened his research activities. He is currently a Programme Director of the Reliability Engineering and Asset Management (REAM). He is also the Head of the Dynamics Laboratory. He is also the Head of the Structures, Health and Maintenance (SHM) Group, School of Engineering, The University of Manchester, Manchester, U.K. He was researching vibration and structural dynamics, including health monitoring techniques. He was also extensively involved in the development of a number of industrial innovative vibration-based techniques related to rotating machines, equipment, and structural components. He is an internationally renowned expert in vibration-based condition monitoring and maintenance of machines and structures. He is involved in and has solved a number of industrial vibration problems of machines and structures using in-situ vibration measurements and analysis in many plants over the last 31 years. He is the author of more than 250 publications (books, journals, conferences, books, edited book/conference proceedings, and technical reports). He has given a number of keynote lectures around the world.
Dr. Sinha received the prestigious Better Opportunity for Young Scientist in a Chosen Area of Science and Technology (BOYSCAST) Fellowship Award from the Department of Science and Technology, Government of India, in 1999, for outstanding work in solving a number of complex vibration-related problems to enhance plant reliability and reduce maintenance overhead of the nuclear power plants.