Fault-Recovery and Repair Modeling of Discrete Event Systems Using Petri Nets

Despite advances in automated manufacturing systems (AMSs), faults occur from time to time, which cannot be avoided in a complex real system. A fault is one of the primary causes of failures making some AMS operations unable to complete, and the diagnosis is one of the most important steps in fault-recovery and repair. This work develops a methodology for investigating the behavior of faults on the resources in discrete event systems that are failure-prone. We tackle the fault-tolerant problem and propose a method to make the system able to continue performing its duties, while the failed resources are under a repair and recovery process. In this work, a failure-safe model is proposed and, at the same time, a method for fault recovery and repair of a faulty element is presented without interrupting task processing due to the occurrence of fault to some elements. We use redundant elements to replace the target elements, and these redundant elements are used to do the same work as the target elements do, when faults occur to the target elements. A target element is an unreliable element that is prone to failure. After a faulty target element is repaired and recovered, its failure model is automatically replaced by its repaired model to indicate that the corresponding element has returned to work. The proposed method is tested using an application example. The results show, compared with those obtained by the studies in the literature, that the proposed method has a great performance and outperforms the existing studies.


I. INTRODUCTION
Many contemporary applications in communication networks, transportation, synthesis, and modeling of industrial processes are characterized as dynamic evolution of discrete events. Systems that evolve in a discrete event manner are known as discrete event systems. Typically, systems modeled by a finite state automaton are such an example.
There are several methods for controller design of discrete event systems depending on the models used. However, most of these methods often involve comprehensive system behavior simulations or searches, making them impractical due to the large number of states and events of such systems.
Fault-tolerance, fault-recovery, and fault-repair in a system are important issues that should be taken into account in many industrial processes [19], [28], [19], [49]- [51], [55]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ When a fault happens, the goal of a fault-tolerant controller is to diagnose and then repair it such that the system is recovered to its normal state. Thanks to the model simplicity with an intuitive graphical representation, Petri nets have the advantage over other modeling and simulation techniques, which enables a simple visualization of fault-recovery and repair systems [16], [51], [55], [57]. Petri nets can represent fault-recovery and repair systems in a top-down form at different levels of abstraction and detail, and they have a well-developed mathematical foundation that enables both quantitative and qualitative analysis of such systems [56]. Supervisory controllers are usually reconfigured or redesigned to make a system continue its operations after the detection of a fault. In [3], the authors introduce a framework for fault-tolerant monitoring of discrete event systems and present two specifications for non-faulty operations of the overall plant. In [4], a method of designing the controller as an independent supervisory system instead of embedding a controller into the main supervisory system is proposed. Thus, if there is a fault, the controller can be switched to a new controller designed for this particular case. Ioradache and Antsaklis [13] present a method to rearrange the model. Knowledge-based models and neural networks are used by Miyagi and Riascos [14] to deal with each fault.
It is of great importance to maintain the integrity of a system in controller design, fault recovery and fault repair. It requires the knowledge of a system and its elements to keep the system safe after a fault occurs [10]. A fault can occur due to various causes such as internal system incidents, changes in environmental conditions, incorrect operation control behavior, and system design errors [5]. Nowadays, fault tolerance is considered in many manufacturing processes as faults are almost inevitable in automated systems [1]. The types of faults that may occur in systems vary in terms of their continuity: either persistent or intermittent [11]. A fault-tolerant controller allows a system to continue its operations in its degrading specifications with intermittent faults, while it is difficult to do so with persistent faults (faults that prevent the system from performing a task).
In this article, we tackle the fault-tolerant problem in another way and propose a method to make the system able to continue performing its duties, while acts are taken to repair and recover the fault. A method is introduced by Nazemzadeh et al. [9] to deal with some of the most important issues in designing a fault-tolerant controller to preserve the safety of a system. Petri nets are used to model such processes with places representing the status of target elements that are failure-prone. To make a system fault-tolerant, there should be a redundant element for each target element. When a fault occurs to a target element, the sub-model of its redundant element is used to replace the model of the faulty target element.
The work in Zhou and DiCesare [56] present four adaption methods for error recovery: input conditioning, alternate path, backward error recovery, and forward error recovery. These methods are used to augment the Petri net controller to ensure some essential properties of the controller, i.e., liveness, boundedness, and reversibility. The fault detection is performed using watchdog timers that indicate a fault when a token stays longer than a certain time in a place. We observe that the resulting controlled system can recognize certain faults, but the system performance during faults may remain to be a degraded one since the system cannot carry out some tasks while the failed resources are being repaired and recovered. Liu et al. [16] propose a three-step robust deadlock control strategy for systems with unreliable elements. The proposed strategy designs recovery subnets and monitors for some resource failures in the Petri net model. Normal and inhibitor arcs are inserted to connect the recovery subnets with monitors. We observe that the main weakness of this proposed strategy is that many recovery subnets with inhibitor arcs are added for unreliable resources, which leads to high structural complexity of the initial model. In addition, the proposed method does not guarantee that the system is able to continue performing its duties, while the failed resources are under their repair and recovery process. In addition, Li et al. [57] propose elementary siphonbased robust controllers that can handle the multi-type and multi-unit of resource failures. We observe that the proposed approach is not maximally permissive and has exponential complexity. Al-Ahmari et al. [51] propose a two-step robust deadlock control strategy for systems with unreliable and shared resources. The proposed strategy designs a common recovery subnet based on colored Petri nets for all resource failures in the Petri net model to make the system reliable. We observe that the proposed method does not ensure that the system is able to continue performing its duties, while the failed resources are under a repair and recovery process.
Kaid et al. [55] propose a novel three-step deadlock control strategy for fault detection and treatment of unreliable resource systems. The proposed strategy is a hybrid approach that combines neural networks with colored Petri nets for the detection and treatment of faults. We observe that the proposed method does not ensure that the system is able to continue performing its duties, while the failed resources are under a repair and recovery process. Feng et al. [58] develop a deadlock prevention controller for AMSs under resource failure consideration. The proposed controller ensures that the system can handle all kinds of parts continuously through any one of their paths, even if one of unreliable resources fails. Then they used a one-step look-ahead method to develop a polynomial-complexity deadlock avoidance policy (DAP). They report that during one resource failure period, the proposed controller is proved to be maximally permissive. However, we observe that the proposed controller requires the simulation-based tool to ensure the correctness, accuracy, or validity of the results.
In view of the above mentioned limitations in [16], [51], [55]- [58], in this work, a failure-safe model is proposed and, at the same time, a method for fault recovery and repair of a faulty element is presented without a single element being shut down. When there are redundant elements to replace the target elements, these redundant elements are used to do the same work as the target elements do. After a faulty target element is repaired and recovered, its failure model is automatically replaced by its new model to indicate that the corresponding elements have returned to the system.
Our contributions can be summarized as follows: 1. The proposed method does not require to introduce inhibitor arcs or enumerate reachability graphs, which leads to low computational overheads; 2. The proposed method ensures that all part types can be processed continuously no matter whether one or multiple unreliable resources fails or not; 3. Simulation-based tool is developed to ensure the correctness, accuracy, and validity of the proposed method, and comparisons are made to verify the performance of the proposed method with methods in the literature; 4. The proposed method can consider all unreliable resources in AMSs; 5. The proposed method has a simpler structure; 6. The proposed method can be applied to an unreliable complex Petri net model for AMSs.
The organization of this article is as follows. Section 2 discusses the preliminaries and fundamentals for Petri nets.
In Section 3, we introduce various system states at which a fault occurs and present how to switch their models to a new model for fault-recovery and repair. In Section 4, the proposed method is explained by using an example. Section 5 provides the conclusion of the study.

II. PRELIMINARIES
We introduce some basics of Petri nets and their properties in this section, which is important in this article. We assume that readers are familiar with the fundamentals of Petri nets. Definition 1: A basic Petri net (structure) is a four-tuple N = (P, T , F, W ) with finite, non-empty, and disjoint sets of P and T , where P = P A ∪ P R ∪ P 0 is a set of places, P A is called the set of activity or operation places, P R is called the set of resource places and P 0 is a set of the idle places. T is a set of transitions with P ∪ T = ∅ and P ∩ T = ∅. F ⊆ (P×T)∪(T×P) is called the net flow relationship defined by arcs with arrows from places to transitions or from transitions to places. W:(P×T)∪(T×P)→ N = {0, 1, 2, . . .} is a mapping assigning weight to an arc: W (x, y) > 0 if (x, y) ∈ F, and W (x, y) = 0 otherwise, where x, y ∈ P ∪ T . F + :(T ×P) → N is the output function, and F − :(P×T ) → N is an input function. The input and output functions can be tabulated and represented by matrices indexed by place set P and transition set T . The incidence matrix F is a matrix calculated by F = F + -F − . T = T c ∪ T uc , where T uc is the set of uncontrollable transitions and T c is the set of controllable transitions, respectively [12]. Using a multiset (bag) or formal sum notation for space economy we ordinarily characterize markings and vectors. Accordingly, p∈P M (p) p is used to denote vector M . For example, a marking that places four tokens in place p 4 and two tokens in place p 6 only in a net with P = {p 1 , p 2 , p 3 , p 4 , p 5 , p 6 } is denoted as M = 4p 4 + 2p 6 instead of (0,0,0,4,0,2) [12].
In general, (N , M 0 ), when there is no ambiguity, is explicitly called a net system or simply a net. N = (P, T , F, W ) is referred to as an ordinary net, denoted as The support(v) function is a set of places marked at the marking v. In the above example, support(M ) = {p 4 , p 6 }.
In an autonomous process, certain desirable properties that restrict the performance of a system are called specifications. Supervisory control aims at enforcing these pre-defined specifications. This can result in that some states are approved, while some others are prohibited. M R is used to denote a set of markings that can be reached after firing a finite set of transitions. Let M A be a subset of M R for the permitted states and M F another subset of M R as the prohibited states. There are two groups in the set of prohibited states [8].
Definition 3 [18] (Prest and Postset): N = (P, T , F) is a net where P is a finite place set, T is a finite transition set, and F is a set of arcs from a place to a transition or from a transition to a place. For all x ∈ P ∪ T , we have that a) · x = {y|y ∈ P ∪ T ∧ (y, x) ∈ F} denotes the preset of x; and b) x · = {y|y ∈ P ∪ T ∧ (x,y) ∈ F} denotes the postset of x. Similarly, given a node set S ⊂ P ∪ T , S's preset and postset are defined as · S = ∪ x∈S · x and S · = ∪ x∈S x · , respectively. Theorem 1 [15]: Let N = (P, T , F) be an acyclic Petri net (a Petri net that does not have direct circuits). If the net has no self-loop, it is pure or self-loop free, i.e., for all t ∈ T , and all p ∈ P, | · t| · |t · | = 0 and | · p|•|p · | = 0, where |•| denotes the cardinality of a set. Note that their incidence matrix can univocally determine the structure of a pure net. In addition, one can trivially check that an acyclic net is pure. We consider the model of an automated manufacturing system with unreliable resources proposed in [16], where P R denotes the set of unreliable resources in a system to be considered. A broken resource in a system can be fixed or repaired. Next, we introduce the notion of recovery subnet.
Definition 4 [17]: A PN is said to be an ordinary PN if for any arc in the net its weight is 1.
Definition 5 [17]: If a transition is enabled, it can fire. Firing an enabled transition t at marking

III. FAULT-RECOVERY AND REPAIR
A faulty state of a system is the system failure resulting from a sequence of valid state transitions. A fault is an anomalous physical condition and a manifestation of a defect in a system, which may lead to a malfunction in the performance of the system.
Definition 6: A system or plant with target elements is fault-safe if, in the case that a fault occurs to a target element, the system can continue its operations by systematically and automatically replacing the model of the target element with the model of its corresponding redundant element.
Definition 7: Failure Recovery is a process that evolves from a faulty state to a normal operational state by restoring the faulty element.
A failure in a system means that there is a fault the happens to a target element such that the system fails to perform some or all the operations. To make a system fault-tolerant, we need to deal with such a failure such that all the operations can be well performed when a failure occurs. If a system enters a failed state, a controller should immediately and automatically respond to it and lead the system to a new state such that the system can continue its operations.
We assume that there is a redundant element as an alternative resource for performing the tasks originally designed for the target element when a fault occurs.
In this study, we investigate different situations of the target non-reliable elements and provide a Petri net model that ensures the fault-safe properties by replacing a path that leads a system to a non-fault-safe state with another path that makes the system fault-safe until the former becomes safe again. In the developed Petri net, places are defined to model different status of each device and element in the system. Therefore, we need to define places to describe all possible states of the target elements and prevent the activation of the fault states represented by these places when a fault occurs. We call these places target places and redundant places for the target elements and their corresponding redundant elements, and they are denoted by p Ai and p Ei , respectively.
We are going to deal with failure representation problem when a failure occurs in the framework of transition/place nets. We assume that failures occur through the transition firing processes that cannot be observed, while repair and maintenance operations are done through the transition firing processes that represent observable legal behaviors.
Definition 8: Let t si be a transition representing the occurrence of an unobservable and uncontrollable failure event that may occur to a target element, and t Ri a transition representing the event of the repair and maintenance operation for a failed target element, and such an event is observable and controllable.
Let e F denote a failure and e R the repair and recovery event for a failed element. When e F = 1 and e R = 0, transition t si is enabled; while e F = 0 and e R = 1, transition t Ri is enabled. Let = {f 1 , f 2 , . . .} be the set of faults that might occur in the system. We model a fault fi ∈ by an unobservable fault  Consider the Petri net model in Fig. 1, which represents a system with two similar elements: target element represented by the place p Ai and its corresponding redundant element represented by p Ei , respectively. In this model, the redundant element is connected to the system, and therefore it is a part of the system. However, the aim of the model is to make only one of these two elements operate at a time. Since operating only one of them is enough for the system to run properly, the purpose of using the redundant is to replace the target element only when there is a fault to the target element. Thus, it needs a controller that can automatically activate the redundant element in the event of a failure and, at the same time, deactivate it when the target element has been repaired and recovered.
When there is a fault to the target element, the controller responsible for activating and deactivating transitions to make the system act properly is presented in Fig. 2. In the subnet p Fi represents that a failure occurs to a target element and it is marked when the target element fails. If p REi is marked, it denotes that the target element is under maintenance. t FIi and t FBi are transitions that indicate the occurrence of failure to the target element, t Si is the transition that describes the time to failures, and t Ri is the transition that represents repair and recovery of a target element.
The recovery subnet can be added to the Petri net model as shown in Fig. 3.
Definition 9: An arc connecting a place p to a transition t, is said to be a test arc if a token can be released from p through t into t · when M (p) ≥ w( f ) and the marking M (p) remains unchanged.
Definition 10: Let p Ai ∈ P R be an unreliable element or resource in a system and let H (p Ai ) be the set of places that hold p Ai called the holders of p Ai indicated by H (p Ai ) = {p|p ∈ P A , p ∈ ·· p Ai ∩ P A = ∅}.
Definition 11 [16], [51]: Let p Ai ∈ P R be an unreliable element. For a system with failure-prone target elements, there is a failure repairing, maintenance and recovery process. To make such a system fault-safe, controllers should be designed for different cases in order to handle any occurrence of failures and the repairing, maintenance and recovery process of a failure as well.

A. THE CASE WITH NO TARGET ELEMENT FAILED
When there is no failure, a failure-prone target element is in operation, while its corresponding redundant element is in its idle state. In this case, we model the system by using Petri net as shown in Fig. 4, where places p R1 and p R2 are the resource places for a target element and its corresponding redundant element, respectively, with M (p R1 ) = 1 and M (p R2 ) = 1, denoting that their capacity is one. Note that the place that models the recovery and repair of the target element is not marked, i.e., M (p REi ) = 0, which makes the redundant element unable to be put into operation. The meaning for the places and transitions in Fig. 4 is presented in Table 1.
Theorem 2: The Petri net (N FR , M FRo ) with unreliable element is live if an unreliable element does not fail.
Proof: A failure of the unreliable element can be observed by the failure of firing transition t Ai ∈ T o . Since there is no failure to the unreliable element, the net is live and the transition t Ai ∈ T o is also live. This means that there are no dead transitions.

B. THE CASE WITH ONE TARGET ELEMENT FAILED
A failure can happen in one of the two possible conditions of a target element: a failure can occur when a target element is VOLUME 8, 2020 in operation or is idle. These two situations are discussed as follows, respectively.

1) FAILURE OF AN ELEMENT IN OPERATION
If the target machine breaks while processing a task, this element cannot continue its operation. Therefore, the task must be switched to the redundant element after it is activated to perform the task, which is done as follows: when a failure occurs to a target element that is in operation, this element cannot continue its operation. Thus, the Petri net should describe this fact by disabling some transitions and changing the marking in some places. At the same time, to make the system failure-safe, it needs to activate its corresponding redundant element by automatically enabling some transitions and changing the marking of some places in the Petri net model. Therefore, we need to synthesize a supervisor to the Petri net model such that the system can act in a correct way. For this case, the Petri net model is shown in Fig. 5. As shown in Fig. 5, when the target element fails, when it is in operation. Thus, p R1 representing the target element is not marked, while the operation place p Ai is marked, i.e., M (p R1 ) = 0 and M (p Ai ) = 1. To make the system failuresafe, we need to automatically remove the token in p Ai and activate the corresponding redundant element such that the operation can be continued.
The Petri net model shown in Fig. 5 presents a policy to control the system such that the system can continue its operations correctly. At the marking shown in Fig. 5, transition t Si is enabled and its firing deposits a token into p Fi , which in turn enables t FBi . As a result, we have that, for all p ∈ · t FBi , M (p) > 0 such that t FBi is enabled. Firing t FBi indicates the occurrence of a failure to the target element. This takes the token in p Ai away and deposits a token into both p 1 and p REi . Then, the controller takes the token in p Ai away to stop its operation, and adds a token to place p Ei to activate the corresponding redundant element for performing the task that would be performed by the target element by firing t FBi . With p REi being marked, the target element is under repair, i.e., the repair mode is activated for the target element. After the target element is repaired and returns to work, the transition t Ri is enabled and its firing deposits a token into p R1 . This process is as follows. After some time, the token in p REi triggers t Ri since, at this time, for all p ∈ · t Ri , M (p) > 0 holds. The firing of t Ri releases a token into p R1 that represents the target element, such that the target element can perform the tasks due to that, for all p ∈ · t Ai1 , M (p) > 0, since M (p REi ) > 0. Suppose that the system consists of a self-loop between t Si and p Si . Then transition t Si releases a token to place p Fi when a failure occurs to the target element according to a given rate based on statistics analysis. The above proposed control policy can ensure the safe operations of the system every time a target element fails.

2) FAILURE TO AN IDLE TARGET ELEMENT
Suppose that a target element has just finished a task and suddenly it fails before it starts performing the next task. In this case, we can think that the target element fails when it is idle. For this case, we also need to control the system such that the tasks to be processed by this target element can be processed without interruption.
For this case, since the target element is idle, place p R1 is marked as shown in Fig. 6, i.e, M (p R1 ) > 0. In this case, transition t Si is triggered to indicate that the target element (p R1 ) has failed. Its firing deposits a token into p Fi . As for all p ∈ · t FIi , M (p) > 0, t FIi fires and releases a token into p REi . This activates the corresponding redundant element by enabling the transition t Ei1 .
In this case, when the target element is recovered, the same procedure as in the case that a failure occurs to an element that is in operation can be followed in order to activate the target element and deactivate the redundant element. When the system has finished some tasks by the redundant element and more tasks need to be processed while the target element is still in a repairing state, there is still a token in p REi , which activates the redundant element. Note that the dashed arc between a place p and a transition t is called a test arc. In Definition 11, p REi is said to be the recovery place of p i ∈ P X . Transitions t FRi and t Ri indicate that an unreliable Proof: When the target element fails, the processing of a task modeled by p Ai cannot be continued such that the token in p Ai should be removed away. At the same time, there is no token in p R1 representing the availability of the target element. Thus, the transitions t Ai1 and t Ai2 are disabled. When the target element fails, p REi gets a token, representing the target element is under repairing. Then, transitions t Ei ∈ T o is enabled. Enabling t Ei ∈ T o means that the redundant element is activated to replace the failed unreliable target element. Thus, t Ei ∈ T o is used to replace t Ai ∈ T o in the model. No matter whether the unreliable target element is idle or busy, the recovery subnet (N FR , M FRo ) will always activate the redundant element when the unreliable element fails. Thus, Petri net (N FR , M FRo ) always remains live under the controlled fault recovery and repair process.

IV. A PRACTICAL EXAMPLE
It is a polishing system for the repainting of some metal objects and it is shown in Fig. 7. There are two air current pumps (AP) with grains of sand for removing rust and the previous coating from the surface of the metal, one of them used as redundancy; a washing machine (WM) with more than one spray pistol to remove oil and grease from the metal surface; a conveyor belt (CB) to transport the metal from the polishing and washing area to the place of painting; and two paint spray pumps (SP) to spray paint onto the surface of the metal, one of them used as redundancy. The Petri net model for this system is shown in Fig. 8. Table 2 presents the meaning of places and transitions in Fig. 8.
Assume that the main air pump, the main motor for driving the conveyor belt, and the main paint spray pump are failureprone. When one of them fails, the following actions should be taken.
• When the main air pump fails, it should be replaced by the spare air pump until it is repaired and returned to work.  • When a failure occurs to the main motor of the conveyor belt, the system should switch to the redundant motor until the main motor is repaired.
• The replacement of the spare paint spray pump should be done, when there is a fault to the main paint spray pump. VOLUME 8, 2020  If the main air pump fails, its operation has to be stopped and the redundant one is put into operation until the main air pump is repaired. Therefore, in this case, the controller should prevent place p 3 from getting a token if there is no token in it, representing that the mental is ready for being processed or it should remove the token from it if there is a token in it. If the main pump fails before it starts to work (p 3 is not marked and p R1 is marked), then t s1 is triggered and its firing releases a token to p F1 , thus enabling t FI 1 , which in turn removes the token from p R1 and delivers the token into p RE1 . In this way, the redundant air pump is activated to replace the main air pump and, at the same time, the main air pump is under repair and maintenance.
If the main pump fails when it is in operation (p 3 is marked and, p R1 is not marked), t s1 fires and releases a token to p F1 , enabling t FB1 , which in turn removes the token in p 3 and puts a token into both p 2 and p RE1 . In this way, the redundant air pump is activated and the main air pump is put under repair and maintenance. The Petri net for realizing such a process is given in Fig. 9.
When the main air pump is repaired and recovered, t R1 fires and puts a token into p R1 , which activates the main air pump and, at the same time, turns off the redundant air pump.
Similarly, when the main-belt conveyor motor or the main paint spray pump at different states fails, we can present similar Petri net models to control the system such that the system is failure-safe just as the cases for the air pump. As above discussed, we can achieve the fault-recovery and repair controller by synchronizing the fault-recovery and repair models and the major plant model. The final controlled Petri net model of the application example is shown in Fig. 10. The entire controlled system is shown in Fig. 10. For this controlled system, it is fault-safe with respect to a failure of any unreliable element (the main air pump, the main motor of the conveyor belt, or the main industrial spray pump). For the controlled system shown in Fig. 10, we consider marking M with support(M ) = (p R1 , p R3 , p R5 ) and, for other marking M with different Support(M ), the controller is explained in Table 3.

V. SIMULATION OF THE APPLICATION EXAMPLE
Simulation is an important way used to evaluate performance and verify the validity of a proposed method. Visual Object Net ++ is a software tool for simulating, evaluating and modelling discrete event systems based on Petri net models. With the above exemple, to validate the proposed method, simulation is carried out by using an application example based on Visual Object Net ++. In the simulation, we consider failures happened to the target elements after a period of operation for the example system over different time periods. System task performance is transferred to the redundant element. When a fault occurs to a target element, the failed target element is replaced by its redundant element and the redundant element continues to perform tasks before the target element is repaired. After the failed target element is recovered, the redundant element is turned off and the processing of tasks is returned to the target element. Fig. 11 shows different states for a target element before and after the failure, and during repair and after repair.
Assume that a failure occurs to some elements in the system at different time during the estimated time period. The simulation for the application example is based on the model given in Fig. 12.
In the model given in Fig. 12, the transition t si represents the self-loop for triggering failures in the simulation. The time periods for the simulation are given in Table 4.
To verify, validate, and make comparisons of the time performance criteria for the proposed method, we have compared the simulation models with the studies by Liu et al. [16], Al-Ahmari et al. [51], and Kaid et al. [55]. After running and simulating the models, the results are summarized in Table 5. Table 5 and Fig. 13 show the results in terms of utilization of the resources, throughput (finished cars), and total time in system (throughput time for each car). From the viewpoint of resource utilization, the proposed   model achieves better performance than the other approaches in the literature, as shown in Fig. 13. With respect to throughput, the proposed method can achieve a higher throughput than other approaches as shown in Fig. 13. Finally, in terms of total time in system, the proposed method can achieve less total time in system than other approaches as shown in Fig. 13. Thus, the proposed method is accurate and can potentially be implemented to other instances.

VI. CONCLUSION
This work proposes an approach for investigating the behavior of faults to the resources in AMS that are prone to failure. We construct a redundant Petri net model for a target element that is prone to failure in order to synthesize a fail-safe controller for the system and make it able to continue performing its duties when a target element fails and is under repair or maintenance. The proposed method is tested using a practical example. The results are compared with the studies in the literature. The main advantages of the proposed method are as follows: (1) The proposed method achieves a good performance and performs better than the studies in the literature [16], [51], and [55]. (2) The proposed method ensures that all part types can be processed continuously no matter whether one or multiple unreliable resources fails or not. (3) The proposed method can handle all unreliable resources in AMSs. (4) The proposed method can be applied to an unreliable complex Petri net model for AMSs. (5) Simulation-based tool is developed to ensure the correctness, accuracy, and validity of the proposed method.
Our future work is to expand this method to the systems with uncontrollable events.